MakeTidyBins



library(tidybins)
suppressPackageStartupMessages(library(dplyr))

Bin Value

Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.


tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data

sales_data %>% 
  bin_cols(SALES, bin_type = "value") -> sales_data1
#> Warning: SALES contains negative values. Negative values are treated as 0.

sales_data1
#> # A tibble: 1,000 x 2
#>    SALES SALES_va10
#>    <int>      <int>
#>  1  7979          2
#>  2  5475          1
#>  3 13642          9
#>  4  9723          4
#>  5 17671         10
#>  6  9517          4
#>  7 10351          5
#>  8  2162          1
#>  9 14162          9
#> 10 12246          7
#> # … with 990 more rows

Notice that the sum is equal across bins.

sales_data1 %>% 
  bin_summary() %>% 
  print(width = Inf)
#> # A tibble: 11 x 14
#>    column method      n_bins .rank  .min  .mean  .max .count .uniques
#>    <chr>  <chr>        <int> <int> <int>  <dbl> <int>  <int>    <int>
#>  1 SALES  equal value     10    10 14780 15919. 20855     63       62
#>  2 SALES  equal value     10     9 13553 14046. 14723     72       72
#>  3 SALES  equal value     10     8 12562 13007. 13552     77       76
#>  4 SALES  equal value     10     7 11855 12179. 12546     82       76
#>  5 SALES  equal value     10     6 11110 11502. 11848     87       84
#>  6 SALES  equal value     10     5 10290 10705. 11105     94       88
#>  7 SALES  equal value     10     4  9381  9835. 10289    101       95
#>  8 SALES  equal value     10     3  8366  8872.  9373    113      111
#>  9 SALES  equal value     10     2  7103  7783.  8360    129      120
#> 10 SALES  equal value     10     1  1229  5517.  7094    181      177
#> 11 SALES  equal value     10     0 -1420 -1420  -1420      1        1
#>    relative_value    .sum  .med   .sd width
#>             <dbl>   <int> <dbl> <dbl> <int>
#>  1         100    1002884 15741 1089.  6075
#>  2          88.2  1011335 14017  356.  1170
#>  3          81.7  1001533 13012  281.   990
#>  4          76.5   998703 12140  216.   691
#>  5          72.3  1000691 11511  211.   738
#>  6          67.2  1006264 10736  256.   815
#>  7          61.8   993372  9797  266.   908
#>  8          55.7  1002549  8869  302.  1007
#>  9          48.9  1004040  7803  357.  1257
#> 10          34.7   998607  5727 1239.  5865
#> 11          -8.92   -1420 -1420   NA      0