DEV Community

Cover image for How to create bins with a monotonic weight of evidence trend in Excel
Wynn Tee for Quantitative Risk Solutions PLT

Posted on • Originally published at addin.qrstoolbox.com

How to create bins with a monotonic weight of evidence trend in Excel

Risk practitioners often apply weight of evidence (WOE) transformation to the independent variables of a logistic regression. A key challenge in the transformation is grouping data into bins that, when sorted logically, show a monotonic WOE trend.

QRS.BIN functions

QRS.BIN functions

QRS Toolbox for Excel includes functions for binning data with application to WOE transformation. The names of the functions all begin with QRS.BIN.

QRS.BIN.EQFREQ, QRS.BIN.EQWIDTH, and QRS.BIN.UNIQUE return bins using the equal frequency, equal width, and unique value methods respectively.

QRS.BIN.MONO returns bins using a novel combination of the Monotone Adjacent Pooling Algorithm and ChiMerge. It can be used to automatically create bins with a monotonic WOE trend.

QRS.BIN.MANUAL returns bins using user-defined criteria. It can be used to manually create bins with a monotonic WOE trend, such as by merging bins returned from one of the other QRS.BIN functions.

QRS.BIN.MAP returns mapped bin values. It can be used to map original data values to WOE values for subsequent use in QRS.LOGISTIC.REGRESSION.

QRS.BIN.MONO

To try QRS.BIN.MONO yourself, first add QRS Toolbox to your instance of Excel and start your free trial of QRS.BIN.MONO. Then, download and open the example workbook.

Data

The data in the workbook is based on the 2020 version of the South German credit dataset. In the workbook:

  • Cells A2–A1001 contain credit duration in months for 1000 debtors.
  • Cells B2–B1001 contain ones if credit defaults occurred and zeros otherwise.
=QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7)
Enter fullscreen mode Exit fullscreen mode

Formula

To group credit duration into 7 bins with a monotonic WOE trend, enter the formula =QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7) in cell E1. The third argument is 1, because credit risk generally increases as credit duration increases.

The result is a frequency table with 7 rows and 7 columns. The rows correspond to bins. The columns correspond to identifier, description, total frequency, frequency of zeros, frequency of ones, proportion of ones, and WOE respectively.

The WOE values in the final column increase monotonically, as required. Note that some applications assume a definition of WOE that differs by a negative sign.

=QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7, "HEADER", TRUE)
Enter fullscreen mode Exit fullscreen mode

Formula

To improve the presentation of the result, add "HEADER", TRUE to the formula. The result now contains a header row.

The identifiers in the first column represent the lower bounds of the bins. They can be used as binning keys in QRS.BIN.MAP.

=QRS.BIN.MAP(A2:A1001, E2:E9, K2:K9)
Enter fullscreen mode Exit fullscreen mode

Formula

Finally, to map the original credit duration values to WOE values, enter the formula =QRS.BIN.MAP(A2:A1001, E2:E9, K2:K9) in cell C2.

Please read the documentation to learn more about QRS.BIN.MONO, including how to include a total row and how to include only selected columns.

QRS.BIN.MANUAL

The bins created using QRS.BIN.MONO will always have a monotonic WOE trend, but they might not have "nice" lower bounds or "smooth" frequencies. For example, the lower bounds might not be integers, or the frequencies might not be unimodal.

Continuing with the example from the previous section, suppose that the bins are required to have not only a monotonic WOE trend, but also lower bounds that are multiples of 6 or 12 months.

=QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, , "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE")
Enter fullscreen mode Exit fullscreen mode

Formula

To create bins with such "nice" lower bounds, enter 0, 6, 12, 18, 24, 30, 36, 48, 60 in cells F15–F23, and enter the formula =QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, , "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE") in cell H14. The "EXC", TRUE option turns the lower bounds into exclusive, rather than inclusive, bounds.

The result contains 9 bins that satisfy the "nice" lower bounds requirement, but not the monotonic WOE trend requirement. The WOE of the ninth bin cannot even be calculated. A possible solution is to merge some of the bins.

=QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, E15:E23, "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE")
Enter fullscreen mode Exit fullscreen mode

Formula

To merge bins 3–4, 5–6, and 7–9, enter the new bin numbers 1, 2, 3, 3, 4, 4, 5, 5, 5 in cells E15–E23, and change the empty fourth argument in cell H14 to E15:E23.

The result now contains 5 bins that satisfy both the "nice" lower bounds and monotonic WOE trend requirements.

=QRS.BIN.MAP(E15:E23, H15:H19, K15:K19)
Enter fullscreen mode Exit fullscreen mode

Formula

Finally, to map the original credit duration values to WOE values, enter the formula =QRS.BIN.MAP(E15:E23, H15:H19, K15:K19) in cell G15.

=QRS.BIN.MAP(A2:A1001, F15:F23, G15:G23)
Enter fullscreen mode Exit fullscreen mode

Formula

Then, enter the formula =QRS.BIN.MAP(A2:A1001, F15:F23, G15:G23) in cell C2. The mapping is done in two steps, because the original values and WOE values are indirectly linked by the bin numbers.

Please read the documentation to learn more about QRS.BIN.MANUAL, including how to use regular expressions for text values and how to handle stray values.

Final remarks

If you would like to use QRS.BIN.MONO beyond your free trial period, you may purchase the right to use it indefinitely for as little as USD 9.00. If you find the QRS.BIN functions useful, please share this page with other potential users.

Discussion (0)