## DEV Community

Wynn Tee for Quantitative Risk Solutions PLT

Posted on • Originally published at addin.qrstoolbox.com

# How to create bins with a monotonic weight of evidence trend in Excel

Risk practitioners often apply weight of evidence (WOE) transformation to the independent variables of a logistic regression. A key challenge in the transformation is grouping data into bins that, when sorted logically, show a monotonic WOE trend.

## QRS.BIN functions

QRS Toolbox for Excel includes functions for binning data with application to WOE transformation. The names of the functions all begin with QRS.BIN.

QRS.BIN.EQFREQ, QRS.BIN.EQWIDTH, and QRS.BIN.UNIQUE return bins using the equal frequency, equal width, and unique value methods respectively.

QRS.BIN.MONO returns bins using a novel combination of the Monotone Adjacent Pooling Algorithm and ChiMerge. It can be used to automatically create bins with a monotonic WOE trend.

QRS.BIN.MANUAL returns bins using user-defined criteria. It can be used to manually create bins with a monotonic WOE trend, such as by merging bins returned from one of the other QRS.BIN functions.

QRS.BIN.MAP returns mapped bin values. It can be used to map original data values to WOE values for subsequent use in QRS.LOGISTIC.REGRESSION.

## QRS.BIN.MONO

The data in the workbook is based on the 2020 version of the South German credit dataset. In the workbook:

• Cells A2–A1001 contain credit duration in months for 1000 debtors.
• Cells B2–B1001 contain ones if credit defaults occurred and zeros otherwise.
``````=QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7)
``````

To group credit duration into 7 bins with a monotonic WOE trend, enter the formula `=QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7)` in cell E1. The third argument is 1, because credit risk generally increases as credit duration increases.

The result is a frequency table with 7 rows and 7 columns. The rows correspond to bins. The columns correspond to identifier, description, total frequency, frequency of zeros, frequency of ones, proportion of ones, and WOE respectively.

The WOE values in the final column increase monotonically, as required. Note that some applications assume a definition of WOE that differs by a negative sign.

``````=QRS.BIN.MONO(A2:A1001, B2:B1001, 1, 7, "HEADER", TRUE)
``````

To improve the presentation of the result, add `"HEADER", TRUE` to the formula. The result now contains a header row.

The identifiers in the first column represent the lower bounds of the bins. They can be used as binning keys in QRS.BIN.MAP.

``````=QRS.BIN.MAP(A2:A1001, E2:E9, K2:K9)
``````

Finally, to map the original credit duration values to WOE values, enter the formula `=QRS.BIN.MAP(A2:A1001, E2:E9, K2:K9)` in cell C2.

## QRS.BIN.MANUAL

The bins created using QRS.BIN.MONO will always have a monotonic WOE trend, but they might not have "nice" lower bounds or "smooth" frequencies. For example, the lower bounds might not be integers, or the frequencies might not be unimodal.

Continuing with the example from the previous section, suppose that the bins are required to have not only a monotonic WOE trend, but also lower bounds that are multiples of 6 or 12 months.

``````=QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, , "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE")
``````

To create bins with such "nice" lower bounds, enter 0, 6, 12, 18, 24, 30, 36, 48, 60 in cells F15–F23, and enter the formula `=QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, , "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE")` in cell H14. The `"EXC", TRUE` option turns the lower bounds into exclusive, rather than inclusive, bounds.

The result contains 9 bins that satisfy the "nice" lower bounds requirement, but not the monotonic WOE trend requirement. The WOE of the ninth bin cannot even be calculated. A possible solution is to merge some of the bins.

``````=QRS.BIN.MANUAL(A2:A1001, B2:B1001, F15:F23, E15:E23, "HEADER", TRUE, "EXC", TRUE, "COLS", "NO,DESC,N,WOE")
``````

To merge bins 3–4, 5–6, and 7–9, enter the new bin numbers 1, 2, 3, 3, 4, 4, 5, 5, 5 in cells E15–E23, and change the empty fourth argument in cell H14 to `E15:E23`.

The result now contains 5 bins that satisfy both the "nice" lower bounds and monotonic WOE trend requirements.

``````=QRS.BIN.MAP(E15:E23, H15:H19, K15:K19)
``````

Finally, to map the original credit duration values to WOE values, enter the formula `=QRS.BIN.MAP(E15:E23, H15:H19, K15:K19)` in cell G15.

``````=QRS.BIN.MAP(A2:A1001, F15:F23, G15:G23)
``````

Then, enter the formula `=QRS.BIN.MAP(A2:A1001, F15:F23, G15:G23)` in cell C2. The mapping is done in two steps, because the original values and WOE values are indirectly linked by the bin numbers.