DEV Community

Cover image for Financial Risk Analysis in Excel: How I Used COUNTIFS, AVERAGEIF & Pivot Tables to Uncover Loan Default Patterns
Maurine Nyongesa
Maurine Nyongesa

Posted on

Financial Risk Analysis in Excel: How I Used COUNTIFS, AVERAGEIF & Pivot Tables to Uncover Loan Default Patterns

Tool: Microsoft Excel | Dataset: 1,000 loans | Period: 2018–2023

Overview

This project is a comprehensive loan portfolio analysis built entirely in Excel. The goal was to understand how borrower characteristics influence loan issuance, default behavior, and overall portfolio risk — and to answer five specific credit risk questions that financial institutions deal with daily:

  1. What Is the Overall Health of the Loan Portfolio? — 37.5% of 1,000 loans ended in default or charge-off. That number alone demands investigation.
  2. Does Loan Grade Reliably Predict Default Risk? Grade A and B loans recorded 0% bad loan rates. Grade D hit 79%. Grade E reached 100%.
  3. Which Borrower Characteristics Drive the Highest Risk? Credit score below 650 produced a 77–100% bad loan rate. DTI above 30% pushed default rates to nearly 45%.
  4. Does Repayment Period Influence Default Probability? 48-month loans carried the highest bad rate at 39.6%, marginally ahead of 36-month (38.8%) and 60-month (34.1%) loans.
  5. Has Underwriting Quality Changed Over Time? Bad loan rates fluctuated between 33.3% (2021) and 41.5% (2023), with 2023 marking the worst recent vintage.

Through segmentation analysis, pivot modeling, and time-trend evaluation, this analysis simulates how a financial institution would assess portfolio health, risk concentration, and lending strategy — combining issuance behavior, risk drivers, and repayment performance into a single unified framework.
The objective was not just to visualize data. It was to let the data answer something.


The Dataset

1,000 loan records across 13 columns, covering two things: who the borrower is, and what happened to the loan.

Column What It Tells Us
Loan_ID Unique loan identifier
Issue_Date When the loan was issued (2018–2023)
Repayment_Period (Months) Loan term — 36, 48, or 60 months
Loan_Amount Principal borrowed (USD)
Interest_Rate (%) Annual interest rate on the loan
Annual_Income Borrower's declared yearly income (USD)
Debt_to_Income_Ratio (%) Monthly debt obligations as % of gross income
Employment_Length (Years) Years employed at time of application
Home_Ownership Rent / Own / Mortgage
Loan_Purpose Medical, Car, Business, Education, Home Improvement, Debt Consolidation
Credit_Score Borrower credit score at issuance (550–800)
Loan_Grade Lender-assigned risk grade — A (lowest risk) to E (highest)
Loan_Status The target variable — Fully Paid / Default / Charged Off

What This Analysis Covers

  1. Portfolio Overview — baseline KPIs: total loans, volume, fully paid rate, bad loan rate, average credit score, interest rate, and income.
  2. Risk Segmentation — bad loan rates broken down by loan grade, credit score band, DTI bracket, loan purpose, and home ownership.
  3. Repayment & Issuance Trends — year-over-year portfolio performance from 2018–2023 and default rates across 36, 48, and 60-month terms
  4. Grade × Status Cross-Tabulation — how each loan grade distributes across Fully Paid, Default, and Charged Off outcomes.
  5. Recommendations — data-backed actions for underwriting, risk concentration, and portfolio strategy All analysis was done in Microsoft Excel only — no Python, no SQL, no Power BI. Just formulas, pivot tables, conditional formatting, and charts.

Setting Up the Analysis — Helper Columns First

Before writing a single summary formula, two helper columns were added directly to the raw data sheet. These columns powered every calculation that followed.

Helper Column 1 — Bad Loan Flag

A new column called Bad_Loan was created using:

=IF(OR(M2="Default",M2="Charged Off"),1,0)

This assigned a 1 to every loan that defaulted or was charged off, and a 0 to every fully paid loan. Because the column contains only 1s and 0s, taking the average of this column at any level of segmentation instantly returns the bad loan rate as a decimal.

Helper Column 2 — Issue Year

A Year column was extracted from the Issue Date column for all trend analysis:

=YEAR(B2)

Every segmentation formula then followed one of these patterns:

-- Overall bad loan rate
=AVERAGE(N2:N1001)

-- Segmented bad loan rate (one condition)
=AVERAGEIF(criteria_range, criteria, N2:N1001)

-- Segmented bad loan rate (multiple conditions)
=AVERAGEIFS(N2:N1001, criteria_range_1, criteria_1, criteria_range_2, criteria_2)
Enter fullscreen mode Exit fullscreen mode

Note on AVERAGEIF vs AVERAGEIFS: The argument order is different and easy to mix up.

AVERAGEIF → range, criteria, average_range (average range last)

AVERAGEIFS → average_range, criteria_range, criteria (average range first)

Finding 1 — The Portfolio is Significantly Non-Performing

The first question was straightforward: what does the overall portfolio health look like?

Formulas used:

-- Count per status
=COUNTIF($M$2:$M$1001,"Fully Paid")
=COUNTIF($M$2:$M$1001,"Default")
=COUNTIF($M$2:$M$1001,"Charged Off")

-- Percentage of portfolio
=COUNTIF($M$2:$M$1001,"Fully Paid")/COUNTA($A$2:$A$1001)

-- Bad loan rate (using helper column)
=AVERAGE($N$2:$N$1001)
Enter fullscreen mode Exit fullscreen mode

What this tells us: More than one in three loans in this portfolio failed to perform. A 37.5% bad loan rate would trigger immediate portfolio review in any real lending institution. But an overall rate alone doesn't tell you where the problem is — that requires segmentation.

Finding 2 — Loan Grade is a Near-Perfect Risk Predictor

Loan grade is a risk classification assigned by the lender at issuance — A being the safest and E the most risky. It factors in credit score, income, employment, and DTI. The question was whether the grade system actually predicted outcomes.



Formulas used:

-- Bad loan rate per grade
=AVERAGEIF($L$2:$L$1001,"A",$N$2:$N$1001)

-- Count per grade
=COUNTIF($L$2:$L$1001,"A")

-- Average credit score per grade
=AVERAGEIF($L$2:$L$1001,"A",$K$2:$K$1001)

-- Average interest rate per grade
=AVERAGEIF($L$2:$L$1001,"A",$E$2:$E$1001)
Enter fullscreen mode Exit fullscreen mode

What this tells us: The grade system worked almost perfectly as a predictor. Grades A and B — 380 loans combined — produced zero bad loans. Grade C had just one charge-off out of 203. Then the cliff edge. Grade D collapsed to 79% and Grade E hit an absolute 100%. Not a single Grade E loan was repaid.

The most alarming detail however is not the default rates — it's the interest rates. Despite Grade E borrowers defaulting at 100%, their average interest rate (15.22%) was barely higher than Grade A (14.82%). The lender identified the risk through grading but failed to price it accordingly. Riskier borrowers should carry significantly higher rates to compensate for expected losses.

Finding 3 — Credit Score is the Single Strongest Default Predictor

Credit score represents a borrower's creditworthiness based on their historical borrowing and repayment behavior. A band analysis was conducted by first creating a helper column:

-- Credit band helper column
=IFS(K2<600,"550-600",K2<650,"600-650",K2<700,"650-700",K2<750,"700-750",K2<=800,"750-800")
Enter fullscreen mode Exit fullscreen mode



Formulas used:

-- Bad loan rate per credit band
=AVERAGEIF($O$2:$O$1001,"550-600",$N$2:$N$1001)

-- Count per credit band
=COUNTIF($O$2:$O$1001,"550-600")
Enter fullscreen mode Exit fullscreen mode

What this tells us: A credit score of 650 functions as a near-perfect binary cutoff. Below 600 — every single borrower defaulted or was charged off. The 600–650 band was nearly as bad at 76.7%. Cross 650 and the bad loan rate drops to exactly 0% — and stays there all the way to 800. No other variable in this dataset draws such a clean, decisive line between performing and non-performing loans. A hard underwriting rule requiring a minimum credit score of 650 would have eliminated the vast majority of bad loans in this portfolio.

Finding 4 — DTI Ratio is a Consistent Risk Escalator

Debt-to-Income ratio measures a borrower's monthly debt obligations as a percentage of their gross monthly income. The higher the DTI, the more financially stretched the borrower. A band analysis was conducted using:

-- DTI band helper column
=IFS(G2<=10,"0-10",G2<=20,"10-20",G2<=30,"20-30",G2<=40,"30-40",G2>40,"40+")
Enter fullscreen mode Exit fullscreen mode


Formulas used:

-- Bad loan rate per DTI band
=AVERAGEIF($P$2:$P$1001,"0-10",$N$2:$N$1001)

-- Average income per DTI band
=AVERAGEIF($P$2:$P$1001,"0-10",$F$2:$F$1001)  
Enter fullscreen mode Exit fullscreen mode

What this tells us: Unlike credit score which produces a sharp binary cutoff, DTI tells a gradual story. As debt burden increases the bad loan rate climbs steadily — from 27.1% at the lowest band to 44.9% at 30–40%. Critically, average credit scores were nearly identical across all DTI bands (668–676), confirming that DTI and credit score measure different dimensions of risk. A borrower can have an acceptable credit score but still be dangerously overleveraged — and this portfolio shows overleveraging carries real consequences regardless of score.

Finding 5 — No Loan Purpose is Safe



Formulas used:

-- Bad loan rate per purpose
=AVERAGEIF($J$2:$J$1001,"Medical",$N$2:$N$1001)

-- Bad loan count per purpose
=COUNTIFS($J$2:$J$1001,"Medical",$N$2:$N$1001,1)
Enter fullscreen mode Exit fullscreen mode

What this tells us: Across all six loan purposes bad loan rates ranged from 34.8% to 40.8% — a narrow 6 percentage point band.

Medical loans carried the highest risk, likely because borrowers taking medical loans are often already under financial stress at application. The narrow spread means loan purpose alone cannot be used to make reliable lending decisions. It needs to be combined with credit score and grade to add meaningful predictive value.

Finding 6 — Home Ownership Adds Almost No Predictive Value



Formulas used:

=AVERAGEIF($I$2:$I$1001,"Rent",$N$2:$N$1001)
=AVERAGEIF($I$2:$I$1001,"Own",$N$2:$N$1001)
=AVERAGEIF($I$2:$I$1001,"Mortgage",$N$2:$N$1001)
Enter fullscreen mode Exit fullscreen mode

What this tells us: The spread between all three groups is just 4.7 percentage points. Counterintuitively, outright homeowners performed worst of the three — directly challenging the conventional lending assumption that homeownership signals financial stability. In isolation home ownership is not a reliable risk indicator in this portfolio and should never be used as a standalone lending criterion.

Finding 7 — The Portfolio Has Been Deteriorating Since 2021

The trend analysis used the Year helper column combined with COUNTIFS and AVERAGEIFS to evaluate performance year over year.

Formulas used:

-- Loan count per year
=COUNTIF($O$2:$O$1001,A2)

-- Total volume per year
=SUMIF($O$2:$O$1001,A2,$D$2:$D$1001)

-- Bad loan rate per year
=AVERAGEIF($O$2:$O$1001,A2,$N$2:$N$1001)

-- Fully paid % per year
=COUNTIFS($O$2:$O$1001,A2,$M$2:$M$1001,"Fully Paid")/COUNTIF($O$2:$O$1001,A2)

-- Year over year bad rate change
=IFERROR((C3-C2)/C2,"-")  
Enter fullscreen mode Exit fullscreen mode

What this tells us: 2021 was the portfolio's strongest year — lowest bad loan rate (33.3%), highest fully paid rate (66.7%), and highest average credit score (682). From 2022 onwards the portfolio deteriorated consistently. By 2023 the bad loan rate hit 41.5% — the worst in the entire dataset — with the default rate alone reaching 24.5%, the highest single year figure across all six years. Average credit scores in 2022 and 2023 dropped back to 666, almost identical to 2018 levels. This pattern strongly suggests underwriting standards loosened after 2021.

Finding 8 — Grade D and Grade E Fail Differently

The cross-tabulation was built using a single COUNTIFS formula written once and dragged across the entire table:

-- Core cross-tabulation formula
=COUNTIFS($L$2:$L$1001,$A3,$M$2:$M$1001,B$2)
Enter fullscreen mode Exit fullscreen mode

The $A3 locks the column but lets the row move. B$2 locks the row but lets the column move — allowing the formula to be written once and dragged across all grade and status combinations automatically.



% Breakdown (referencing count table):

-- % of row total
=B3/$E3

What this tells us: Both Grade D and E record 100% bad loan rates but fail in completely different ways. Grade D loans predominantly charge off at 72.2% — the lender writes the debt off as unrecoverable in nearly three quarters of cases.

Grade E flips this pattern — 89.2% default while only 10.8% charge off, meaning the lender still pursues recovery on most Grade E failures. Since charge-offs represent total losses while defaults leave some recovery options open, Grade D is arguably more damaging to the lender than Grade E despite both recording identical bad loan rates.

Recommendations

  1. Enforce a Hard Credit Score Floor of 650 The data is unambiguous. No borrower above 650 defaulted. No borrower below 600 repaid. Implementing a minimum credit score of 650 as a hard underwriting rule would eliminate the vast majority of bad loans. 577 of 1,000 loans already sit above this threshold — meaning this rule would not dramatically reduce origination volume while significantly improving portfolio quality.
  2. Restrict Origination to Grades A, B, and C Grades A, B, and C combined produced just one bad loan out of 583. Grades D and E produced 374 bad loans out of 417. Concentrating origination in the top three grades while either eliminating or dramatically repricing Grades D and E would transform this portfolio's performance. At minimum, Grade D and E interest rates need to reflect the actual level of risk — not sit within 0.5% of Grade A rates.
  3. Investigate and Address the Post-2021 Deterioration The consistent worsening of bad loan rates from 2021 to 2023 — combined with declining average credit scores — points to a loosening of underwriting standards in recent years. The 2023 default rate of 24.5% is the highest in the dataset. Without intervention the trajectory points toward a 2024 bad loan rate exceeding 41.5%. Understanding what changed in underwriting policy after 2021 is the most urgent question this portfolio raises.

Conclusion

This analysis set out to answer one question — what does the data tell us about why loans go bad? The answer turned out to be surprisingly clean. Two variables dominate everything else: credit score and loan grade. Both draw near-perfect lines between performing and non-performing loans. DTI adds a secondary layer of independent risk. Everything else — purpose, home ownership, repayment period — contributes marginally at best.

The broader lesson for any data analyst working in financial risk is this: averages deceive, segmentation reveals. The average borrower in this portfolio looks reasonable — $82,924 income, 672 credit score, 9.8 years employed.

But segment by grade and credit score and the picture changes completely. The risk is not spread evenly across the portfolio. It is concentrated, identifiable, and largely predictable — which means it is also largely preventable.

Dataset sourced from a financial risk portfolio simulation.
Here is the Dataset: Financial Dataset

Top comments (0)