DEV Community

Cover image for How to use python in data visualization for credit risk assessment.
Judy
Judy

Posted on

How to use python in data visualization for credit risk assessment.

Most individuals rely on credit to finance vehicles, real estate, student loans, and the start-up of small enterprises. Assessing credit risk data is crucial for financial institutions when deciding whether to offer the loans.

Dataset used for credit assessment was sourced from kaggle.com.com. Therefore proceed to load the relevant libraries in python which will be used for credit risk assessment.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Enter fullscreen mode Exit fullscreen mode

Data is then loaded on python as csv file;

data = pd.read_csv('credit_risk_dataset_test.csv')
Enter fullscreen mode Exit fullscreen mode

After the data is loaded, data cleaning is done next to format any inconsistencies and ensure the data is in order to avoid any errors.
First check the data types for the columns as

print(data.dtypes)
Enter fullscreen mode Exit fullscreen mode

In this data set there are a mix types of data which may make data manipulation a bit hard hence convert integers to float. I prefer using floats as it allows me to represent data on plots accurately and ensures compatibility with various libraries.

data['person_income'] = data['person_income'].astype(str)
data['loan_amount'] = data['loan_amount'].astype(str)
data['loan_int_rate'] = data['loan_int_rate'].astype(str)
data['debt_to_income_ratio'] = data['debt_to_income_ratio'].astype(str)
Enter fullscreen mode Exit fullscreen mode

Once the data is cleaned, data visualisation is next. Histograms will be used to show visual representation of ages. In this data set, shows most people who have loans range 20-40 years. Highest number are in their 20s.

plt.figure(figsize=(8, 6))
sns.histplot(data['person_age'], bins=20, kde=True)
plt.title('Distribution of Person Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Next, creation of boxplot to visualize the distribution of loan amounts by loan grade.Each box in the plot represents a specific loan grade, and it shows the distribution of loan amounts for that grade. The boxplot provides information about the median, quartiles, and any potential outliers in the data for each loan grade, making it a useful tool for understanding the distribution of loan amounts across different grades. Grade F loans have a median income that is superior to other grades.

plt.figure(figsize=(10, 6))
sns.boxplot(data=data, x='loan_grade', y='loan_amount')
plt.title('Loan Amount Distribution by Loan Grade')
plt.xlabel('Loan Grade')
plt.ylabel('Loan Amount')
plt.show()
Enter fullscreen mode Exit fullscreen mode

The next step,use of scatter plots to provide insights into the relationship between a borrower's debt-to-income ratio, interest rate, and whether they defaulted on their loan. A higher debt-to-income ratio indicates that the borrower has a larger proportion of their income committed to debt payments. In the data set, loans with lower interest rate, lower debt to income ratio have not defaulted while loans with higher interest rate have defaulted.

plt.figure(figsize=(8, 6))
sns.scatterplot(data=data, x='debt_to_income_ratio', y='loan_int_rate', hue='cb_person_default_on_file')
plt.title('Debt-to-Income Ratio vs. Interest Rate')
plt.xlabel('Debt-to-Income Ratio')
plt.ylabel('Interest Rate')
plt.legend(title='Default')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Count plots of homeownership will be done to provide a visual representation of different types of home ownership among borrowers. In this data set the "RENT" is the most common home ownership type among borrowers, followed by "MORTGAGE." The "OWN" category has the fewest borrowers.
This plot helps one understand the characteristics of borrowers and can help identify potential factors that impact credit risk.
In this case, most of the borrowers are renters, followed by those with mortgage. The least borrowers are home owners

plt.figure(figsize=(8, 6))
sns.countplot(data=data, x='person_home_ownership')
plt.title('Count of Home Ownership Types')
plt.xlabel('Home Ownership')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()
Enter fullscreen mode Exit fullscreen mode

Lastly, relationship between person income and loan amount is established by using scatterplot. In the data set, most borrowers are low income earners and have different intent use for the loans.

plt.figure(figsize=(10, 6))
sns.scatterplot(x='person_income', y='loan_amount', data=data, hue='loan_intent', palette='Dark2')
plt.xlabel('Person Income')
plt.ylabel('Loan Amount')
plt.title('Person Income vs. Loan Amount')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Python being an open source gives one access to different libraries that enables you to handle large data sets and easily customize to what you prefer.

To view the output for the above, visit github.com

Thanks for reading!

Top comments (0)