Here I use RFC to show its application in finance domain to predict if a loan granted to a person is risky or not
π¦ German Credit Risk β Random ForestΒΆ
Course: Finance & Machine Learning
π What is Random Forest?ΒΆ
A Random Forest is an ensemble of many Decision Trees. Instead of relying on a single tree (which can overfit), it builds hundreds of trees on random subsets of data and features, then takes a majority vote for classification.
Why use Random Forest for Credit Risk?ΒΆ
- Credit datasets are imbalanced (more good loans than bad)
- Credit decisions involve complex, non-linear relationships between features
- Random Forest handles missing values, outliers, and mixed data types well
- It provides feature importance β crucial for regulatory explainability in finance
Key Advantages over a single Decision Tree:ΒΆ
| Feature | Decision Tree | Random Forest |
|---|---|---|
| Overfitting | High risk | Low risk |
| Stability | Low (sensitive to data) | High |
| Accuracy | Moderate | Higher |
| Interpretability | High | Moderate |
| Handles imbalance | Poor | Better (with class_weight) |
Step 1: Import LibrariesΒΆ
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
# Models
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
# Evaluation
from sklearn.metrics import (
classification_report, confusion_matrix,
roc_auc_score, roc_curve, ConfusionMatrixDisplay
)
# Save/Load model
import joblib
print("All libraries loaded successfully!")
All libraries loaded successfully!
Step 2: Load the DatasetΒΆ
We load the German Credit Risk dataset.
- Each row represents a loan applicant.
- The target variable
Risktells us if the applicant was a good or bad credit risk.
df = pd.read_csv('data_german_credit.csv', index_col=0)
# df = pd.read_csv('https://raw.githubusercontent.com/ash322ash422/data/refs/heads/main/data_german_credit.csv', index_col=0)
df
| Age | Sex | Job | Housing | Saving accounts | Checking account | Credit amount | Duration | Purpose | Risk | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 67 | male | 2 | own | NaN | little | 1169 | 6 | radio/TV | good |
| 1 | 22 | female | 2 | own | little | moderate | 5951 | 48 | radio/TV | bad |
| 2 | 49 | male | 1 | own | little | NaN | 2096 | 12 | education | good |
| 3 | 45 | male | 2 | free | little | little | 7882 | 42 | furniture/equipment | good |
| 4 | 53 | male | 2 | free | little | little | 4870 | 24 | car | bad |
| … | … | … | … | … | … | … | … | … | … | … |
| 995 | 31 | female | 1 | own | little | NaN | 1736 | 12 | furniture/equipment | good |
| 996 | 40 | male | 3 | own | little | little | 3857 | 30 | car | good |
| 997 | 38 | male | 2 | own | little | NaN | 804 | 12 | radio/TV | good |
| 998 | 23 | male | 2 | free | little | little | 1845 | 45 | radio/TV | bad |
| 999 | 27 | male | 2 | own | moderate | moderate | 4576 | 45 | car | good |
1000 rows Γ 10 columns
# print(f"Dataset shape: {df.shape[0]} rows Γ {df.shape[1]} columns")
print(f"Dataset shape: {df.shape}") # rows, columns
Dataset shape: (1000, 10)
# First 5 rows:
print(df.head())
First 5 rows:
Age Sex Job Housing Saving accounts Checking account Credit amount \
0 67 male 2 own NaN little 1169
1 22 female 2 own little moderate 5951
2 49 male 1 own little NaN 2096
3 45 male 2 free little little 7882
4 53 male 2 free little little 4870
Duration Purpose Risk
0 6 radio/TV good
1 48 radio/TV bad
2 12 education good
3 42 furniture/equipment good
4 24 car bad
# Quick summary of column types, etc
print(df.info())
<class 'pandas.core.frame.DataFrame'> Index: 1000 entries, 0 to 999 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age 1000 non-null int64 1 Sex 1000 non-null object 2 Job 1000 non-null int64 3 Housing 1000 non-null object 4 Saving accounts 817 non-null object 5 Checking account 606 non-null object 6 Credit amount 1000 non-null int64 7 Duration 1000 non-null int64 8 Purpose 1000 non-null object 9 Risk 1000 non-null object dtypes: int64(4), object(6) memory usage: 85.9+ KB None
print("\nNumeric Statistics:")
print(df.describe())
Numeric Statistics:
Age Job Credit amount Duration
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 35.546000 1.904000 3271.258000 20.903000
std 11.375469 0.653614 2822.736876 12.058814
min 19.000000 0.000000 250.000000 4.000000
25% 27.000000 2.000000 1365.500000 12.000000
50% 33.000000 2.000000 2319.500000 18.000000
75% 42.000000 2.000000 3972.250000 24.000000
max 75.000000 3.000000 18424.000000 72.000000
print("\n Others Statistics:")
print(df.describe(include='object'))
Others Statistics:
Sex Housing Saving accounts Checking account Purpose Risk
count 1000 1000 817 606 1000 1000
unique 2 3 4 3 8 2
top male own little little car good
freq 690 713 603 274 337 700
Step 3: Exploratory Data Analysis (EDA)ΒΆ
Before building any model, we need to understand the data. In finance, this is called “Know Your Data” (KYD). We check:
- Class imbalance in the target
- Distribution of key features
- Missing values
- Correlations
target_counts = df['Risk'].value_counts()
target_counts
Risk good 700 bad 300 Name: count, dtype: int64
# ββ 3.1 Visualize target class imbalance
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
target_counts = df['Risk'].value_counts()
colors = ['#2ecc71', '#e74c3c']
axes[0].bar(target_counts.index, target_counts.values, color=colors, edgecolor='black')
axes[0].set_title('Target Class Distribution (Count)', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Risk')
axes[0].set_ylabel('Count')
for i, v in enumerate(target_counts.values):
axes[0].text(i, v + 5, str(v), ha='center', fontweight='bold')
axes[1].pie(target_counts.values, labels=target_counts.index,
autopct='%1.1f%%', colors=colors, startangle=90,
wedgeprops={'edgecolor': 'black'})
axes[1].set_title('Target Class Distribution (%)', fontsize=13, fontweight='bold')
plt.suptitle('β οΈ Class Imbalance: ~70% Good vs ~30% Bad', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()
print("\nπ NOTE: This 70/30 imbalance means a naive model that predicts 'good' every time")
print(" would get 70% accuracy β but it would MISS all bad risks! We must address this.")
π NOTE: This 70/30 imbalance means a naive model that predicts 'good' every time would get 70% accuracy β but it would MISS all bad risks! We must address this.
# ββ 3.2 Missing values - Ideally these are handled after split into train + test
# To keep it simple, I am doing early.
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(2)
missing_df = pd.DataFrame({'Missing Count': missing, 'Missing %': missing_pct})
missing_df = missing_df[missing_df['Missing Count'] > 0].sort_values('Missing %', ascending=False)
print("Missing Values Summary:")
print(missing_df)
# Visualise
if not missing_df.empty:
missing_df['Missing %'].plot(kind='bar', color='#e67e22', figsize=(8, 4), edgecolor='black')
plt.title('Missing Values by Feature (%)', fontweight='bold')
plt.ylabel('Missing %')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Missing Values Summary:
Missing Count Missing %
Checking account 394 39.4
Saving accounts 183 18.3
# ββ 3.3 Age distribution by Risk
fig, axes = plt.subplots(1, 3, figsize=(16, 4))
# Age
df.groupby('Risk')['Age'].plot(kind='hist', bins=20, alpha=0.6, ax=axes[0],
legend=True, color=['#2ecc71', '#e74c3c'])
axes[0].set_title('Age Distribution by Risk', fontweight='bold')
axes[0].set_xlabel('Age')
# Credit Amount
df.groupby('Risk')['Credit amount'].plot(kind='hist', bins=20, alpha=0.6,
ax=axes[1], legend=True)
axes[1].set_title('Credit Amount by Risk', fontweight='bold')
axes[1].set_xlabel('Credit Amount')
# Duration
df.groupby('Risk')['Duration'].plot(kind='hist', bins=20, alpha=0.6,
ax=axes[2], legend=True)
axes[2].set_title('Loan Duration by Risk', fontweight='bold')
axes[2].set_xlabel('Duration (months)')
plt.tight_layout()
plt.show()
# ββ 3.4 1) Categorical features vs Risk
ct = pd.crosstab(df["Sex"], df['Risk'], normalize='index') * 100
ct
| Risk | bad | good |
|---|---|---|
| Sex | ||
| female | 35.161290 | 64.838710 |
| male | 27.681159 | 72.318841 |
# ββ 3.4 2) Categorical features vs Risk
cat_features = ['Sex', 'Housing', 'Saving accounts', 'Checking account', 'Purpose']
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()
for i, col in enumerate(cat_features):
ct = pd.crosstab(df[col], df['Risk'], normalize='index') * 100
ct.plot(kind='bar', ax=axes[i], color=['#e74c3c', '#2ecc71'],
edgecolor='black', width=0.7)
axes[i].set_title(f'{col} vs Risk (%)', fontweight='bold')
axes[i].set_xlabel('')
axes[i].set_ylabel('Percentage')
axes[i].legend(title='Risk')
axes[i].tick_params(axis='x', rotation=30)
axes[-1].axis('off') # hide unused subplot
plt.suptitle('Categorical Features vs Credit Risk', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
Step 4: PreprocessingΒΆ
Random Forest can handle many data types but sklearn requires numerical inputs. We need to:
- Encode categorical variables using Label Encoding
- Handle missing values β Random Forest can’t handle NaN natively in sklearn; we impute with the mode (most frequent value) for categorical features
- Address class imbalance using
class_weight='balanced'inside the model (no need for SMOTE)
π‘ Finance Insight: Label encoding is fine here since Random Forest splits on thresholds; it doesn’t assume ordinality the way linear models do.
# ββ 4.1 Copy and separate features / target
data = df.copy()
# Encode target: good=1, bad=0
data['Risk'] = data['Risk'].map({'good': 1, 'bad': 0})
data
| Age | Sex | Job | Housing | Saving accounts | Checking account | Credit amount | Duration | Purpose | Risk | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 67 | male | 2 | own | NaN | little | 1169 | 6 | radio/TV | 1 |
| 1 | 22 | female | 2 | own | little | moderate | 5951 | 48 | radio/TV | 0 |
| 2 | 49 | male | 1 | own | little | NaN | 2096 | 12 | education | 1 |
| 3 | 45 | male | 2 | free | little | little | 7882 | 42 | furniture/equipment | 1 |
| 4 | 53 | male | 2 | free | little | little | 4870 | 24 | car | 0 |
| … | … | … | … | … | … | … | … | … | … | … |
| 995 | 31 | female | 1 | own | little | NaN | 1736 | 12 | furniture/equipment | 1 |
| 996 | 40 | male | 3 | own | little | little | 3857 | 30 | car | 1 |
| 997 | 38 | male | 2 | own | little | NaN | 804 | 12 | radio/TV | 1 |
| 998 | 23 | male | 2 | free | little | little | 1845 | 45 | radio/TV | 0 |
| 999 | 27 | male | 2 | own | moderate | moderate | 4576 | 45 | car | 1 |
1000 rows Γ 10 columns
# ββ 4.2 Impute missing values with mode (most frequent)
# 'Saving accounts' and 'Checking account' have NAs β likely 'none/no account'
cat_cols = data.select_dtypes(include='object').columns.tolist()
print(f"Categorical columns: {cat_cols}")
for col in cat_cols:
mode_val = data[col].mode()[0]
missing_count = data[col].isnull().sum()
data[col].fillna(mode_val, inplace=True)
if missing_count > 0:
print(f" β '{col}': filled {missing_count} NAs with mode = '{mode_val}'")
Categorical columns: ['Sex', 'Housing', 'Saving accounts', 'Checking account', 'Purpose'] β 'Saving accounts': filled 183 NAs with mode = 'little' β 'Checking account': filled 394 NAs with mode = 'little'
data.isnull().sum()
Age 0 Sex 0 Job 0 Housing 0 Saving accounts 0 Checking account 0 Credit amount 0 Duration 0 Purpose 0 Risk 0 dtype: int64
# ββ 4.3 Label encode all categorical columns
le = LabelEncoder()
encoding_map = {} # store mapping for interpretability
for col in cat_cols:
data[col] = le.fit_transform(data[col])
encoding_map[col] = dict(zip(le.classes_, le.transform(le.classes_)))
print("\n Encoding complete. Mappings:")
for col, mapping in encoding_map.items():
print(f" {col}: {mapping}")
Encoding complete. Mappings:
Sex: {'female': 0, 'male': 1}
Housing: {'free': 0, 'own': 1, 'rent': 2}
Saving accounts: {'little': 0, 'moderate': 1, 'quite rich': 2, 'rich': 3}
Checking account: {'little': 0, 'moderate': 1, 'rich': 2}
Purpose: {'business': 0, 'car': 1, 'domestic appliances': 2, 'education': 3, 'furniture/equipment': 4, 'radio/TV': 5, 'repairs': 6, 'vacation/others': 7}
# ββ 4.4 Correlation heatmap (after encoding)
plt.figure(figsize=(10, 8))
corr = data.corr()
print("correlation matrix:\n",corr,"\n")
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f', cmap='RdYlGn',
center=0, linewidths=0.5, square=True)
plt.title('Correlation Heatmap (Encoded Features)', fontweight='bold', fontsize=13)
plt.tight_layout()
plt.show()
print("\nπ Note: Random Forest does NOT require uncorrelated features (unlike linear models).")
print(" High correlation between features is not a problem here.")
correlation matrix:
Age Sex Job Housing Saving accounts \
Age 1.000000 0.161694 0.015673 -0.301419 0.015772
Sex 0.161694 1.000000 0.070298 -0.219844 -0.014425
Job 0.015673 0.070298 1.000000 -0.107191 -0.034596
Housing -0.301419 -0.219844 -0.107191 1.000000 0.043324
Saving accounts 0.015772 -0.014425 -0.034596 0.043324 1.000000
Checking account -0.027176 -0.012705 -0.043277 -0.028196 0.015763
Credit amount 0.032716 0.093482 0.285385 -0.135632 -0.077929
Duration -0.036136 0.081432 0.210910 -0.157049 -0.043274
Purpose -0.074084 -0.063231 -0.025326 0.020633 -0.024817
Risk 0.091127 0.075493 -0.032735 -0.019315 0.102751
Checking account Credit amount Duration Purpose \
Age -0.027176 0.032716 -0.036136 -0.074084
Sex -0.012705 0.093482 0.081432 -0.063231
Job -0.043277 0.285385 0.210910 -0.025326
Housing -0.028196 -0.135632 -0.157049 0.020633
Saving accounts 0.015763 -0.077929 -0.043274 -0.024817
Checking account 1.000000 0.006953 0.004163 0.018577
Credit amount 0.006953 1.000000 0.624984 -0.151720
Duration 0.004163 0.624984 1.000000 -0.083459
Purpose 0.018577 -0.151720 -0.083459 1.000000
Risk -0.052375 -0.154739 -0.214927 0.061145
Risk
Age 0.091127
Sex 0.075493
Job -0.032735
Housing -0.019315
Saving accounts 0.102751
Checking account -0.052375
Credit amount -0.154739
Duration -0.214927
Purpose 0.061145
Risk 1.000000
π Note: Random Forest does NOT require uncorrelated features (unlike linear models). High correlation between features is not a problem here.
# ββ 4.5 Train-test split
X = data.drop('Risk', axis=1)
y = data['Risk']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42,
stratify=y # stratify preserves 70/30 split
)
print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"\nClass balance in training set:")
print(y_train.value_counts(normalize=True).rename({1: 'good', 0: 'bad'}).apply(lambda x: f"{x:.1%}"))
Training set: 800 samples Test set: 200 samples Class balance in training set: Risk good 70.0% bad 30.0% Name: proportion, dtype: object
Step 5: Train the Random ForestΒΆ
Key Hyperparameters to understand:ΒΆ
| Parameter | What it controls | Our choice |
|---|---|---|
n_estimators |
Number of trees in the forest | 200 |
max_depth |
How deep each tree can grow | None (full) |
max_features |
Features considered at each split | 'sqrt' |
class_weight |
Penalise misclassifying minority class | 'balanced' |
random_state |
Reproducibility | 42 |
π‘
class_weight='balanced'is our fix for the 70/30 imbalance. It automatically weights bad-risk samples ~2.3Γ more heavily, so the model pays equal attention to both classes.
# ββ 5.1 Train Random Forest
rf_model = RandomForestClassifier(
n_estimators=200, # 200 decision trees in the ensemble
max_features='sqrt', # sqrt(n_features) features considered per split β reduces correlation between trees
class_weight='balanced', # β KEY: handles the 70/30 imbalance automatically
random_state=42,
n_jobs=-1 # use all CPU cores
)
rf_model.fit(X_train, y_train)
print("Random Forest trained successfully!")
print(f" Number of trees: {rf_model.n_estimators}")
print(f" Number of features: {rf_model.n_features_in_}")
Random Forest trained successfully! Number of trees: 200 Number of features: 9
# ββ 5.2 Feature Importance β the 'why' behind predictions
feature_importance = pd.DataFrame({
'Feature': X.columns,
'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=True)
print("feature_importance:\n", feature_importance)
plt.figure(figsize=(9, 6))
bars = plt.barh(feature_importance['Feature'], feature_importance['Importance'],
color='steelblue', edgecolor='black')
plt.xlabel('Feature Importance (Mean Decrease in Impurity)', fontsize=11)
plt.title('π³ Random Forest β Feature Importance', fontweight='bold', fontsize=13)
for bar, val in zip(bars, feature_importance['Importance']):
plt.text(val + 0.002, bar.get_y() + bar.get_height()/2,
f'{val:.3f}', va='center', fontsize=9)
plt.tight_layout()
plt.show()
print("\nπ Finance Insight: Feature importance helps satisfy regulatory explainability")
print(" requirements (e.g., Basel III, GDPR 'right to explanation').")
feature_importance:
Feature Importance
1 Sex 0.032913
3 Housing 0.044831
5 Checking account 0.045936
4 Saving accounts 0.047025
2 Job 0.057612
8 Purpose 0.096604
7 Duration 0.177227
0 Age 0.209685
6 Credit amount 0.288167
π Finance Insight: Feature importance helps satisfy regulatory explainability requirements (e.g., Basel III, GDPR 'right to explanation').
Step 6: Test on 1 SampleΒΆ
Let’s simulate what happens when a new loan applicant comes in. We pass their data through the model and get a prediction with probability.
# ββ 6.1 Pick one sample from the test set ββββββββββββββββββββββββββββββββββββ
sample_idx = 0
sample = X_test.iloc[[sample_idx]] # double brackets keep it as a DataFrame
actual_label = y_test.iloc[sample_idx]
# Predict
prediction = rf_model.predict(sample)[0]
probabilities = rf_model.predict_proba(sample)[0]
label_map = {1: 'GOOD β
', 0: 'BAD β'}
print("=" * 50)
print(" LOAN APPLICATION DECISION")
print("=" * 50)
print(f"\nApplicant features:")
print(sample.to_string())
print(f"\nModel Prediction : {label_map[prediction]}")
print(f"Actual Label : {label_map[actual_label]}")
print(f"\nProbability of BAD risk : {probabilities[0]:.1%}")
print(f"Probability of GOOD risk : {probabilities[1]:.1%}")
print("=" * 50)
if prediction == actual_label:
print("\nπ― Prediction is CORRECT!")
else:
print("\nβ Prediction is WRONG β this is the cost of Type I / Type II error in credit risk.")
==================================================
LOAN APPLICATION DECISION
==================================================
Applicant features:
Age Sex Job Housing Saving accounts Checking account Credit amount Duration Purpose
977 42 1 2 1 0 1 2427 18 0
Model Prediction : GOOD β
Actual Label : GOOD β
Probability of BAD risk : 16.0%
Probability of GOOD risk : 84.0%
==================================================
π― Prediction is CORRECT!
Step 7: Model EvaluationΒΆ
In credit risk, the cost of errors is asymmetric:
- False Negative (predicting ‘good’ when actually ‘bad’) β Bank loses money on a defaulting loan πΈ
- False Positive (predicting ‘bad’ when actually ‘good’) β Bank loses business opportunity
We use multiple metrics:
- Precision / Recall / F1: Better than accuracy for imbalanced data
- ROC-AUC: Measures overall discriminating power (closer to 1 = better)
- Confusion Matrix: Shows exact breakdown of prediction errors
# ββ 7.1 Predictions on full test set βββββββββββββββββββββββββββββββββββββββββ
y_pred_rf = rf_model.predict(X_test)
y_prob_rf = rf_model.predict_proba(X_test)[:, 1]
# Also compare with a single Decision Tree (to show RF advantage)
dt_model = DecisionTreeClassifier(class_weight='balanced', random_state=42)
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
y_prob_dt = dt_model.predict_proba(X_test)[:, 1]
print("=" * 55)
print(" RANDOM FOREST β Classification Report")
print("=" * 55)
print(classification_report(y_test, y_pred_rf, target_names=['Bad (0)', 'Good (1)']))
print("=" * 55)
print(" DECISION TREE β Classification Report (for comparison)")
print("=" * 55)
print(classification_report(y_test, y_pred_dt, target_names=['Bad (0)', 'Good (1)']))
=======================================================
RANDOM FOREST β Classification Report
=======================================================
precision recall f1-score support
Bad (0) 0.59 0.32 0.41 60
Good (1) 0.76 0.91 0.82 140
accuracy 0.73 200
macro avg 0.67 0.61 0.62 200
weighted avg 0.71 0.73 0.70 200
=======================================================
DECISION TREE β Classification Report (for comparison)
=======================================================
precision recall f1-score support
Bad (0) 0.36 0.35 0.35 60
Good (1) 0.72 0.73 0.73 140
accuracy 0.61 200
macro avg 0.54 0.54 0.54 200
weighted avg 0.61 0.61 0.61 200
cm = confusion_matrix(y_test, y_pred_rf)
print("confusion matrix for random forest:\n", cm)
# 'true' normalizes over the true labels (rows)
cm_normalized = confusion_matrix(y_test, y_pred_rf, normalize='true').round(2) * 100
print("confusion matrix for random forest(Normalized):\n", cm_normalized)
disp = ConfusionMatrixDisplay(confusion_matrix=cm_normalized)
disp.plot()
# plt.show()
confusion matrix for random forest: [[ 19 41] [ 13 127]] confusion matrix for random forest(Normalized): [[32. 68.] [ 9. 91.]]
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x2281cc09a10>
InterpretationΒΆ
Looking at these results through a credit risk lens β here’s what the numbers actually mean for a bank.
π¦ The Core Problem: Bad Loans Are Being MissedΒΆ
The confusion matrix tells the real story. For every 100 actual bad-risk applicants, the Random Forest:
- Correctly rejects only 32 of them β
- Approves 68 of them as if they were good risk β β This is the bank’s financial loss
For every 100 actual good-risk applicants, the model:
- Correctly approves 91 of them β
- Wrongly rejects only 9 β β Lost business, but no financial loss
π Metric-by-Metric BreakdownΒΆ
Random Forest vs Decision TreeΒΆ
| Metric | RF β Bad (0) | RF β Good (1) | DT β Bad (0) | DT β Good (1) |
|---|---|---|---|---|
| Precision | 0.59 | 0.76 | 0.36 | 0.72 |
| Recall | 0.32 | 0.91 | 0.35 | 0.73 |
| F1-Score | 0.41 | 0.82 | 0.35 | 0.73 |
Precision (Bad class = 0.59 for RF): When the model does flag someone as bad risk, it is right 59% of the time. That’s decent β rejections are more accurate than random. The Decision Tree is only right 36% of the time β barely better than a coin flip.
Recall (Bad class = 0.32 for RF): This is the critical number. The model only catches 32% of actual bad borrowers. This means 68 out of every 100 bad loans get approved. In a real bank portfolio, this would translate directly to Non-Performing Loans (NPLs).
F1-Score (Bad class = 0.41 for RF): The harmonic mean of precision and recall. Anything below 0.5 on the minority class is a signal that the model is struggling with bad-risk detection β despite class_weight='balanced'.
π Why Is Bad-Recall So Low?ΒΆ
Three likely causes:
1. The 70/30 imbalance is still dominating. Even with class_weight='balanced', the model has seen far more “good” examples and has learned more confident patterns for that class. The 32% recall on bad risk means the signal for default is genuinely weak in this dataset.
2. The dataset is small. With only 1,000 rows (200 in test), you have just 60 bad-risk test cases. Statistical noise is high β even one or two wrong splits in the trees affect recall significantly.
3. The default 0.5 decision threshold is wrong for this problem. The model outputs a probability, and we call someone “bad risk” only if P(bad) > 0.5. But in credit risk, you might want to flag anyone with P(bad) > 0.3 as bad. This threshold controls the precision-recall tradeoff.
βοΈ The Business TradeoffΒΆ
This is the fundamental tension in credit risk modeling:
Raise the bad-risk threshold β Catch more bad loans (higher recall)
but reject more good customers (lower precision)
Lower the bad-risk threshold β Approve more good customers (good business)
but miss more bad loans (financial loss)
The right threshold depends on the cost ratio: how much does one bad loan cost vs. how much profit does one good loan generate?
This is a business decision, not a model decision.
β What RF Does Well vs Decision TreeΒΆ
RF is clearly better across every metric:
- Overall accuracy: 73% vs 61% β a 12-point improvement
- Bad-risk precision: 0.59 vs 0.36 β RF’s rejections are nearly twice as trustworthy
- Good-risk recall: 0.91 vs 0.73 β RF approves 18% more legitimate customers
- Macro F1: 0.62 vs 0.54 β RF is more balanced across both classes
The Decision Tree is essentially unreliable for the bad class (precision 0.36, recall 0.35 β barely better than random). This is the overfitting problem of single trees in action.
π The One Number That Matters to a Risk ManagerΒΆ
68% of bad loans are being approved. Everything else is context around that number. A risk manager’s job is to decide whether that is acceptable, and if not, what precision-recall tradeoff they are willing to accept to reduce it.
# ββ 7.2 Confusion matrices side-by-side
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
for ax, y_pred, title in zip(
axes,
[y_pred_rf, y_pred_dt],
['Random Forest', 'Decision Tree (baseline)']
):
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=['Bad', 'Good'])
disp.plot(ax=ax, colorbar=False, cmap='Blues')
ax.set_title(title, fontweight='bold', fontsize=12)
plt.suptitle('Confusion Matrix Comparison', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
# ββ 7.3 ROC Curves
plt.figure(figsize=(8, 6))
for y_prob, label, color in [
(y_prob_rf, 'Random Forest', 'steelblue'),
(y_prob_dt, 'Decision Tree', 'tomato')
]:
fpr, tpr, _ = roc_curve(y_test, y_prob)
auc = roc_auc_score(y_test, y_prob)
plt.plot(fpr, tpr, color=color, lw=2.5, label=f'{label} (AUC = {auc:.3f})')
plt.plot([0, 1], [0, 1], 'k--', lw=1.5, label='Random Classifier (AUC = 0.5)')
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate (Recall)', fontsize=12)
plt.title('ROC Curve β Random Forest vs Decision Tree', fontweight='bold', fontsize=13)
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
rf_auc = roc_auc_score(y_test, y_prob_rf)
dt_auc = roc_auc_score(y_test, y_prob_dt)
print(f"\nπ AUC Score β Random Forest : {rf_auc:.4f}")
print(f" AUC Score β Decision Tree : {dt_auc:.4f}")
print(f" Improvement : +{(rf_auc - dt_auc):.4f}")
print("\nπ AUC > 0.7 is generally considered acceptable in credit scoring (Basel III standard).")
π AUC Score β Random Forest : 0.6573 AUC Score β Decision Tree : 0.5393 Improvement : +0.1180 π AUC > 0.7 is generally considered acceptable in credit scoring (Basel III standard).
AUC of 0.65 is mediocreΒΆ
(OPTIONAL)ΒΆ
Step 8: Save and Load the ModelΒΆ
In production banking systems, models are saved and versioned so that:
- The same model is used consistently for all decisions
- Models can be audited and rolled back if needed
- Regulators can inspect the exact model used on any given date
We use joblib β the standard tool for saving sklearn models.
# ββ 8.1 Save the model
model_path = 'rf_credit_risk_model.joblib'
joblib.dump(rf_model, model_path)
print(f"β
Model saved to: {model_path}")
# ββ 8.2 Load the model back
loaded_model = joblib.load(model_path)
print(f"β
Model loaded successfully!")
# ββ 8.3 Verify loaded model gives same predictions
loaded_pred = loaded_model.predict(X_test)
assert (loaded_pred == y_pred_rf).all(), "β Loaded model gives different predictions!"
print("β
Verification passed: loaded model matches original model predictions exactly.")
β Model saved to: rf_credit_risk_model.joblib β Model loaded successfully! β Verification passed: loaded model matches original model predictions exactly.
Step 9: Using the Loaded Model on a New ApplicantΒΆ
This simulates how a bank’s loan officer would use the model in real life.
# ββ New applicant (manually entered data) β
# Create a new applicant using the encoded values from encoding_map above
# For reference: Sex {'female':0,'male':1}, Housing {'free':0,'own':1,'rent':2},
# Saving accounts {'little':0,'moderate':1,'NA(none)':2,'quite rich':3,'rich':4}
# Checking account {'little':0,'moderate':1,'NA(none)':2,'rich':3}
new_applicant = pd.DataFrame([{
'Age': 34,
'Sex': 1, # male
'Job': 2,
'Housing': 1, # own
'Saving accounts': 0, # little
'Checking account': 2, # none/NA
'Credit amount': 8500,
'Duration': 36,
'Purpose': encoding_map['Purpose'].get('car', 0)
}])
pred = loaded_model.predict(new_applicant)[0]
prob = loaded_model.predict_proba(new_applicant)[0]
print("π¦ LOAN DECISION SYSTEM (using saved model)")
print("=" * 45)
print(f" Prediction : {'APPROVED β
' if pred == 1 else 'REJECTED β'}")
print(f" P(Good risk) : {prob[1]:.1%}")
print(f" P(Bad risk) : {prob[0]:.1%}")
print("=" * 45)
π¦ LOAN DECISION SYSTEM (using saved model) ============================================= Prediction : APPROVED β P(Good risk) : 83.0% P(Bad risk) : 17.0% =============================================
π SummaryΒΆ
| Topic | Key Takeaway |
|---|---|
| Class Imbalance | Use class_weight='balanced' β never ignore it in finance |
| Random Forest vs Decision Tree | RF is more stable, less prone to overfitting |
| Feature Importance | Credit amount, duration, and age drive risk most |
| AUC over Accuracy | Always use AUC for imbalanced credit data |
| Model Saving | Essential for audit trails and regulatory compliance |
| Probability scores | More useful than binary yes/no β allows risk tiering |
