Skip to content

hemamalini0708/Credit-Card-Risk-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Risk Prediction System

An end-to-end Machine Learning project that predicts whether a customer is a Good or Bad credit risk based on financial and demographic attributes.

This project follows a structured ML pipeline including data preprocessing, feature engineering, model training, evaluation, and deployment using Streamlit.

App Screenshot


Problem Statement

Banks need to identify high-risk customers before approving loans.

The objective of this project is to build a classification model that segments customers into:

  • Good Customer (Low Risk)
  • Bad Customer (High Risk)

This helps financial institutions reduce default risk and improve decision-making.


Dataset

The dataset contains customer financial and demographic information including:

  • Debt Ratio
  • Monthly Income
  • Number of Open Credit Lines
  • Real Estate Loans
  • Number of Dependents
  • Education Level
  • Region
  • NPA Status (Target Variable)

🎯 Target Variable

  • Good → 1
  • Bad → 0

Machine Learning Pipeline

The project strictly follows a production-style ML workflow:

Data Preprocessing

  • Removed invalid rows
  • Handled missing values using random sampling
  • Log transformation on skewed features
  • Outlier treatment using quantile capping
  • Train-test split (before feature engineering to prevent data leakage)

Feature Engineering

  • One-Hot Encoding for nominal features
  • Ordinal Encoding for ordered categorical features
  • Variance Threshold feature selection
  • Hypothesis testing (p-value filtering)
  • SMOTE for handling class imbalance
  • StandardScaler for feature scaling

Model Training

Models implemented:

  • K-Nearest Neighbors
  • Naive Bayes
  • Logistic Regression
  • Decision Tree
  • Random Forest

Model Evaluation

  • Accuracy Score
  • Confusion Matrix
  • Classification Report
  • ROC Curve comparison
image

Final Model Selection

Based on performance comparison and ROC analysis, the best performing model was selected and saved as:

credit_card.pkl


Deployment

The trained model is deployed using Streamlit.

The web app allows users to:

  • Enter financial details
  • Automatically scale inputs
  • Predict credit risk
  • Display risk level with confidence score

Installation

git clone https://github.com/hemamalini0708/Credit-Card-Risk-Prediction.git
cd Credit-Card-Risk-Prediction
pip install -r requirements.txt

Run Locally

pip install -r requirements.txt
streamlit run app/credit_risk_streamlit.py

Tech Stack

  • Python
  • Pandas
  • NumPy
  • Scikit-Learn
  • Imbalanced-Learn (SMOTE)
  • Matplotlib
  • Streamlit

credit-risk-prediction/

│
├── data/
├── models/
├── src/
├── app/
├── README.md
├── requirements.txt
└── .gitignore

Business Impact

  • Reduces loan default risk
  • Supports automated credit approval decisions
  • Enables risk-based customer segmentation
  • Improves financial risk management strategy

Author

Hema Malini Gangumalla

Aspiring Data Scientist

📧 hemamalinig07@gmail.com

License

MIT License

About

End-to-end binary classifier on 150K+ records. Used SMOTE to resolve severe class imbalance (8K → 111K samples), benchmarked 5 models, achieved 89.1% accuracy and AUC 0.818 with Random Forest. Production-ready with pickle deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages