An end-to-end Machine Learning project that predicts whether a customer is a Good or Bad credit risk based on financial and demographic attributes.
This project follows a structured ML pipeline including data preprocessing, feature engineering, model training, evaluation, and deployment using Streamlit.
Banks need to identify high-risk customers before approving loans.
The objective of this project is to build a classification model that segments customers into:
- Good Customer (Low Risk)
- Bad Customer (High Risk)
This helps financial institutions reduce default risk and improve decision-making.
The dataset contains customer financial and demographic information including:
- Debt Ratio
- Monthly Income
- Number of Open Credit Lines
- Real Estate Loans
- Number of Dependents
- Education Level
- Region
- NPA Status (Target Variable)
- Good → 1
- Bad → 0
The project strictly follows a production-style ML workflow:
- Removed invalid rows
- Handled missing values using random sampling
- Log transformation on skewed features
- Outlier treatment using quantile capping
- Train-test split (before feature engineering to prevent data leakage)
- One-Hot Encoding for nominal features
- Ordinal Encoding for ordered categorical features
- Variance Threshold feature selection
- Hypothesis testing (p-value filtering)
- SMOTE for handling class imbalance
- StandardScaler for feature scaling
Models implemented:
- K-Nearest Neighbors
- Naive Bayes
- Logistic Regression
- Decision Tree
- Random Forest
- Accuracy Score
- Confusion Matrix
- Classification Report
- ROC Curve comparison
Based on performance comparison and ROC analysis, the best performing model was selected and saved as:
credit_card.pkl
The trained model is deployed using Streamlit.
The web app allows users to:
- Enter financial details
- Automatically scale inputs
- Predict credit risk
- Display risk level with confidence score
git clone https://github.com/hemamalini0708/Credit-Card-Risk-Prediction.git
cd Credit-Card-Risk-Prediction
pip install -r requirements.txt
pip install -r requirements.txt
streamlit run app/credit_risk_streamlit.pyTech Stack
- Python
- Pandas
- NumPy
- Scikit-Learn
- Imbalanced-Learn (SMOTE)
- Matplotlib
- Streamlit
credit-risk-prediction/
│
├── data/
├── models/
├── src/
├── app/
├── README.md
├── requirements.txt
└── .gitignore
Business Impact
- Reduces loan default risk
- Supports automated credit approval decisions
- Enables risk-based customer segmentation
- Improves financial risk management strategy
Author
Hema Malini Gangumalla
Aspiring Data Scientist
License
MIT License
