Skip to content

CemHarput/Mini-Fraud-Detection-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Numpy + Pandas Lab (Mini Fraud Detection Project)

A small data-processing project for practicing pandas and numpy on synthetic banking transactions.

The pipeline:

  • Loads transaction data from data/fake_banking_transactions.csv
  • Cleans the data (remove duplicates, fill missing values)
  • Adds a feature flag (is_large) for high-value transactions
  • Applies a simple fraud rule (fraud_flag)
  • Saves processed output to data/processed/clean_transactions.csv

Project Structure

main.py - Entry point for the pipeline
src/clean.py - Data cleaning helpers
src/features.py - Feature engineering (is_large)
src/fraud_rules.py - Rule-based fraud tagging
data/fake_banking_transactions.csv - Raw synthetic dataset
data/processed/clean_transactions.csv - Processed output
notebooks/analysis.ipynb - Notebook for exploratory analysis

Dataset Schema

Input CSV columns:

  • customer_id (int)
  • amount (float)
  • merchant (str)
  • category (str)
  • date (YYYY-MM-DD)

The synthetic data intentionally includes:

  • Missing values
  • Duplicate rows
  • Outliers (very large and some negative transaction amounts)

Fraud Logic

Current rule in src/fraud_rules.py:

  • Mark transaction as fraud (fraud_flag = 1) when:
    • amount > 5000 and
    • category == "crypto"
  • Otherwise fraud_flag = 0

Feature in src/features.py:

  • is_large = 1 when amount > 2000, else 0

Setup

1) Create and activate virtual environment (Windows PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1

2) Install dependencies

pip install -r requirements.txt

Run the Project

.\.venv\Scripts\python.exe main.py

You should see:

  • Preview rows in terminal
  • Fraud count summary
  • Output written to data/processed/clean_transactions.csv

Next Improvements (Optional)

  • Replace rule-based logic with anomaly detection / ML baseline
  • Add tests for cleaning and fraud rules
  • Add configuration (thresholds, input/output paths) via .env or CLI arguments

About

Mini Fraud Detection Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors