GitHub - CemHarput/Mini-Fraud-Detection-Project: Mini Fraud Detection Project

Numpy + Pandas Lab (Mini Fraud Detection Project)

A small data-processing project for practicing pandas and numpy on synthetic banking transactions.

The pipeline:

Loads transaction data from data/fake_banking_transactions.csv
Cleans the data (remove duplicates, fill missing values)
Adds a feature flag (is_large) for high-value transactions
Applies a simple fraud rule (fraud_flag)
Saves processed output to data/processed/clean_transactions.csv

Project Structure

main.py - Entry point for the pipeline
src/clean.py - Data cleaning helpers
src/features.py - Feature engineering (is_large)
src/fraud_rules.py - Rule-based fraud tagging
data/fake_banking_transactions.csv - Raw synthetic dataset
data/processed/clean_transactions.csv - Processed output
notebooks/analysis.ipynb - Notebook for exploratory analysis

Dataset Schema

Input CSV columns:

customer_id (int)
amount (float)
merchant (str)
category (str)
date (YYYY-MM-DD)

The synthetic data intentionally includes:

Missing values
Duplicate rows
Outliers (very large and some negative transaction amounts)

Fraud Logic

Current rule in src/fraud_rules.py:

Mark transaction as fraud (fraud_flag = 1) when:
- amount > 5000 and
- category == "crypto"
Otherwise fraud_flag = 0

Feature in src/features.py:

is_large = 1 when amount > 2000, else 0

Setup

1) Create and activate virtual environment (Windows PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1

2) Install dependencies

pip install -r requirements.txt

Run the Project

.\.venv\Scripts\python.exe main.py

You should see:

Preview rows in terminal
Fraud count summary
Output written to data/processed/clean_transactions.csv

Next Improvements (Optional)

Replace rule-based logic with anomaly detection / ML baseline
Add tests for cleaning and fraud rules
Add configuration (thresholds, input/output paths) via .env or CLI arguments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Numpy + Pandas Lab (Mini Fraud Detection Project)

Project Structure

Dataset Schema

Fraud Logic

Setup

1) Create and activate virtual environment (Windows PowerShell)

2) Install dependencies

Run the Project

Next Improvements (Optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Numpy + Pandas Lab (Mini Fraud Detection Project)

Project Structure

Dataset Schema

Fraud Logic

Setup

1) Create and activate virtual environment (Windows PowerShell)

2) Install dependencies

Run the Project

Next Improvements (Optional)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages