AI Resume Matching System

Rabeeha Kamran, Zainab Tahir, Hashaam Waheed

Project Structure

resume_matcher/
├── app.py                   # Streamlit web UI
├── main.py                  # Command-line pipeline runner
├── requirements.txt
├── resumes/
└── modules/
    ├── text_extractor.py    # Stage 1: PDF text extraction
    ├── nlp_processor.py     # Stage 2: NER + preprocessing
    ├── csp_engine.py        # Stage 3: Hard constraint filtering
    ├── similarity_scorer.py # Stage 4: TF-IDF cosine similarity
    └── ml_ranker.py         # Stage 5: ML suitability ranking

Datasets & Input Files

Sample Resume PDFs (for pipeline testing)

The resumes/ folder is not included in the repository (excluded via .gitignore). Generate the 15 sample resume PDFs locally by running:

python create_sample_data.py

This creates profiles covering strong matches, moderate matches, and CSP failures.

Kaggle Resume Dataset (for ML model training)

The ML model is trained on the Resume Dataset from Kaggle. Download Resume.csv and place it at:

Resume/Resume.csv

Then retrain the model:

python dataset_loader.py --csv Resume/Resume.csv --samples 30

A pre-trained saved_model.pkl is also excluded from the repo and will be auto-generated on first run of main.py or app.py using synthetic data if the Kaggle CSV is not available.

Setup (One-time)

1. Create a virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Download SpaCy model

python -m spacy download en_core_web_sm

Running the System

Option A: Streamlit Web UI (recommended)

streamlit run app.py

Then open http://localhost:8501 in your browser.

Option B: Command Line

# Put resume PDFs in the resumes/ folder first
mkdir resumes
# then:
python main.py

Pipeline Stages

Stage	Module	Technique
1	text_extractor.py	PyPDF2 + pdfplumber
2	nlp_processor.py	SpaCy NER + NLTK lemmatization
3	csp_engine.py	Constraint Satisfaction (hard filter)
4	similarity_scorer.py	TF-IDF + Cosine Similarity
5	ml_ranker.py	Random Forest classifier

Composite Score Formula

f(C) = 0.5 × CosineSimilarity
     + 0.3 × (Matched_Skills / Total_JD_Skills)
     + 0.2 × Normalized_Experience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Resume Matching System

Rabeeha Kamran, Zainab Tahir, Hashaam Waheed

Project Structure

Datasets & Input Files

Sample Resume PDFs (for pipeline testing)

Kaggle Resume Dataset (for ML model training)

Setup (One-time)

1. Create a virtual environment

2. Install dependencies

3. Download SpaCy model

Running the System

Option A: Streamlit Web UI (recommended)

Option B: Command Line

Pipeline Stages

Composite Score Formula

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
modules		modules
.gitignore		.gitignore
README.md		README.md
app.py		app.py
create_sample_data.py		create_sample_data.py
dataset_loader.py		dataset_loader.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Resume Matching System

** Rabeeha Kamran, Zainab Tahir, Hashaam Waheed**

Project Structure

Datasets & Input Files

Sample Resume PDFs (for pipeline testing)

Kaggle Resume Dataset (for ML model training)

Setup (One-time)

1. Create a virtual environment

2. Install dependencies

3. Download SpaCy model

Running the System

Option A: Streamlit Web UI (recommended)

Option B: Command Line

Pipeline Stages

Composite Score Formula

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Rabeeha Kamran, Zainab Tahir, Hashaam Waheed

Packages