Skip to content

rabeehakamran/Resume-Matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Resume Matching System

** Rabeeha Kamran, Zainab Tahir, Hashaam Waheed**

Project Structure

resume_matcher/
├── app.py                   # Streamlit web UI
├── main.py                  # Command-line pipeline runner
├── requirements.txt
├── resumes/
└── modules/
    ├── text_extractor.py    # Stage 1: PDF text extraction
    ├── nlp_processor.py     # Stage 2: NER + preprocessing
    ├── csp_engine.py        # Stage 3: Hard constraint filtering
    ├── similarity_scorer.py # Stage 4: TF-IDF cosine similarity
    └── ml_ranker.py         # Stage 5: ML suitability ranking

Datasets & Input Files

Sample Resume PDFs (for pipeline testing)

The resumes/ folder is not included in the repository (excluded via .gitignore). Generate the 15 sample resume PDFs locally by running:

python create_sample_data.py

This creates profiles covering strong matches, moderate matches, and CSP failures.

Kaggle Resume Dataset (for ML model training)

The ML model is trained on the Resume Dataset from Kaggle. Download Resume.csv and place it at:

Resume/Resume.csv

Then retrain the model:

python dataset_loader.py --csv Resume/Resume.csv --samples 30

A pre-trained saved_model.pkl is also excluded from the repo and will be auto-generated on first run of main.py or app.py using synthetic data if the Kaggle CSV is not available.


Setup (One-time)

1. Create a virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Download SpaCy model

python -m spacy download en_core_web_sm

Running the System

Option A: Streamlit Web UI (recommended)

streamlit run app.py

Then open http://localhost:8501 in your browser.

Option B: Command Line

# Put resume PDFs in the resumes/ folder first
mkdir resumes
# then:
python main.py

Pipeline Stages

Stage Module Technique
1 text_extractor.py PyPDF2 + pdfplumber
2 nlp_processor.py SpaCy NER + NLTK lemmatization
3 csp_engine.py Constraint Satisfaction (hard filter)
4 similarity_scorer.py TF-IDF + Cosine Similarity
5 ml_ranker.py Random Forest classifier

Composite Score Formula

f(C) = 0.5 × CosineSimilarity
     + 0.3 × (Matched_Skills / Total_JD_Skills)
     + 0.2 × Normalized_Experience

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages