resume_matcher/
├── app.py # Streamlit web UI
├── main.py # Command-line pipeline runner
├── requirements.txt
├── resumes/
└── modules/
├── text_extractor.py # Stage 1: PDF text extraction
├── nlp_processor.py # Stage 2: NER + preprocessing
├── csp_engine.py # Stage 3: Hard constraint filtering
├── similarity_scorer.py # Stage 4: TF-IDF cosine similarity
└── ml_ranker.py # Stage 5: ML suitability ranking
The resumes/ folder is not included in the repository (excluded via .gitignore).
Generate the 15 sample resume PDFs locally by running:
python create_sample_data.pyThis creates profiles covering strong matches, moderate matches, and CSP failures.
The ML model is trained on the Resume Dataset from Kaggle.
Download Resume.csv and place it at:
Resume/Resume.csv
Then retrain the model:
python dataset_loader.py --csv Resume/Resume.csv --samples 30A pre-trained saved_model.pkl is also excluded from the repo and will be auto-generated on first run of main.py or app.py using synthetic data if the Kaggle CSV is not available.
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activatepip install -r requirements.txtpython -m spacy download en_core_web_smstreamlit run app.pyThen open http://localhost:8501 in your browser.
# Put resume PDFs in the resumes/ folder first
mkdir resumes
# then:
python main.py| Stage | Module | Technique |
|---|---|---|
| 1 | text_extractor.py | PyPDF2 + pdfplumber |
| 2 | nlp_processor.py | SpaCy NER + NLTK lemmatization |
| 3 | csp_engine.py | Constraint Satisfaction (hard filter) |
| 4 | similarity_scorer.py | TF-IDF + Cosine Similarity |
| 5 | ml_ranker.py | Random Forest classifier |
f(C) = 0.5 × CosineSimilarity
+ 0.3 × (Matched_Skills / Total_JD_Skills)
+ 0.2 × Normalized_Experience