Applications-Screening

A privacy-conscious pilot data pipeline for Sanctuary Scholarship screening support and planning. Free-text responses were coded using a controlled taxonomy with three layers: barriers/support needs, motivation/aspiration themes, and data-quality indicators. Because course-interest responses often describe social issues as part of the course subject rather than as personal applicant barriers, barrier tags were treated as requiring human validation.

Project Overview

This project transforms fragmented, multi-year scholarship application records into a consistent analytics-ready dataset to support:

cleaner and more consistent screening workflows,
transparent follow-up flags for incomplete applications,
thematic insight from anonymised free-text responses,
trend reporting across cycles,
planning dashboards for demand, capacity, and support needs.

This pipeline is designed for decision support, not decision automation. Human review remains central.

Why this project exists

Scholarship applications often live across different workbooks, sheet structures, and historical formats. Valuable insight is lost when data cannot be consistently joined and analysed.

This repository provides a practical framework to:

inventory source files and structures,
standardise fields into a canonical schema,
anonymise sensitive data,
generate screening quality flags,
produce dashboard-ready outputs.

Scope

Included

Data inventory and schema mapping.
Multi-sheet / multi-file integration by application ID.
Data quality checks and completeness indicators.
Anonymisation and privacy-by-design handling.
Starter pathway for rule-based and AI-assisted thematic tagging on anonymised text.

Not included

Automated scholarship decision-making.
Use of identifiable applicant data in external AI tools.

Repository Structure

Applications-Screening/
├── README.md
├── data/
│   ├── raw/           # Original source files (do not edit)
│   ├── processed/     # Cleaned and standardised outputs
│   └── anonymised/    # Analysis-ready anonymised outputs
├── docs/
│   ├── data_inventory.xlsx
│   ├── data_dictionary.csv
│   └── column_mapping.csv
└── src/
    ├── 01_profile_raw.py
    ├── 02_map_and_union.py
    └── 03_anonymise.py

Suggested Workflow

Inventory all files, sheets, and relationships.
Map source columns to canonical fields.
Ingest + union yearly records with provenance.
Anonymise sensitive columns and generate internal IDs.
Flag missing/inconsistent screening fields.
Tag themes in anonymised free text with human validation.
Report in Power BI.

Data Inventory Model

Use a relational inventory workbook with tabs:

files (one row per physical file),
tables (one row per sheet or standalone table),
relationships (table joins via application ID),
columns (one row per column per table),
issues_log (data quality/action tracker).

This structure handles years where referees are in separate files and years where they are in workbook sheets.

Privacy and Governance Principles

Remove or mask direct identifiers before analysis.
Use anonymised applicant IDs.
Restrict reporting to aggregated insights.
Keep sensitive raw data in controlled storage.
Maintain an auditable transformation process.

Portfolio Positioning

This project demonstrates real-world data science capability in:

messy multi-source integration,
reproducible data engineering,
responsible NLP on sensitive domains,
analytics communication for non-technical stakeholders,
governance-aware AI implementation.

Next Steps

Add inventory templates and sample mapping files.
Implement initial profiling, mapping, and anonymisation scripts.
Add evaluation notebook for thematic tagging agreement.
Publish dashboard screenshots and methodology notes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
src		src
.gitignore		.gitignore
Conference Poster.pptx		Conference Poster.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applications-Screening

Project Overview

Why this project exists

Scope

Included

Not included

Repository Structure

Suggested Workflow

Data Inventory Model

Privacy and Governance Principles

Portfolio Positioning

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Applications-Screening

Project Overview

Why this project exists

Scope

Included

Not included

Repository Structure

Suggested Workflow

Data Inventory Model

Privacy and Governance Principles

Portfolio Positioning

Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages