Skip to content

barvepriyamvada/SQL-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 This project is Part 1 of a two-part portfolio.
See Healthcare Executive Dashboard for the Tableau visualization layer built on these findings.

Healthcare SQL Analytics — CMS Medicare Data

Python SQL SQLite Data

Structured SQL analysis on 169,635 real Medicare records to answer the cost and quality questions hospital executives and health insurers actually act on.


Project Overview

This project loads three publicly available CMS Medicare datasets — covering 3,000+ US hospitals, 6 tracked diagnoses, and one full fiscal year of readmission and cost data — into a local SQLite database and applies 20 layered SQL queries to surface operational and financial insights.

The analysis is organized the way a real analytics engagement would be: starting with data quality and exploration, moving through aggregations and segmentation, and ending with five executive-ready business question answers backed by specific numbers. The findings below are drawn directly from the query outputs — not approximations.


Key Business Findings

# Finding Key Number
1 Septicemia is the #1 national cost driver — DRG 871 alone generates more Medicare spend than the next two conditions combined $10.3B estimated total payments
2 Heart Failure has the largest readmission burden — nearly 1 in 5 patients returns within 30 days 19.4% 30-day readmission rate across 309,000 discharges
3 CABG is the worst cost-quality combination — the most expensive service line still carries above-benchmark readmission rates $49,121 avg cost per discharge, ERR > 1.0
4 AdventHealth Orlando is the highest national penalty risk — penalised on 5 of 6 tracked diagnoses simultaneously ~$911K estimated annual CMS penalty exposure
5 Massachusetts is structurally failing on readmissions — worst-performing state in 4 of 6 HRRP diagnoses despite high hospital density ERR > 1.05 across AMI, COPD, HF, Pneumonia
6 Critical-tier hospitals identified in Alaska and Alabama — excess readmission ratios above 1.20, flagged across multiple diagnoses ERR up to 1.23 (Central Peninsula General, AK)

Dataset Sources

All data is sourced from data.cms.gov — public domain, no license or account required.

Dataset What It Contains
Medicare Inpatient Hospitals by Provider and Service Average charges, total and Medicare payments by DRG code and hospital
Hospital Readmissions Reduction Program (HRRP) FY2024 Excess readmission ratios, predicted vs. expected rates, patient volumes by facility and diagnosis
Hospital General Information Facility type, ownership, state, CMS star rating — one row per hospital

Raw CSVs are excluded from this repo via .gitignore. To run immediately without downloading data, use python scripts/create_sample_db.py to generate a sample database — all 20 queries will execute against it.


SQL Techniques Demonstrated

Category Techniques
Exploration Row counts, NULL profiling, distinct value checks, schema validation
Aggregations GROUP BY, HAVING, conditional aggregation with CASE WHEN inside SUM
Segmentation Cost quartiles with NTILE(), diagnosis performance tiers, facility risk classification
JOINs INNER JOIN and LEFT JOIN across 3 tables; scalar subqueries in SELECT clause
Window Functions RANK() and PARTITION BY for within-state ranking, PERCENT_RANK() for national percentile, LAG() for period comparison, NTILE() for distribution buckets
CTEs Multi-step CTEs (4+ stages) for readmission risk scoring and cost efficiency indexing
Business Questions Five end-to-end queries combining all of the above into executive-ready outputs

Query Results

BQ16: Patient Segments by Readmission Risk BQ16 - Patient Segments by Readmission Risk

BQ17: Top Cost Drivers by Service Line BQ17 - Top Cost Drivers by Service Line

BQ18: Cost vs Quality Analysis BQ18 - Cost vs Quality Analysis

BQ19: Worst Performing Hospitals by State BQ19 - Worst Performing Hospitals by State

BQ20: Executive Summary Dashboard BQ20 - Executive Summary Dashboard


How to Run

  1. Clone the repo

    git clone https://github.com/barvepriyamvada/SQL-Analytics.git
    cd SQl-Analytics
  2. Install dependencies

    pip install -r requirements.txt
  3. Generate the sample database (no data download needed)

    python scripts/create_sample_db.py
  4. Run any query against the database

    sqlite3 data/healthcare_analytics.db < queries/07_business_questions/16_bq_highest_readmission_segments.sql

    Or open data/healthcare_analytics.db in DB Browser for SQLite and run any .sql file interactively.

  5. Optional — load real CMS data for full results Download the three CSVs from data.cms.gov, place them in data/raw/, then:

    python scripts/load_data.py

Project Structure

├── data/
│   ├── raw/                          # CMS CSVs — not committed to git
│   └── healthcare_analytics.db       # SQLite database (generated)
│
├── queries/
│   ├── 01_exploration/               # Row counts, data quality checks
│   ├── 02_aggregations/              # National averages, cost summaries
│   ├── 03_segmentation/              # Facility tiers, cost quartiles
│   ├── 04_joins/                     # Multi-table analysis
│   ├── 05_window_functions/          # RANK, LAG, NTILE, PERCENT_RANK
│   ├── 06_cte_analysis/              # Multi-step CTEs, composite scoring
│   └── 07_business_questions/        # Executive-ready answers to BQs 1–5
│
├── outputs/                          # Query result CSVs and screenshots
├── scripts/
│   ├── create_sample_db.py           # Generates sample database instantly
│   └── load_data.py                  # Full data loading and cleaning pipeline
└── README.md

Built as part of a data analytics portfolio. Data sourced from CMS.gov — public domain.

About

SQL analytics case study on 169,635 real CMS Medicare records — 20 queries across window functions, CTEs, and multi-table JOINs analyzing cost and readmission data across 3,000+ US hospitals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors