Full data engineering pipeline for FMCG retail analytics: Amazon S3 to Snowflake to Power BI
End-to-end data analytics pipeline analyzing 2,500 household transactions over 2 years for an FMCG company. Covers the full data engineering lifecycle from raw data ingestion to executive dashboard delivery.
Raw CSV Files (transactions, households, products)
|
v
Amazon S3 (raw data lake)
|
v [Snowflake Storage Integration + AUTO_INGEST pipe]
Snowflake Data Warehouse
- STAGING schema (raw tables)
- ANALYTICS schema (transformed views)
|
v
Python EDA (Jupyter Notebook)
- Data quality checks
- Feature engineering
- Campaign effectiveness analysis
|
v
Power BI Dashboard
- Campaign performance KPIs
- Customer segmentation
- Product category analysis
- Campaign response rates across demographic segments
- Basket size and purchase frequency by household type
- Product category affinity and cross-sell patterns
- Coupon redemption effectiveness
| Layer | Technology |
|---|---|
| Storage | Amazon S3 |
| Data Warehouse | Snowflake (AUTO_INGEST pipes, IAM integration) |
| Transformation | Python (Pandas, NumPy) |
| Visualization | Power BI (DAX measures, custom visuals) |
| Notebooks | Jupyter |
Retail-Analysis/
├── notebooks/ # EDA and analysis notebooks
├── sql/ # Snowflake DDL and transformation scripts (19KB)
├── dashboards/ # Power BI .pbix files
├── docs/
│ ├── ERD.pdf # Entity relationship diagram
│ └── analysis_brief.pdf # Business analysis summary
└── README.md