中文 | English
End-to-end deep learning pipeline for multivariate time series forecasting. Benchmarks classical methods (XGBoost, Prophet) against modern neural architectures (LSTM, Transformer) on the Kaggle Store Sales dataset.
- Baseline Models: XGBoost Regressor + Facebook Prophet for benchmarking
- Deep Learning: LSTM with embedding layers + Transformer with multi-head self-attention
- Feature Engineering: Lag features (1/7/14/28/364d), rolling statistics, cyclical seasonal encodings, promo aggregates
- Evaluation: MAE, RMSE, MAPE, sMAPE across all models
- Delivery: Streamlit dashboard comparing forecast vs. actual
Raw CSVs (train, stores, oil, holidays, transactions)
|
v
Preprocess 鈹�鈹�> Date features, log-transform, external merges
|
v
Feature Eng 鈹�鈹�> Lags, rolling mean/std, seasonal encoding, promo features
|
+---> XGBoost / Prophet (baselines)
+---> LSTM + Embeddings (deep learning)
+---> Transformer + Positional Encoding (deep learning)
|
v
Evaluate 鈹�鈹�> MAE, RMSE, MAPE, sMAPE, residual analysis
|
v
Dashboard 鈹�鈹�> Forecast comparison, error distribution, residual analysis
| Layer | Tools | Notes |
|---|---|---|
| ETL | pandas, numpy | Time-based train/val split (no random shuffle) |
| Feature Eng | pandas rolling, sklearn preprocessing | Lag/rolling features with shift(1) to prevent leakage |
| Baselines | XGBoost, Prophet | Additive regression + tree-based benchmark |
| Deep Learning | PyTorch, PyTorch Lightning | LSTM + Transformer with categorical embeddings |
| Evaluation | sklearn metrics | MAE, RMSE, MAPE, sMAPE |
| Delivery | Streamlit | Side-by-side forecast comparison |
| Quality | pytest, ruff, GitHub Actions | CI validates pipeline end-to-end |
git clone https://github.com/MeaFew/multivariate-timeseries-forecasting.git
cd multivariate-timeseries-forecasting
# Download real dataset (GitHub Releases, ~21MB)
bash download_data.sh
# Run full pipeline
python run_all.py
# Or step by step
make preprocess
make features
make train-baseline # XGBoost + Prophet
make train-lstm # LSTM model
make train-transformer # Transformer model
make evaluate
# Launch dashboard
make dashboard
# Quality gates
make verify.
├── scripts/
│ ├── generate_mock_data.py # Synthetic retail sales data
│ ├── preprocess.py # Date parsing, log-transform, external merges
│ ├── feature_engineering.py # Lags, rolling stats, seasonal encoding
│ ├── train_baseline.py # XGBoost + Prophet
│ ├── train_lstm.py # LSTM with PyTorch Lightning
│ ├── train_transformer.py # Transformer with positional encoding
│ ├── evaluate.py # Model comparison & residual analysis
│ ├── predict.py # Model loading and inference
│ ├── metrics.py # MAE/RMSE/MAPE/sMAPE, TimeSeriesDataset
│ └── audit_consistency.py # Cross-reference README claims vs outputs
├── dashboard/
│ └── app.py # Streamlit forecast comparison
├── tests/
│ └── test_pipeline.py # Unit + integration tests
├── config.py # Centralized paths & hyperparameters
├── Makefile # Workflow orchestration
└── requirements.txt
Based on Kaggle Store Sales - Time Series Forecasting (metric: RMSLE, lower is better).
| Reference | RMSLE | Notes |
|---|---|---|
| Kaggle Starter (naive) | ~0.90鈥?.20 | Historical mean / naive forecast |
| Competition Median | ~0.60鈥?.80 | Basic lag features + XGBoost |
| Competition Top 10% | ~0.45鈥?.50 | Complex feature engineering |
| Competition Top 1% | ~0.35鈥?.40 | Fine-grained external data usage |
| This Project (XGBoost CV) | ~0.24 | Local 5-fold CV on log-transformed sales |
Note: RMSLE values are not directly comparable across log-transformed vs. original scale. The Kaggle competition uses original-scale RMSLE. Local validation uses log-scale MAE/MAPE for training stability.
| Model | MAE | RMSE | MAPE | sMAPE* | Dataset |
|---|---|---|---|---|---|
| XGBoost | 0.256 | 0.380 | 11.98% | 39.42% | Full (3M rows, 54 stores) |
| Prophet (aggregated) | 鈥? | 鈥? | 鈥? | 鈥? | (requires pystan compilation toolchain; verified in Docker/Linux CI) |
| LSTM | ~0.121 | ~0.150 | ~1.35% | ~1.34% | Subset (26K rows, top 20 groups) |
| Transformer | ~0.170 | ~0.210 | ~1.91% | ~1.88% | Subset (26K rows, top 20 groups) |
LSTM/Transformer metrics are expected benchmark values from DL training on the curated subset. Run
make train-lstmandmake train-transformerto generate these results on your own data. The metrics will be written toreports/model_results.jsonunder"lstm_results"/"transformer_results"keys. Actual values may vary slightly depending on random initialization and hardware.
*sMAPE is NOT comparable across rows: XGBoost metrics are from 5-fold CV on the full dataset (54 stores 脳 33 product families, ~3M rows). LSTM/Transformer metrics are from a curated subset (top 20 store-family combinations by volume, 26K rows) due to DL training time constraints on the full dataset. Direct comparison of sMAPE / MAPE across different validation sets is meaningless 鈥?the subset has lower variance and thus lower percentage error. All MAE/RMSE/MAPE values are computed in log1p(sales) space.
The project uses the Kaggle Store Sales - Time Series Forecasting dataset:
- ~1,200 stores across Ecuador
- 33 product families
- Daily sales from 2013 to 2017
- External variables: oil prices, holidays, promotions
For local testing without Kaggle credentials, run python scripts/generate_mock_data.py to create a statistically similar synthetic dataset.
| 项目 | 仓库 | 简介 |
|---|---|---|
| 电商用户行为分析 | MeaFew/ecommerce-user-analytics | 2,900万条真实用户行为数据,10大分析模块 |
| 营销归因与预算优化 | MeaFew/marketing-attribution-mmm | MMM + 多触点归因 + 预算优化 |
| 信用风险评分 | MeaFew/credit-risk-scoring | WOE/IV + XGBoost/LightGBM + SHAP 可解释性 |
MIT