Skip to content
View ivanrivasgr's full-sized avatar
😉
😉
  • Spiro.ai
  • Corpus Christi, Texas
  • LinkedIn in/ifrg

Block or report ivanrivasgr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ivanrivasgr/README.md

Ivan Gruber

Sports Data Engineer | Real-Time Analytics | Cloud Data Platforms

I build production-style data systems for sports analytics: event ingestion, medallion lakehouse pipelines, streaming decision engines, ML-ready feature layers, and dashboards that turn raw game data into operational insight.

Portfolio | LinkedIn | Email


Current Focus

I am building Bullpen Signal, a real-time vs batch decision engine for pitcher fatigue, bullpen readiness, and matchup leverage.

The project is designed to answer a practical architecture question: when does a manager need a fast provisional signal, when does the organization need canonical batch truth, and how do you reconcile the gap between both?

Bullpen Signal | CI/CD


Selected Projects

Bullpen Signal

Dual-path baseball decision engine using deterministic MLB game replay, Redpanda/Kafka-style event streams, Flink jobs, Iceberg lakehouse tables, dbt incremental models, quality checks, lineage, observability, and a Streamlit serving layer.

Why it matters: demonstrates the boundary between streaming systems and batch truth in a high-pressure operational sports setting.

Redpanda Flink Iceberg dbt Great Expectations OpenLineage Prometheus Grafana Streamlit

Repository | GitHub Actions

BaseballIQ

End-to-end MLB analytics platform with Statcast-style data, Bronze/Silver/Gold modeling, pitcher CSW prediction, SHAP explainability, cached LLM scouting insights, and a polished Streamlit dashboard.

Why it matters: shows a complete analytics product, from data modeling and ML framing to user-facing decision support.

Python DuckDB Parquet XGBoost SHAP Streamlit Plotly Pytest

Live Demo | Repository

Soccer Data Platform

Production-style sports data platform using Bronze/Silver/Gold architecture for GPS tracking data, validation, Parquet transformation, CI/CD, Airflow orchestration, and Terraform-backed AWS S3 lake design.

Python Airflow Terraform AWS S3 Parquet Pytest GitHub Actions Streamlit

Live Demo | Repository

Sports Injury Risk Intelligence

ML pipeline for football injury risk using point-in-time correct features, leakage guards, CI coverage gates, confidence intervals, and out-of-distribution flags.

Python Scikit-learn Great Expectations Delta Lake Pytest GitHub Actions

Live Demo | Repository


Engineering Strengths

  • Building data products around real operational decisions, not just dashboards.
  • Designing batch, streaming, and reconciliation layers with clear tradeoffs.
  • Working across ingestion, transformation, quality, orchestration, ML features, and serving.
  • Translating sports domain complexity into testable data contracts and useful analytics interfaces.

Stack

Languages: Python, SQL, Ruby
Data: DuckDB, Parquet, Iceberg, Delta Lake, BigQuery, Redshift
Streaming & orchestration: Redpanda, Kafka-style eventing, Apache Flink, Airflow, dbt, GitHub Actions
Cloud & infra: GCP, AWS, Docker Compose, Terraform, MinIO, S3
Quality & observability: Pytest, Great Expectations, OpenLineage, Prometheus, Grafana
Analytics & apps: Streamlit, Plotly, Power BI, SHAP, scikit-learn, XGBoost
Sports data: MLB Statcast, pybaseball, MLB StatsAPI, pitch-by-pitch tracking, GPS tracking data


Background

I have 5+ years of experience designing and operating analytics pipelines across sports data, cloud ETL, BI automation, and client-facing data delivery.

At Synergy Sports / Sportradar, I worked with live MLB game data, pitch-by-pitch feeds, QA workflows, and structured datasets used downstream for analytics and broadcast reporting.

At Vikua, I delivered cloud analytics systems across multiple client environments, improving time-to-insight, reliability, and compute efficiency.

I am currently completing the MIT MicroMasters in Statistics and Data Science track, with a focus on statistical modeling and computation.


Contact

Portfolio
LinkedIn
Email

I care about data systems that hold up under pressure: live decisions, late corrections, measurable reliability, and interfaces that make complex information usable.

Pinned Loading

  1. ruby_dropbox_file_automation- ruby_dropbox_file_automation- Public

    Automated workflow that reads files from Dropbox, transforms CSVs (cleaning and formatting data), and sends them to a data pipeline — fully serverless and powered by Ruby + Cron scheduling.

    Ruby 1

  2. financial_analytics_construction_projects financial_analytics_construction_projects Public

    End-to-end financial analytics project for construction companies. Built SQL models and interactive Metabase dashboards to track income, expenses, and profitability across multiple projects.