Ingests energy IoT data from your home devices.
Stores, processes, and delivers real-time alerts and AI insights.
Built for real production workloads with AWS EKS, decoupled microservices, and end-to-end observability.
Infrastructure
- AWS architecture is 100% Infrastructure as Code with Terraform, deployed through Helm.
- Cloud-native Kubernetes with HPA autoscaling, self-healing, and rolling deployments.
CI/CD
- Automated GitHub Actions pipeline that validates changed code -> builds images -> deploys to EKS.
- Testcontainers (MySQL, InfluxDB) and EmbeddedKafka for complete E2E test coverage.
Observability
- Full Grafana observability stack with platform-wide tracing, log aggregation, and metrics.
Data Layer
- Scalable DB layer with MySQL + read replicas for relational data and InfluxDB + Redis for real-time analytics.
| Backend | |
| Frontend | |
| Data / Messaging | |
| AI Inference | |
| Observability | |
| Infrastructure |
Data Pipeline
- Ingests streamed power readings from IoT devices into Kafka, writing consumption data into InfluxDB and enabling temporal aggregations and per-device statistics.
- Caches frequent queries in Redis to avoid bottlenecks with horizontally scaled microservices
Alerts
- Emits real-time alerts when usage exceeds configured limits, sending email notifications and persisting events in MySQL.
AI Insights
- Generates sustainability recommendations, contextualized by your recent usage, devices, time of day, location, and more.
Dashboard and Auth
- Real-time dashboard with live summary cards, energy charts, alert history, and AI insights panel.
- Handles auth with Google OAuth and credential login, issuing JWTs with stateless backend validation and middleware-enforced route protection.
Application runs on AWS EKS with managed RDS and MSK Serverless.
- All 7 services autoscale with HPA (2-5 replicas) and survive node drains with PodDisruptionBudgets.
- Infrastructure defined in Terraform, and deployments use the same Helm charts with EKS-specific value overlays.
- IRSA roles for pod level IAM, eliminating shared node roles and static credentials.
Route53 (energy.aidanchien.com)
β
NLB (internet-facing)
β
ingress-nginx controller
β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β β
frontend user-svc device-svc usage-svc ... (7 svcs)
β β β β β
ββββββββ inter-service with ClusterIP DNS βββββββββ
β β
βΌ βΌ
RDS MySQL MSK Serverless
insight-service βββΊ AWS Bedrock (Claude Haiku)
via cross-region inference profile (IRSA)
βοΈ Services
| Service | Port | Persistence | What it does |
|---|---|---|---|
| user-service | 8080 | MySQL (primary + replica) | Handles registration, login, JWT issuance, and user profile CRUD. Flyway-managed schema, read replicas for query scaling. |
| device-service | 8081 | MySQL (primary + replica) | Manages IoT device registration and ownership. Validates users through inter-service REST calls. |
| ingestion-service | 8082 | - | Accepts energy readings via REST and publishes them to Kafka. Includes multi-threaded simulator for load testing. |
| usage-service | 8083 | InfluxDB, Redis | Consumes readings from Kafka, stores time-series data in InfluxDB, caches queries in Redis, and creates alert events when thresholds are exceeded. |
| alert-service | 8084 | MySQL (primary + replica) | Consumes alert events from Kafka, persists them, and sends email notifications with SMTP. |
| insight-service | 8085 | - | Polls recent usage data and sends it to an LLM (Ollama locally, Bedrock on EKS) to generate energy efficiency recommendations. |
π¨ Frontend
Next.js dashboard (App Router, TypeScript, Tailwind, shadcn/ui) served through nginx reverse proxy.
energy_tracker.mp4
- Google OAuth 2.0 + credentials login through NextAuth.js - backend validates tokens and issues HMAC-SHA signed JWTs.
- Route protection - middleware redirects unauthenticated users, stateless backend with no server-side sessions.
| Route | Description |
|---|---|
/login |
Email/password + Google OAuth sign-in |
/register |
Account registration (auto-signs in on success) |
/dashboard |
Summary cards, energy bar chart, AI insights panel |
/dashboard/devices |
Device table, embedded InfluxDB explorer |
/dashboard/alerts |
Email alert inbox |
- Summary Cards - device count, 7-day energy consumption, alert count (live polling).
- Energy Chart - per-device 7-day consumption bar chart using Recharts.
- AI Insights Panel - LLM-generated efficiency recommendations with confidence scores.
π Observability
All 6 services are tracked across the three pillars, all aggregated in Grafana.
Spring Boot Services
βββ /actuator/prometheus βββββββββββββββΊ Prometheus ββββββΊ Grafana
βββ OTLP traces (HTTP :4318) βββββββββββΊ Tempo βββββββββββΊ Grafana
βββ stdout JSON (ECS format) ββΊ Promtail ββΊ Loki ββββββββββΊ Grafana
- Metrics - Prometheus scrapes all 6 services every 15s (JVM, HTTP, Kafka producer/consumer).
- Tracing - OpenTelemetry traces pushed to Tempo, with trace context propagated across Kafka spans.
- Logs - Structured JSON to stdout, collected by Promtail into Loki. Log-to-trace correlation in Grafana.
| Dashboard | Description |
|---|---|
| Service Health Overview | HTTP request rates, error rates, latency per service |
| JVM Metrics | Heap, GC pause, thread count per service |
| Kafka Event Pipeline | Producer/consumer lag, throughput per topic |
| IoT Business Metrics | Device count, energy readings, alert frequency |
| Service Health Overview | JVM Metrics |
|---|---|
|
|
π CI/CD Pipeline
push to main / tag v*
β
βΌ
build-and-push.yml
βββ detect-changes β dorny/paths-filter scans services/** + frontend/
βββ build matrix β one runner per changed service (parallel)
β βββ OIDC β assume iot-tracker-dev-gha-deploy role
β βββ docker buildx β push to ECR Public
β βββ tags: <sha> + latest (insight-service: +bedrock variant)
βΌ
deploy-eks.yml (auto-triggered or manual)
βββ OIDC β assume deploy role
βββ helm upgrade infra (values-eks.yaml)
βββ helm upgrade observability (values-eks.yaml)
βββ helm upgrade microservices (values-eks.yaml, --set image.tag=<sha>)
- Change detection - only rebuilds services with actual file changes.
- Concurrency guard - only one deploy-eks run at a time.
- Claude PR reviews - automatically reviewed for security and compliance by Claude.
All workflows use GitHub OIDC, so IAM role assumption (no stored AWS credentials).
π‘οΈ Production Hardening
| Feature | Configuration |
|---|---|
| HPA | All 7 services autoscale: min 2, max 5 replicas, target 70% CPU |
| PDB | All 7 services: minAvailable=1 |
| NetworkPolicies | Default-deny + per-service ingress/egress allowlists |
| CloudWatch Alarms | RDS CPU >80%, connections >53/66 max, free storage <4 GiB, EKS failed nodes |
| TLS throughout | Let's Encrypt cert, TLS termination at ingress |
| AWS Secrets Manager | No plaintext in values files, synced using ExternalSecrets |
| IRSA | Pod-level IAM, ex. insight-service has Bedrock access only |
π¦ Deployments (Local / EKS)
Three deployment targets: Docker Compose (local dev), Minikube (local k8s), and AWS EKS (production).
docker compose up -d # core stack
docker compose -f docker-compose.observability.yml up -d # observability stackminikube start --gpus all --driver=docker --memory=8192 --cpus=6
minikube addons enable ingress
helm install infra ./k8s/charts/infra-chart
helm install observability ./k8s/charts/observability-chart
helm install microservices ./k8s/charts/microservices-chart
minikube tunnel # in separate terminal, so we can expose services to localhostAll infrastructure defined in Terraform; deployments use the same Helm charts with EKS-specific value overlays.
| Resource | Details |
|---|---|
| VPC | 3 public, 3 private, 3 DB subnets, NAT gateway |
| EKS | K8s 1.31, managed node group (t3.large Γ 3) |
| RDS MySQL | db.t3.medium, encrypted, 7-day backup, Performance Insights |
| MSK Serverless | IAM auth (SASL/SSL), topics: energy-usage, energy-alerts |
| Route53 + ACM | energy.aidanchien.com, Let's Encrypt TLS via cert-manager |
| Secrets Manager | 3 secrets synced to cluster using External Secrets Operator |
| IRSA | 7 pod-level IAM roles (ALB Controller, EBS CSI, ESO, ExternalDNS, cert-manager, insight-service, GHA deploy) |
| CloudWatch + SNS | 4 alarms (RDS CPU, connections, storage, EKS node health) |
| ECR Public | 8 image repos, SHA-tagged deploys |
# 1. Provision infrastructure (~8 min)
cd terraform/envs/dev
terraform init && terraform apply
# 2. Configure kubectl
aws eks update-kubeconfig --name iot-tracker-dev --region us-east-1
# 3. Install cluster addons
./scripts/eks-bootstrap.sh
# 4. Deploy Helm charts
helm upgrade --install infra ./k8s/charts/infra-chart -f ./k8s/charts/infra-chart/values-eks.yaml
helm upgrade --install observability ./k8s/charts/observability-chart -f ./k8s/charts/observability-chart/values-eks.yaml
helm upgrade --install microservices ./k8s/charts/microservices-chart -f ./k8s/charts/microservices-chart/values-eks.yamlhelm uninstall microservices && helm uninstall observability && helm uninstall infra
cd terraform/envs/dev && terraform destroyπ΅ Cost
| Component | Spec | Monthly |
|---|---|---|
| EKS control plane | K8s 1.31, standard support | $73.00 |
| EC2 nodes | 3Γ t3.large @ $0.0832/hr | $182.21 |
| RDS instance | db.t3.medium MySQL, single-AZ @ $0.068/hr | $49.64 |
| RDS storage | gp3, ~20 GB allocated @ $0.115/GB | $2.30 |
| MSK Serverless cluster | Flat $0.75/cluster-hr | $547.50 |
| MSK Serverless partitions | ~10 partitions @ $0.0015/hr | $10.95 |
| NAT Gateway | 1Γ @ $0.045/hr + data processing | ~$34 |
| NLB | Base @ $0.0225/hr + minimal LCU | ~$21 |
| EBS gp3 | 7 in-cluster PVCs + 3 node roots (~130 GB) | ~$11 |
| Secrets Manager | 3 secrets @ $0.40/mo + API calls | $1.20 |
| Route53 hosted zone | 1 zone | $0.50 |
| CloudWatch | 4 alarms (free tier), basic metrics/logs | ~$3 |
| ACM certificates | Public certs are free | $0 |
| Bedrock (Haiku 4.5) | Pay-per-token, low personal usage | ~$1β5 |
| Data transfer out | Internet egress | ~$1β3 |
| Total | ~$940/mo |
This is very expensive :( so production environment is currently spun down. All infrastructure reproducible with
terraform apply+helm install(see Deploying to EKS).
I connected the system to my own home using Shelly smartplugs. Check out the /shelly folder for how you can do it too!
services/ # 6 Spring Boot microservices (user, device, ingestion, usage, alert, insight)
frontend/ # Next.js 16 (App Router, Tailwind v4, shadcn/ui)
shelly/ # Shelly smart plug integration script + setup guide
observability/
βββ prometheus/ # prometheus.yml - scrape configs for all services
βββ grafana/ # Provisioned datasources (Prometheus, Loki, Tempo) + 4 dashboards
βββ loki/ # loki.yaml - in-memory ring, TSDB storage
βββ promtail/ # promtail.yaml - Docker SD, ECS JSON pipeline
βββ tempo/ # tempo.yaml - OTLP receivers, local storage
k8s/
βββ charts/
βββ infra-chart/ # MySQL (primary + replica), Kafka, InfluxDB, Redis, Mailpit, Kafka UI, Ollama
βββ observability-chart/ # Prometheus, Grafana, Loki, Promtail, Tempo
βββ microservices-chart/ # 6 services + frontend, shared ConfigMap/Secret/Ingress, HPA/PDB/NetworkPolicies
terraform/
βββ envs/dev/ # VPC, EKS, RDS, MSK Serverless, IAM (IRSA), Route53, ACM, CloudWatch + SNS alarms
scripts/
βββ eks-bootstrap.sh # Installs EKS cluster addons (ALB Controller, ESO, cert-manager, etc.)
.github/workflows/
βββ build-and-push.yml # CI: detect changed services, build, push to ECR
βββ deploy-eks.yml # CD: helm upgrade infra -> observability -> microservices
βββ build-test.yml # Manual single-service rebuild from any branch
βββ claude-review.yml # AI code review on PRs
docker-compose.yml # Core application stack
docker-compose.observability.yml # Prometheus, Grafana, Loki, Promtail, Tempo


