Skip to content

chieaid24/canopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

207 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Canopy

Ingests energy IoT data from your home devices.

Stores, processes, and delivers real-time alerts and AI insights.

Built for real production workloads with AWS EKS, decoupled microservices, and end-to-end observability.

Github Release

IoT Telemetry System Design

Technical Highlights

Infrastructure

  • AWS architecture is 100% Infrastructure as Code with Terraform, deployed through Helm.
  • Cloud-native Kubernetes with HPA autoscaling, self-healing, and rolling deployments.

CI/CD

  • Automated GitHub Actions pipeline that validates changed code -> builds images -> deploys to EKS.
  • Testcontainers (MySQL, InfluxDB) and EmbeddedKafka for complete E2E test coverage.

Observability

  • Full Grafana observability stack with platform-wide tracing, log aggregation, and metrics.

Data Layer

  • Scalable DB layer with MySQL + read replicas for relational data and InfluxDB + Redis for real-time analytics.

Tools Used

Backend Java Spring Boot Spring AI
Frontend TypeScript Next.js Tailwind CSS
Data / Messaging MySQL (RDS) InfluxDB Flyway Redis Apache Kafka (MSK)
AI Inference AWS Bedrock Ollama
Observability Prometheus Grafana Loki
Infrastructure Kubernetes (EKS) Helm Docker Terraform GitHub Actions

Functional Overview

Data Pipeline

  • Ingests streamed power readings from IoT devices into Kafka, writing consumption data into InfluxDB and enabling temporal aggregations and per-device statistics.
  • Caches frequent queries in Redis to avoid bottlenecks with horizontally scaled microservices

Alerts

  • Emits real-time alerts when usage exceeds configured limits, sending email notifications and persisting events in MySQL.

AI Insights

  • Generates sustainability recommendations, contextualized by your recent usage, devices, time of day, location, and more.

Dashboard and Auth

  • Real-time dashboard with live summary cards, energy charts, alert history, and AI insights panel.
  • Handles auth with Google OAuth and credential login, issuing JWTs with stateless backend validation and middleware-enforced route protection.

AWS-Specific Architecture

Application runs on AWS EKS with managed RDS and MSK Serverless.

  • All 7 services autoscale with HPA (2-5 replicas) and survive node drains with PodDisruptionBudgets.
  • Infrastructure defined in Terraform, and deployments use the same Helm charts with EKS-specific value overlays.
  • IRSA roles for pod level IAM, eliminating shared node roles and static credentials.
                Route53 (energy.aidanchien.com)
                                 β”‚
                          NLB (internet-facing)
                                 β”‚
                        ingress-nginx controller
                                 β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚           β”‚            β”‚           β”‚            β”‚
    frontend   user-svc    device-svc   usage-svc    ... (7 svcs)
        β”‚           β”‚            β”‚           β”‚            β”‚
        └─────── inter-service with ClusterIP DNS β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚                    β”‚
                     β–Ό                    β–Ό
                 RDS MySQL           MSK Serverless

        insight-service ──► AWS Bedrock (Claude Haiku)
                            via cross-region inference profile (IRSA)

Feature Details

βš™οΈ Services
Service Port Persistence What it does
user-service 8080 MySQL (primary + replica) Handles registration, login, JWT issuance, and user profile CRUD. Flyway-managed schema, read replicas for query scaling.
device-service 8081 MySQL (primary + replica) Manages IoT device registration and ownership. Validates users through inter-service REST calls.
ingestion-service 8082 - Accepts energy readings via REST and publishes them to Kafka. Includes multi-threaded simulator for load testing.
usage-service 8083 InfluxDB, Redis Consumes readings from Kafka, stores time-series data in InfluxDB, caches queries in Redis, and creates alert events when thresholds are exceeded.
alert-service 8084 MySQL (primary + replica) Consumes alert events from Kafka, persists them, and sends email notifications with SMTP.
insight-service 8085 - Polls recent usage data and sends it to an LLM (Ollama locally, Bedrock on EKS) to generate energy efficiency recommendations.
🎨 Frontend

Next.js dashboard (App Router, TypeScript, Tailwind, shadcn/ui) served through nginx reverse proxy.

Demo

energy_tracker.mp4

Auth

  • Google OAuth 2.0 + credentials login through NextAuth.js - backend validates tokens and issues HMAC-SHA signed JWTs.
  • Route protection - middleware redirects unauthenticated users, stateless backend with no server-side sessions.

Pages

Route Description
/login Email/password + Google OAuth sign-in
/register Account registration (auto-signs in on success)
/dashboard Summary cards, energy bar chart, AI insights panel
/dashboard/devices Device table, embedded InfluxDB explorer
/dashboard/alerts Email alert inbox

Dashboard Components

  • Summary Cards - device count, 7-day energy consumption, alert count (live polling).
  • Energy Chart - per-device 7-day consumption bar chart using Recharts.
  • AI Insights Panel - LLM-generated efficiency recommendations with confidence scores.
πŸ” Observability

All 6 services are tracked across the three pillars, all aggregated in Grafana.

Spring Boot Services
  β”œβ”€β”€ /actuator/prometheus  ──────────────► Prometheus ─────► Grafana
  β”œβ”€β”€ OTLP traces (HTTP :4318) ──────────► Tempo ──────────► Grafana
  └── stdout JSON (ECS format) ─► Promtail ─► Loki ─────────► Grafana
  • Metrics - Prometheus scrapes all 6 services every 15s (JVM, HTTP, Kafka producer/consumer).
  • Tracing - OpenTelemetry traces pushed to Tempo, with trace context propagated across Kafka spans.
  • Logs - Structured JSON to stdout, collected by Promtail into Loki. Log-to-trace correlation in Grafana.

Provisioned Dashboards

Dashboard Description
Service Health Overview HTTP request rates, error rates, latency per service
JVM Metrics Heap, GC pause, thread count per service
Kafka Event Pipeline Producer/consumer lag, throughput per topic
IoT Business Metrics Device count, energy readings, alert frequency
Service Health Overview JVM Metrics
Grafana Service Health Overview Grafana JVM Metrics
πŸš€ CI/CD Pipeline

Overview

push to main / tag v*
    β”‚
    β–Ό
build-and-push.yml
    β”œβ”€β”€ detect-changes      ← dorny/paths-filter scans services/** + frontend/
    β”œβ”€β”€ build matrix        ← one runner per changed service (parallel)
    β”‚     β”œβ”€β”€ OIDC β†’ assume iot-tracker-dev-gha-deploy role
    β”‚     β”œβ”€β”€ docker buildx β†’ push to ECR Public
    β”‚     └── tags: <sha> + latest (insight-service: +bedrock variant)
    β–Ό
deploy-eks.yml (auto-triggered or manual)
    β”œβ”€β”€ OIDC β†’ assume deploy role
    β”œβ”€β”€ helm upgrade infra (values-eks.yaml)
    β”œβ”€β”€ helm upgrade observability (values-eks.yaml)
    └── helm upgrade microservices (values-eks.yaml, --set image.tag=<sha>)
  • Change detection - only rebuilds services with actual file changes.
  • Concurrency guard - only one deploy-eks run at a time.
  • Claude PR reviews - automatically reviewed for security and compliance by Claude.

All workflows use GitHub OIDC, so IAM role assumption (no stored AWS credentials).

πŸ›‘οΈ Production Hardening

Production Hardening

Feature Configuration
HPA All 7 services autoscale: min 2, max 5 replicas, target 70% CPU
PDB All 7 services: minAvailable=1
NetworkPolicies Default-deny + per-service ingress/egress allowlists
CloudWatch Alarms RDS CPU >80%, connections >53/66 max, free storage <4 GiB, EKS failed nodes
TLS throughout Let's Encrypt cert, TLS termination at ingress
AWS Secrets Manager No plaintext in values files, synced using ExternalSecrets
IRSA Pod-level IAM, ex. insight-service has Bedrock access only
πŸ“¦ Deployments (Local / EKS)

Three deployment targets: Docker Compose (local dev), Minikube (local k8s), and AWS EKS (production).

Run With Docker

docker compose up -d                                    # core stack
docker compose -f docker-compose.observability.yml up -d  # observability stack

Run With Kubernetes (Minikube)

Requires: Minikube, Helm, kubectl

minikube start --gpus all --driver=docker --memory=8192 --cpus=6
minikube addons enable ingress

helm install infra ./k8s/charts/infra-chart
helm install observability ./k8s/charts/observability-chart
helm install microservices ./k8s/charts/microservices-chart

minikube tunnel  # in separate terminal, so we can expose services to localhost

Run on AWS EKS (Production)

All infrastructure defined in Terraform; deployments use the same Helm charts with EKS-specific value overlays.

AWS Resources (Terraform)

Resource Details
VPC 3 public, 3 private, 3 DB subnets, NAT gateway
EKS K8s 1.31, managed node group (t3.large Γ— 3)
RDS MySQL db.t3.medium, encrypted, 7-day backup, Performance Insights
MSK Serverless IAM auth (SASL/SSL), topics: energy-usage, energy-alerts
Route53 + ACM energy.aidanchien.com, Let's Encrypt TLS via cert-manager
Secrets Manager 3 secrets synced to cluster using External Secrets Operator
IRSA 7 pod-level IAM roles (ALB Controller, EBS CSI, ESO, ExternalDNS, cert-manager, insight-service, GHA deploy)
CloudWatch + SNS 4 alarms (RDS CPU, connections, storage, EKS node health)
ECR Public 8 image repos, SHA-tagged deploys

Deploy from Scratch

# 1. Provision infrastructure (~8 min)
cd terraform/envs/dev
terraform init && terraform apply

# 2. Configure kubectl
aws eks update-kubeconfig --name iot-tracker-dev --region us-east-1

# 3. Install cluster addons
./scripts/eks-bootstrap.sh

# 4. Deploy Helm charts
helm upgrade --install infra ./k8s/charts/infra-chart -f ./k8s/charts/infra-chart/values-eks.yaml
helm upgrade --install observability ./k8s/charts/observability-chart -f ./k8s/charts/observability-chart/values-eks.yaml
helm upgrade --install microservices ./k8s/charts/microservices-chart -f ./k8s/charts/microservices-chart/values-eks.yaml

Teardown

helm uninstall microservices && helm uninstall observability && helm uninstall infra
cd terraform/envs/dev && terraform destroy
πŸ’΅ Cost

AWS Management Costs

Component Spec Monthly
EKS control plane K8s 1.31, standard support $73.00
EC2 nodes 3Γ— t3.large @ $0.0832/hr $182.21
RDS instance db.t3.medium MySQL, single-AZ @ $0.068/hr $49.64
RDS storage gp3, ~20 GB allocated @ $0.115/GB $2.30
MSK Serverless cluster Flat $0.75/cluster-hr $547.50
MSK Serverless partitions ~10 partitions @ $0.0015/hr $10.95
NAT Gateway 1Γ— @ $0.045/hr + data processing ~$34
NLB Base @ $0.0225/hr + minimal LCU ~$21
EBS gp3 7 in-cluster PVCs + 3 node roots (~130 GB) ~$11
Secrets Manager 3 secrets @ $0.40/mo + API calls $1.20
Route53 hosted zone 1 zone $0.50
CloudWatch 4 alarms (free tier), basic metrics/logs ~$3
ACM certificates Public certs are free $0
Bedrock (Haiku 4.5) Pay-per-token, low personal usage ~$1–5
Data transfer out Internet egress ~$1–3
Total ~$940/mo

This is very expensive :( so production environment is currently spun down. All infrastructure reproducible with terraform apply + helm install (see Deploying to EKS).

Real Usage

I connected the system to my own home using Shelly smartplugs. Check out the /shelly folder for how you can do it too!

Project Layout

services/               # 6 Spring Boot microservices (user, device, ingestion, usage, alert, insight)
frontend/               # Next.js 16 (App Router, Tailwind v4, shadcn/ui)
shelly/                 # Shelly smart plug integration script + setup guide
observability/
  β”œβ”€β”€ prometheus/       # prometheus.yml - scrape configs for all services
  β”œβ”€β”€ grafana/          # Provisioned datasources (Prometheus, Loki, Tempo) + 4 dashboards
  β”œβ”€β”€ loki/             # loki.yaml - in-memory ring, TSDB storage
  β”œβ”€β”€ promtail/         # promtail.yaml - Docker SD, ECS JSON pipeline
  └── tempo/            # tempo.yaml - OTLP receivers, local storage
k8s/
  └── charts/
      β”œβ”€β”€ infra-chart/          # MySQL (primary + replica), Kafka, InfluxDB, Redis, Mailpit, Kafka UI, Ollama
      β”œβ”€β”€ observability-chart/  # Prometheus, Grafana, Loki, Promtail, Tempo
      └── microservices-chart/  # 6 services + frontend, shared ConfigMap/Secret/Ingress, HPA/PDB/NetworkPolicies
terraform/
  └── envs/dev/         # VPC, EKS, RDS, MSK Serverless, IAM (IRSA), Route53, ACM, CloudWatch + SNS alarms
scripts/
  └── eks-bootstrap.sh  # Installs EKS cluster addons (ALB Controller, ESO, cert-manager, etc.)
.github/workflows/
  β”œβ”€β”€ build-and-push.yml    # CI: detect changed services, build, push to ECR
  β”œβ”€β”€ deploy-eks.yml        # CD: helm upgrade infra -> observability -> microservices
  β”œβ”€β”€ build-test.yml        # Manual single-service rebuild from any branch
  └── claude-review.yml     # AI code review on PRs
docker-compose.yml                 # Core application stack
docker-compose.observability.yml   # Prometheus, Grafana, Loki, Promtail, Tempo