Skip to content

[Backend] Build High-Scale User Activity Tracking System (Relational DB, Analytics, Multi-Tenant, Production Ready) #6

@abhishek-nexgen-dev

Description

@abhishek-nexgen-dev

We are building a high-scale activity tracking system to capture all user and system events across the platform.

User actions occur
→ events captured
→ queued (async)
→ processed by workers
→ stored in relational DB
→ queried for analytics
→ exposed via APIs

🎯 Goals

System must be:

  • High throughput (100K–1M events/day)
  • Strongly structured (Relational DB)
  • Multi-tenant safe
  • Analytics-ready (fast queries)
  • Fault-tolerant
  • Production-grade

🧱 SYSTEM ARCHITECTURE


Tech Stack

Backend → Node.js + Express
Database → PostgreSQL (preferred)
Queue → Redis (BullMQ) or Kafka (future)
ORM → Prisma / Sequelize

High-Level Flow

App Event
→ Event Emitter (trackEvent)
→ Queue (Redis/Kafka)
→ Worker processes
→ Batch insert into DB
→ Analytics APIs query DB

🧠 CORE MODULES


1️⃣ EVENT TRACKING SYSTEM


Internal API

trackEvent({
  userId,
  communityId,
  eventType,
  entityType,
  entityId,
  metadata
})

Event Types

user.signup
user.login
community.created
member.created
member.activated
event.created
hackathon.created
webhook.triggered
github.push
github.pr.opened

Requirements

  • non-blocking (async)
  • lightweight
  • reusable across services


2️⃣ RELATIONAL DATABASE DESIGN


Table: users

id UUID PRIMARY KEY
email TEXT UNIQUE
created_at TIMESTAMP

Table: communities

id UUID PRIMARY KEY
name TEXT
created_at TIMESTAMP

Table: activities (CORE)

id BIGSERIAL PRIMARY KEY
user_id UUID REFERENCES users(id)
community_id UUID REFERENCES communities(id)

event_type TEXT NOT NULL
entity_type TEXT
entity_id TEXT

metadata JSONB

ip_address TEXT
user_agent TEXT

created_at TIMESTAMP DEFAULT NOW()

Optional: partitions

PARTITION BY RANGE (created_at)


3️⃣ INDEXING STRATEGY (CRITICAL)


Basic Indexes

CREATE INDEX idx_user ON activities(user_id);
CREATE INDEX idx_community ON activities(community_id);
CREATE INDEX idx_event_type ON activities(event_type);
CREATE INDEX idx_created_at ON activities(created_at);

Composite Indexes

(user_id, created_at)
(community_id, event_type)
(event_type, created_at)

JSON Index

GIN (metadata)


4️⃣ EVENT INGESTION PIPELINE


Flow

Event occurs
→ push to queue
→ worker consumes
→ validate
→ batch insert into DB

Batch Insert

  • insert 100–1000 rows per batch
  • reduces DB load


5️⃣ SCALING STRATEGY


Horizontal Scaling

  • multiple workers
  • load-balanced API

DB Scaling

  • read replicas
  • partitioned tables

Queue Scaling

  • Redis cluster
  • Kafka (future upgrade)


6️⃣ ANALYTICS SYSTEM


API Endpoints

GET /api/v1/analytics/overview
GET /api/v1/analytics/community/:id
GET /api/v1/analytics/user/:id
GET /api/v1/analytics/events

Sample Queries


Daily Active Users

SELECT COUNT(DISTINCT user_id)
FROM activities
WHERE created_at >= NOW() - INTERVAL '1 day';

Event Count

SELECT event_type, COUNT(*)
FROM activities
GROUP BY event_type;


7️⃣ SECURITY 🔐


Data Safety

  • no sensitive data in metadata
  • sanitize all inputs

Abuse Protection

  • rate limit event ingestion
  • detect spam events

Access Control

  • analytics APIs protected
  • role-based access


8️⃣ IDEMPOTENCY


Problem

duplicate events (retry, network)


Solution

Add:

event_id (unique)

DB Constraint

UNIQUE(event_id)


9️⃣ DATA RETENTION & ARCHIVING


Policy

Hot data → 3 months
Warm data → 6–12 months
Cold → archive (S3)


🔟 OBSERVABILITY


Logging

  • structured logs (Pino)

Metrics

events/sec
queue size
DB latency
error rate

Alerts

  • DB slow queries
  • queue backlog
  • ingestion failures


1️⃣1️⃣ ERROR HANDLING


Strategy

  • retry failed jobs
  • dead letter queue

Failure Cases

DB down
queue failure
invalid payload


1️⃣2️⃣ TESTING


Unit

  • event validator
  • metadata sanitizer

Integration

  • event → queue → DB

Load Testing

  • simulate 100K events


1️⃣3️⃣ EDGE CASES


duplicate events
high traffic spikes
DB failure
queue crash
invalid metadata


⚙️ PERFORMANCE OPTIMIZATION


  • batch inserts
  • connection pooling
  • prepared statements


🌍 ENVIRONMENT


DEV
STAGING
PROD


📦 FOLDER STRUCTURE


/events
/services
/workers
/repositories
/models
/utils


✅ ACCEPTANCE CRITERIA


✔ Events tracked across system
✔ Stored in relational DB
✔ Query time < 200ms
✔ Handles high traffic
✔ Partitioning implemented
✔ Analytics APIs working
✔ Secure & scalable


🔥 FINAL SUMMARY

This system behaves like:

Mixpanel / Google Analytics backend system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions