A production-grade distributed rate limiting system built with Node.js, TypeScript, Redis, and Docker. Implements multiple rate limiting algorithms with atomic Redis operations, Prometheus observability, configurable fault tolerance, and a horizontally scalable architecture behind an Nginx load balancer.
- Overview
- Architecture
- Algorithms
- Features
- Tech Stack
- Project Structure
- Getting Started
- Configuration
- API Reference
- Observability
- Fault Tolerance
- Testing
- Load Test Results
- Design Decisions
- Future Extensions
Most rate limiter implementations demonstrate the concept. This one demonstrates what a rate limiter looks like when it needs to actually work — across multiple servers, under load, with Redis potentially unavailable, and with the operational visibility to know what's happening at all times.
What makes this different from a typical implementation:
- Atomic Redis operations via Lua scripts — no race conditions under concurrency
- Strategy pattern — algorithms are interchangeable without changing middleware
- Per-tier dynamic limits (free/pro/enterprise) resolved at request time
- Configurable fail-open/fail-closed behavior when Redis is unreachable
- Prometheus metrics and structured pino logs on every request
- Proper HTTP rate limit headers (
X-RateLimit-Remaining,Retry-After) - Integration tests against a real Redis instance via testcontainers
- Quantified load test results
┌─────────────────────────────┐
│ Client │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ Nginx (Port 80) │
│ Round-robin upstream │
└──┬─────────────┬────────────┘
│ │ │
┌────────────▼──┐ ┌───────▼────┐ ┌───▼────────┐
│ Node App 1 │ │ Node App 2 │ │ Node App 3 │
│ Port 3001 │ │ Port 3002 │ │ Port 3003 │
└──────┬────────┘ └─────┬──────┘ └────┬───────┘
│ │ │
┌──────▼─────────────────▼───────────────▼───────┐
│ Redis 7 │
│ Rate limit counters · Tier configs │
│ User→tier mappings · Blocked key index │
└────────────────────────────────────────────────┘
Every Node instance shares a single Redis. A request hitting any server reads and writes to the same keys, so the limit holds correctly across the cluster.
Divides time into fixed intervals (e.g., 0–60s, 60–120s). Counts requests per interval per key. Fast and memory-efficient.
Tradeoff: A client can send double the limit across a window boundary — 100 requests at second 59 and 100 more at second 61 — both windows allow them individually.
window 1 │ window 2
────────────────┼────────────────
[░░░░░░░░░░ 100]│[░░░░░░░░░░ 100]
↑ boundary burst possible here
Use when: The exact request rate at boundaries doesn't matter. Good for billing/quota systems where daily or hourly buckets are fine.
Stores each request as a timestamped entry in a Redis Sorted Set. At each request, removes entries older than the window size, then checks the count.
Tradeoff: Higher memory usage (one entry per request vs. one counter). More accurate than Fixed Window.
now - 60s now
──────────────────────────────────┤
[req][req][req][req][req] │← only these count
Use when: You need accurate per-user throttling with no boundary loophole. Most rate limiting use cases.
Each user has a bucket that refills tokens at a fixed rate. Requests consume one token. Burst traffic is allowed up to bucket capacity.
Tradeoff: More complex to implement correctly (refill calculation must be atomic). Better for APIs where short bursts should be tolerated.
capacity: 100 tokens
refill: 10 tokens/second
burst of 80 requests → allowed immediately
next 20 → allowed as tokens refill
Use when: Clients legitimately burst (mobile apps, batch jobs) and you want to absorb that without rejecting requests.
Requests drain from the bucket at a constant output rate, regardless of input rate. Smooths bursty traffic into a steady stream.
Tradeoff: Even small bursts get queued/rejected if the output rate is already saturated. Less forgiving than Token Bucket.
Use when: You need to protect a downstream system that can't handle any variance in request rate (e.g., a payment processor).
| Algorithm | Burst Handling | Memory | Boundary Accuracy | Complexity |
|---|---|---|---|---|
| Fixed Window | None | O(1) | Low | Low |
| Sliding Window | None | O(requests) | High | Medium |
| Token Bucket | Yes | O(1) | High | Medium |
| Leaky Bucket | No (smoothed) | O(1) | High | Medium |
- 4 rate limiting algorithms — Fixed Window, Sliding Window, Token Bucket, Leaky Bucket
- Lua atomic scripts — every algorithm runs as a single Redis command. No race conditions.
- Strategy pattern — swap algorithms via environment variable without touching middleware
- Dynamic per-tier limits — free/pro/enterprise quotas stored in Redis, resolved per request
- Runtime limit updates — change tier limits via admin API without restarting servers
- Fail-open / fail-closed — configurable behavior when Redis is unreachable
- Proper HTTP headers —
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset,Retry-After - Prometheus metrics — request counts, block counts, Redis operation latency histograms
- Structured logging — pino JSON logs with per-request context (userId, tier, algorithm, allowed, latencyMs)
- Horizontal scaling — 3 Node instances behind Nginx, all sharing one Redis
- Integration tests — Vitest + testcontainers (real Redis, not mocks)
- CI pipeline — GitHub Actions: lint → type check → unit tests → integration tests → build
| Layer | Tool | Why |
|---|---|---|
| Runtime | Node.js 20 LTS | Stable, async I/O fits this workload |
| Language | TypeScript 5 | Type-safe algorithm interface, catches config errors at compile time |
| Framework | Express 5 | Minimal, middleware-first |
| Redis client | ioredis | Lua scripting API, pipeline support, cluster-ready |
| Metrics | prom-client | Prometheus standard; works with Grafana out of the box |
| Logging | pino | Structured JSON, ~8x faster than winston |
| Validation | zod | Runtime schema validation for env config and request inputs |
| Container | Docker + Compose | Reproducible multi-service local environment |
| Load balancer | Nginx | Round-robin upstream, production-realistic setup |
| Testing | Vitest | Fast, native ESM, works with testcontainers |
| Integration | testcontainers | Spins up a real Redis container in CI — no mocks for Redis behavior |
| Load testing | autocannon | Node-native, scriptable, produces p99 latency stats |
| CI | GitHub Actions | Automated lint + test + build on every push |
distributed-rate-limiter/
│
├── src/
│ ├── algorithms/
│ │ ├── base.ts # RateLimiter abstract interface
│ │ ├── FixedWindow.ts
│ │ ├── SlidingWindow.ts
│ │ ├── TokenBucket.ts
│ │ └── LeakyBucket.ts
│ │
│ ├── middleware/
│ │ └── rateLimiter.ts # Strategy picker + header injection
│ │
│ ├── services/
│ │ ├── redis.ts # ioredis client, retry logic, error events
│ │ ├── limitsConfig.ts # Tier config loader from Redis
│ │ └── metrics.ts # prom-client counters and histograms
│ │
│ ├── scripts/lua/
│ │ ├── fixedWindow.lua
│ │ ├── slidingWindow.lua
│ │ └── tokenBucket.lua
│ │
│ ├── routes/
│ │ ├── stats.ts # /stats, /blocked, /limits
│ │ ├── health.ts # /health — liveness + Redis ping
│ │ └── metrics.ts # /metrics — Prometheus scrape endpoint
│ │
│ ├── config/
│ │ └── index.ts # Zod-validated env config
│ │
│ ├── utils/
│ │ ├── logger.ts # pino instance
│ │ └── keyBuilder.ts # Centralized Redis key namespacing
│ │
│ ├── app.ts # Express setup (no listen)
│ └── server.ts # Port binding, startup
│
├── tests/
│ ├── unit/
│ │ ├── FixedWindow.test.ts
│ │ ├── SlidingWindow.test.ts
│ │ └── TokenBucket.test.ts
│ └── integration/
│ └── rateLimiter.test.ts # Real Redis via testcontainers
│
├── scripts/
│ ├── loadTest.ts # autocannon load test
│ └── seedTiers.ts # Seed Redis with tier configs
│
├── infra/
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── nginx.conf
│
├── .github/workflows/ci.yml
├── .env.example
├── tsconfig.json
├── vitest.config.ts
├── package.json
└── README.md
- Docker and Docker Compose
- Node.js 20+ (for local development without Docker)
git clone https://github.com/your-username/distributed-rate-limiter.git
cd distributed-rate-limiter
cp .env.example .env
docker compose up --buildThis starts:
- 3 Node.js app instances (ports 3001, 3002, 3003)
- Redis 7 on port 6379
- Nginx load balancer on port 80
Test it:
curl -i http://localhost/api/test -H "x-user-id: user:123"npm install
# Requires a local Redis instance
redis-server
# Seed tier configs
npx tsx scripts/seedTiers.ts
npm run devAll config is environment-driven and validated at startup with Zod. If a required variable is missing or has the wrong type, the process exits with a clear error — not a runtime crash 10 requests in.
# .env.example
PORT=3000
REDIS_URL=redis://localhost:6379
# Algorithm: fixed_window | sliding_window | token_bucket | leaky_bucket
RATE_LIMIT_ALGORITHM=sliding_window
# What to do when Redis is unreachable: fail_open | fail_closed
REDIS_FAILURE_MODE=fail_open
# Default limits (overridden by per-tier Redis config if set)
DEFAULT_LIMIT=100
DEFAULT_WINDOW_MS=60000
LOG_LEVEL=info# Set tier quotas
HSET rl:tiers free 100 pro 1000 enterprise 10000
# Assign a user to a tier
SET rl:user:123:tier proOr use the seed script:
npx tsx scripts/seedTiers.tsGET /api/test
Headers:
x-user-id: <string> # identifies the rate-limited entity
Response headers on every request:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1718123456
On limit exceeded (429):
{
"error": "Too Many Requests",
"retryAfter": 34,
"limit": 1000,
"windowMs": 60000
}Liveness check with Redis connectivity status.
{
"status": "ok",
"redis": "connected",
"degraded": false,
"uptime": 3824,
"timestamp": "2025-06-15T10:34:00Z"
}When Redis is down:
{
"status": "ok",
"redis": "disconnected",
"degraded": true,
"failureMode": "fail_open"
}Aggregate request statistics.
{
"allowedRequests": 184523,
"blockedRequests": 4821,
"blockRate": "2.54%",
"algorithm": "sliding_window",
"uptime": 3824
}Current tier configuration.
{
"tiers": {
"free": 100,
"pro": 1000,
"enterprise": 10000
},
"windowMs": 60000,
"algorithm": "sliding_window"
}Update a tier's limit at runtime without restarting servers. Change takes effect on the next request.
curl -X POST http://localhost/limits/pro \
-H "Content-Type: application/json" \
-d '{"limit": 2000}'{
"tier": "pro",
"previousLimit": 1000,
"newLimit": 2000,
"effectiveAt": "2025-06-15T10:35:00Z"
}Keys currently at or over their limit, with TTL remaining.
{
"count": 3,
"keys": [
{ "key": "rl:user:456", "ttlSeconds": 28 },
{ "key": "rl:user:789", "ttlSeconds": 51 }
]
}Prometheus-compatible text format. Scrape with any Prometheus-compatible collector.
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{status="200",algorithm="sliding_window"} 184523
http_requests_total{status="429",algorithm="sliding_window"} 4821
# HELP rate_limit_blocked_total Requests blocked by rate limiter
# TYPE rate_limit_blocked_total counter
rate_limit_blocked_total{tier="free",algorithm="sliding_window"} 3201
rate_limit_blocked_total{tier="pro",algorithm="sliding_window"} 1620
# HELP redis_operation_duration_seconds Redis Lua script execution latency
# TYPE redis_operation_duration_seconds histogram
redis_operation_duration_seconds_bucket{le="0.001"} 178432
redis_operation_duration_seconds_bucket{le="0.005"} 184501
Every request produces a structured JSON log line via pino:
{
"level": "info",
"time": "2025-06-15T10:34:00.000Z",
"userId": "user:123",
"tier": "pro",
"algorithm": "sliding_window",
"allowed": true,
"remaining": 847,
"latencyMs": 1.4,
"requestId": "a3f9b2c1"
}On a block:
{
"level": "warn",
"userId": "user:456",
"tier": "free",
"allowed": false,
"remaining": 0,
"retryAfterMs": 28000,
"latencyMs": 0.9
}The /metrics endpoint exposes Prometheus-format metrics. Wire it to a Prometheus + Grafana stack to get dashboards tracking:
- Request throughput (allowed vs. blocked)
- Block rate by tier and algorithm
- Redis operation p50/p95/p99 latency
- Active rate-limited keys over time
what happens when Redis goes down?
This system has a configurable answer.
Set via REDIS_FAILURE_MODE env variable:
When Redis is unreachable, all requests are allowed through. The system degrades gracefully — rate limiting stops temporarily but the application continues serving traffic.
Use when: Availability matters more than strict enforcement. Most user-facing APIs.
When Redis is unreachable, all requests are blocked with a 503. Nothing gets through.
Use when: The rate limiter exists to protect a downstream system that would be overwhelmed without it. Strict enforcement required.
In both modes:
redis_errors_totalPrometheus counter increments- Every request logs a
redis_failure: truefield /healthreturnsdegraded: trueso your alerting fires
Each algorithm is tested in isolation with a mocked Redis client. Tests cover:
- Requests exactly at the limit (should allow)
- Request one over the limit (should block)
- Window reset behavior (counter should clear after TTL)
- Token refill rate (Token Bucket)
- Boundary bursting (Fixed Window edge case)
npm run test:unitVitest + testcontainers spins up a real Redis 7 container for each test suite. No mocks for Redis behavior — actual INCR, ZADD, Lua script execution.
npm run test:integrationnpm testTested with autocannon: 50 concurrent connections, 30-second duration, Sliding Window algorithm, 3 app instances behind Nginx.
Running 30s test @ http://localhost/api/test
50 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 2 ms │ 3 ms │ 6 ms │ 8 ms │ 3.12 ms │ 1.8 ms │ 42 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
Req/Bytes counts sampled once per second.
45k requests in 30.01s, 12.4 MB read
Requests/sec: 1499.8
Errors: 0
Non-2xx or 3xx responses: 8432 (429s — expected, within limit enforcement)
p99 latency: 8ms including Nginx, middleware, Lua script execution, and Redis round-trip.
Redis transactions (MULTI/EXEC) are optimistic — they don't prevent other clients from modifying keys between a WATCH and EXEC. Under high concurrency, this causes retries and complexity.
Lua scripts execute atomically on the Redis server. No other command can run while the script is executing. The entire rate limit check-and-increment is a single operation.
This is the correct solution to the race condition:
Server A reads 99 ─┐
Server B reads 99 ─┤ Both see count < limit
Server A writes 100─┤ Both allow
Server B writes 100─┘ Count is now 101 — limit broken
With Lua, the INCR and TTL check happen in one atomic script. This is impossible.
- Lua scripting: ioredis has a cleaner
.defineCommand()API for named Lua scripts - Pipeline support: batching multiple commands in one round-trip
- Cluster mode support: if this project extends to Redis Cluster, ioredis handles key slot routing automatically
Benchmarks show pino is ~8x faster than winston on throughput, because it uses a worker thread for log serialization and avoids blocking the event loop. For a rate limiter where every microsecond of middleware overhead matters, this is relevant — not just an aesthetic preference.
Mocking Redis means mocking a database — you end up testing your mock, not your code. Lua scripts, key expiry behavior, and sorted set operations can't be accurately simulated in a mock. testcontainers runs the actual Redis binary in a container, tears it down after the suite, and gives you real confidence.
Redis Cluster / Redis Sentinel
For true high availability, Redis Cluster distributes keys across shards. ioredis supports this natively. The key namespacing in keyBuilder.ts uses {userId} hash tags to ensure related keys land on the same shard.
Multi-region rate limiting A truly global rate limiter would need consensus across data centers. One approach: CRDT-based counters (each region tracks its own count, periodically syncs). This sacrifices strict accuracy for availability — a deliberate tradeoff.
gRPC control plane The admin API is currently HTTP/JSON. A gRPC control plane would let you push limit config changes to all instances simultaneously rather than having each instance poll Redis.
Rate limit by endpoint, not just user
Current implementation keys by userId. Extending to key by userId:endpoint (e.g., rl:user:123:POST:/api/orders) allows per-route policies and is a trivial extension to keyBuilder.ts.
WebSocket / SSE notifications Push a real-time event to clients when they approach their limit (e.g., at 80% consumed), rather than only signaling via response headers.
MIT