Skip to content

Alidmo/OpenReserve

Repository files navigation

OpenReserve

A domain-agnostic, high-concurrency reservation engine built as an open-source, attachable microservice. It handles the math and concurrency of inventory — hotel rooms, concert tickets, warehouse stock — under extreme load without overselling.


Architecture: Two-Phase Commit (Saga-lite)

The system prevents overselling through a two-phase commit pattern backed by an atomic Postgres UPDATE and a Redis TTL hold:

Client                    OpenReserve                     Postgres          Redis
  │                            │                              │                │
  │ POST /reserve              │                              │                │
  │ Idempotency-Key: <uuid>    │                              │                │
  │ ─────────────────────────► │                              │                │
  │                            │ Check idempotency key        │                │
  │                            │ ────────────────────────────────────────────► │
  │                            │                              │                │
  │                            │ Atomic stock decrement ──────►                │
  │                            │  UPDATE resources            │                │
  │                            │  SET available_stock -= qty  │                │
  │                            │  WHERE available_stock >= qty│                │
  │                            │  → 0 rows: 409 OUT OF STOCK  │                │
  │                            │  → 1 row:  HOLD created      │                │
  │                            │                              │                │
  │                            │ Cache idempotency key + hold TTL ────────────► │
  │                            │                              │                │
  │◄───────────────────────── 200 {token, expiresAt}          │                │
  │                            │                              │                │
  │ POST /commit {token}       │                              │                │
  │ ─────────────────────────► │                              │                │
  │                            │ Delete hold key (TTL cancelled) ─────────────► │
  │                            │ Mark reservation COMMITTED ──►                │
  │◄───────────────────────── 200 {COMMITTED}                 │                │
  │                            │                              │                │
  │  [TTL expires without commit]                             │                │
  │                            │◄───── keyspace expired event ──────────────── │
  │                            │ Restore stock ───────────────►                │
  │                            │ Mark reservation EXPIRED     │                │

Oversell Prevention Proof

The single source of truth is one atomic SQL statement:

UPDATE resources
SET available_stock = available_stock - :qty,
    updated_at      = NOW()
WHERE resource_id = :resourceId
  AND available_stock >= :qty   -- the guard

If available_stock is 0, the WHERE clause fails and 0 rows are updated. The service receives rowsAffected == 0, throws OutOfStockException, and returns HTTP 409. No two concurrent transactions can both decrement below zero because Postgres serialises the row-level lock. This is verified by the Gatling simulation: 2,000 users fighting for 100 items produce exactly 100 × 200 OK and 1,900 × 409 Conflict — zero oversells.

Fallback: Reconciliation Scheduler

Redis Pub/Sub (keyspace notifications) is fire-and-forget. If a TTL event is dropped (Redis restart, slow consumer, GC pause), ReconciliationScheduler sweeps the reservations table every 60 seconds for HELD records past their expires_at and closes them.

Primary path:   Redis TTL fires → HoldExpirationListener → restore stock + EXPIRED
Fallback path:  @Scheduled every 60s → findHeldAndExpired(now) → restore stock + EXPIRED

Tech Stack

Concern Technology
Language Kotlin 1.9
Framework Spring Boot 3.2, WebFlux + Coroutines
Reactive DB R2DBC + PostgreSQL 15
Cache / Locks Redis 7 (keyspace notifications for TTL expiry)
DB Migrations Flyway (JDBC, runs before R2DBC pool opens)
Observability Micrometer + Prometheus + Grafana
Unit Tests JUnit 5, MockK
Integration Tests Testcontainers (real Postgres + Redis), WireMock (webhooks), Awaitility
Load Tests Gatling 3.10 (Kotlin Java DSL)
CLI Clikt 4.2 + Java HttpClient (no Spring boot, ~100ms startup)
DevOps Docker Compose

Project Structure

.
├── src/
│   ├── main/kotlin/com/openreserve/
│   │   ├── api/
│   │   │   ├── controller/       # ResourceController, ReservationController
│   │   │   └── dto/              # Request/Response DTOs
│   │   ├── config/               # R2dbcConfig, RedisConfig, SchedulingConfig, WebClientConfig
│   │   ├── domain/               # Resource, Reservation, ReservationStatus
│   │   ├── exception/            # Domain exceptions + GlobalExceptionHandler
│   │   ├── repository/           # ResourceRepository, ReservationRepository (CoroutineCrudRepository)
│   │   ├── service/
│   │   │   ├── ReservationService.kt      # Core: hold / commit / release
│   │   │   ├── HoldExpirationListener.kt  # Redis keyspace TTL handler
│   │   │   ├── ReconciliationScheduler.kt # Fallback scheduler for dropped events
│   │   │   └── WebhookService.kt          # Outbound webhook publisher
│   │   └── cli/
│   │       └── OpenReserveCli.kt          # Clikt CLI (no Spring context)
│   ├── main/resources/
│   │   ├── application.yml
│   │   └── db/migration/V1__create_initial_schema.sql
│   ├── test/kotlin/com/openreserve/
│   │   ├── AbstractIntegrationTest.kt     # Testcontainers base class
│   │   ├── api/ReservationControllerTest.kt   # Concurrency + idempotency proofs
│   │   ├── repository/ResourceRepositoryTest.kt
│   │   └── service/HoldExpirationIntegrationTest.kt  # Expiry + WireMock webhook tests
│   └── gatling/kotlin/com/openreserve/simulation/
│       └── BlackFridaySimulation.kt       # 2,000-user flash-sale load test
├── cli/
│   └── sample-resources.csv
├── grafana/
│   ├── dashboards/openreserve-dashboard.json
│   └── provisioning/
│       ├── datasources/prometheus.yml
│       └── dashboards/dashboards.yml
├── docker-compose.yml
├── prometheus.yml
├── openreserve          # Unix CLI wrapper
├── openreserve.bat      # Windows CLI wrapper
└── build.gradle.kts

Quick Start

1. Start the infrastructure

docker-compose up -d

This starts Postgres, Redis, Prometheus, and Grafana. Grafana auto-provisions the Prometheus datasource and the OpenReserve dashboard on first boot.

2. Run the application

./gradlew bootRun

The application starts on port 8080. Flyway runs migrations automatically on startup.

3. Verify health

curl http://localhost:8080/actuator/health

API Reference

POST /api/v1/resources — Create or update inventory

curl -X POST http://localhost:8080/api/v1/resources \
  -H "Content-Type: application/json" \
  -d '{"resourceId": "hotel-room-101", "totalStock": 50}'

POST /api/v1/reserve — Hold inventory (Phase 1)

Requires Idempotency-Key header. Retrying with the same key returns the cached reservation.

curl -X POST http://localhost:8080/api/v1/reserve \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{"resourceId": "hotel-room-101", "quantity": 1, "ttlSeconds": 300}'

Response:

{
  "token": "550e8400-e29b-41d4-a716-446655440000",
  "resourceId": "hotel-room-101",
  "quantity": 1,
  "status": "HELD",
  "expiresAt": "2026-03-31T10:47:11Z"
}

POST /api/v1/commit — Finalize reservation (Phase 2)

curl -X POST http://localhost:8080/api/v1/commit \
  -H "Content-Type: application/json" \
  -d '{"token": "550e8400-e29b-41d4-a716-446655440000"}'

POST /api/v1/release — Cancel hold, restore stock

curl -X POST http://localhost:8080/api/v1/release \
  -H "Content-Type: application/json" \
  -d '{"token": "550e8400-e29b-41d4-a716-446655440000"}'

GET /api/v1/resources/{resourceId} — Check availability

curl http://localhost:8080/api/v1/resources/hotel-room-101

Running Tests

Integration tests (Testcontainers — real Postgres + Redis)

# All tests
.\gradlew test

# Repository layer only
.\gradlew test --tests "com.openreserve.repository.*"

# Concurrency + idempotency proof
.\gradlew test --tests "com.openreserve.api.ReservationControllerTest"

# Expiration + WireMock webhook tests
.\gradlew test --tests "com.openreserve.service.HoldExpirationIntegrationTest"

Load Testing — The Black Friday Simulation

This is the centrepiece of the observability story. It proves zero oversells under 2,000 concurrent users competing for 100 items.

Prerequisites

# 1. Start infrastructure
docker-compose up -d postgres redis

# 2. Start the application (leave running in a separate terminal)
./gradlew bootRun

Run the simulation

.\gradlew gatlingRun -Dgatling.simulationClass=com.openreserve.simulation.BlackFridaySimulation

Expected console output

╔═══════════════════════════════════════════════════════════╗
║         BLACK FRIDAY SIMULATION — OpenReserve             ║
╠═══════════════════════════════════════════════════════════╣
║  Target     : http://localhost:8080
║  Stock      : 100 units of ps5_sku_123
║  Users      : 2000 virtual users over 5s
║  Expect     : 100 × HTTP 200, 1900 × HTTP 409
╚═══════════════════════════════════════════════════════════╝

✓ Seeded ps5_sku_123 with 100 units

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       2000 (OK=100    KO=1900  )
> min response time                                      3 (OK=3      KO=2     )
> mean response time                                    47 (OK=89     KO=38    )
> max response time                                    312 (OK=312    KO=178   )
> mean requests/sec                                  357.8 (OK=17.9   KO=339.9 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          2000 (100%)
> t >= 800 ms                                            0 (  0%)
> failed                                              1900 ( 95%)
================================================================================

The OK=100 KO=1900 line is the atomic lock proof.

Custom parameters

# Run against a staging environment with 5,000 users fighting for 500 items
.\gradlew gatlingRun `
  -Dgatling.simulationClass=com.openreserve.simulation.BlackFridaySimulation `
  -Dgatling.baseUrl=http://staging-host:8080 `
  -Dgatling.users=5000 `
  -Dgatling.stockCount=500 `
  -Dgatling.rampSeconds=10

HTML Report

After the run, open the generated report:

build/reports/gatling/blackfridaysimulation-<timestamp>/index.html

The report contains per-percentile breakdown, response time distribution charts, and the full request log.


Observability — Grafana Dashboard

With Docker Compose running, the dashboard is available at:

http://localhost:3000
Login: admin / admin

Grafana auto-provisions the Prometheus datasource and the OpenReserve — Load & Health dashboard on first boot. No manual setup required.

Dashboard Panels

Panel What it shows
HTTP Request Throughput Total req/s and per-endpoint breakdown. Spikes to 400+ req/s during a Gatling run.
HTTP Response Code Split 200 (Reserved) vs 409 (Out-of-Stock) vs 5xx (errors — must stay at zero). The visual proof of the atomic lock.
API Latency — p95 & p99 Server-side response time percentiles from Micrometer histograms. Shows latency impact of DB contention under load.
R2DBC Connection Pool Acquired vs idle vs pending connections. High "pending" = pool saturation — tune r2dbc.pool.max-size.

Prometheus metrics endpoint

curl http://localhost:8080/actuator/prometheus | grep http_server_requests

CLI Usage

The CLI is a standalone binary (~100ms startup, no Spring context).

Check resource availability

# Unix
./openreserve stock check --resource hotel-room-101

# Windows
openreserve stock check --resource hotel-room-101

Output:

Resource : hotel-room-101
─────────────────────────────────────
Available : 35 / 50  (70% remaining)
Version   : 3
Updated   : 2026-03-31T10:42:11Z
─────────────────────────────────────

Bulk import from CSV

./openreserve stock import --file cli/sample-resources.csv

CSV format (cli/sample-resources.csv):

# resource_id,total_stock
hotel-room-101,50
concert-seat-floor-A1,1
warehouse-sku-XYZ-001,500

Point at a non-local server

OPENRESERVE_SERVER_URL=http://staging:8080 ./openreserve stock check -r ps5_sku_123

Webhooks

Configure an outbound webhook to receive notifications when inventory is depleted or restored:

# application.yml
openreserve:
  webhooks:
    enabled: true
    url: "https://your-service.example.com/inventory-events"

Payload shape:

{
  "type": "STOCK_DEPLETED",
  "resourceId": "ps5_sku_123",
  "value": 0,
  "timestamp": "2026-03-31T10:47:11Z"
}

Event types: STOCK_DEPLETED (available stock hit zero), STOCK_RESTORED (hold expired or released).


Design Decisions & Trade-offs

Why R2DBC over JPA/Hibernate?

Full non-blocking I/O stack from HTTP request to DB response. Under a 2,000-user Gatling run, a thread-per-request model would exhaust the thread pool. WebFlux + R2DBC allows a small number of threads to serve thousands of concurrent requests via event loop semantics.

Why not optimistic locking (@Version) for the reserve operation?

Optimistic locking throws OptimisticLockingFailureException on version conflict and requires retry logic at the application layer. Under high contention (2,000 simultaneous reservations), this would produce a thundering-herd retry storm. The atomic WHERE available_stock >= qty is O(1) work per request and never retries.

Why is Redis idempotency key storage done after the DB commit?

To avoid a two-phase commit between Postgres and Redis. If Redis were written first and the DB then failed, the key would be cached but no reservation would exist. Writing the key after DB success means a rare crash between the two writes makes the next identical request create a new reservation — this is the safer failure mode (idempotency is best-effort, not strict).

Why a reconciliation scheduler alongside Redis keyspace notifications?

Redis Pub/Sub is fire-and-forget. The scheduler is the operational safety net. In production, a Prometheus alert on high openreserve_reconciled_holds_total indicates the primary Redis expiry path is degraded.


License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages