Skip to content

feat(k8s, helm): Enable running OpenShell Gateway with multiple replicas #1021

@TaylorMutch

Description

@TaylorMutch

Problem Statement

When running on Kubernetes, the OpenShell Gateway currently runs as a single-replica StatefulSet. This blocks rolling deployments (upgrades cause a full outage for existing supervisor connections), prevents horizontal scaling under load, and means any Gateway pod failure interrupts all active sandbox connections until the pod restarts.

Proposed Design

Multi-replica support requires changes across four areas, delivered in sequence. SQLite remains the supported backend for single-replica deployments (dev, small, low-cost). PostgreSQL is required for multi-replica.

Blockers to resolve

1. Storage — SQLite cannot support multiple writers

SQLite uses per-process file locking and the StatefulSet's ReadWriteOnce PVC physically prevents two pods from mounting the same volume on different nodes. The persistence layer already supports PostgreSQL (crates/openshell-server/src/persistence/postgres.rs). Multi-replica deployments must use PostgreSQL; the Helm chart should document this and warn when replicaCount > 1 is set with a SQLite dbUrl. SQLite deployments remain on the current StatefulSet + PVC path unchanged.

2. Reconciliation — concurrent loops cause data races

reconcile_loop() and watch_loop() run independently on every replica (compute/mod.rs:533–542). With multiple replicas this causes double-deletes and conflicting updates on shared sandbox records. Replicas need a coordination mechanism so only the relevant replica reconciles a given sandbox. The design for this is tracked separately.

3. Supervisor sessions — in-memory state is not shared

The SupervisorSessionRegistry (supervisor_session.rs:70) is per-replica and in-memory. A supervisor reconnecting to a different replica after a pod restart will fail to find its session, breaking SSH relay. Replicas need to either own a stable subset of supervisors or share session state. The design for this is tracked separately.

4. SSH connection limits — not globally enforced

Connection slots are tracked in per-replica Mutex<HashMap> (ssh_tunnel.rs:38), making per-token and per-sandbox limits "per replica" rather than global. Once replica ownership is established these limits can be enforced locally; global enforcement will require the shared database.

Phased delivery

Phase Change Unblocks
1 PostgreSQL backend + Deployment (replacing StatefulSet) everything else
2 Replica ownership for reconciliation and session routing safe rolling deploys
3 Persistent supervisor session state transparent pod failure recovery
4 Shared connection limit accounting correct global per-token limits

Phases 1 and 2 are the minimum for safe rolling deployments. Phases 3 and 4 are required for full HA.

Alternatives Considered

ReadWriteMany PVC with SQLite over NFS: Avoids the PostgreSQL dependency but SQLite over NFS is unreliable — WAL lock propagation is slow and network failures can corrupt the database. Not recommended.

Kubernetes Lease-based leader election for reconciliation: Solves the reconciliation race but ties multi-replica behavior to Kubernetes, breaking Docker and Podman deployments. A deployment-agnostic coordination mechanism is preferred.

Agent Investigation

  • PostgreSQL persistence is fully implemented at crates/openshell-server/src/persistence/postgres.rs — no new persistence code is needed for Phase 1.
  • reconcile_loop() and watch_loop() are spawned unconditionally per replica at compute/mod.rs:533–542. The sync_lock mutex at line 1015 is process-local only.
  • SupervisorSessionRegistry at supervisor_session.rs:70–94 is purely in-memory with no cross-replica sharing.
  • SSH connection slots are tracked in two Mutex<HashMap> fields on ServerState at ssh_tunnel.rs:38–64.
  • The StatefulSet PVC uses accessModes: ["ReadWriteOnce"] (templates/statefulset.yaml:179), physically preventing multi-node pod scheduling with SQLite.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions