Problem Statement
When running on Kubernetes, the OpenShell Gateway currently runs as a single-replica StatefulSet. This blocks rolling deployments (upgrades cause a full outage for existing supervisor connections), prevents horizontal scaling under load, and means any Gateway pod failure interrupts all active sandbox connections until the pod restarts.
Proposed Design
Multi-replica support requires changes across four areas, delivered in sequence. SQLite remains the supported backend for single-replica deployments (dev, small, low-cost). PostgreSQL is required for multi-replica.
Blockers to resolve
1. Storage — SQLite cannot support multiple writers
SQLite uses per-process file locking and the StatefulSet's ReadWriteOnce PVC physically prevents two pods from mounting the same volume on different nodes. The persistence layer already supports PostgreSQL (crates/openshell-server/src/persistence/postgres.rs). Multi-replica deployments must use PostgreSQL; the Helm chart should document this and warn when replicaCount > 1 is set with a SQLite dbUrl. SQLite deployments remain on the current StatefulSet + PVC path unchanged.
2. Reconciliation — concurrent loops cause data races
reconcile_loop() and watch_loop() run independently on every replica (compute/mod.rs:533–542). With multiple replicas this causes double-deletes and conflicting updates on shared sandbox records. Replicas need a coordination mechanism so only the relevant replica reconciles a given sandbox. The design for this is tracked separately.
3. Supervisor sessions — in-memory state is not shared
The SupervisorSessionRegistry (supervisor_session.rs:70) is per-replica and in-memory. A supervisor reconnecting to a different replica after a pod restart will fail to find its session, breaking SSH relay. Replicas need to either own a stable subset of supervisors or share session state. The design for this is tracked separately.
4. SSH connection limits — not globally enforced
Connection slots are tracked in per-replica Mutex<HashMap> (ssh_tunnel.rs:38), making per-token and per-sandbox limits "per replica" rather than global. Once replica ownership is established these limits can be enforced locally; global enforcement will require the shared database.
Phased delivery
| Phase |
Change |
Unblocks |
| 1 |
PostgreSQL backend + Deployment (replacing StatefulSet) |
everything else |
| 2 |
Replica ownership for reconciliation and session routing |
safe rolling deploys |
| 3 |
Persistent supervisor session state |
transparent pod failure recovery |
| 4 |
Shared connection limit accounting |
correct global per-token limits |
Phases 1 and 2 are the minimum for safe rolling deployments. Phases 3 and 4 are required for full HA.
Alternatives Considered
ReadWriteMany PVC with SQLite over NFS: Avoids the PostgreSQL dependency but SQLite over NFS is unreliable — WAL lock propagation is slow and network failures can corrupt the database. Not recommended.
Kubernetes Lease-based leader election for reconciliation: Solves the reconciliation race but ties multi-replica behavior to Kubernetes, breaking Docker and Podman deployments. A deployment-agnostic coordination mechanism is preferred.
Agent Investigation
- PostgreSQL persistence is fully implemented at
crates/openshell-server/src/persistence/postgres.rs — no new persistence code is needed for Phase 1.
reconcile_loop() and watch_loop() are spawned unconditionally per replica at compute/mod.rs:533–542. The sync_lock mutex at line 1015 is process-local only.
SupervisorSessionRegistry at supervisor_session.rs:70–94 is purely in-memory with no cross-replica sharing.
- SSH connection slots are tracked in two
Mutex<HashMap> fields on ServerState at ssh_tunnel.rs:38–64.
- The StatefulSet PVC uses
accessModes: ["ReadWriteOnce"] (templates/statefulset.yaml:179), physically preventing multi-node pod scheduling with SQLite.
Checklist
Problem Statement
When running on Kubernetes, the OpenShell Gateway currently runs as a single-replica StatefulSet. This blocks rolling deployments (upgrades cause a full outage for existing supervisor connections), prevents horizontal scaling under load, and means any Gateway pod failure interrupts all active sandbox connections until the pod restarts.
Proposed Design
Multi-replica support requires changes across four areas, delivered in sequence. SQLite remains the supported backend for single-replica deployments (dev, small, low-cost). PostgreSQL is required for multi-replica.
Blockers to resolve
1. Storage — SQLite cannot support multiple writers
SQLite uses per-process file locking and the StatefulSet's
ReadWriteOncePVC physically prevents two pods from mounting the same volume on different nodes. The persistence layer already supports PostgreSQL (crates/openshell-server/src/persistence/postgres.rs). Multi-replica deployments must use PostgreSQL; the Helm chart should document this and warn whenreplicaCount > 1is set with a SQLitedbUrl. SQLite deployments remain on the current StatefulSet + PVC path unchanged.2. Reconciliation — concurrent loops cause data races
reconcile_loop()andwatch_loop()run independently on every replica (compute/mod.rs:533–542). With multiple replicas this causes double-deletes and conflicting updates on shared sandbox records. Replicas need a coordination mechanism so only the relevant replica reconciles a given sandbox. The design for this is tracked separately.3. Supervisor sessions — in-memory state is not shared
The
SupervisorSessionRegistry(supervisor_session.rs:70) is per-replica and in-memory. A supervisor reconnecting to a different replica after a pod restart will fail to find its session, breaking SSH relay. Replicas need to either own a stable subset of supervisors or share session state. The design for this is tracked separately.4. SSH connection limits — not globally enforced
Connection slots are tracked in per-replica
Mutex<HashMap>(ssh_tunnel.rs:38), making per-token and per-sandbox limits "per replica" rather than global. Once replica ownership is established these limits can be enforced locally; global enforcement will require the shared database.Phased delivery
Phases 1 and 2 are the minimum for safe rolling deployments. Phases 3 and 4 are required for full HA.
Alternatives Considered
ReadWriteMany PVC with SQLite over NFS: Avoids the PostgreSQL dependency but SQLite over NFS is unreliable — WAL lock propagation is slow and network failures can corrupt the database. Not recommended.
Kubernetes Lease-based leader election for reconciliation: Solves the reconciliation race but ties multi-replica behavior to Kubernetes, breaking Docker and Podman deployments. A deployment-agnostic coordination mechanism is preferred.
Agent Investigation
crates/openshell-server/src/persistence/postgres.rs— no new persistence code is needed for Phase 1.reconcile_loop()andwatch_loop()are spawned unconditionally per replica atcompute/mod.rs:533–542. Thesync_lockmutex at line 1015 is process-local only.SupervisorSessionRegistryatsupervisor_session.rs:70–94is purely in-memory with no cross-replica sharing.Mutex<HashMap>fields onServerStateatssh_tunnel.rs:38–64.accessModes: ["ReadWriteOnce"](templates/statefulset.yaml:179), physically preventing multi-node pod scheduling with SQLite.Checklist