Skip to content

High-availability Kubernetes Support #1012

@drew

Description

@drew

Create an officially supported Helm chart for deploying Kubernetes onto existing Kubernetes infrastructure. Ensure Kubernetes deployments are suitable for enterprise deployments. Establish a plan for making OpenShell components ready for enterprise.

Overview

  • Helm Chart
    • Publish as OCI resource on ghcr.io
    • Provide Ingress to OpenShell Gateway via Kubernetes Gateway API
  • Control Plane Authnz (keycloak to start)
  • High Availability Gateway
    • Supports horizontal scaling, rollouts, and client connection rebalancing
  • Configuring Sandboxes
    • Specify memory, cpu, and other pod specs.
    • Reduce need for privileged security capabilities (eg no more CAP_NET_ADMIN)
    • OpenShell supervisor is injected onto the container via image volumes.
  • Observability (includes both gateway and sandbox health)
    • Logs exporting
    • Metrics (Prometheus, already landed)
    • Dashboards (Grafana/similar)
  • Resiliency
    • Sandboxes are resilient to network disconnection between supervisor and openshell-server for Y period of time.
    • Clarify heartbeat of supervisor to openshell-server, possibly some jitter in case of thundering herd scenarios.
  • Upgrading OpenShell
    • Gateway data/schemas are migrated as necessary on upgrades
    • Sandbox containers can be rolled with a new version of the supervisor
    • Sandbox containers themselves can be updated
  • Unified gateway configuration file (see RFC 0002 - Gateway Configuration File)
  • Kubernetes Operator
  • We need a way to develop and test k8s features locally.
    • K3s + skaffold or tilt or some other dev script
  • Test infrastructure

Milestones

M1 - mvp

  • Helm Chart
  • Developer loop
  • OpenShell server accepts a kube config and is decoupled from any k3s
  • Documentation w/ rbac details for running inside an existing kube cluster
  • parameterize e2e tests to point to any cluster
  • Unified gateway config
  • Initial documentation and deployment guidance available

M2 - reliability

  • Test coverage on OpenShift and major Kubernetes distributions
  • Security context privileges are dropped.
    • Implement runtime class configs: kata and gvisor
  • Gateway can horizontally scale
  • Agent and operator friendly observability (gateway and sandbox)

M3 - operating OpenShell at scale

  • OpenShell Kubernetes Operator
  • Support upgrading/snapshotting sandboxes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions