Skip to content

feat(docs): Add documentation for OpenShell Kubernetes RBAC requirements #1018

@TaylorMutch

Description

@TaylorMutch

Problem Statement

There is no documentation for the Kubernetes RBAC permissions required by the OpenShell Gateway. Operators deploying to existing clusters — especially multi-tenant environments — cannot easily determine what access the Gateway service account needs without reading the Helm chart templates directly.

Proposed Design

Add a dedicated RBAC reference page to the docs (e.g., docs/deployment/kubernetes-rbac.mdx) that covers:

  1. What the Helm chart creates by default — ServiceAccount, Role, and RoleBinding installed by deploy/helm/openshell/templates/.

  2. Full permission table — document every rule in the Role:

    API Group Resource Verbs Scope Purpose
    agents.x-k8s.io sandboxes, sandboxes/status create, delete, get, list, patch, update, watch Namespaced Sandbox lifecycle management
    "" (core) events get, list, watch Namespaced Sandbox event observation
    "" (core) nodes get, list Cluster GPU capacity validation (required only when GPU sandboxes are enabled)
  3. GPU deployments — note that GPU sandbox support requires an additional ClusterRole binding so the Gateway can list nodes cluster-wide to check allocatable GPU resources. Non-GPU deployments do not need this permission.

  4. Bring-your-own ServiceAccount — how to disable the auto-created ServiceAccount (serviceAccount.create: false) and supply an existing one (serviceAccount.name).

  5. Multi-tenant guidance — how to scope the Gateway to a specific namespace and disable cluster-level resource access when GPU support is not needed.

Alternatives Considered

Embedding RBAC details in the main deployment guide rather than a dedicated page would be more concise, but a standalone reference page is easier to link to from error messages or support docs and allows the table to expand without cluttering the deployment walkthrough.

Agent Investigation

  • The Helm Role at deploy/helm/openshell/templates/role.yaml covers agents.x-k8s.io/sandboxes and core events — both namespaced.
  • The driver calls Api::<Node>::all() at crates/openshell-driver-kubernetes/src/driver.rs:184–192 exclusively for GPU capacity checks. This is only exercised when a user requests a GPU sandbox (validate_sandbox_create, driver.rs:195–207). It requires nodes: [get, list] at cluster scope, which the current Helm chart does not include. The Helm chart likely needs a conditional ClusterRole/ClusterRoleBinding gated on a gpu.enabled or similar value.
  • The TLS secret (client_tls_secret_name) is referenced only in pod volume specs — the Gateway does not read it directly via the Secrets API, so no secrets: get permission is needed.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions