feat(actor-template): per-container securityContext by dims · Pull Request #73 · agent-substrate/substrate

Davanum Srinivas (dims) · 2026-05-24T00:18:26Z

Add an opt-in securityContext block on ActorTemplate.spec.containers[], plumbed through ateletpb to atelet's OCI bundle builder. Templates that omit it produce the same OCI bundle as before.

Two fields are exposed:

capabilities.add — Linux capabilities to grant on top of the default sandbox set (CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE). Entries may be written with or without the CAP_ prefix; case is normalised; duplicates collapse against the defaults.
runAsUser / runAsGroup — the UID and GID to start the container process as. Unset preserves atelet's existing default of root.

The motivating workload is NVIDIA OpenShell's openshell-sandbox supervisor, which needs CAP_NET_ADMIN, CAP_SETUID, CAP_SETGID to configure the actor's network and user namespaces, and a non-root start UID for the supervisor process itself. Capabilities alone are not enough — the entry point still runs as root until something drops privileges.

Test plan:

go vet clean on touched packages
Unit tests for resolveCapabilities: defaults, prefix normalisation, case folding, dedup, blank-entry skip
ContainerSecurityContext DeepCopy round-trip with pointer-isolation assertions for Capabilities.Add and RunAsUser
cmd/ateapi/internal/controlapi workflow tests pass with the new copy block in resume + suspend
kind end-to-end with a template that opts into both fields

…text) The substrate-side PR #73 — per-container `securityContext` on `ActorTemplate.spec.containers[]` with both `capabilities.add` and `runAsUser` / `runAsGroup` — is the field that lets this driver's `synthesize_template` start emitting capability adds and a non-root supervisor start UID once it merges. Empty templates produce the same OCI bundle as before; opt-in per container. Surface the PR in three places: the top-of-doc header in poc-intro (alongside #66 and #67), the §3 "Companion changes" component table, and the §9 "Where to next" item 8 that was previously an open TODO about capability plumbing. Also tidy the embedded `~/notes/...` references in poc-intro: the local agent-substrate notes (kind-local-dev runbook, Shorewall recipe) moved from `~/notes/` to `~/notes/agent-substrate/` to mirror the existing `~/notes/openshell-on-substrate/` layout. Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>

a4-a4s1 · 2026-05-27T06:01:38Z

Looking at #73 alongside the production-readiness thread already on #20:

feat(actor-template): per-container securityContext #73 (this PR): per-container securityContext — spec-default, workload author opts in.
Support actortemplate secret env #20: per-container env/secret resolution — spec-default, workload author opts in.
ActorTemplate: Enforce that all images must be pinned #51 (closes Prevent user from creating ActorTemplates with un-pinned images #10): image-pinning via ValidatingAdmissionPolicy — operator-owned, cluster-wide.

Two layers showing up — spec-default vs admission-policy — both with K8s precedent.

One Q for maintainers: is a convention forming for which hardening axes land at which layer, or is this contributor-discretion right now? The first axis merged tends to anchor the shape for the rest.

[🤖a4s1]

Michael Taufen (mtaufen) · 2026-05-27T18:13:31Z

Where does the openshell-sandbox supervisor sit in the architecture? Is it duplicating some of the jobs ateom has, or is it more like runsc or a vmm? If the latter, we have plans to support multiple sandbox technologies, maybe it fits best there? Taahir Ahmed (@ahmedtd) curious for your thoughts

Davanum Srinivas (dims) · 2026-05-27T21:39:18Z

Michael Taufen (@mtaufen) openshell-sandbox is an in-container supervisor that runs as PID 1 of the workload container, inside the gVisor sandbox that ateom-gvisor + runsc already set up. From my understanding, it looks like this right now:

ateom-gvisor → runsc (single gVisor sandbox kernel)
                ├── pause container        (pid ns A, mount ns A, ...)
                │   └── PID 1: /pause
                └── supervisor container   (pid ns B, mount ns B, ...)
                    └── PID 1: openshell-sandbox  ← here
                        └── PID 2: workload (e.g. python3 agent.py)

It's just a regular Linux process under runsc, not a peer of runsc or a duplicate of ateom. Its job is application-layer policy enforcement: OPA/rego rules over outbound syscalls and HTTP, Landlock filesystem allowlists, capability bounding, and drop_privileges from root to a non-root UID before execve-ing the actual workload. None of that overlaps with what ateom or runsc do.

This PR is needed because that supervisor has to start with a couple of capabilities (CAP_SETUID / CAP_SETGID for the privilege drop, etc.) and a configurable runAsUser — knobs that the driver currently can't request through ActorTemplate. The multi-sandbox-technology direction is orthogonal: openshell-sandbox would still want this same securityContext regardless of whether the underlying sandbox is runsc, Kata, or anything else.

…unAs) Add an opt-in `securityContext` block on `ActorTemplate.spec.containers[]` carrying two K8s-shape sub-fields: - capabilities.add: []string of Linux caps to add on top of the default sandbox set (CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE) - runAsUser/runAsGroup: *int64, the UID/GID the container's entrypoint starts as Empty templates produce the same OCI bundle as before. The pause container is unaffected — it always runs as root with the default sandbox cap set. Plumbing: ActorTemplate.spec.containers[].securityContext → ateletpb.Container.security_context → atelet's prepareOCIDirectory (via prepareOCIBundles) → OCI process.capabilities.{Bounding,Effective,Inheritable,Permitted} and process.user.{uid,gid} `resolveCapabilities` in cmd/atelet/oci.go normalises each entry to its CAP_… form so templates may write either `NET_ADMIN` or `CAP_NET_ADMIN`, and de-duplicates against the default set. `RunAsUser` / `RunAsGroup` are bare `int64` on the wire but `*int64` in the CRD. At the proto boundary "unset" and "0" both mean root, and atelet's OCI bundle builder collapses them into the same Process.User block. The CRD shape keeps `*int64` so K8s users can express the usual "unset vs. explicit 0" distinction in YAML even though the runtime ignores it. The two halves are useful together: `Capabilities.Add` alone only enables `setresuid` inside the running process (useful for supervisors that drop privileges mid-startup), but the entry point still runs as root until they do. `RunAsUser` is the field that makes the container actually *start* at a non-root UID. A gVisor compatibility spike confirmed runsc honours the OCI cap set exactly: granting CAP_SETUID/CAP_SETGID unblocks `setresuid` inside the actor, while `unshare(CLONE_NEWNET)` remains refused regardless of caps (architectural refusal in the sentry, unrelated to capability bits). The motivating workload is NVIDIA OpenShell's `openshell-sandbox` — an in-container policy supervisor that runs as PID 1 of the workload's sub-container, needs CAP_NET_ADMIN/CAP_SETUID/CAP_SETGID to configure user namespaces and prepare the workload's filesystem, then drops privileges to a non-root UID before exec'ing the inner workload. The field godoc and CRD description describe this in generic terms; the test fixtures use `app` / `registry.example/app:test` rather than naming the downstream consumer. Tests cover `resolveCapabilities` normalisation/dedup/blank-skip and a round-trip DeepCopy of `ContainerSecurityContext`. Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Davanum Srinivas (dims) changed the title ~~[WIP] Feat/actor template capabilities~~ feat(actor-template): per-container securityContext May 24, 2026

Taahir Ahmed (ahmedtd) self-requested a review May 26, 2026 23:23

Taahir Ahmed (ahmedtd) self-assigned this May 26, 2026

Benjamin Elder (BenTheElder) added the feature An enhancement / feature request or implementation label May 27, 2026

Davanum Srinivas (dims) force-pushed the feat/actor-template-capabilities branch from 1f42a2a to 5eb794e Compare May 27, 2026 21:57

Davanum Srinivas (dims) force-pushed the feat/actor-template-capabilities branch from 5eb794e to 45ad373 Compare May 27, 2026 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(actor-template): per-container securityContext#73

feat(actor-template): per-container securityContext#73
Davanum Srinivas (dims) wants to merge 1 commit into
agent-substrate:mainfrom
dims:feat/actor-template-capabilities

Davanum Srinivas (dims) commented May 24, 2026 •

edited

Loading

Uh oh!

a4-a4s1 Bot commented May 27, 2026

Uh oh!

Michael Taufen (mtaufen) commented May 27, 2026

Uh oh!

Davanum Srinivas (dims) commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Davanum Srinivas (dims) commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a4-a4s1 Bot commented May 27, 2026

Uh oh!

Michael Taufen (mtaufen) commented May 27, 2026

Uh oh!

Davanum Srinivas (dims) commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Davanum Srinivas (dims) commented May 24, 2026 •

edited

Loading