feat(actor-template): per-container securityContext#73
feat(actor-template): per-container securityContext#73Davanum Srinivas (dims) wants to merge 1 commit into
Conversation
…text) The substrate-side PR #73 — per-container `securityContext` on `ActorTemplate.spec.containers[]` with both `capabilities.add` and `runAsUser` / `runAsGroup` — is the field that lets this driver's `synthesize_template` start emitting capability adds and a non-root supervisor start UID once it merges. Empty templates produce the same OCI bundle as before; opt-in per container. Surface the PR in three places: the top-of-doc header in poc-intro (alongside #66 and #67), the §3 "Companion changes" component table, and the §9 "Where to next" item 8 that was previously an open TODO about capability plumbing. Also tidy the embedded `~/notes/...` references in poc-intro: the local agent-substrate notes (kind-local-dev runbook, Shorewall recipe) moved from `~/notes/` to `~/notes/agent-substrate/` to mirror the existing `~/notes/openshell-on-substrate/` layout. Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
|
Looking at #73 alongside the production-readiness thread already on #20:
Two layers showing up — spec-default vs admission-policy — both with K8s precedent. One Q for maintainers: is a convention forming for which hardening axes land at which layer, or is this contributor-discretion right now? The first axis merged tends to anchor the shape for the rest. [🤖a4s1] |
|
Where does the openshell-sandbox supervisor sit in the architecture? Is it duplicating some of the jobs ateom has, or is it more like runsc or a vmm? If the latter, we have plans to support multiple sandbox technologies, maybe it fits best there? Taahir Ahmed (@ahmedtd) curious for your thoughts |
|
Michael Taufen (@mtaufen) openshell-sandbox is an in-container supervisor that runs as PID 1 of the workload container, inside the gVisor sandbox that ateom-gvisor + runsc already set up. From my understanding, it looks like this right now: It's just a regular Linux process under runsc, not a peer of runsc or a duplicate of ateom. Its job is application-layer policy enforcement: OPA/rego rules over outbound syscalls and HTTP, Landlock filesystem allowlists, capability bounding, and drop_privileges from root to a non-root UID before execve-ing the actual workload. None of that overlaps with what ateom or runsc do. This PR is needed because that supervisor has to start with a couple of capabilities (CAP_SETUID / CAP_SETGID for the privilege drop, etc.) and a configurable runAsUser — knobs that the driver currently can't request through ActorTemplate. The multi-sandbox-technology direction is orthogonal: openshell-sandbox would still want this same securityContext regardless of whether the underlying sandbox is runsc, Kata, or anything else. |
1f42a2a to
5eb794e
Compare
…unAs)
Add an opt-in `securityContext` block on `ActorTemplate.spec.containers[]`
carrying two K8s-shape sub-fields:
- capabilities.add: []string of Linux caps to add on top of the
default sandbox set (CAP_AUDIT_WRITE,
CAP_KILL, CAP_NET_BIND_SERVICE)
- runAsUser/runAsGroup: *int64, the UID/GID the container's
entrypoint starts as
Empty templates produce the same OCI bundle as before. The pause
container is unaffected — it always runs as root with the default
sandbox cap set.
Plumbing:
ActorTemplate.spec.containers[].securityContext
→ ateletpb.Container.security_context
→ atelet's prepareOCIDirectory (via prepareOCIBundles)
→ OCI process.capabilities.{Bounding,Effective,Inheritable,Permitted}
and process.user.{uid,gid}
`resolveCapabilities` in cmd/atelet/oci.go normalises each entry to
its CAP_… form so templates may write either `NET_ADMIN` or
`CAP_NET_ADMIN`, and de-duplicates against the default set.
`RunAsUser` / `RunAsGroup` are bare `int64` on the wire but `*int64`
in the CRD. At the proto boundary "unset" and "0" both mean root, and
atelet's OCI bundle builder collapses them into the same Process.User
block. The CRD shape keeps `*int64` so K8s users can express the
usual "unset vs. explicit 0" distinction in YAML even though the
runtime ignores it.
The two halves are useful together: `Capabilities.Add` alone only
enables `setresuid` inside the running process (useful for
supervisors that drop privileges mid-startup), but the entry point
still runs as root until they do. `RunAsUser` is the field that
makes the container actually *start* at a non-root UID.
A gVisor compatibility spike confirmed runsc honours the OCI cap set
exactly: granting CAP_SETUID/CAP_SETGID unblocks `setresuid` inside
the actor, while `unshare(CLONE_NEWNET)` remains refused regardless
of caps (architectural refusal in the sentry, unrelated to capability
bits).
The motivating workload is NVIDIA OpenShell's `openshell-sandbox` —
an in-container policy supervisor that runs as PID 1 of the
workload's sub-container, needs CAP_NET_ADMIN/CAP_SETUID/CAP_SETGID
to configure user namespaces and prepare the workload's filesystem,
then drops privileges to a non-root UID before exec'ing the inner
workload. The field godoc and CRD description describe this in
generic terms; the test fixtures use `app` / `registry.example/app:test`
rather than naming the downstream consumer.
Tests cover `resolveCapabilities` normalisation/dedup/blank-skip and
a round-trip DeepCopy of `ContainerSecurityContext`.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
5eb794e to
45ad373
Compare
Add an opt-in
securityContextblock onActorTemplate.spec.containers[], plumbed throughateletpbto atelet's OCI bundle builder. Templates that omit it produce the same OCI bundle as before.Two fields are exposed:
capabilities.add— Linux capabilities to grant on top of the default sandbox set (CAP_AUDIT_WRITE,CAP_KILL,CAP_NET_BIND_SERVICE). Entries may be written with or without theCAP_prefix; case is normalised; duplicates collapse against the defaults.runAsUser/runAsGroup— the UID and GID to start the container process as. Unset preserves atelet's existing default of root.The motivating workload is NVIDIA OpenShell's
openshell-sandboxsupervisor, which needsCAP_NET_ADMIN,CAP_SETUID,CAP_SETGIDto configure the actor's network and user namespaces, and a non-root start UID for the supervisor process itself. Capabilities alone are not enough — the entry point still runs as root until something drops privileges.Test plan:
go vetclean on touched packagesresolveCapabilities: defaults, prefix normalisation, case folding, dedup, blank-entry skipContainerSecurityContextDeepCopy round-trip with pointer-isolation assertions forCapabilities.AddandRunAsUsercmd/ateapi/internal/controlapiworkflow tests pass with the new copy block in resume + suspend