Skip to content

feat: rehydrate sandbox registry from Docker labels on startup#1

Merged
jonasnobile merged 2 commits into
mainfrom
feat/rehydrate-sandboxes-on-reset
Apr 24, 2026
Merged

feat: rehydrate sandbox registry from Docker labels on startup#1
jonasnobile merged 2 commits into
mainfrom
feat/rehydrate-sandboxes-on-reset

Conversation

@jonasnobile

Copy link
Copy Markdown
Member

Summary

  • Rehydration on startup (feat): edvabe serve now enumerates every edvabe.managed=true container and rebuilds its in-memory sandbox registry from Docker-label + container-state metadata. Paused sandboxes survive edvabe restarts; SDK reconnect(sandboxId) works across them.
  • Paused-state guard for SetTimeout (fix): same bug class as 88f0245 (Connect). sandbox.setTimeout() on a paused sandbox past its original TTL no longer returns 410.
  • Label schema: sandbox-level identity is persisted as immutable Docker labels at Create — edvabe.sandbox.template.id, edvabe.sandbox.template.alias, edvabe.sandbox.token.envd, edvabe.sandbox.token.traffic, edvabe.sandbox.ontimeout. Mutable fields (ExpiresAt, PausedAt) are reset to conservative defaults on rehydration.

Pre-upgrade caveat

Containers created before this change lack the new labels. They still rehydrate with State, PauseMode, resource limits, metadata, and TemplateID (via an edvabe.template.id image-label fallback), and they remain destroyable. The envd access token isn't recoverable — SDK reconnect against pre-upgrade sandboxes will fail envd auth. Workflow: destroy and recreate after deploying this.

Test plan

  • go test ./... — unit suite green, incl. four new TestRehydrate* covering running / paused-frozen / paused-stopped / pre-upgrade / no-clobber + TestSetTimeoutExtendsPausedSandboxPastOriginalTTL.
  • go test -tags=integration ./internal/runtime/docker/ — integration test TestDockerRuntimeListManagedReturnsLabeledContainers verifies real Docker round-trip.
  • End-to-end manual: ran the binary against a live paused sandbox, log shows rehydrated sandboxes from existing containers count=1, GET /sandboxes/{id} returns correct state + template + CPU/mem, POST /timeout returns 204.
  • After merge: bump to v0.1.2, then docker compose pull edvabe && docker compose up -d edvabe in downstream projects.

Docs

  • New "Sandbox registry persistence" section in docs/05-architecture.md with the full label-schema table + rationale for which fields are recovered vs defaulted.
  • CHANGELOG entries under Unreleased (one Fixed bullet for SetTimeout, one Added bullet for rehydration with the pre-upgrade caveat).

🤖 Generated with Claude Code

jonasnobile and others added 2 commits April 24, 2026 16:43
Manager.SetTimeout had the same state-blind ExpiresAt check that
Manager.Connect had before 88f0245 — a paused sandbox past its
original running-TTL returned ErrExpired, so any SDK code path that
calls sandbox.setTimeout() on a paused sandbox before connect() hit
410. Same bug class, same fix: the running-TTL check only applies to
running sandboxes. Paused sandboxes follow the pause-cycle reaper
(FreezeDuration → demote, StoppedGCAfter → destroy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
edvabe's sandbox registry is in-memory, so an edvabe restart
previously left running/paused containers in Docker but invisible to
SDK reconnect (404 or 410 depending on state). Manager.Rehydrate
rebuilds the registry from Docker-label + container-state metadata
on serve startup.

Sandbox-level identity is persisted as immutable Docker labels at
Create time: template id, envd/traffic tokens, on-timeout mode.
Mutable fields (ExpiresAt, PausedAt) are reset to conservative
defaults on rehydration — running sandboxes get now+DefaultTimeout
(SDK typically re-extends shortly after), paused get a past
ExpiresAt so only the pause-cycle reaper owns them.

Containers created before this change lack the new labels; they
still rehydrate state + metadata + resource limits, and TemplateID
is recovered via an image-label fallback so they remain
destroyable. The envd access token isn't recoverable for them, so
SDK reconnect will fail envd auth — documented in the CHANGELOG as
a migration note.

Rehydrate runs once before the HTTP listener starts, is non-fatal
on error, and logs skipped containers via slog. The runtime
interface gains ListManaged + a ManagedContainer DTO so the
sandbox package owns the label interpretation and the runtime stays
dumb. LabelMetaPrefix is centralized in the runtime package so
docker, noop, and the sandbox-label reader share one source of
truth for the user-metadata prefix.

Tests: unit round-trip through noop runtime, plus stopped-resume,
pre-upgrade-container, and no-clobber edge cases; integration test
against real Docker verifies ListManaged returns faithful labels,
state, and agent endpoint. End-to-end verified against a live
paused sandbox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jonasnobile jonasnobile merged commit d17259f into main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant