feat: rehydrate sandbox registry from Docker labels on startup#1
Merged
Conversation
Manager.SetTimeout had the same state-blind ExpiresAt check that Manager.Connect had before 88f0245 — a paused sandbox past its original running-TTL returned ErrExpired, so any SDK code path that calls sandbox.setTimeout() on a paused sandbox before connect() hit 410. Same bug class, same fix: the running-TTL check only applies to running sandboxes. Paused sandboxes follow the pause-cycle reaper (FreezeDuration → demote, StoppedGCAfter → destroy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
edvabe's sandbox registry is in-memory, so an edvabe restart previously left running/paused containers in Docker but invisible to SDK reconnect (404 or 410 depending on state). Manager.Rehydrate rebuilds the registry from Docker-label + container-state metadata on serve startup. Sandbox-level identity is persisted as immutable Docker labels at Create time: template id, envd/traffic tokens, on-timeout mode. Mutable fields (ExpiresAt, PausedAt) are reset to conservative defaults on rehydration — running sandboxes get now+DefaultTimeout (SDK typically re-extends shortly after), paused get a past ExpiresAt so only the pause-cycle reaper owns them. Containers created before this change lack the new labels; they still rehydrate state + metadata + resource limits, and TemplateID is recovered via an image-label fallback so they remain destroyable. The envd access token isn't recoverable for them, so SDK reconnect will fail envd auth — documented in the CHANGELOG as a migration note. Rehydrate runs once before the HTTP listener starts, is non-fatal on error, and logs skipped containers via slog. The runtime interface gains ListManaged + a ManagedContainer DTO so the sandbox package owns the label interpretation and the runtime stays dumb. LabelMetaPrefix is centralized in the runtime package so docker, noop, and the sandbox-label reader share one source of truth for the user-metadata prefix. Tests: unit round-trip through noop runtime, plus stopped-resume, pre-upgrade-container, and no-clobber edge cases; integration test against real Docker verifies ListManaged returns faithful labels, state, and agent endpoint. End-to-end verified against a live paused sandbox. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
feat):edvabe servenow enumerates everyedvabe.managed=truecontainer and rebuilds its in-memory sandbox registry from Docker-label + container-state metadata. Paused sandboxes survive edvabe restarts; SDKreconnect(sandboxId)works across them.SetTimeout(fix): same bug class as88f0245(Connect).sandbox.setTimeout()on a paused sandbox past its original TTL no longer returns 410.edvabe.sandbox.template.id,edvabe.sandbox.template.alias,edvabe.sandbox.token.envd,edvabe.sandbox.token.traffic,edvabe.sandbox.ontimeout. Mutable fields (ExpiresAt,PausedAt) are reset to conservative defaults on rehydration.Pre-upgrade caveat
Containers created before this change lack the new labels. They still rehydrate with
State,PauseMode, resource limits, metadata, andTemplateID(via anedvabe.template.idimage-label fallback), and they remain destroyable. The envd access token isn't recoverable — SDK reconnect against pre-upgrade sandboxes will fail envd auth. Workflow: destroy and recreate after deploying this.Test plan
go test ./...— unit suite green, incl. four newTestRehydrate*covering running / paused-frozen / paused-stopped / pre-upgrade / no-clobber +TestSetTimeoutExtendsPausedSandboxPastOriginalTTL.go test -tags=integration ./internal/runtime/docker/— integration testTestDockerRuntimeListManagedReturnsLabeledContainersverifies real Docker round-trip.rehydrated sandboxes from existing containers count=1,GET /sandboxes/{id}returns correct state + template + CPU/mem,POST /timeoutreturns 204.v0.1.2, thendocker compose pull edvabe && docker compose up -d edvabein downstream projects.Docs
docs/05-architecture.mdwith the full label-schema table + rationale for which fields are recovered vs defaulted.Fixedbullet for SetTimeout, oneAddedbullet for rehydration with the pre-upgrade caveat).🤖 Generated with Claude Code