| status | done | ||
|---|---|---|---|
| depends | |||
| specs |
|
||
| issues |
|
||
| pr | 94 |
specs/behaviors/storage.md specifies that binary blobs (avatars, buzz images, featured-project hero images) live alongside their owning record via gitsheets' attachment plumbing, and are served via a streamed GET /api/attachments/<key> route. Serializers already construct these URLs (e.g. projects-members.ts:33 builds /api/attachments/${person.avatarKey} for avatarUrl), but the route doesn't exist — every reference 404s.
This is a pre-req for #32 (avatar upload): even once upload lands, the file can't be read back without this.
Closes #77.
- behaviors/storage.md#attachments — "Web serves attachments via a streamed
GET /api/attachments/<key>route with cache headers." - api/people.md —
avatarUrlshape (/api/attachments/<avatarKey>). - api/projects-buzz.md —
imageUrlshape (/api/attachments/<imageKey>).
GET /api/attachments/* (wildcard segment) in apps/api/src/routes/attachments.ts. The wildcard captures the full path-after-prefix as a single string, which is the gitsheets attachment key per spec (e.g. people/janedoe/avatar.jpg, project-buzz/squadquest/inquirer-praises-foo/image.jpg).
Bare-repo invariant (storage.md → "The data clone is bare") means we can't read files off disk; we read blobs from the git object DB. The attachment key IS the path in the HEAD tree, so:
git cat-file blob HEAD:<key>
streams the bytes. The route spawns this as a child process, pipes stdout into the Fastify reply, and sets headers.
The alternative — going through sheet.getAttachment(record, name) — requires parsing the key to identify (sheet, record, attachment-name), looking up the record in memory, then resolving the blob via the cached dataTree on the standing Sheet (which is itself a known staleness vector per #47). Plumbing skips both burdens.
Before invoking git, validate the key:
- Must not start with
/(Fastify wildcard already strips the leading/, but defensive). - Must not contain
..segments (path traversal). - Must not contain empty segments (
//, leading/trailing/). - Must not contain control characters or null bytes.
Invalid → 400. The git cat-file invocation itself sanitizes against most exploits because we pass HEAD:<key> as a single argument (not a shell string), but defense in depth.
Infer from the file extension. The set we care about today:
- Images: jpg/jpeg, png, gif, webp, svg
- Documents: pdf
- Fallback:
application/octet-stream
A small in-file table; if we ever want a larger set, lift the table from gitsheets' own inferMimeType (it has ~30 entries; not worth importing for our subset).
Cache-Control: public, max-age=3600 (1 hour). Path-keyed attachments mean an update to (say) people/janedoe/avatar.jpg keeps the same URL — we can't use immutable + long max-age. Future polish: derive an ETag from git rev-parse HEAD:<key> (the blob hash) so conditional GETs short-circuit with 304. Skipped for v1 — short max-age is good enough; ETag adds an extra git invocation per request.
- 404 —
git cat-fileexits non-zero (path not in HEAD tree) - 400 — path validation fails
- 500 — git invocation errors not matching the 404 case (rare; probably a corrupt repo)
All 4xx/5xx use the standard response envelope from lib/response.ts (the route helpers handle that).
git cat-file blob writes to stdout; the child's stdout is a Readable. Fastify accepts a Readable as the response body and streams it. For large attachments (PDF, hi-res images), this avoids buffering into memory. No size limit at the route level — the data repo's own write surface is what gates upload size.
-
GET /api/attachments/people/<slug>/avatar.pngagainst a seeded attachment returns 200 + the bytes + correct Content-Type. -
GET /api/attachments/<nonexistent>returns 404 with the standard envelope. - URL-based path traversal (
/api/attachments/../etc/passwd) — Fastify normalizes upstream of the handler, so these never reach our validator and never serve 200 from outside the data repo. Test confirmsstatusCode !== 200. - Null-byte-in-key (
%00) returns 422 from the validator. - Content-Type inferred for jpg/jpeg/png/gif/webp/avif/svg/pdf; unknown extensions get
application/octet-stream. - Binary content byte-identical end-to-end (test seeds all 256 byte values; route returns them unchanged).
- All 280 API tests pass (274 pre-existing + 6 new).
-
npm run type-check && npm run lintclean.
- Backpressure when the client is slow. Streaming from
git cat-fileto Fastify means a slow client could block the git process. Fastify handles backpressure on the destination side; we don't need to do anything special. - Concurrent attachment serves. Each request spawns a
gitprocess. For high concurrency (hundreds of avatars per page load via project lists), this could fork-bomb. Today's traffic is low; if profiling ever shows it, we can pool or batch. Out of scope. - No range request support. A 200 OK with the full body for every request — no
Range:/206. Fine for small avatars/images; if we ever serve video or very large PDFs, revisit. - No ETag in v1. Path-keyed attachments + 1-hour cache means a user who updates their avatar may see the old one for up to an hour. Acceptable for civic-platform avatars; we can add ETag in a follow-up if it bites.
- Stale gitsheets Sheet dataTree (#47) does NOT apply here. We read from
HEADvia plumbing on every request, not from any cached Sheet handle. So the staleness footgun documented inbehaviors/storage.mdis sidestepped by going direct.
Three commits — plan opening + route + tests/lint-fix.
Surprises:
- Fastify normalizes URL path segments before routing.
/api/attachments/../etc/passwdgets collapsed to/etc/passwdupstream of our handler — the validator never sees the..from URL-borne traversal attempts. The validator's..check is defense in depth, not the primary line. Adjusted the test to assert "doesn't 200 from outside the data repo" rather than "validator returns 422". The validator still catches null bytes, control chars, and leading/which Fastify doesn't normalize. - Hijacking the reply for streaming. First pass used
reply.header()+reply.raw.write()to inject the buffered first chunk, but headers set viareply.header()are tracked by Fastify and only flushed onreply.send()— bypassing send via.raw.write()skips header flush, so the response went out with no Content-Type. Fixed by callingreply.hijack()first (tells Fastify "I own the response from here"), thenreply.raw.writeHead(200, { ... })to write headers directly. - First-chunk-or-exit race.
git cat-fileexits non-zero withfatal:on stderr when a path is missing. Race-stylePromiseonfirstDatavsexitlets us distinguish "exited before any stdout" (translate to 404) from "first chunk arrived" (200 + stream). Zero-byte blobs would currently 404 — edge case not worth designing around for v1 since gitsheets'setAttachmentrejects zero-byte input.
- ETag + 304 conditional GETs. Easy win:
git rev-parse HEAD:<key>gives the blob hash; use it as ETag. WithIf-None-Match, 304 the request instead of streaming the blob. Saves the git-spawn cost on cache hits. Tracked as: low-priority polish; the 1-hour max-age already gives most clients fresh-enough responses. - Range request support. No
Range:handling today — full 200 OK on every request. Fine for small avatars/buzz-images; if we ever serve video or large PDFs, revisit. - Process pool for high concurrency. Each request spawns a
git cat-fileprocess. If a project-list page renders 30 avatars in parallel, that's 30 fork/execs. Today's traffic doesn't warrant pooling; if flamegraphs show fork overhead, switch togit cat-file --batchlong-running mode shared via a pool. - 404 vs 410 for tombstoned records. Plan originally mentioned 410 for tombstoned record attachments. Distinguishing requires re-querying the record by sheet/slug, which adds a lookup per request. Deferred until a concrete user-facing need for the distinction appears.