Skip to content

Migrate the API backend to FastAPI (uvicorn · SSE · Pydantic · Scalar)#53

Merged
thalida merged 31 commits into
mainfrom
worktree-fastapi-migration
Jun 10, 2026
Merged

Migrate the API backend to FastAPI (uvicorn · SSE · Pydantic · Scalar)#53
thalida merged 31 commits into
mainfrom
worktree-fastapi-migration

Conversation

@thalida

@thalida thalida commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Replaces the hand-rolled http.server backend (~1k-line api/server.py) with a clean, layered FastAPI app on uvicorn. Same product behavior, much simpler internals:

  • Layered structure — thin routers/ over framework-agnostic services/; Pydantic wire models/; security.py (the allowed_roots trust set); static.py (SPA serving).
  • SSE scan streamGET /api/manifest is now Server-Sent Events (named events) instead of bespoke NDJSON + a select() disconnect watchdog. Disconnect/cancel is handled by Starlette's request.is_disconnected().
  • Pydantic + generated typesapi/models/*.py are the single source of truth → OpenAPI → generated app/src/types/manifest.generated.ts (with a compile-time drift guard). Kills the old hand-synced manifest.ts drift.
  • Scalar docs at /api/docs; OpenAPI at /api/openapi.json.
  • Single uvicorn process by design — the in-memory trust set can't be split across workers (documented in security.py).

Notable changes beyond the straight port

  • SSE event names describe what they deliver, not sequence position: clone-progress, scan-progress, manifest-partial, manifest-complete, error (synced across wire ↔ scan_tree phases ↔ Pydantic models ↔ frontend ScanPhase ↔ generated types).
  • Clone progress + mid-clone cancel restored: the git clone runs on the SSE worker thread, streaming clone-progress and abortable on disconnect.
  • Source resolution extracted to api/services/source.py; single-sourced the cache root in config.CACHE_ROOT (clone gets CLONES_ROOT).
  • Frontend streamManifest rewritten fetch+NDJSON → EventSource, same async-iterable contract; malformed frames reject instead of hanging.
  • Whole project is pyright-strict clean (the pre-existing scan/cache debt was fixed too).

Test plan

  • just test (api pytest + app vitest, in containers) — green (209 api, 2160 app).
  • uv run pyright → 0 errors; cd app && npm run typecheck && npm run build → clean.
  • Pre-push gate (pytest · vitest · eslint · prettier · typecheck) passes.
  • Manual browser smoke (recommended before merge): just dev, load a git URL → cloning→scanning→skeleton→final overlay advances, city builds, file preview + commit pane work; load a local repo via just dev <path> and confirm live-update poll.

Breaking / follow-ups

  • Breaking wire change: SSE event names changed (cloningclone-progress, etc.). Backend + frontend ship together, so no concern for this app's single-bundle deploy.
  • Deferred (tracked in TODO.md): streaming gzip over SSE (ships uncompressed first — measure on a large repo before adding); wire-model optionality accuracy (GitMeta/RepoInfo Pydantic = None vs always-emitted).

🤖 Generated with Claude Code

thalida and others added 30 commits June 2, 2026 20:28
… change)

Creates api/services/ package and relocates the four data-layer modules;
updates all intra-service relative imports, test import paths, and
mock.patch string targets to the new api.services.* namespace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r docs

Adds api/app.py (create_app factory), api/routers/health.py (/api/health +
/api/config), api/static.py (SPA fallback with extension-based 404 guard and
fresh-router-per-app isolation), Scalar docs at /api/docs, OpenAPI JSON
relocated to /api/openapi.json. Adds httpx2 dev dep — Starlette 1.2.x's
TestClient prefers it (import httpx2 as httpx) and deprecates plain httpx.
All 8 TDD tests pass warning-free; 0 pyright strict errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds classify/resolve_source helpers and the GET /api/manifest/signature
and DELETE /api/manifest/cache FastAPI routes (SSE stream is Task 10).
All status codes and error messages match the legacy server.py behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relocate scanner TypedDicts -> api/services/manifest_types.py, FileEntry ->
cache.py, exceptions -> owning service modules. Delete server.py/types.py/
_reload.py. python -m api now launches a single uvicorn process; app.py
exposes a module-level app for the uvicorn import string.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
python -m api now launches a single uvicorn process (api.app:app). The
STATIC_DIR comment referenced the deleted api/server.py; point it at
api/app.py's DEFAULT_STATIC_DIR. docker-compose.dev.yml --reload flow is
unchanged (uvicorn --reload via __main__). Live docker smoke-test deferred
to the frontend swap (Task 15).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose the SSE event models (-> Manifest -> tree) in the OpenAPI schema via
the /api/manifest route's documented responses. Add scripts/gen_openapi.py +
a 'just gen-types' recipe that emits app/src/types/manifest.generated.ts via
openapi-typescript. manifest.ts stays hand-written (NodeKind is a frontend
scene enum); a new manifest.contract.ts compile-time guard fails typecheck if
the hand-written wire field sets drift from the generated schema.

openapi-typescript declares peer typescript@^5 but the app is on TS 6; pin
legacy-peer-deps in app/.npmrc so npm ci (CI/Docker/compose) reproduces the
install. The generated file is excluded from prettier/eslint (it is codegen).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Event contract)

streamManifest now bridges EventSource (push) to an async iterable, closing on
final/error to avoid reconnect storms. Server-describable failures arrive as a
named error event; transport drops reject. EventSource ctor is injectable for
tests. ScanPhase/ScanStreamEvent and all URL builders unchanged; callers untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
README: note the uvicorn single-process backend, SSE scan stream, Scalar API
docs at /api/docs, and OpenAPI-driven type generation. TODO: record the
deferred streaming-gzip-over-SSE measurement, the pre-existing scan/cache
pyright-strict debt, and the wire-model optionality accuracy follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…0 errors)

Type-only changes, behavior bit-identical (snapshot/signature tests unchanged):
- compute_tree_signature: type tree_root as DirNode, _walk as TreeNode, narrow
  on the 'type' discriminator before touching children (files never recurse).
- drop the dead 'if repo_info is not None' guard (_collect_repo_info always
  returns a RepoInfo).
- model the derived CommitEntry.same_day_total as NotRequired on the internal
  TypedDict — it is absent at collection/cache-load and baked in-place by
  _annotate_same_day_totals at wrap; the wire Pydantic model keeps it required.
- parse cached JSON via cast(dict[str, object], ...)/(list[object], ...) then
  isinstance-narrow each field, matching the existing _coerce pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Task-14 .npmrc (legacy-peer-deps=true, for openapi-typescript's stale
peer range vs TS 6) reached local/compose npm ci but NOT the Dockerfile build,
which COPYs only package.json + package-lock.json before npm ci — so the
image build failed with ERESOLVE. Copy .npmrc alongside the manifests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- static.py: register the SPA catch-all via add_api_route (not a decorated
  nested fn), dropping the reportUnusedFunction ignore. Same for app.py's
  Scalar + error handlers (now module-level, registered by reference). Remove
  the cargo-culted reportUnusedFunction ignores on the module-level routes.
- commit.py: _build_authors_list -> build_authors_list (public; it's used
  across modules) instead of suppressing reportPrivateUsage. No pyright
  ignores remain in api/.
- routers/health.py -> routers/meta.py: the router holds health + config
  (both server-meta endpoints, already tagged 'meta'); the filename now
  matches the grouping instead of implying health-only.
- cache.py: condense the long cache-version changelog comments (the per-bump
  archaeology lives in git history).
- manifest_types.py: docstring now explains WHY it's a standalone leaf module
  (shared by scan.py + cache.py without an import cycle).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#1/#2 clone progress + cancel: the SSE manifest stream now classifies the
source without cloning, then clones on the worker thread — emitting the
'cloning' event FIRST and streaming git's stage/percent via on_progress, with
the disconnect poll covering the clone. ensure_clone/_run_git_streaming gain a
cancel_event: a cancel watcher kills git so a mid-clone disconnect aborts the
clone instead of orphaning it.
#3 unexpected scan errors are now logged (logger.exception) before the error
event, not silently swallowed.
#4 frontend: JSON.parse in the SSE listeners is guarded — a malformed/truncated
frame rejects the iterator instead of hanging it forever.
#5 /api/file media responses set Content-Encoding: identity so GZipMiddleware
skips re-compressing already-compressed image/video/audio/pdf bytes.
#6 single-source the {error} shape: _api_error_handler returns ErrorResponse;
SSE error events go through ErrorEvent (no more hand-built dicts / dead model).
#7 create_app no longer resets the process-global TRUST set (factory is now
side-effect-free); tests isolate it via an autouse conftest fixture.
#8 same_day_total: regression test asserts both skeleton and final emits carry
it (NotRequired internally, required on the wire).

Also drops the redundant final_holder['err'] guard. Backend 209 + pyright 0;
frontend 2160 + typecheck/build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The wire event names + their Pydantic models + the frontend ScanPhase are
renamed end-to-end so each event says what it DELIVERS, never its position
('final' was wrong-by-design — it breaks if you add a stage):

  cloning  -> clone-progress      (CloneProgressEvent  / ScanPhase.CloneProgress)
  scanning -> scan-progress       (ScanProgressEvent   / ScanPhase.ScanProgress)
  skeleton -> manifest-partial    (PartialManifestEvent/ ScanPhase.PartialManifest)
  final    -> manifest-complete   (CompleteManifestEvent/ScanPhase.CompleteManifest)
  error    -> error               (ErrorEvent / ScanPhase.Error)

In sync across: wire event names, scan_tree's internal phase strings +
ScanStreamEvent, api/models/events.py, the OpenAPI responses doc, the frontend
ScanPhase enum + listeners + all consumers (loadingReactions, useManifestSource),
tests, and the regenerated manifest.generated.ts. The separate user-facing
LoadingStep UI vocabulary is intentionally left as-is.

Backend 209 + pyright 0; frontend 2160 + typecheck/lint/build clean; verified
live (event: scan-progress -> manifest-partial -> manifest-complete).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ResolveError/Resolved + classify/resolve_local/resolve_source were defined
inline in the manifest router, but they're framework-agnostic domain logic (no
FastAPI), not Pydantic wire models — so they belong in the service layer beside
scan/cache/clone/media, not in models/ (the OpenAPI-serialization layer) and not
in the router. The router is now a thin handler that imports from the service;
the resolution exception lives next to the code that raises it (like
ScanCancelledError in scan.py, CloneError in clone.py). _resolve_local ->
resolve_local (now crosses a module boundary). No behavior change; backend 209
+ pyright 0, coverage retained via the existing route tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cache.py and clone.py each independently read CODECITY_CACHE_ROOT (with the same
default), and clone's constant was confusingly also named CACHE_ROOT while
actually meaning the clones/ subdir. Move the env-driven base to config.CACHE_ROOT
(the env-settings home, beside MAX_FILE_BYTES/GZIP_MIN_BYTES) as the single source;
cache.py imports it, clone.py derives CLONES_ROOT = CACHE_ROOT / 'clones'. Not
'pull from cache' — that would couple clone->cache and break the import-time
monkeypatch; config is the shared owner both consume. Tests patch the per-module
attrs (cache.CACHE_ROOT / clone.CLONES_ROOT) as before. Backend 209 + pyright 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The static file server's `GET /{full_path}` catch-all was showing up in
/api/openapi.json (and so in the Scalar docs) as a bogus API endpoint. Mark it
include_in_schema=False — it's the SPA fallback, not an API route. Serving is
unchanged (the flag only affects the schema). Regenerated manifest.generated.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ainer-safe

The test container sets GIT_AUTHOR_NAME=Test Runner (for the test_scan git
tests); that env var outranks the fixture's `git config user.name Tester`, so
the commit's %an became 'Test Runner' and the author assertion failed in the
container (passed locally where no such env var exists). --author on git commit
outranks the env var, making %an deterministic.
No Python formatter existed (only pyright). Add ruff with Black-compatible
defaults (88 cols, double quotes, magic trailing commas). New `ruff` service
in docker-compose.test.yml runs `ruff format --check` reproducibly like
pytest/vitest; wired into the pre-push hook (now 1..6) and CI so it gates like
the frontend's prettier. `just fmt` applies it (local uv run, like gen-types).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One-time reformat of the never-formatted backend to ruff defaults (36 files).
Purely cosmetic — 209 tests pass, pyright 0 errors, behavior unchanged.
The Pydantic models declared several always-emitted fields as optional (and
several absent-or-value fields as nullable), so the OpenAPI/Scalar docs +
generated TS didn't match what the scanner actually sends:

- GitMeta.created/modified, RepoInfo.* -> required-nullable (always present,
  can be null): drop the '= None' default.
- FileNode.media_width/height, Manifest.display_root, and the SSE event models'
  display_root/stage/percent/files_scanned -> optional-but-NON-nullable
  (absent-or-value, never null): keep the Optional Python type but emit a
  non-nullable JSON schema via pydantic.WithJsonSchema (shared OptionalInt/
  OptionalStr aliases). Avoids the pyright-strict 'None not assignable to int'
  that a plain non-Optional default would trip.

Generated TS now matches the hand-written manifest.ts exactly, so the drift
guard is upgraded: pure-scalar types (GitMeta/RepoInfo/CommitEntry/
ExtBreakdownEntry/BusynessThresholds) now use DEEP type equality (catches
optionality/nullability drift), not just key sets. Verified the deep guard bites.

Backend 209 + pyright 0 + ruff clean; frontend 2160 + typecheck/build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /api/manifest SSE stream shipped uncompressed (Starlette's GZipMiddleware
skips text/event-stream because it buffers, which would stall the live stream).
Measured: a 379-file repo's manifest is 250 KB -> 28 KB gzip (8.8x); JSON
compresses ~9x, so big repos win a lot.

Add SSEGZipMiddleware: an ASGI middleware that gzips text/event-stream with a
Z_SYNC_FLUSH after every event, so the skeleton/progress events still arrive
early (Z_FINISH writes the trailer on the last body). Content-negotiated on
Accept-Encoding (every browser sends gzip; raw sockets / odd proxies get
identity). Browsers + httpx decode EventSource gzip transparently. gzip (stdlib,
zero deps) over brotli — brotli would shave ~15-25% more but needs a dependency;
swappable in the one middleware if that changes.

Verified live (gzip magic bytes + round-trip) and via tests. Backend 211 +
pyright 0 + ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thalida thalida merged commit 718f627 into main Jun 10, 2026
1 check passed
@thalida thalida deleted the worktree-fastapi-migration branch June 10, 2026 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant