Migrate the API backend to FastAPI (uvicorn · SSE · Pydantic · Scalar)#53
Merged
Conversation
… change) Creates api/services/ package and relocates the four data-layer modules; updates all intra-service relative imports, test import paths, and mock.patch string targets to the new api.services.* namespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r docs Adds api/app.py (create_app factory), api/routers/health.py (/api/health + /api/config), api/static.py (SPA fallback with extension-based 404 guard and fresh-router-per-app isolation), Scalar docs at /api/docs, OpenAPI JSON relocated to /api/openapi.json. Adds httpx2 dev dep — Starlette 1.2.x's TestClient prefers it (import httpx2 as httpx) and deprecates plain httpx. All 8 TDD tests pass warning-free; 0 pyright strict errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds classify/resolve_source helpers and the GET /api/manifest/signature and DELETE /api/manifest/cache FastAPI routes (SSE stream is Task 10). All status codes and error messages match the legacy server.py behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nnected, cache-on-final)
Relocate scanner TypedDicts -> api/services/manifest_types.py, FileEntry -> cache.py, exceptions -> owning service modules. Delete server.py/types.py/ _reload.py. python -m api now launches a single uvicorn process; app.py exposes a module-level app for the uvicorn import string. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
python -m api now launches a single uvicorn process (api.app:app). The STATIC_DIR comment referenced the deleted api/server.py; point it at api/app.py's DEFAULT_STATIC_DIR. docker-compose.dev.yml --reload flow is unchanged (uvicorn --reload via __main__). Live docker smoke-test deferred to the frontend swap (Task 15). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose the SSE event models (-> Manifest -> tree) in the OpenAPI schema via the /api/manifest route's documented responses. Add scripts/gen_openapi.py + a 'just gen-types' recipe that emits app/src/types/manifest.generated.ts via openapi-typescript. manifest.ts stays hand-written (NodeKind is a frontend scene enum); a new manifest.contract.ts compile-time guard fails typecheck if the hand-written wire field sets drift from the generated schema. openapi-typescript declares peer typescript@^5 but the app is on TS 6; pin legacy-peer-deps in app/.npmrc so npm ci (CI/Docker/compose) reproduces the install. The generated file is excluded from prettier/eslint (it is codegen). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Event contract) streamManifest now bridges EventSource (push) to an async iterable, closing on final/error to avoid reconnect storms. Server-describable failures arrive as a named error event; transport drops reject. EventSource ctor is injectable for tests. ScanPhase/ScanStreamEvent and all URL builders unchanged; callers untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
README: note the uvicorn single-process backend, SSE scan stream, Scalar API docs at /api/docs, and OpenAPI-driven type generation. TODO: record the deferred streaming-gzip-over-SSE measurement, the pre-existing scan/cache pyright-strict debt, and the wire-model optionality accuracy follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…0 errors) Type-only changes, behavior bit-identical (snapshot/signature tests unchanged): - compute_tree_signature: type tree_root as DirNode, _walk as TreeNode, narrow on the 'type' discriminator before touching children (files never recurse). - drop the dead 'if repo_info is not None' guard (_collect_repo_info always returns a RepoInfo). - model the derived CommitEntry.same_day_total as NotRequired on the internal TypedDict — it is absent at collection/cache-load and baked in-place by _annotate_same_day_totals at wrap; the wire Pydantic model keeps it required. - parse cached JSON via cast(dict[str, object], ...)/(list[object], ...) then isinstance-narrow each field, matching the existing _coerce pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Task-14 .npmrc (legacy-peer-deps=true, for openapi-typescript's stale peer range vs TS 6) reached local/compose npm ci but NOT the Dockerfile build, which COPYs only package.json + package-lock.json before npm ci — so the image build failed with ERESOLVE. Copy .npmrc alongside the manifests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- static.py: register the SPA catch-all via add_api_route (not a decorated nested fn), dropping the reportUnusedFunction ignore. Same for app.py's Scalar + error handlers (now module-level, registered by reference). Remove the cargo-culted reportUnusedFunction ignores on the module-level routes. - commit.py: _build_authors_list -> build_authors_list (public; it's used across modules) instead of suppressing reportPrivateUsage. No pyright ignores remain in api/. - routers/health.py -> routers/meta.py: the router holds health + config (both server-meta endpoints, already tagged 'meta'); the filename now matches the grouping instead of implying health-only. - cache.py: condense the long cache-version changelog comments (the per-bump archaeology lives in git history). - manifest_types.py: docstring now explains WHY it's a standalone leaf module (shared by scan.py + cache.py without an import cycle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#1/#2 clone progress + cancel: the SSE manifest stream now classifies the source without cloning, then clones on the worker thread — emitting the 'cloning' event FIRST and streaming git's stage/percent via on_progress, with the disconnect poll covering the clone. ensure_clone/_run_git_streaming gain a cancel_event: a cancel watcher kills git so a mid-clone disconnect aborts the clone instead of orphaning it. #3 unexpected scan errors are now logged (logger.exception) before the error event, not silently swallowed. #4 frontend: JSON.parse in the SSE listeners is guarded — a malformed/truncated frame rejects the iterator instead of hanging it forever. #5 /api/file media responses set Content-Encoding: identity so GZipMiddleware skips re-compressing already-compressed image/video/audio/pdf bytes. #6 single-source the {error} shape: _api_error_handler returns ErrorResponse; SSE error events go through ErrorEvent (no more hand-built dicts / dead model). #7 create_app no longer resets the process-global TRUST set (factory is now side-effect-free); tests isolate it via an autouse conftest fixture. #8 same_day_total: regression test asserts both skeleton and final emits carry it (NotRequired internally, required on the wire). Also drops the redundant final_holder['err'] guard. Backend 209 + pyright 0; frontend 2160 + typecheck/build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The wire event names + their Pydantic models + the frontend ScanPhase are
renamed end-to-end so each event says what it DELIVERS, never its position
('final' was wrong-by-design — it breaks if you add a stage):
cloning -> clone-progress (CloneProgressEvent / ScanPhase.CloneProgress)
scanning -> scan-progress (ScanProgressEvent / ScanPhase.ScanProgress)
skeleton -> manifest-partial (PartialManifestEvent/ ScanPhase.PartialManifest)
final -> manifest-complete (CompleteManifestEvent/ScanPhase.CompleteManifest)
error -> error (ErrorEvent / ScanPhase.Error)
In sync across: wire event names, scan_tree's internal phase strings +
ScanStreamEvent, api/models/events.py, the OpenAPI responses doc, the frontend
ScanPhase enum + listeners + all consumers (loadingReactions, useManifestSource),
tests, and the regenerated manifest.generated.ts. The separate user-facing
LoadingStep UI vocabulary is intentionally left as-is.
Backend 209 + pyright 0; frontend 2160 + typecheck/lint/build clean; verified
live (event: scan-progress -> manifest-partial -> manifest-complete).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ResolveError/Resolved + classify/resolve_local/resolve_source were defined inline in the manifest router, but they're framework-agnostic domain logic (no FastAPI), not Pydantic wire models — so they belong in the service layer beside scan/cache/clone/media, not in models/ (the OpenAPI-serialization layer) and not in the router. The router is now a thin handler that imports from the service; the resolution exception lives next to the code that raises it (like ScanCancelledError in scan.py, CloneError in clone.py). _resolve_local -> resolve_local (now crosses a module boundary). No behavior change; backend 209 + pyright 0, coverage retained via the existing route tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cache.py and clone.py each independently read CODECITY_CACHE_ROOT (with the same default), and clone's constant was confusingly also named CACHE_ROOT while actually meaning the clones/ subdir. Move the env-driven base to config.CACHE_ROOT (the env-settings home, beside MAX_FILE_BYTES/GZIP_MIN_BYTES) as the single source; cache.py imports it, clone.py derives CLONES_ROOT = CACHE_ROOT / 'clones'. Not 'pull from cache' — that would couple clone->cache and break the import-time monkeypatch; config is the shared owner both consume. Tests patch the per-module attrs (cache.CACHE_ROOT / clone.CLONES_ROOT) as before. Backend 209 + pyright 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The static file server's `GET /{full_path}` catch-all was showing up in
/api/openapi.json (and so in the Scalar docs) as a bogus API endpoint. Mark it
include_in_schema=False — it's the SPA fallback, not an API route. Serving is
unchanged (the flag only affects the schema). Regenerated manifest.generated.ts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ainer-safe The test container sets GIT_AUTHOR_NAME=Test Runner (for the test_scan git tests); that env var outranks the fixture's `git config user.name Tester`, so the commit's %an became 'Test Runner' and the author assertion failed in the container (passed locally where no such env var exists). --author on git commit outranks the env var, making %an deterministic.
No Python formatter existed (only pyright). Add ruff with Black-compatible defaults (88 cols, double quotes, magic trailing commas). New `ruff` service in docker-compose.test.yml runs `ruff format --check` reproducibly like pytest/vitest; wired into the pre-push hook (now 1..6) and CI so it gates like the frontend's prettier. `just fmt` applies it (local uv run, like gen-types). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One-time reformat of the never-formatted backend to ruff defaults (36 files). Purely cosmetic — 209 tests pass, pyright 0 errors, behavior unchanged.
The Pydantic models declared several always-emitted fields as optional (and several absent-or-value fields as nullable), so the OpenAPI/Scalar docs + generated TS didn't match what the scanner actually sends: - GitMeta.created/modified, RepoInfo.* -> required-nullable (always present, can be null): drop the '= None' default. - FileNode.media_width/height, Manifest.display_root, and the SSE event models' display_root/stage/percent/files_scanned -> optional-but-NON-nullable (absent-or-value, never null): keep the Optional Python type but emit a non-nullable JSON schema via pydantic.WithJsonSchema (shared OptionalInt/ OptionalStr aliases). Avoids the pyright-strict 'None not assignable to int' that a plain non-Optional default would trip. Generated TS now matches the hand-written manifest.ts exactly, so the drift guard is upgraded: pure-scalar types (GitMeta/RepoInfo/CommitEntry/ ExtBreakdownEntry/BusynessThresholds) now use DEEP type equality (catches optionality/nullability drift), not just key sets. Verified the deep guard bites. Backend 209 + pyright 0 + ruff clean; frontend 2160 + typecheck/build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /api/manifest SSE stream shipped uncompressed (Starlette's GZipMiddleware skips text/event-stream because it buffers, which would stall the live stream). Measured: a 379-file repo's manifest is 250 KB -> 28 KB gzip (8.8x); JSON compresses ~9x, so big repos win a lot. Add SSEGZipMiddleware: an ASGI middleware that gzips text/event-stream with a Z_SYNC_FLUSH after every event, so the skeleton/progress events still arrive early (Z_FINISH writes the trailer on the last body). Content-negotiated on Accept-Encoding (every browser sends gzip; raw sockets / odd proxies get identity). Browsers + httpx decode EventSource gzip transparently. gzip (stdlib, zero deps) over brotli — brotli would shave ~15-25% more but needs a dependency; swappable in the one middleware if that changes. Verified live (gzip magic bytes + round-trip) and via tests. Backend 211 + pyright 0 + ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the hand-rolled
http.serverbackend (~1k-lineapi/server.py) with a clean, layered FastAPI app on uvicorn. Same product behavior, much simpler internals:routers/over framework-agnosticservices/; Pydantic wiremodels/;security.py(theallowed_rootstrust set);static.py(SPA serving).GET /api/manifestis now Server-Sent Events (named events) instead of bespoke NDJSON + aselect()disconnect watchdog. Disconnect/cancel is handled by Starlette'srequest.is_disconnected().api/models/*.pyare the single source of truth → OpenAPI → generatedapp/src/types/manifest.generated.ts(with a compile-time drift guard). Kills the old hand-syncedmanifest.tsdrift./api/docs; OpenAPI at/api/openapi.json.security.py).Notable changes beyond the straight port
clone-progress,scan-progress,manifest-partial,manifest-complete,error(synced across wire ↔scan_treephases ↔ Pydantic models ↔ frontendScanPhase↔ generated types).clone-progressand abortable on disconnect.api/services/source.py; single-sourced the cache root inconfig.CACHE_ROOT(clone getsCLONES_ROOT).streamManifestrewritten fetch+NDJSON →EventSource, same async-iterable contract; malformed frames reject instead of hanging.Test plan
just test(api pytest + app vitest, in containers) — green (209 api, 2160 app).uv run pyright→ 0 errors;cd app && npm run typecheck && npm run build→ clean.just dev, load a git URL → cloning→scanning→skeleton→final overlay advances, city builds, file preview + commit pane work; load a local repo viajust dev <path>and confirm live-update poll.Breaking / follow-ups
cloning→clone-progress, etc.). Backend + frontend ship together, so no concern for this app's single-bundle deploy.TODO.md): streaming gzip over SSE (ships uncompressed first — measure on a large repo before adding); wire-model optionality accuracy (GitMeta/RepoInfoPydantic= Nonevs always-emitted).🤖 Generated with Claude Code