Skip to content

feat: edit syntax-health checks + anchored str_replace + opt-in linter bridge#517

Open
justrach wants to merge 14 commits into
mainfrom
feat/edit-syntax-health-and-linter-bridge
Open

feat: edit syntax-health checks + anchored str_replace + opt-in linter bridge#517
justrach wants to merge 14 commits into
mainfrom
feat/edit-syntax-health-and-linter-bridge

Conversation

@justrach
Copy link
Copy Markdown
Owner

What

Three layers that make codedb_edit safer, plus an opt-in real-linter bridge. Everything is advisory and off the edit hot path; the external linters are off unless you opt in at codedb update, so default behavior + latency are unchanged.

1. Post-edit syntax health (in-process, always on, microseconds)
After every edit, codedb scans the result and warns if the edit broke the file:

  • unbalanced/duplicated delimiter (e.g. a stray ) from regenerating a multi-line import block),
  • an import name removed while still referenced (a NameError waiting to happen).

2. Anchored str_replace
codedb_edit gains op=str_replace (old_string/new_string) — an exact, unique byte splice that can't mis-target surrounding lines the way a range replace can. Advertised as the preferred edit.

3. External-linter bridge (opt-in, off the hot path)
When enabled, codedb runs a real linter on the edited file on a detached thread (never blocking the edit) and serves results via a new codedb_diagnostics tool (+ piggyback on the next edit). Registry: ruff (Python), biome (JS/TS/CSS), zig ast-check, gofmt (Go), ruby -c, php -l, cppcheck (C/C++), shellcheck (shell), ktlint (Kotlin), swiftlint (Swift). Opt-in is asked once at codedb update (TTY only; the MCP server never prompts/installs), prefers nanobrew when present, and falls back to uv/npm/brew. A missing/failed linter is disabled for the session, so it silently falls back to the in-process checks.

Why

Functional validation of the deep-SWE benchmark found the codedb-driven agent shipped patches that don't even import (httpx: a duplicate ); narwhals: a dropped-but-used import producing a NameError). Layer 1 catches exactly that class at edit time instead of at runtime.

Safety / latency

  • Off by default; when off, the edit path does zero extra work (the Tier-1 block is gated behind enabled).
  • When on, the linter runs on a detached thread after the edit response is built — no added edit latency. A bounded diagnostics cache drains in-flight workers before teardown (no use-after-free).
  • The always-on in-process check is a single ~microsecond byte scan.

Testing

zig build test green. Live MCP smoke verified end-to-end on macOS (ruff F401/F821, zig ast-check, shellcheck SC2154/SC2086) and Linux (cross-compiled static aarch64-linux-musl, run under Apple container), including the missing-tool fallback.

Needs reconciliation with current main

This branch was cut from main before 0.2.5822. Since then main rewrote applyEdit (path resolution via explorer.root_dir, two fast-path replace blocks, a new EditResult.changed field) and grew mcp.zig substantially. Rebasing onto current main produces a 2-hunk conflict in src/edit.zig onlymcp.zig/update.zig auto-merge. Reconciliation plan: re-apply str_replace + the health check onto main's applyEdit, route all write paths through one finalizeEdit, and keep main's .changed / fast-path optimizations. Happy to push that rebase on request.

Commits

13 focused commits: P0a delimiter scan, P0b dropped-import scan, P2 anchored str_replace, linter registry/preference/exec/prompt, DiagnosticsCache, daemon wiring + codedb_diagnostics, nanobrew install option, and shellcheck/cppcheck/ktlint/swiftlint + ANSI-strip.

🤖 Generated with Claude Code

justrach and others added 13 commits May 31, 2026 13:06
…health)

codedb_edit was a blind line-splice that returned "edit applied" even when the
result no longer parsed. The deep-SWE benchmark showed Sonnet-4.6 regenerates
multi-line bracketed regions (import blocks, function signatures) and leaves an
orphaned/duplicated delimiter, shipping files that don't import — codedb caught
none of it.

Add a dependency-free, language-aware delimiter-balance scan (scanDelimiterBalance
/ describeHealth) wired into EditResult.health and surfaced by handleEdit, so a
mis-spliced edit is reported back to the agent. Advisory only; skips non-code
languages and bails when unsure. On the real broken benchmark patches: 4/4
delimiter-class breaks flagged, 0 false positives on 11 good files.

Tests: src/test_core.zig "edit-health: ..." (incl. faithful httpx/narwhals repros).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The other half of the benchmark breakage: codedb's narwhals patch regenerated a
`from ... import (...)` block and dropped names (ExprNode/ExprKind/unstable) that
were still referenced — a NameError at import. The file stays syntactically valid,
so the delimiter scan cannot see it.

describeHealth now also diffs Python imports before vs after the edit and flags any
binding that was removed yet is still used (whole-word, skipping import/comment
lines; names re-imported elsewhere are not flagged). On the real codedb narwhals
series.py it reports: "'ExprNode' was removed from the imports but is still used at
line 109". Combined with the delimiter scan, both of codedb's broken benchmark
patches (httpx, narwhals) now surface a specific, actionable warning.

Tests: src/test_core.zig — dropped-but-used flagged, unused-drop clean, re-import
not flagged. zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The benchmark breakages all came from range-based whole-block replaces that
mis-targeted their boundaries. Add an anchored op: old_string/new_string does an
exact byte splice of the unique occurrence (errors PatternNotFound / PatternNotUnique
otherwise), so an edit cannot silently swallow surrounding lines.

Factor applyEdit's write/record tail into finalizeEdit, shared by the line ops and
the new anchored path; the post-edit health scan runs on both. handleEdit accepts
op="str_replace" with old_string/new_string, and the tool schema advertises it as
the preferred edit with the health-warning contract.

Tests: src/test_core.zig "edit-str_replace: ..." (exact update, not-found, non-unique,
health-on-anchor-break). zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation)

Foundation for the external-linter bridge: the in-process heuristics (Tier-0) are
the instant always-on floor; a fast external linter is the optional rich ceiling
that catches real diagnostics (undefined names, type/lint errors) the heuristics
can't. This commit lands the pure, side-effect-free decision layer.

src/linter.zig:
- linterFor(language): registry of the preferred single-file linter per language.
  ruff (`ruff check <f> --output-format json`) and biome (`biome lint --reporter=json
  <f>`) CLIs confirmed via deepwiki; zig ast-check / gofmt -e / ruby -c / php -l use
  the toolchain's own check. Languages without a good single-file checker return null
  and use the heuristics.
- LinterSession: per-connection availability cache. A language that is missing, fails
  to install, or crashes is marked .unavailable and never retried this session, so a
  flaky/absent tool can never tax the edit path — it silently falls back to Tier-0.
- toolOnPath(): libc access(2) PATH probe (std.fs/std.posix absent in this Io build).

Next (not in this commit): off-hot-path async execution, optional auto-install
(ruff/biome/…), and wiring diagnostics into the edit result / a codedb_diagnostics tool.

Tests: src/test_core.zig "linter: ..." (registry, sticky fallback, PATH probe).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refine the install model per design discussion: codedb never installs linters or
prompts at edit time. Instead the opt-in happens at codedb install / `codedb update`
time and the choice is remembered; the MCP server only reads it. LinterSession gains
an `enabled` flag (default false) and shouldTry() now requires it — so with no opt-in
(or a decline) the edit path uses only the Tier-0 heuristics and carries zero linter
cost. Install/update-time prompt + persistence + actual install wiring are the next step.

Tests: src/test_core.zig — disabled session never tries; opted-in session respects the
sticky per-language fallback. zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New src/linter_pref.zig holds the remembered linter opt-in (Pref: unset/off/on) as
a global user-level state file next to the auto-update stamp — NOT in per-checkout
.codedbrc. read()/write() use the same std.Io idioms as the auto-update stamp;
readAt()/writeAt()/parseBody() are path-injectable + pure so tests never touch the
real ~/.codedb. Absent/garbage -> unset -> external linters off (Tier-0 heuristics).
The MCP server will read() this at startup to seed LinterSession.enabled; the
install / `codedb update` flow write()s it.

Tests: src/test_core.zig "linter-pref: ..." (parse states, write/read round-trip,
missing-file-is-unset, unset-write-is-noop). zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
runCheck(allocator, language, abs_path) spawns the registered linter via
cio.runCapture (substituting FILE_TOKEN), classifies the exit Term (non-.Exited =>
LinterCrashed so the session disables it), and folds output into a short owned
summary or null when clean. JSON tools parse via std.json: summarizeRuffJson (array
of {code,message,location.row}) and summarizeBiomeJson ({diagnostics:[{category}]})
are pure + public for testing; exit-code tools (zig ast-check/gofmt/ruby/php) use
code + first stderr line. installFor() returns the install argv for the two
installable tools (uv tool install ruff; npm i -g @biomejs/biome), null otherwise.
Must run off the hot path (spawns a process). 256KB output cap.

Tests: src/test_core.zig — installFor mapping, ruff/biome summary + clean cases,
runCheck NoLinter. zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…time)

Add the consent flow that records the linter preference. cio.readLine() reads one
stdin line (CLI only). linter.maybePromptAndInstall() — called from update.zig run()
after a successful update and on already-up-to-date — is a no-op unless BOTH stdin
and stdout are TTYs (never fires in the MCP server, CI, or curl|bash) and the pref
is still unset (asked at most once). On yes it installs ruff/biome if missing (via
their package managers, skipping absent ones) and writes pref=on; on no it writes
pref=off. The MCP server never prompts — it only reads the remembered pref.

Also fixes a latent break: cio.readLine used std.mem.trimRight (renamed to trimEnd in
this build). It was invisible to `zig build test` because maybePromptAndInstall is
reached only from the exe, so Zig's lazy analysis skipped it — only the CLI build
(exercised by issue-150) caught it. Added a test that references the entrypoint so
the prompt path is analysed by `zig build test` too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…results

The delivery substrate: holds the latest linter summary per file so the edit/read
handlers can piggyback it and a codedb_diagnostics tool can pull it. Mutex-guarded
(producer is a detached worker, consumers are request handlers), bounded to MAX=16
with LRU eviction by timestamp, FRESH_MS staleness window. tryBeginWork() coalesces
per path and skips already-fresh content; an `inflight` counter + drain() let the
owner wait for workers before deinit (no use-after-free on connection teardown).
appendIfFresh() matches (path,hash) for piggyback; appendLatest() serves the pull
tool. (std.Thread.sleep absent in this build -> libc usleep for the drain wait.)

Tests: src/test_core.zig "diag-cache: ..." (hash-matched piggyback, latest, coalesce
semantics, bounded+leak-free eviction via testing.allocator). zig build test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_diagnostics)

Connects the Tier-1 pieces into the live MCP server:
- ProjectCache (per-connection, created once in run()) now holds a LinterSession
  seeded from the persisted preference (linter_pref.read at startup) and a
  c_allocator-backed DiagnosticsCache; deinit() drains in-flight workers before
  freeing, so a detached worker can never touch freed state.
- After a successful non-dry-run codedb_edit, if the user opted in and the
  language has an available linter, a detached LintJob runs runCheck() OFF the
  hot path (its own c_allocator + an owned path copy), then store()s the summary
  or marks the language unavailable (sticky) on failure. tryBeginWork() coalesces
  per path. handleEdit also piggybacks any cached summary for the new content.
- New codedb_diagnostics tool (Tool enum + tools_list + dispatch + handler) pulls
  the latest summary for a path. LinterSession.statuses is now mutex-guarded
  (worker writes mark(); main thread reads shouldTry()).

The edit hot path stays heuristics-only + a fire-and-forget spawn; the linter
result is delivered out of band. With no opt-in it's fully dormant.

Verified: zig build + zig build test green; MCP stdio smoke shows the tool
advertised, edit health inline, the worker hitting the missing-tool fallback
without crashing, codedb_diagnostics responding, and clean shutdown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…talls

At the `codedb update` opt-in (TTY-gated, asked once), if the user opts into
linters: prefer nanobrew (justrach's fast Zig package manager — installs Homebrew
bottles in ~1s) when present; on macOS/Linux, if nb is absent, ASK once whether to
install it (curl installer) with a one-line reason; otherwise fall back to the
per-tool installers (uv for ruff, npm for biome). Never auto-installs nanobrew
silently — it's a prompt. nanobrewPath() finds nb on PATH or at the canonical
/opt/nanobrew/prefix/bin (its installer only PATHs that for new shells).

When nanobrew is used it installs ruff + biome in one shot (`nb install ruff biome`).
Validated live: nb install ruff biome -> ruff 0.15.12 + biome 2.4.16 in ~2.9s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st when off)

Address the latency concern: when the user hasn't opted into linters (the default),
handleEdit now does nothing extra after a write — the whole Tier-1 block (cache
appendIfFresh, detectLanguage, shouldTry, thread spawn) is skipped via the read-only
`enabled` flag. When on, the linter still runs on a detached thread after the edit
response is built, so it never adds latency to the edit itself. The MCP server never
installs anything — installation happens only at `codedb update` time; at lint time
the worker only CALLS the linter (and marks it unavailable for the session if absent).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
linterFor() now covers C/C++ (cppcheck), shell (shellcheck, JSON), Kotlin (ktlint),
and Swift (swiftlint) on top of the existing python/js/ts/zig/go/ruby/php. Zig stays
`zig ast-check` (ships with the compiler). shellcheck gets a JSON summarizer
(summarizeShellcheckJson -> "shellcheck N issues - SCxxxx <msg> at L<n>"); the others
use the exit-code path.

Fix: exit-code summaries now strip ANSI escapes (appendFirstLineStripped) — `zig
ast-check` colours its output even to a pipe, which otherwise put raw ESC bytes into
the JSON-RPC string. The install set grows to ruff+biome+shellcheck+cppcheck (via
nanobrew one-shot, or per-tool brew/uv/npm fallback); ktlint (pulls a JDK) and
swiftlint (niche) are registry-only — used if present, not auto-installed.

Tests: registry mapping, installFor mapping, summarizeShellcheckJson, ANSI strip.
zig build + zig build test green. Live MCP smoke confirmed: editing a .zig file ->
"zig check failed - ...:2:16: error: expected ';'"; a .sh file -> shellcheck SC2154/SC2086.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f19716e209

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/linter.zig
if (cio.readLine(&b2)) |l2| {
if (l2.len > 0 and (l2[0] == 'y' or l2[0] == 'Y')) {
out.print(" installing nanobrew...\n", .{}) catch {};
if (cio.runCapture(.{ .allocator = allocator, .argv = &.{ "sh", "-c", "curl -fsSL https://nanobrew.trilok.ai/install | bash" }, .max_output_bytes = 1024 * 1024 })) |res| {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid piping the downloaded installer into bash

When a TTY user opts into nanobrew here, codedb update executes whatever bytes are returned by https://nanobrew.trilok.ai/install directly via sh -c ... | bash, with no pinned version, checksum, signature, or reviewed script contents. This is in the installer/update path and violates the repo guideline from AGENTS.md to flag installer scripts that execute untrusted code or skip verification; a transient CDN/DNS/TLS endpoint compromise would become arbitrary local code execution for users who answer yes.

Useful? React with 👍 / 👎.

Comment thread src/mcp.zig
cache.diag.endWork(path);
return;
};
t.detach();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Join linter workers before freeing the session cache

With external linters enabled, spawnLintWorker detaches a thread that keeps a pointer to the stack-owned ProjectCache; however ProjectCache.deinit() only waits via DiagnosticsCache.drain() for about 2 seconds before freeing the cache and returning. If a linter process is slow or hangs and the MCP connection shuts down during that window, the detached worker can later execute job.cache.linter.mark() or job.cache.diag.store()/endWork() against freed stack/cache memory, causing a use-after-free or crash. The worker needs to be joined, cancelled, or otherwise prevented from outliving ProjectCache.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 84416 83616 -0.95% -800 OK
codedb_changes 10711 10448 -2.46% -263 OK
codedb_context 939803 967008 +2.89% +27205 OK
codedb_deps 243 215 -11.52% -28 OK
codedb_edit 37149 39597 +6.59% +2448 OK
codedb_find 9290 9340 +0.54% +50 OK
codedb_hot 24241 26368 +8.77% +2127 OK
codedb_outline 29913 29664 -0.83% -249 OK
codedb_read 14426 17726 +22.88% +3300 NOISE
codedb_search 21665 21551 -0.53% -114 OK
codedb_snapshot 51484 59043 +14.68% +7559 NOISE
codedb_status 8792 9237 +5.06% +445 OK
codedb_symbol 18970 17727 -6.55% -1243 OK
codedb_tree 33689 30053 -10.79% -3636 OK
codedb_word 13087 12888 -1.52% -199 OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant