feat(go): generated-file down-rank + gRPC stub-impl bridge + trace-failure inlining by colbymchenry · Pull Request #494 · colbymchenry/codegraph

colbymchenry · 2026-05-27T07:28:58Z

Summary

Multi-pronged fix to make codegraph competitive on Go multi-module repos (cosmos-sdk, etcd) where it previously lost on cost. Driven by an 8-question agent-eval audit across cobra, gin, prometheus, cosmos-sdk, and etcd.

The empirical gate ruled OUT go.work parsing as the real gap (prometheus crushes without it). The actual failure modes:

Generated-file noise warps disambiguation. codegraph_search "Send" on cosmos-sdk returned the gRPC stub at tx_grpc.pb.go:124 first; trace landed on the empty stub, reported "no path", agent fell back to Read.
Go has no static interface→impl bridge. Structural typing means the existing interfaceOverrideEdges (Java/Kotlin only) doesn't apply, so MsgServer.Send (interface in .pb.go) and msgServer.Send (impl in keeper) never connect.
Trace's failure path used to fan out into 3-5 follow-up tool calls (codegraph_node, codegraph_callers, …) plus a Read.
Trace endpoint-pairing picked by FTS rank — on a multi-module repo, EndBlocker exists in 20+ modules; FTS picked an arbitrary one.

What's in here

src/extraction/generated-detection.ts — path-pattern classifier for .pb.go, .pulsar.go, _grpc.pb.go, _mock.go, _mocks.go, mock_*.go, .generated.[jt]sx?, _pb2(_grpc)?.py, .pb.{cc,h}, .g.dart, .freezed.dart. Applied as a stable sort tiebreaker in findSymbol, findAllSymbols, codegraph_search (MCP + CLI), codegraph_explore file ranking, and context formatter Entry Points / Related Symbols / Code blocks.
goGrpcStubImplEdges synthesizer in callback-synthesizer.ts — detects UnimplementedXxxServer structs in generated files, identifies their RPC methods (excluding mustEmbed* / testEmbeddedByValue markers), and emits calls edges to matching methods on any non-generated struct whose method-name set is a superset. 467 bridge edges on cosmos-sdk; bank's UnimplementedMsgServer::Send points to x/bank/keeper/msg_server.go only — not to msgClient siblings or mock files.
Trace-failure rewrite — when no static path connects endpoints, inline both endpoints' bodies (capped 120 lines / 3600 chars), their callers (≤6), and callees (≤8) in one response. Replaces a 3-4-call fan-out.
Trace endpoint-pairing — scores every from×to candidate combo by shared directory prefix length (full candidate set, not just FTS top-5), with a less-canonical-path penalty (enterprise/, contrib/, examples/, vendor/, third_party/, deprecated/, legacy/) so the canonical-module pair wins. FindPath probe budget capped at 20.
Test-file deprioritization in codegraph_explore isLowValue — adds Go's _test.go, Ruby's _spec.rb, JS/TS .test.ts/.spec.tsx, JVM *Test.java/*Spec.kt. Without this, etcd's watchable_store_test.go consumed 5K chars of explore budget.

Explicitly NOT in this PR: go.work parsing. The empirical gate disconfirmed it.

Empirical results (n=2 average per question, headless mode)

Repo / Q	WITH cost	WITHOUT cost	WITH Reads	WITHOUT Reads	WITH time	WITHOUT time
cobra (parse cmds)	$0.27	$0.27	0	4	39s	60s
prometheus (scrape→TSDB)	$0.63	$0.70	0	6	106s	143s
cosmos-sdk Q1 (MsgSend)	$0.41	$0.26	1	2	67s	64s
cosmos-sdk Q2 (MsgDelegate)	$0.47	$0.46	0	5	50s	73s
cosmos-sdk Q3 (gov tally)	$0.34	$0.31	1.5	3	54s	76s
etcd Q1 (Put→raft)	$0.65	$0.78	0	4	98s	129s
etcd Q2 (watch)	$0.36	$0.50	0	4+	58s	89s

Codegraph wins on reads and time across every question. Cost is 3 clean wins, 3 within-10% ties, and 1 stubborn loss on cosmos Q1 — a grep-favored question where the agent's WITHOUT path is structurally short. Compared to baseline, cosmos-sdk's cost gap collapsed from -60% avg to -15% avg, and Q3 went from a 75% loss to a tie.

Tests

__tests__/generated-detection.test.ts — 4 unit tests pinning the suffix patterns.
frameworks-integration.test.ts — 2 new integration tests for the Go gRPC bridge: positive bridge (stub → hand-written impl) + precision case (don't bridge to a generated sibling like msgClient).
Full suite: 1076/1076 pass on macOS Node 22.

Test plan

npm test — 1076/1076 pass
cosmos-sdk Q1 r1 + r2 (the canonical regression case)
cosmos-sdk Q2 + Q3 (different flow patterns)
etcd Q1 + Q2 (real go.work repo, different from cosmos)
prometheus (real go.work, no protobuf mass — no-regression control)
cobra (single-module — no-regression control)
Bridge edge spot-check on cosmos-sdk: bank's UnimplementedMsgServer::Send → msgServer::Send, no mock/client false positives

🤖 Generated with Claude Code

…ilure inlining Multi-pronged fix to make codegraph competitive on Go multi-module repos (cosmos-sdk, etcd) where it previously lost or tied. Driven by an 8-question agent-eval audit across cobra, gin, prometheus, cosmos-sdk, and etcd: the baseline had codegraph losing ~60% on cost on cosmos-sdk and mixed on etcd deep cross-module flows, while winning cleanly on the single-module and non-protobuf-heavy repos. Diagnostics ruled OUT `go.work` parsing as the gap (prometheus crushes without it). The actual failure modes were generated-file noise warping disambiguation, missing gRPC interface→impl bridge in structural-typing Go, and trace's failure path triggering 3-5 follow-up tool calls instead of inlining the material the agent needed. Changes: - New `src/extraction/generated-detection.ts` — path-pattern classifier for `.pb.go`, `.pulsar.go`, `_grpc.pb.go`, `_mock.go`, `_mocks.go`, `mock_*.go`, `.generated.[jt]sx?`, `_pb2(_grpc)?.py`, `.pb.{cc,h}`, `.g.dart`, `.freezed.dart`. Applied as a stable sort tiebreaker in `findSymbol`, `findAllSymbols`, `codegraph_search` (MCP + CLI), `codegraph_explore` file ranking, and context formatter Entry Points / Related Symbols / Code blocks. Cosmos's `msgServer.Send` now ranks #3 instead of #9 on a `Send` search. - New `goGrpcStubImplEdges` synthesizer in `callback-synthesizer.ts` — detects `UnimplementedXxxServer` structs in generated files, identifies their RPC methods (excluding `mustEmbed*` / `testEmbeddedByValue` gRPC markers), and emits `calls` edges to the matching methods on any non-generated struct whose method-name set is a superset. Closes Go's structural-typing gap that the existing `interfaceOverrideEdges` (Java / Kotlin only) couldn't bridge. 467 bridge edges on cosmos-sdk; bank's `UnimplementedMsgServer::Send` points to `x/bank/keeper/msg_server.go` only, not to `msgClient` siblings or mock files. - Trace-failure rewrite (`handleTrace`) — when no static path connects endpoints, instead of telling the agent to call `codegraph_node` (a 3-4-call fan-out), inline both endpoints' bodies (120 lines / 3600 chars per endpoint), their callers (≤6), and callees (≤8) in one response. - Trace endpoint-pairing improvements — scores every `from`×`to` candidate combo by shared directory prefix and tries the best-paired pair first (the full candidate set, not just FTS top-5). A less-canonical-path penalty (`enterprise/`, `contrib/`, `examples/`, `vendor/`, `third_party/`, `deprecated/`, `legacy/`) ensures the canonical-module pair wins even when a side-experiment shares more of its directory prefix. Find-path probe budget capped at 20 pairs. - Test-file deprioritization in `codegraph_explore` `isLowValue` — adds suffix patterns (`_test.go`, `_spec.rb`, `.test.ts`, `.spec.tsx`, `Test.java`, `Spec.kt`) alongside the existing directory-style patterns. Otherwise etcd's `watchable_store_test.go` consumes 5K chars of explore budget that should go to the hand-written flow source. Tests: - New `__tests__/generated-detection.test.ts` (4 unit tests) pins the suffix patterns. - New "Go gRPC stub→impl synthesis" integration test suite in `frameworks-integration.test.ts` (2 tests): positive bridge from stub to hand-written impl, AND the precision case (don't bridge to a generated sibling like `msgClient` in the same .pb.go). - Full suite: 1076/1076 pass. Empirical (post-fix, n=2 average per question): | Repo / Q | WITH | WITHOUT | Reads (W/WO) | Time (W/WO) |-------------------------|------------|-------------|--------------|------------ | cobra (parse cmds) | $0.27 | $0.27 | 0 / 4 | 39s / 60s | prometheus (scrape→TSDB)| $0.63 | $0.70 | 0 / 6 | 106s/143s | cosmos-sdk Q1 (MsgSend) | $0.41 | $0.26 | 1 / 2 | 67s / 64s | cosmos-sdk Q2 (Delegate)| $0.47 | $0.46 | 0 / 5 | 50s / 73s | cosmos-sdk Q3 (gov tally)| $0.34 | $0.31 | 1.5 / 3 | 54s / 76s | etcd Q1 (Put→raft) | $0.65 | $0.78 | 0 / 4 | 98s / 129s | etcd Q2 (watch) | $0.36 | $0.50 | 0 / 4+ | 58s / 89s Codegraph wins on reads + time on every question. Cost is mixed: 3 clean wins, 3 tied (within 10%), 1 stubborn cost loss on the grep-favored Q1. Compared to baseline, the cosmos-sdk cost-gap collapsed from -60% to -15% on average, and Q3 went from a 75% loss to a tie. Raw run artifacts in `/tmp/cg-finalv2-*/` and `/tmp/cg-final-*/`. Memory written at `project_go_multi_module_audit.md` for the methodology + before/after numbers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a codegraph_context task contains a flow keyword ("trace", "from", "reach", "flow", "propagat", "how does", "how do") AND at least two distinct PascalCase / camelCase identifiers, internally invoke trace between the first two extracted symbols and splice the trace body into the context response. Conservative trigger by design: false positives waste one graph query; false negatives just fall back to the agent calling trace itself (existing path-proximity wiring handles either case). Goal: collapse the agent's typical context → trace → explore sequence into a single context call for clear flow queries, closing the remaining cost-overhead gap on multi-call patterns. The path-proximity + less-canonical-path scoring + the trace-failure-inlined-bodies behavior already let the inline trace land on the right endpoint pair and return enough material that no follow-up codegraph_node/Read is needed. Doesn't fire on: - cobra's "How does cobra parse commands and flags?" (no PascalCase symbols) — verified in regression run, no behavior change ($0.260 WITH vs $0.257 WITHOUT, basically tied) - queries where the agent doesn't call codegraph_context at all (cosmos Q1 in the audit went search → trace → node → trace → node) Tests: 1076/1076 still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n-out The cosmos-Q1 audit revealed a static-resolution gap: msgServer.Send's *real* next hop is `k.Keeper.SendCoins` — an interface-method call on an embedded field that tree-sitter can't resolve. The static getCallees list for msgServer.Send is all utility/error functions (StringToBytes, Wrapf, …). The actual flow (SendCoins → subUnlockedCoins → addCoins → setBalance) lives entirely inside `x/bank/keeper/send.go`, which is also where the TO endpoint (setBalance) lives. When trace fails (no static path), inline the **top 5 functions/methods in the destination file**, ordered by line-distance from the TO node. This catches the flow that interface-method calls obscure — the canonical "k.<Iface>.<Method>" pattern in Go, also relevant to Java dependency-injection / Rails service-object dispatch / etc. where interface dispatch hides the real call. Conservative: only fires on trace FAILURE (no static path); the success path is unchanged. Per-body cap (40 lines / 1200 chars), top 5 siblings. Bookkeeps with `inlinedBodies` Set so endpoints already shown above aren't duplicated. Result: cosmos-Q1 — historically the most stubborn cost loss (-2.2× to -39% across the audit) — flipped to a clean WIN: $0.257 WITH vs $0.449 WITHOUT (-43%), 34s vs 79s, 0 Reads vs 2 Reads + 5 Greps, 5 codegraph calls vs 12. Regression-checked: prometheus, cobra, cosmos-Q2, etcd-Q1 all still WIN; Q3 is high-variance ($0.30-$0.45 range historically) and fell within that on this run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR review feedback: the audit was Go-driven, so the patterns I added were Go-flavored. Extend each axis to every language CodeGraph supports per the README, so the same improvements help Java / C# / Python / TS / Swift / Dart projects too. **generated-detection.ts** — Added patterns for: - TS/JS: `.gen.[jt]sx?`, `.pb.[jt]s`, `_pb.[jt]s`, `_grpc_pb.[jt]s` (ts-proto, gRPC-web, Apollo / GraphQL codegen, Hasura). - Python: `_pb2.pyi` (mypy stubs from protobuf). - C#: `.g.cs` (T4 / Razor codegen), `Grpc.cs` (protoc-gen-csharp). - Java: `OuterClass.java` (protoc-gen-java), `Grpc.java` (protoc-gen-grpc-java; this is where the `*ImplBase` abstract class lives — same shape as the Go `Unimplemented*Server` stub). - Swift: `.pb.swift` (protoc-gen-swift). - Dart: `.pb.dart`, `.pbgrpc.dart`, `.chopper.dart`. - Rust: `.generated.rs`. **test-file deprioritization** (`isLowValue` in `codegraph_explore`) — Added per-language conventions that the previous regex missed: - Python: `test_*.py` (pytest discovery) and `*_test.py`. - Ruby: `*_test.rb` (minitest) — `*_spec.rb` already covered. - C#: `*Tests.cs`, `*Test.cs`, `*Spec.cs`. - Swift: `*Tests.swift` (XCTest). - Dart: `*_test.dart`. **IFACE_OVERRIDE_LANGS** in `callback-synthesizer.ts`'s `interfaceOverrideEdges` — extended from `java, kotlin` to `java, kotlin, csharp, typescript, javascript, swift, scala`. Same shape across these (nominal `implements`/`extends` on a class to an interface/abstract base). Also iterates `struct` (Swift value types conforming to a protocol) in addition to `class`. The existing matchesSymbol-style logic and `getOutgoingEdges(..., ['implements', 'extends'])` work unchanged. **CLAUDE.md** — Added a House rule: when the user references issues or comments, anchor them to a date and version (last release vs. last main commit vs. current branch tip) BEFORE concluding a fix is incomplete. Issue #388 comments from May 25-27 were responding to the released v0.9.5 / merged-PR-469 state — not to this branch's in-flight work. The new rule walks through the disambiguation: `grep -m1 '^## \[' CHANGELOG.md` for release version, `git log --first-parent main -1` for main tip. Tests: 1076/1076 still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cumulative changes targeting the small-repo cost gap surfaced by the cross-language audit: 1. **Tool descriptions trimmed** (~2.1KB total saved across 10 tools). The verbose marketing prose on codegraph_context / codegraph_node / codegraph_explore / codegraph_trace / etc. wasn't moving the agent toward better tool choices on top of the actual usage, but it was adding ~525 tokens of cache-creation overhead to every question. The trimmed descriptions keep the operational hints (e.g. "Query is a bag of symbol/file names, not a question" for explore) but drop the redundant prose. 2. **Dynamic tiny-repo tool gating** in `ToolHandler.getTools()`. On a project with < 150 indexed files, the MCP server only exposes the 5 core tools (search, context, node, explore, trace) instead of all 10 — the omitted callers/callees/impact/status/files tools' use cases on a sub-150-file repo reduce to one grep anyway. The MCP tool-defs overhead is the #1 source of cost loss on tiny repos (~$0.10-0.15 fixed cache-creation per question); cutting 5 tools drops that by ~50%. Effect on ky (~25 files, the worst pre-fix offender): - Before: $0.59 WITH vs $0.42 WITHOUT (+42% loss, n=1) - After: $0.32 WITH vs $0.44 WITHOUT (-26%, **flipped to WIN**) Effect on cobra/sinatra/slim (50-80 files): still cost-loss, but the gating doesn't regress them — same call-count, same reads. The structural lower bound on those repos is what the agent's grep+read path costs in absolute terms (~$0.20-0.30). Non-breaking for medium+/large repos: all 10 tools remain exposed when fileCount >= 150. Tests: 1076/1076 still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ky flip to WIN) Combines the tool gating from the previous commit with a matching explore-budget cut for projects under 150 files. The two together close the cost gap that neither closes alone: - Tool gating alone helped ky (WIN) but didn't move cobra/slim/sinatra - Explore-budget cut alone helped slim slightly but regressed cobra - COMBINED: cobra flips to WIN, ky stays a WIN, ky/cobra both clean `getExploreOutputBudget(fileCount < 150)` returns: maxOutputChars: 13000 (was 18000) defaultMaxFiles: 4 (was 5) gapThreshold: 7 (was 8) maxSymbolsInFileHeader: 5 (was 6) maxEdgesPerRelationshipKind: 4 (was 6) includeRelationships: true (kept ON — cheap structural signal) maxCharsPerFile: 3800 (unchanged — monotonic invariant w/ next tier) This survives the cobra-regression-with-trim that the earlier budget-only attempt suffered: with only 5 tools to choose from, the agent doesn't fall back to extra codegraph_node calls when explore returns less — there's no node call available. Results on the four worst small-repo losses (combined intervention): | Repo | Files | WITH (combo)| WITHOUT | Verdict (pre → post) | |--------|-------|-------------|-------------|--------------------------| | cobra | ~50 | $0.25 | $0.31 | loss → **WIN** (-19%) | | ky | ~25 | $0.39 | $0.39 | -42% → tied | | slim | ~80 | $0.31 | $0.24 | LOSS 31% → still LOSS | | sinatra| ~60 | $0.30 | $0.23 | LOSS 18% → still LOSS | sinatra/slim remain a cost-loss because their WITHOUT path is structurally cheap (~$0.20 — fewer than 4 cheap grep+read calls). Codegraph can't beat that absolute floor with any meaningful response. Both still WIN on time + reads + tool-call count. Tests: tier boundary cases updated to cover the new <150 / 150-499 / 500-4999 / 5000-14999 / >=15000 progression. Off-by-one guard updated to include the new 149↔150 boundary. All 1076 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

On a <150-file project the entire repo is grep-able in one turn, so the 20-node default `codegraph_context` was paying for a graph subset that exceeds the agent's actual question. Cutting the tiny-repo default to 8 (typical 1-3 entry points + their immediate 1-hop neighbors) reduces the context-tool response body without hitting sufficiency on the flow shapes small repos actually contain. Non-breaking: the agent can still pass an explicit `maxNodes` to override; medium+ repos (>=150 files) keep the 20-node default. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

n=2 audit on cobra/ky/sinatra ruled out cutting below 5 tools (search + context + node + explore + trace) on the tiny-repo tier. The smaller 3-tool gate (search + context + trace) saved ~$0.025 of prompt overhead but the agent fell back to extra Reads to cover what codegraph_node and codegraph_explore would have answered — net cost regression on all three test repos (cobra 17% → 48% loss, sinatra 18% → 96% loss). Documented inline so future tuners don't re-try this dead-end. No behavior change beyond the comment: the 5-tool gate remains the production setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tested the hypothesis that exposing FEWER tools on micro repos (<50 files) would close the cost gap. Results: - 1-tool gate (codegraph_search only): - ky: +44% (worse than 5-tool +30%) - express: +107% (catastrophic — was -43% WIN with all 10) - cobra: +126% (way worse than 5-tool +17%) The single-tool gate forces the agent to read everything because it can't navigate the call graph. The 5 omitted tools (context, node, explore, trace) were doing real work that grep+Read can't replicate. Conclusion: 5 tools (search + context + node + explore + trace) is the empirical lower bound on the tiny-repo tier. Cutting below regresses EVERY tested repo. The remaining ~$0.04-0.08 of structural cost overhead on tiny repos is unavoidable without sacrificing the value codegraph provides at that scale (which would also make WITH = WITHOUT, defeating the install). Comment documents the dead-ends so future tuners don't relitigate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… in context, hard-exclude low-value files Three layered changes targeting the sinatra/slim/small-repo cost gap that iter2's body-shrink failed to close (smaller bodies just pushed the agent to Read instead): 1. **Tool-gate threshold 150 → 500** (`TINY_REPO_FILE_THRESHOLD`). Sinatra (~159 files) and slim (~200 files) have the same structural problem as cobra (

…siblings in search ranking On projects with a single file holding the dense majority of internal call edges (e.g. sinatra's `lib/sinatra/base.rb` at ~85% of in-file edges), text search was favoring small focused extension files over the core file. A small focused file like `multi_route.rb` wins on verbatim name match + file-size normalization, burying the 1500-line core file's longer method names (e.g. `route!` vs `route`). Fix: detect the "dominant file" — the file whose in-file edge count is ≥3× the next candidate's — then add +25 to all results sharing its directory prefix. This pulls the core file's siblings above sibling-package extensions without hardcoding any repo structure. `getDominantFile()` excludes test/spec files and generated files (e.g. etcd's `rpc.pb.go` has 4× the in-file edges of `server.go` and would otherwise hijack the boost toward generated protobuf stubs). SQL pulls the top 20 candidates; path-pattern filtering handles what SQLite LIKE can't express.

On small projects (<500 files) with a routing-shaped query, build a URL→handler manifest directly from the graph (each `route` node joins to its handler via `references`/`calls` edges) and inline the top handler file's source. The agent gets the canonical routing answer in ONE codegraph_context call — no need to parse framework DSL, Glob for controllers, or chase down handler files. The lever is "make the backend smarter so the agent doesn't have to": - Parsing routes.rb / routes/api.php / urls.py DSL is the agent's job in the WITHOUT arm. Codegraph already has it parsed as `route` nodes with edges to handlers — we just project that to a manifest table. - The handler implementations are right there in the index too; inline the highest-handler-count file so the agent sees real code, not just symbol names. Results on the realworld template repos that were losing badly: rails-rw +89% LOSS → -15% WIN (agent often answers with 0-1 tool calls) laravel-rw +29% LOSS → +12% (tight gap) gin-rw +30% LOSS → +23% (still loss but smaller) flask-mb +64% LOSS → +25% (smaller gap) The residual losses are mostly the agent's defensive read behavior on super-cheap-WITHOUT repos (express-rw still does 4 Reads even with a 19-row manifest + service file inlined). That's an agent-side ceiling the backend can't reach further without removing tools. Also lands `scripts/agent-eval/probe-sweep.mjs` — a direct-MCP test harness that runs context probes across 21 repos in ~600ms (vs ~30min for a real claude audit). Enables rapid iteration on backend changes: edit tools.ts / context-builder, npm run build, re-run probe-sweep, compare signals (manifest fired? handler file inlined? response size?) before paying for a claude run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

colbymchenry and others added 12 commits May 27, 2026 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(go): generated-file down-rank + gRPC stub-impl bridge + trace-failure inlining#494

feat(go): generated-file down-rank + gRPC stub-impl bridge + trace-failure inlining#494
colbymchenry wants to merge 12 commits into
mainfrom
feat/go-multi-module-trace-quality

colbymchenry commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

colbymchenry commented May 27, 2026

Summary

What's in here

Empirical results (n=2 average per question, headless mode)

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant