Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ development sessions. Solutions are reusable; specs are per-feature.
- [Fixed-offset asset resolvers break on bundle collapse](solutions/best-practices/fixed-offset-asset-resolvers-break-on-bundle-collapse.md) — after a tsup collapse flattens `dist/commands/x.js`→`dist/x-<hash>.js`, EVERY `import.meta.url` resolver with a fixed `..` offset shifts by a level. #189 fixed only doctor.ts; 6 others shipped broken ~5 days (init/ci-init/setup/betterleaks + 2 ingestion WASM resolvers), two SILENTLY (`analyze` → 0 symbols, exit 0). Convert all to walk-up probes; the CLI bundles deps from `dist/` so rebuild in dep order; verify from a packed-tarball global install, not the hot node_modules.
- [Silent degradation needs a run-level guard](solutions/best-practices/silent-degradation-needs-a-run-level-guard.md) — a `catch→warn→continue` loop hides a GLOBAL stage failure: every item fails identically, the run exits 0 with empty output. Fix is a PAIR — (a) a thrown sentinel distinguishing global-init failure from per-item failure (set `.name`, Piscina drops the prototype across threads), (b) a run-level "processed N but produced 0 expected outputs" guard with a distinct exit code. Pick the denominator carefully (exclude regex-provider/placeholder output) and make smoke tests assert a KNOWN symbol, not "any non-empty result".
- [Squash-merge masks pre-existing repo-wide debt](solutions/best-practices/squash-merge-masks-pre-existing-debt.md) — first action on a fresh branch from main is `mise run check` BEFORE starting work; lint rules / transitive deps / cross-package test assertions drift across squash boundaries even when per-commit gating was green inside the prior PR.
- [CLI↔MCP parity via one shared core + a split parity test](solutions/best-practices/cli-mcp-parity-via-one-shared-core-plus-split-test.md) — parity = both surfaces call ONE `@opencodehub/analysis` core with identical query args (copy `detect_changes`), never per-surface logic. A cross-package test (importing `cli/src` from `mcp/src`) breaks the mcp `rootDir`; SPLIT it — CLI asserts `--json` deep-equals the raw result, MCP asserts `toStructured` recases losslessly. Adding a tool trips `server.test.ts`'s `EXPECTED_TOOL_NAMES` + `length===N` roster contract by design — bump both.
- [No spec-coordinate leakage into source](solutions/best-practices/no-spec-coordinate-leakage-into-source.md) — ERPAVal `AC-*`, `M-*`, `W-*`, `CL-*` prefixes belong in commits, PR bodies, ADR refs sections — NOT in JSDoc, inline comments, CLI flag help, MCP tool descriptions, or test names. Sweep `rg -n "AC-[A-Z]-[0-9]" packages/` before every PR-open; LLM clients pick up the leakage and start citing it back.
- [release: published events need PAT or inline](solutions/conventions/release-published-event-needs-pat-or-inline.md) — release-please-action with default `GITHUB_TOKEN` does NOT fire downstream `release: [published]` workflows; inline asset-attach in `release-please.yml` gated on `steps.release.outputs.release_created`. Fixed AC-D-4; sbom.yml has same latent bug for follow-on.
- [Dogfood pre-push hook catches CLI spec drift on first push](solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md) — the first `git push` of the commit that adds a self-targeting pre-push hook is where spec/CLI-flag mismatches and "missing index" foot-guns surface. Pattern: SKIP-with-message shape from `pack-determinism-audit.sh` for any gate that depends on a derived artifact.
Expand All @@ -57,6 +58,7 @@ development sessions. Solutions are reusable; specs are per-feature.
- [Collapse parallel switches into a Record registry](solutions/architecture-patterns/collapse-parallel-switches-into-record-registry.md) — when 2+ functions each switch over the same closed union (one per derived attribute), fold them into `Record<Union, Entry>`. tsc preserves exhaustiveness, the functions become one-line lookups, honest `| null` replaces placeholder lies, and ONE table-driven test with a `Record<Union, Expected>` fixture pins every (key, attribute) pair — better coverage than the zero direct tests the switches had.

- [A vendored-artifact dep bump must re-vendor in the same PR](solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md) — bumping a dep that has a committed vendored artifact (`web-tree-sitter` → `vendor/wasms/*.wasm` + manifest) without re-running the vendor script passes ALL CI (the `prepublishOnly` guard isn't a CI step) and detonates at `npm publish`, aborting the dependency-ordered multi-package release mid-stream. Scan Dependabot consolidations for vendored-artifact deps and re-vendor before merge.
- [tokenizerId is provenance, not an encoder](solutions/conventions/tokenizer-id-is-provenance-not-an-encoder.md) — the `openai:o200k_base@tiktoken-0.8.0` pin in `@opencodehub/pack` is a reproducibility label, NOT a tokenizer; chonkie's `'character'` tokenizer does the pack's counting. For real "token cost / tokens saved" numbers, add a real encoder: `change-pack` ships `gpt-tokenizer` (pure-JS, MIT, zero-dep) via the `…/encoding/o200k_base` subpath for synchronous deterministic counts, char heuristic only as a throw-fallback. Pick pure-JS gpt-tokenizer over native/WASM `tiktoken` to honor the no-native-binding rail (ADR 0015).

- [npm trusted publisher matches the ENTRY workflow, not the reusable one](solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md) — npm OIDC matches "Workflow filename" against the workflow that INITIATED the run, not the one running `npm publish`. With `release-please.yml` → `workflow_call` → `release.yml`, register `release-please.yml`. Wrong registration silently 404s the token exchange; only `workflow_dispatch` runs (entry = release.yml) ever publish, so npm lags the tags. Config is web-UI-only, passkey-gated, one entry per package (17 here).

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
name: cli-mcp-parity-via-one-shared-core-plus-split-test
description: To give a capability true CLI<->MCP parity in OCH, BOTH surfaces must call one shared @opencodehub/analysis core function with structurally identical query args — never reimplement logic per surface. The CLI emits the raw camelCase result via JSON.stringify; the MCP tool recases a PARTIAL set of keys to snake_case (top-level + impacted_subgraph/cost_attribution interiors only; nested array elements + verdict stay camelCase). A cross-package test importing cli/src from mcp/src breaks the MCP package rootDir — so SPLIT the parity proof: CLI test asserts --json deep-equals the raw pack (passthrough), MCP test asserts toStructured recases losslessly (recase-back == raw). Also: adding a tool trips the server.test.ts roster contract (EXPECTED_TOOL_NAMES + length assertion) by design — bump both.
metadata:
type: best-practice
category: best-practices
tags: [cli, mcp, parity, change-pack, rootdir, tool-roster, contract-test, snake-case]
discovered: 2026-06-14
session: session-6afa8d
related:
- tokenizer-id-is-provenance-not-an-encoder
---

# CLI<->MCP parity: one shared core + a split parity test

## The pattern that gives parity

Parity is NOT "two surfaces that happen to format the same." It is: both the CLI
command (`packages/cli/src/commands/X.ts`) and the MCP tool
(`packages/mcp/src/tools/X.ts`) delegate to ONE exported
`@opencodehub/analysis` (or `pack`) core function, passing structurally
identical query args. The cleanest exemplar is `detect_changes` →
`runDetectChanges`. Copy it. The core is the single source of truth; the
surfaces only resolve args, open a store, and format.

Surface conventions to match:
- CLI: commander v15, per-command `--json` that does `console.log(JSON.stringify(result, null, 2))` (raw camelCase, no reshaping). Reuse `pack.verdict.exitCode` (0|1|2) so CI gates match `codehub verdict`.
- MCP: `withStore` + `withNextSteps` + `stalenessFromMeta` + `toolErrorFromUnknown`, registered in `server.ts`, NO `outputSchema`. The structuredContent uses a PARTIAL snake_case recasing — only the top-level keys plus the interiors of nested objects the tool explicitly recases; array element objects and pass-through objects (e.g. the verdict) stay camelCase. Document the exact map.

## The split parity test (the non-obvious part)

A single test that imports BOTH `cli/src/...` and the MCP tool from inside the
mcp package fails `tsc` with TS6059/TS6307: `cli/src/*` is not under the mcp
package's `rootDir`. Don't fight it. Split the proof so each half lives in its
own package and together they're airtight:

- **CLI package test**: assert `--json` output deep-equals the analysis result
fixture → proves the CLI is a pure passthrough (no reshaping).
- **MCP package test**: assert `mcpToCamel(toStructured(fixture))` deep-equals
the fixture → proves the recasing is LOSSLESS (drops no field, mis-spells no
key). Export `toStructured` from the tool module so the test can call it.

Pure passthrough + lossless recasing of the SAME core output ⇒ identical values
on both surfaces. Drive both hermetically via the `_`-prefixed seams
(`_openStore`/`_runChangePack` on the CLI; the analysis core is deterministic and
fails-open on a fake store) — no git, no DB, no process spawn.

## The roster contract will fail — that's intentional

`packages/mcp/src/server.test.ts` pins `EXPECTED_TOOL_NAMES` (sorted) AND
`assert.equal(registered.length, N)`. Registering a new tool makes both fail.
This is the guard against a tool silently dropping out of `buildServer`. Adding
a tool means: add the wire-name to `EXPECTED_TOOL_NAMES` and bump the length.
Also sweep for stale "N tools" counts in CLAUDE.md / AGENTS.md / docs prose
(those are not gated but go stale).

## Why this matters

Per-surface logic drifts the moment one side gets a fix the other doesn't. One
shared core makes drift structurally impossible; the split test makes the
serialization layer (the only place they CAN diverge) provably equivalent
without violating the package boundary.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
name: tokenizer-id-is-provenance-not-an-encoder
description: The `openai:o200k_base@tiktoken-0.8.0` tokenizerId string threaded through @opencodehub/pack is a PROVENANCE LABEL, not an enforced encoder — @chonkiejs/core's default 'character' tokenizer (1 char = 1 token) does the pack's counting. A feature needing real "tokens saved / token cost" numbers must add a real encoder; change-pack ships gpt-tokenizer (pure-JS, MIT, zero-dep) via the isolated `gpt-tokenizer/encoding/o200k_base` subpath for synchronous deterministic o200k_base counts, with a len/4 char heuristic only as a throw-fallback. Prefer pure-JS gpt-tokenizer over native/WASM tiktoken to honor the no-native-binding rail (ADR 0015).
metadata:
type: convention
category: conventions
tags: [tokenizer, pack, chonkie, cost-attribution, change-pack, determinism, estimate]
discovered: 2026-06-14
session: session-6afa8d
related:
- tsup-collapse-monorepo-to-single-cli
---

# tokenizerId is provenance, not an encoder

## The trap

`@opencodehub/pack` threads a `tokenizerId` like `openai:o200k_base@tiktoken-0.8.0`
through the manifest and into `buildAstChunks`. It LOOKS like the pack counts
tokens with the o200k_base encoder. It does not.

- `ast-chunker.ts` explicitly "does not interpret" the tokenizer id (only uses it
to pick `determinismClass`: `anthropic:` → `best_effort`, else `strict`).
- Grep the whole `pnpm-lock.yaml` for `tiktoken|gpt-tokenizer|o200k` → zero hits.
`@chonkiejs/token` (the HF-tokenizer add-on) is not installed.
- chonkie's `CodeChunker` defaults to the `'character'` tokenizer (1 char = 1
token); the pack passes only `{language, chunkSize}`, never a tokenizer. The
degraded path uses `Math.max(1, Math.ceil(len / 4))`.
- `budgetTokens` maps straight to chonkie's `chunkSize` (per-chunk cap) — there
is no binary-search-to-fit and no total-pack budget enforcement.

So the tokenizerId is a reproducibility *label* (it correctly busts `packHash`
when it changes, which is its real job) — not a guarantee that any o200k_base or
Anthropic encoder ran.

## How to apply

When a feature needs token counts (cost attribution, budget trimming, "tokens
saved"):

1. **The pin is not a counter — add a real encoder if you want real tokens.**
`change-pack` ships `gpt-tokenizer` (v3.4.0, MIT, **pure-JS, zero-dep**) and
counts via `import { encode } from "gpt-tokenizer/encoding/o200k_base"`. The
encoding subpath bundles its BPE ranks inline, so `encode` is **synchronous
and deterministic** — no async rank fetch to break byte-identity.
2. **Pick pure-JS over native/WASM `tiktoken`.** The headline package
`tiktoken` is a WASM/native binding; OCH's no-native-binding-at-the-
npm-distributed-boundary rail (ADR 0015) rules it out. `gpt-tokenizer` is the
rail-compatible way to get the SAME o200k_base counts.
3. **Keep a heuristic FALLBACK, not as the primary.** `countTokens` wraps the
encoder in try/catch and falls back to `max(1, ceil(len/4))` only on
pathological input that throws — so cost attribution never crashes the pack,
but the normal path is real model tokens. Record the basis:
`estimate: false`, `tokenizerModel: "openai/o200k_base"` (fold it into the
content hash so a tokenizer swap changes the hash).
4. **Compute the baseline auditably from the graph**, not from a borrowed
marketing percentage — sum real tokens over every File node in the impacted
subgraph as the "agent reads each file blind" baseline.

## Why this matters

A cost feature that silently reports character counts as "tokens" is worse than
no number — it reads as authoritative and is wrong by a model-specific factor.
The honest options are (a) a clearly-labeled heuristic or (b) a real encoder.
`change-pack` started at (a) and shipped (b) the same session — the contract
shape (`estimate`/`tokenizerModel` fields) was built wide enough that swapping
the basis touched only the counter + the field values, never the structure.
When you do add a tokenizer, prefer a **pure-JS** one if the repo bans native
bindings, and import the **isolated encoding subpath** so counting stays
synchronous and deterministic.
Loading
Loading