Review fixes#1
Merged
Merged
Conversation
… review findings
Task-handling (model dispatch) overhaul:
- Add agents/nexum-impl-opus.md so needs-strong step content is delegated to an
Opus-tier executor instead of forcing the whole orchestrator onto Opus.
- Batch steps by tier into one warm executor dispatch (dispatch_granularity:
group) instead of one cold start per step; executors self-run guardrail.py and
return its verdict, so the orchestrator skips a round-trip.
- Gate the reviewer to escalation / needs-strong / many-file steps; retry ladder
is one same-tier patch attempt then escalate. Orchestration no longer assumes
Opus. New config: dispatch_granularity, max_same_tier_retries.
Metered cost + cache-aware savings:
- Snapshot Claude Code's own cost.total_cost_usd into a session_cost table; the
cost report prints this cache-accurate total beside the per-tier breakdown.
- Weight dedup savings by dedup_cache_weight (repeated reads bill at cache-read
rate); record raw + effective tokens.
- test_determinism.py / test_metering.py.
Review-finding fixes:
- truncate.py: drop unreachable hard-cut else branch.
- audit.py: drop condition subsumed by the "# nexum" prefix check.
- store.py: corruption fallback uses a shared-cache in-memory DB (held open by a
module keeper) so separate db() calls share state instead of silently becoming
no-ops; drop the no-op foreign_keys pragma.
- dedup.py: measure pointer-collapse savings from the recorded shrunk token count
(what the model actually saw), not the original output.
- scan_guard.py: fix _under_deny normalization (lstrip("./") mangled dot-leading
paths like .git); remove the now-redundant raw fallback and dead FLAGS_WITH_VALUE.
- context_watch.py: derive task type over sorted words so the intent-guard
decision is stable across PYTHONHASHSEED; collapse identical if/else branch.
- hooks.json: drop redundant truncate.py PostToolUse hook (dedup re-applies shrink).
…levers - Handoff: auto-skeleton writer (scripts/handoff.py) + /nx-save and /nx-load commands; context_watch auto-writes a resume handoff past the threshold. - Rename commands nexum-* -> nx-* (audit/build/plan/status) and add nx-save/nx-load. - scan_guard: shlex-based tokenizer so quoted args don't evade the grep guard; PreToolUse Read-guard injects limit/offset for large files. - dedup: gate truncate/dedup savings behind a per-session self-test, since PostToolUse updatedToolOutput is ignored for built-in tools on current CC. - statusline: capture real metered cost/context size. - README/SPEC: honest description of which context levers work today. - Scratch notes: HANDOFF-hook-investigation.md, nexum-review.md.
The auto-handoff never fired in practice. context_watch drove the threshold off max(prompt-text estimate, real_context_tokens flag): the estimate only counts the typed prompt (never reaches 100k), and the flag is written by the statusLine hook — which runs the cache copy lacking that write and can resolve a different data dir. So token_total stayed tiny and no handoff was written. - context_watch: read the REAL context size directly from the session transcript (input + cache_creation + cache_read of the last usage block) via new store.context_tokens_from_transcript; fall back to the estimate/flag only when the transcript is unavailable. Removes the statusline/data-dir/flag chain. - Re-arm the handoff/compaction nudges when context drops back below the threshold (e.g. after /clear), instead of warning once per session forever. - Handoff message now explicitly says to run /clear (or a fresh session) then /nx-load. - handoff.write_skeleton + /nx-save + /nx-load resolve a project-scoped data dir (store.project_data_dir: $CLAUDE_PLUGIN_DATA, else git-root/.nexum-data, else cwd/.nexum-data) so writer and reader always agree per-project. - Tests for the transcript reader, project_data_dir, transcript-driven handoff, and re-arm. Full suite: 269 passed.
Three optimizations folded into 0.3.0: - predup.py (PreToolUse): deny an identical repeated Read/Grep/Glob (and optional read-only Bash) call already seen this session, with an mtime guard for Read. A PreToolUse deny is honored (unlike the inert PostToolUse shrink), so the avoided re-injection is recorded ungated and the status-line "saved" figure finally moves. Backed by a new input-keyed tool_calls table in store.py, populated by dedup.py on first occurrence. - plan_preview.py: /nx-build prints a projected per-tier cost vs all-opus baseline before dispatching, so routing savings are visible up front. - resume_nudge.py (SessionStart): one-line "run /nx-load" hint when a fresh handoff matches the current branch; nothing auto-loads. Wire both new hooks in hooks.json, document in README/CHANGELOG, and sync marketplace.json to 0.3.0 (was 0.2.1, failing check_version). Full suite green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.