Skip to content

Model/effort default recommendations from compare data (design-gated) #607

Description

@iamtoruk

Model/effort default recommendations from compare data (design-gated)

Part of the acting-layer epic. Blocked by the journal and --apply issues. Additionally gated: post a short design comment on this issue (recommendation thresholds, exact settings keys verified against current Claude Code docs, rollout plan) and get it approved before writing code.

Idea

compare already computes per-model one-shot rate, cost per edit turn, and per-category (classifier) splits from real local history. When one model demonstrably matches another's edit reliability at a fraction of the cost for a given project's workload, CodeBurn can recommend, and optionally apply, a per-project default model change. This is the no-proxy version of "routing": it changes a default in config, never a live request, and the user sees exactly what changed.

What to build (after design approval)

Recommendation engine (src/act/model-defaults.ts)

Using src/compare-stats.ts aggregates per project over the last 30 days:

  • Consider a pair (current dominant model A, candidate B) only when both have at least 30 edit turns in the project (raise if the design review says so).
  • Recommend B when: B one-shot rate >= A one-shot rate minus 3 percentage points, and B cost per edit turn <= 60% of A's, computed on this project's own turns.
  • Never recommend across capability tiers for Debugging-heavy projects: if the project's classifier split is more than 40% Debugging, require B one-shot rate >= A (no tolerance).
  • Output: recommendation objects with the full evidence (turn counts, rates, costs) for display.

Surface

  • codeburn compare (non-interactive summary path) and codeburn optimize gain a low-key recommendation block listing qualifying projects with evidence, and the apply hint.
  • codeburn act apply-model <project>: writes the model default into <project>/.claude/settings.json (verify the exact key against current docs at implementation time; do not guess) via the action journal (kind: model-default). Undo restores.
  • Always print the evidence and the escape hatch (/model in-session overrides per session) when applying.

Explicit non-goals

  • No per-turn routing, no proxy, no request rewriting. If the recommendation is wrong for a particular task, the user overrides in-session; nothing intercepts anything.
  • No cross-provider recommendations in v1 (Claude Code projects recommend Claude models only; Codex analogue is a follow-up).
  • No automatic apply. This kind is never included in optimize --apply --yes; it requires the explicit per-project command.

Tests

  • fixture stats where B qualifies (rates/costs asserted against the thresholds), where B fails the one-shot tolerance, where turn counts are insufficient, and where the Debugging-share rule blocks an otherwise-qualifying pair
  • apply writes only the expected key into project settings, preserves the rest, journals, undo restores byte-identical
  • recommendation block absent when nothing qualifies

Acceptance

  • On real data, recommendations only appear with the full evidence line, and applying one is a single reviewable config edit with undo.
  • Design comment approved on this issue before any implementation PR.

Part of epic #602.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions