Tool/function-call edit mode#33
Conversation
main.rs::refactor() matched on Provider, re-derived the API key inline, and called a free anthropic::complete / openai::complete — a match-and-duplicate that grows worse with each new call shape. Introduce a Backend trait and a single resolve() that turns a provider choice into a key-bearing, callable backend, so the rest of refac stays provider-agnostic. No behavior change: wire formats, error paths, and "no key" messages are identical. Paves the way for an Anthropic-only tool/edit capability (a separate trait, hence Box<dyn Backend> over a closed enum). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…only Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The edit tool's core. The model calls a single-edit `edit({old,new,replace_all?})`
tool, possibly several times per turn (both providers do parallel tool calls);
`apply` runs one such edit against the buffer. Matching walks a chain of
progressively looser strategies (exact, line-trimmed, block-anchor,
whitespace-normalized, indentation-flexible — after opencode's replace()), exact
first, first unique hit wins. Missing or ambiguous matches are typed errors
(after claude-code's str_replace contract) to feed back to the model, never a
silent mis-apply.
Pure and provider-agnostic. 14 unit tests. Not yet wired into a backend.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The model drives a session over the selected text via four tools — edit (the workhorse), view (re-anchor if it loses track), reset (restore the original), and finish (the done signal) — and refac applies each, feeding results back, until the model finishes or a guard trips. A Model is one abstract assistant turn (send conversation + tools, get tool calls); the real providers implement it over their wire formats, so this loop is IO-free and tested end to end with a scripted model. Guards: a turn cap (view/ reset can't spin) and a consecutive-all-edits-failed cap (a stuck model stops burning tokens). Parallel tool calls in one turn apply in order. 10 unit tests. Not yet wired to a real provider or main. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AnthropicAgent implements agent::Model: each turn posts the running conversation plus the tool defs (tool_choice auto) and returns the model's tool_use calls; respond threads results back as a tool_result user turn. The assistant content is echoed verbatim as JSON, which the tool_use/tool_result protocol requires. The typed rewrite path (build_request/send) is unchanged. Tool threading uses serde_json::Value since the echoed assistant content is dynamic by nature. Request shaping and tool_use parsing are pure and unit-tested; the HTTP turn needs the live API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
OpenaiAgent implements agent::Model over chat-completions: each turn posts the conversation plus the function tools and returns the model's tool_calls; respond threads results back as role:"tool" messages (errors marked in content, since the API has no error flag). Assistant messages are echoed verbatim so tool_call_ids line up. prompt::edit_prefix is the edit-mode system prompt — task, workflow, and refac's personality only; the per-tool mechanics live on the tool descriptions, not here. Request shaping and tool_calls parsing are unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- config: edit_mode (tool | rewrite), default tool, via config.toml or REFAC_EDIT_MODE. Folded Config's manual Default into a derive. - backend: resolve_agent builds a Box<dyn agent::Model> per provider; key sourcing factored into one key_for helper shared with resolve. - agent::run_with reports each edit attempt to a callback. - main::refactor switches on the mode: tool mode seeds the edit prompt, builds the agent, and runs the loop; every edit attempt (with its EditError, if any) is logged to edits.jsonl — the failure-rate signal. Rewrite mode is unchanged. 37 unit tests; clippy clean. Live API smoke test next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
From a fan-out review of the edit-mode feature: - edit: whitespace_normalized advanced the search cursor by one byte, which could split a multi-byte char and panic on ordinary non-ASCII input. Advance by the matched char's length. Regression test added. - edit: block_anchor no longer accepts a match on first/last anchors alone when every middle line is blank (too weak); require a real middle-line match. - agent: collapse Model::turn + Model::respond into one turn(results) -> calls. The two-method split was an unenforced state machine (advance without answering the prior calls => an API 400); merging makes "answer the outstanding calls" and "take the next turn" one indivisible step. - dedup: Role::as_str replaces the hand-written role mapping in both providers (it also reimplemented serde's rename); agent::http_client is now shared. 38 unit tests; clippy clean. Re-smoke-tested live against Anthropic (incl. a multi-byte selection — no panic). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop references to the removed Model::respond in the agent doc comments; correct the Backend trait doc to point the tool path at agent::Model/resolve_agent; note why the OpenAI seed skips the empty-field placeholder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per owner: this is a personal tool; the whole-text rewrite path was dead weight
once tool edits work. Removing it (not gating it) deletes a pile of code:
- the Backend trait, resolve(), and both providers' send()/complete() paths
- src/api_client.rs (only the OpenAI rewrite path used it)
- the rewrite few-shot system prompt + all SAMPLES
- the typed Anthropic/OpenAI wire structs the rewrite path needed
- EditMode / REFAC_EDIT_MODE config and Message::{cache, assistant}
What remains is the edit loop and nothing else: refac always drives the model
through the edit/view/reset/finish tools. ~675 fewer lines. 35 tests; clippy
clean; re-smoke-tested live on Anthropic.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
From a sub-agent simplification sweep:
- Drop unused deps itertools and similar (leftovers from the rewrite path).
- Replace the run_with + EditOutcome callback with run() returning an Outcome
{ text, attempts }; main logs the attempts after the loop. Removes the
observer protocol, a lifetime, and the run/run_with split.
- Drop the Limits struct: run() takes max_turns directly (the only thing a caller
varies) and the failure cap is a const.
- Fix the block_anchor test to actually exercise a reworded middle (it previously
passed via the exact matcher because old == src).
35 tests; clippy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
refac only ever sends one conversation shape: the system prompt plus a
single user (selected, transform) turn. The general Message/Role message
list modeled turns and an Assistant role that the edit-only path never
used.
Replace it with agent::Seed { system, selected, transform } — named
fields encode the real shape and make a malformed conversation
unrepresentable. Each agent's new() takes &Seed and builds its own wire
format directly; prompt::edit_prefix() collapses to the pub
EDIT_SYSTEM_PROMPT const.
No behavior change: 35 tests green, clippy clean, Anthropic edit live-tested.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `edit` tool's JSON Schema was a hand-built `json!` literal sitting next to the `Edit` struct it described — two copies of the same shape, free to drift. Derive `JsonSchema` on `Edit` so the advertised schema and the type the call deserializes into are one source of truth; the no-arg tools get their schema from an empty `NoArgs` struct the same way. Field doc comments become the per-property descriptions. schemars dep added. 35 tests green, clippy clean, Anthropic live-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the review batch on bddap#33. agent.rs: replace the Action enum + parse() + match-dispatch with a tool registry. Each tool is a `Tool { name, description, input_schema, run }` built by a generic `Tool::new::<A>` that derives the schema from the args type `A` and wraps a typed handler — so the advertised schema and the parsed call come from one type and can't drift, and a tool's whole behavior lives in one place. A `Step { Continue { reply, attempt }, Finish }` return lets each handler own its loop effect: `finish` returns `Finish`, `edit` carries its `Attempt` to log. run() is now generic — no per-tool name checks, no dead finish handler, no re-parsing edit args for the log. input_schema is now schemars' own `Schema` type, not serde_json::Value; it serializes transparently so the providers are untouched. prompt.rs: drop the "you're well-trained, you only write high-quality" model flattery; apply the suggested tighter task wording; make the --refac sign-off sass meaner (contextual insults); "Aim to please by showing off your cleverness". Rename EDIT_SYSTEM_PROMPT -> SYSTEM_PROMPT (one mode now). Trim comments to non-obvious WHY throughout. 35 tests, clippy clean. Anthropic live-tested (multi-edit, advice, countdown); edit-attempt JSONL logging verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
pascal that pr description please @bddap-bot |
|
Done — cut to ~250 words from ~430, same substance (engine → loop/registry → typed-throughout → tests). Your two inline notes are queued: inline the |
|
Heads up — these review notes landed on an outdated diff: that hunk's
I'll land the comment sweep + the view-seed prototype as one revision (and the |
Cut the comments that just restate names/fields — NoArgs, tools(), RawCall, ToolResult, Outcome, http_client — and trimmed Seed to its invariant. Kept the WHYs (Tool::new's no-drift, the Model-trait folding, failure-counting, the malformed-call handling). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| /// An edit-mode session against the Messages API. Implements [`Model`]: each | ||
| /// `turn` first threads the previous turn's results back as a `tool_result` user | ||
| /// turn, posts the running conversation plus the tool definitions, and returns | ||
| /// the model's tool calls. The assistant's content is echoed back verbatim (as | ||
| /// JSON), which is what the API requires for a `tool_use`/`tool_result` exchange. |
There was a problem hiding this comment.
wtf accepted — cut. The old CacheControl/ContentBlock cruft this pointed at is already gone; also deleted the leftover role-tag narration on Message here and the matching one on the OpenAI side. 8f19ea1.
The Anthropic and OpenAI agents held their conversation state as untyped Vec<Value>, in a PR whose thesis is strong typing. Replace it with structs that serde-serialize to the exact same wire JSON: - anthropic: SystemBlock, ContentBlock (Text/ToolResult), Message (User/Assistant), ToolDef, and a borrowing Request. Message is tagged by `role`; ContentBlock by `type`. - openai: Message (System/User/Tool/Assistant) + Role, ToolDef/FunctionDef, and a borrowing Request. The one Value kept on each side is the echoed *assistant* turn — the verbatim bytes the API returned, preserved for round-trip fidelity (re-serializing parsed blocks reorders fields and drops ones refac doesn't model, e.g. Anthropic thinking signatures, which the tool_use/tool_result handshake depends on). Everything refac constructs is typed. OpenAI's Message is `untagged` rather than tag = "role": the assistant value already carries its own `role`, so a role discriminant emitted it twice (malformed duplicate key). A test guards the single-role invariant. Request-shaping tests assert the typed structs serialize to the same wire JSON as before; 39 tests pass, clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bddap
left a comment
There was a problem hiding this comment.
Make sure not to treat anything but the "done" function call (or api error) as terminating. Normal chat messages should be errors reported to the agent because I don't have a way to read them.
Multi-lens review pass over the edit-mode PR.
- Wire types: tool schemas now travel as `schemars::Schema` end to end
(drop the `serde_json::to_value` round-trip in both providers), and
Anthropic's `tool_choice` is a typed `{"type":"auto"}` unit instead of a
`json!` `Value` — one fewer stringly hop on the request path.
- `http_client()` moves to `backend.rs`, so `agent.rs` is genuinely IO-free
as its module doc claims.
- `dedent` no longer panics on multi-byte leading whitespace: a `min` indent
can land mid-char on another line, so slice via `get(..)`.
- Comment sweep to the WHY-only bar: cut WHAT/restatement doc comments on
internal items and narration inline comments; kept the real gotchas
(handshake byte-fidelity, the matcher's char-boundary notes, the 0600
rationale) and the schemars/clap-consumed docs.
Tests: 39 pass; clippy -D warnings clean; rustfmt clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Meaningless to the model, and outdated under the tool-result text injection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drops the make-exactly-the-changes/flag-bugs sentence per the suggestion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fresh-context reviewer rounds (correctness/design/security/owner-taste)
over the PR diff; this lands what survived triage.
Correctness:
- edit.rs: guard against empty match candidates. A whitespace-only `old`
trims to "" and the line/block matchers yield zero-length spans; with
replace_all, `src.replace("", new)` splatters `new` between every char and
returns Ok (silent buffer corruption). Skip empty candidates -> NotFound.
- Empty selection was broken: the buffer was seeded with the raw (empty)
`selected` while the model was shown "(empty)", and no tool can fill an
empty buffer, so README's generate-from-nothing examples (fizzbuzz) failed
three edits and aborted. Seed the buffer with the same placeholder the
model sees, via a shared `placeholder_if_empty`, so the model edits it away.
HTTP / errors:
- send_json in backend.rs: read the body as text before parsing, so a non-JSON
error page (gateway/proxy 429/5xx) survives into the error instead of being
lost to a parse failure. Both providers' duplicated `post` now call it.
Types (make-illegal-states):
- ToolResult: collapse {content, is_error} into one Reply (Result<String,String>)
so the error flag can't desync from the content; drop the two bridge helpers.
- openai.rs: per-variant singleton role types so a message's role can't be
constructed wrong (matches the existing type-tag idiom).
- config_files.rs: parse REFAC_PROVIDER through Provider's ValueEnum instead of
a hand-written string match, so accepted spellings can't drift from the enum.
Comments trimmed to WHY-only; provenance/dev-history removed. Two tests added
(empty-candidate guard, empty-selection generation). cargo test 41 ok, clippy
--all-targets -D warnings clean, rustfmt clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The provider-typing refactor left the OpenAI `Message` enum `#[serde(untagged)]`
with the assistant turn as a raw `serde_json::Value`, to dodge a double `role`
key and to preserve the verbatim `function.arguments` bytes. Model it properly
instead.
- `Message` is now a clean `#[serde(tag = "role")]` enum like the Anthropic side;
each variant fixes its own role, so no construction can set the wrong one and
the wire still carries `role` exactly once (the single-role regression test is
kept and still passes).
- The assistant turn is a typed `AssistantTurn { content: Option<String>,
tool_calls: Option<Vec<ToolCall>> }`, used as both the receive and the echo
type via a newtype variant `Assistant(AssistantTurn)` — the parsed turn flows
straight back out, no field-by-field copy to drift.
- `ToolCall.function.arguments` stays a `String`: it is a JSON string on the wire,
so keeping it opaque preserves byte-for-byte fidelity (a reparse would reorder
keys and renormalize numbers/whitespace) — the real reason `Value` was used,
now had for free. Added a test proving the bytes survive.
chat-completions, unlike Anthropic's Messages API, carries no echo-required
opaque blocks (`thinking` signatures), so the turn can be fully typed; unmodeled
fields (`refusal`, …) are dropped on echo, which the API ignores on input.
`cargo test` (43 pass) + `cargo clippy --all-targets -D warnings` clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`selected` was a plain user message. Per review, it must reach the model exactly once, as the return value of a pre-seeded function call. Both providers now open the conversation as if the model had already called `view`: a user message with the instruction (`transform`), an assistant `view` tool call (id `seed_view`), and that call's result carrying `selected`. The model reads `selected` the same way it reads every later `view`, and it appears once — never as a user message. `SEED_TOOL`/ `SEED_CALL_ID` live in agent.rs so the two wire formats can't disagree. The empty-input placeholder is untouched: the seeded `selected` is still the placeholder-applied string that also seeds the edit buffer, so the buffer the model edits and the text it's shown remain identical (and never an empty block). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Andrew flagged comment-flood on bddap#33 four times. Delete every comment that only restates the code, a signature, or a type, or narrates WHAT the next lines do; keep only WHY / gotcha / constraint. Net -33 comment lines across agent.rs, anthropic.rs, openai.rs, main.rs. Named offenders cut: the `ToolResult` "one field not a tuple" doc (agent.rs), the `Message` role-tag narration (anthropic.rs / openai.rs). Add AGENTS.md codifying the bar ("comments are code; a comment must provably earn its place; when in doubt, delete"), so the rule outlives this PR. Also note the real-types-over-Value preference and the nix build/test commands. Strong-typing note (anthropic.rs): the wire types already landed — SystemBlock/Message/ToolDef are real types; the lone `Value` is the assistant turn echoed back verbatim for byte-fidelity, which carries its WHY. Ignore the in-tree /.cargo-home build cache. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per review: comment privileges revoked for this PR — remove every //, ///, and //! from the files this PR touches (agent.rs, anthropic.rs, backend.rs, config_files.rs, edit.rs, main.rs, openai.rs, prompt.rs). The `edit` tool's three arg-field doc comments are model-facing (schemars turns them into the tool-schema `description`), so move them verbatim to `#[schemars(description = "…")]` attributes — code, not comments — keeping the sent schema byte-identical. Wire/schema tests pass unchanged. Also inline four pre-existing `uninlined_format_args` in main.rs so `cargo clippy --all-targets -- -D warnings` is green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…xtra
anthropic: replace the hand-rolled `ToolChoiceAuto` struct with an
internally-tagged `ToolChoice { Auto, Any, Tool { name } }` enum matching
Anthropic's `tool_choice` wire shape; replace the assistant message's raw
`content: Value` with a typed `Vec<AssistantBlock>` (internally tagged on
`type`: Text / Thinking / RedactedThinking / ToolUse), each variant carrying a
flattened `Map<String, Value>` so unmodeled keys (e.g. a thinking block's
signature) survive the next-turn handshake.
openai: give the assistant carry types the same treatment — `AssistantTurn`,
`ToolCall`, `FunctionCall` each gain a flattened `Map<String, Value>` for
unmodeled fields (refusal, reasoning, tool-call index). A deserialize-only
`role` field absorbs the incoming role so it can't land in the flattened map and
re-emit as a duplicate alongside the `#[serde(tag = "role")]` tag.
`extra` is `Map<String, Value>`, not `Value`, so a non-object flatten target is
unrepresentable. The tool-call `arguments` stays a verbatim `String`. New tests
pin the ToolChoice wire shapes and the OpenAI flatten/no-duplicate-role
round-trip; existing wire/verbatim tests are unchanged. 45 tests pass, clippy
clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Honestly kinda regret being so micro-managy. |
|
Please add the new version to my dotfiles. |
PR B of the plan in #31, and now refac's only mode: the model edits via tool calls instead of re-emitting the whole text. Whole-text rewrite is removed (personal tool — no reason to carry it); #32 closed as obsoleted.
The model calls a single-edit
edittool (as often as it likes per turn) plusview/reset/finish; refac applies each edit, feeds the result back, and loops untilfinish.Design
edit.rs— pure edit engine.applymatchesoldthrough progressively looser strategies (exact → line-trimmed → block-anchor → whitespace/indent-flexible, after opencode); first unique hit wins. Missing/ambiguous/no-op are typedEditErrors fed back to the model — never a silent mis-apply.agent.rs— provider-agnostic loop + tool registry. A genericTool::new::<A>derives each tool's JSON schema and parser from its args type (can't drift); theModeltrait is oneturn(results) -> calls, so the loop is tested with a scripted model. Providers translate to their own wire format (Anthropictool_use, OpenAItool_calls).input_schemais schemars'Schema, notValue. One conversation shape (Seed { system, selected, transform }) replaces a general message list — malformed conversations unrepresentable.EditErrorlogged toedits.jsonl.Tests
35 unit tests (engine, loop, both providers' shaping/parsing), clippy clean. Live-tested on Anthropic (grammar, docstring insert, multi-edit rename, multi-byte); OpenAI path unit-tested only. Two review rounds fixed a UTF-8 boundary panic, an API-400 trait footgun, and the block-anchor matcher.
Follow-up (separate): OpenAI schemas aren't
strict-mode (fine as-is).🤖 Generated with Claude Code