Function-call edits (structured replacements, not full rewrite) by bddap-bot · Pull Request #31 · bddap/refac

bddap-bot · 2026-05-29T17:55:20Z

Makes refac express changes as function-call edits instead of re-emitting the whole input, as requested.

Stacks on #30 (Anthropic backend) — merge that first, or merge this to get both.

How it works

The model is given the selected text + transform and must call an apply_edits tool returning a list of exact-substring replacements ({old, new}).
refac applies them in order to the original text (apply_edits in main.rs), erroring if an old isn't found verbatim — so a stale/hallucinated edit fails loudly instead of silently no-op'ing.
New default on Anthropic: edit_mode = "tool". Set edit_mode = "rewrite" (or REFAC_EDIT_MODE=rewrite) for the previous full-rewrite behavior. The OpenAI path always rewrites.

Details

anthropic.rs: request_edits() adds the tool + forces tool_choice, parses the tool_use input into Vec<Edit>; HTTP factored into a shared send().
prompt.rs: an edit-focused system prompt (keeps refac's personality; no rewrite few-shot).
Edit-mode skips the full-text few-shot, so requests are much smaller.

Testing (live, with the provided key)

refac tor "Me like toast." "Correct grammar." → I like toast. (one substring edit)

refac tor "<def add…>" "add a docstring" → inserts a docstring, rest untouched:

def add(a, b):
    """Return the sum of a and b. Groundbreaking stuff, truly. --refac"""
    return a + b

cargo test → 6 passing (edit application order, insert-via-anchor, delete-via-empty, missing-target error, request shaping).

🤖 Generated with Claude Code

refac now defaults to the Claude Messages API (model claude-opus-4-8); OpenAI remains available via `provider = "openai"`. New `src/anthropic.rs` talks to the REST API with reqwest (no official Rust SDK): x-api-key + anthropic-version headers, top-level `system`, and required `max_tokens`. It adapts refac's flat message list to Anthropic's shape — lifts the system prompt out of messages and merges the consecutive user turns to satisfy user/assistant alternation — and marks the static system prompt + few-shot examples `cache_control: ephemeral` so repeat calls only pay for the varying input. Config gains `provider`, optional `model` (defaulted per provider), and `max_tokens`; secrets hold either/both keys. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Some few-shot samples have an empty `selected`; Anthropic rejects empty text content blocks (OpenAI tolerated them), so skip empty messages when building the request. Add a REFAC_DEBUG env that dumps the request JSON. Verified end-to-end: 'Me like toast.' / 'Correct grammar.' -> 'I like toast.'

Adds an `apply_edits` tool the model must call, returning a list of exact substring replacements that refac applies to the input — so it no longer re-emits the entire selected text. New default on the Anthropic provider (edit_mode = "tool"); set edit_mode = "rewrite" (or REFAC_EDIT_MODE=rewrite) for the old behavior. OpenAI always rewrites. - anthropic.rs: `request_edits` forces `tool_choice` to `apply_edits`, parses the tool_use input into `Vec<Edit>`. HTTP factored into a shared `send`. - prompt.rs: edit-mode system prompt (no rewrite few-shot). - main.rs: `apply_edits` applies edits in order, erroring if a target isn't found. Live-tested against the API: "Me like toast."/"Correct grammar." -> "I like toast."; "add a docstring" inserts one without touching the rest. cargo test green (6 tests incl edit application + missing-target error). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two rounds of concurrent reviewers (taste/Niko, correctness/API, Rust idiom); findings triaged and the real ones fixed: - apply_edits: resolve every `old` against the ORIGINAL text and reject empty, missing, ambiguous (>1 occurrence), or overlapping edits; apply right-to-left. Edits are now order-independent — closes a silent buffer-corruption hazard. - Check `stop_reason`: error on `max_tokens` truncation in complete + request_edits (a truncated rewrite or half-finished tool call was silently returned/mis-parsed). - login: `--provider` flag + state which provider's key is being saved (avoids an upgrade footgun where an OpenAI key gets stored as the Anthropic one). - Unknown REFAC_PROVIDER / REFAC_EDIT_MODE now error instead of silently defaulting. - Unify the tool name to APPLY_EDITS; rewrite the tool/system prompt to match the new unique-and-independent edit contract. Delete dead OpenAI edits structs (clears all build warnings). Extract DEFAULT_ANTHROPIC_MODEL. - Tests: ambiguous/empty/overlapping edits + resolve-against-original. 10 passing. Re-tested live against the API after the changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bddap · 2026-06-02T21:17:13Z

Time for a rewrite atop the merged anthropic pr. Consider including a wide-scope refactor before this pr to pave the way for this change. Common interface to both providers might be needed, put your software design hat on.

bddap-bot · 2026-06-02T21:22:15Z

Plan, software-design hat on. Splitting into two stacked PRs as you suggested:

PR A — refactor: a common Backend interface (no behavior change).
Today main.rs::refactor() matches on Provider, re-derives the key, and calls a free anthropic::complete / openai::complete. That match-and-duplicate is exactly what grows ugly when a second call shape (tool edits) lands. So first:

trait Backend { fn complete(&self, &[Message]) -> Result<String>; } in a new backend.rs.
Each provider module owns its struct (Anthropic{key,model} / Openai{key,model}) and impl Backend.
backend::resolve(provider, model, &secrets) -> Result<Box<dyn Backend>> holds the key-extraction + "no key" errors in one place. refactor() collapses to resolve(...)?.complete(&messages).
Pure refactor: existing tests stay green, wire formats unchanged.

PR B — rewrite #31 atop A. Function-call edits become the natural second capability on the interface (Anthropic-only request_edits → Vec<Edit>; the substring-apply logic is provider-agnostic and moves to its own module). OpenAI keeps rewriting. This drops the #31-vs-#30 overlap since anthropic.rs already lives on main now.

Starting on A now. Will run my reviewer loop before flagging either for your review.

bddap · 2026-06-02T21:25:17Z

I don't think you'll want to write the funtion-call loop twice. Consider a common abstraction that takes function-calling into account.

bddap-bot · 2026-06-02T21:28:24Z

PR A is up: #32 (the paving refactor, no behavior change). PR B = this branch rewritten atop it.

bddap-bot · 2026-06-02T21:30:44Z

Right, and it corrects a wrong assumption I baked into A. Function-calling isn't Anthropic-only — OpenAI has tools too. So the abstraction shouldn't be an "Anthropic-only edits" capability; it should make a single tool-call round-trip a first-class Backend operation that both providers implement, and the function-call loop lives once on top of it.

Revised design for B:

trait Backend {
    fn complete(&self, &[Message]) -> Result<String>;
    /// One forced tool-call round-trip: hand the model a tool schema, get back
    /// the parsed call arguments. Each provider does this in its own wire format
    /// (Anthropic tools + tool_choice; OpenAI tools + tool_choice "required").
    fn call_tool(&self, &[Message], tool: &Tool) -> Result<serde_json::Value>;
}

The edit flow — build the apply_edits tool schema, call it, parse Vec<Edit>, apply to the original (erroring on a missing old) — is provider-agnostic and written once, atop call_tool. No per-provider loop.
Both providers get edit mode; edit_mode = tool|rewrite stops being provider-gated.
A's backend.rs doc comment currently justifies Box<dyn> over an enum by calling edits "Anthropic-only." That's now wrong — both implement call_tool, so it's a uniform trait. Fixing that comment on A (the dyn-vs-enum call stands on plain extensibility, not capability split).

Noodle away on A. This is the shape I'll build B in once you're settled.

bddap-bot · 2026-06-02T21:41:13Z

Ran the debate club — three advocates (sans-io step-enum / minimal two-method / protocol-transport split). Synthesis + recommendation.

Verdict: protocol/transport split (true sans-io) + an `Outcome { ToolCall | Done }` sum type, scoped one-shot.

Your two words settled it:

"sans-io" is a term of art (h11, quinn): no IO in the protocol layer. Only the split delivers it — render(conv, tools) -> Request and parse(resp) -> Outcome are pure; a generic Transport does the HTTP. The rival step(conv,tools)->Step returns the right type but still opens the socket inside the method — sans-io in name only, and still un-unit-testable.
"tool-call or done as output" = enum Outcome { ToolCall { id, name, input }, Done(String) }. Illegal states gone. Rewrite = Done; edits = ToolCall. One path.

pub enum Outcome { ToolCall { id: String, input: ApplyEdits }, Done(String) }

pub trait Provider {
    fn render(&self, conv: &[Turn], tools: &[Tool]) -> HttpRequest; // pure
    fn parse(&self, status: u16, body: &[u8]) -> Result<Outcome>;   // pure
}
pub trait Transport { fn execute(&self, req: HttpRequest) -> Result<(u16, Vec<u8>)>; }
// one generic driver: render -> execute -> parse. function-call handling written ONCE.

Payoff: two providers with divergent, fiddly tool wire-formats (Anthropic tool_use blocks + x-api-key/version; OpenAI tool_calls + bearer) become testable with zero network — assert exact request JSON, feed canned response bytes to parse. And it's mostly finishing a pattern the repo already half-has: openai.rs already goes through Endpoint/Req (pure render) + Client (transport); anthropic.rs hand-rolls its own reqwest. The split generalizes Req→HttpRequest, promotes the response side to a typed parse, and deletes anthropic.rs's duplicated transport. Net less code, one consistent path.

Where I overrule the advocates

vs two-method (complete+call_tool): its YAGNI instinct is right but mis-aimed — it saves a ~15-line driver by burying IO in 2 methods × 2 providers = four copies of client/auth/timeout boilerplate, and it drops your "or done" output type (forced tool, no Done arm). Rejected on your own stated output shape.
But I keep its core discipline: no multi-turn agent loop. refac is one-shot. The driver does render→execute→parse→(ToolCall: apply edits, terminal / Done: return text). The function-call round-trip is written once, provider-agnostic ("don't write it twice" ✓) — without a turn-feedback engine wired to nothing.

Typed tool

refac has exactly one tool (apply_edits, known schema). Parse straight into typed ApplyEdits { edits: Vec<Edit { old, new }> } at the wire seam; keep serde_json::Value only as the transient decode buffer, never in the loop. Malformed call = typed error at the seam, not a panic in apply.

One decision for you — apply-error policy

When an edit's old isn't found verbatim (model paraphrased / whitespace drift):

(a) fail loudly, abort the transform — your original Function-call edits (structured replacements, not full rewrite) #31 choice; deterministic, single round-trip.
(b) feed the error back as a tool_result, let the model retry (bounded ≤N) — recovers gracefully, costs latency + a 2nd call.

The split supports either; (a) keeps refac's one-shot determinism, (b) needs the bounded loop. I lean (a) for a fast editor binding, option (b) held open. Your call.

Knock-on for PR A (#32)

The complete()->String trait there becomes a stepping stone — B reshapes Backend into this sans-io Provider. A still merges cleanly as the dispatch/key-resolution cleanup, or I fold it into B. Say which.

bddap-bot · 2026-06-02T21:54:38Z

Round 2 — open ideation (3 agents: question-the-premise / type-leverage / orchestration). It found real design-space we'd skipped. The headline: round 1 argued about the Backend trait; round 2 says the higher-leverage axes are the edit mechanism, the apply robustness, and making the seam data. The trait shape is downstream of those.

Three ideas worth more than the trait debate

1. Maybe don't use function-calling at all. Two alternatives to "model calls an apply_edits tool":

SEARCH/REPLACE blocks (aider-style) in the normal text channel: model emits <<<<<<< SEARCH / ======= / >>>>>>> REPLACE fenced pairs; refac parses + applies. Zero Backend/wire changes, works on every model including no-tool ones, and leans on the few-shot mechanism that already drives refac (the model imitates Sample.result verbatim — far more reliable than a JSON-schema description). Smallest possible diff to the actual goal ("stop re-emitting the blob").
Locate-then-edit (offsets). Call 1: model returns char/line ranges relevant to the transform. refac slices them (refac holds the authoritative offsets). Call 2: rewrite only those slices, splice back. This eliminates the entire "old substring didn't match" failure class — the model never reproduces existing text verbatim. Most robust, biggest token win; cost is 2 round-trips.

Both dodge the thing that makes tool-edits fragile: the model has to echo old byte-exact, and whitespace drift / paraphrase breaks it.

2. Correctness lives in the apply step, not the trait — and we all ignored it. Whatever produces {old,new}, the bug surface is: old not found / found multiple times / two edits overlap. Push that into a type:

struct Plan<'src> { src: &'src str, spans: Vec<(Range<usize>, String)> }
impl<'src> Plan<'src> {
    fn resolve(src: &'src str, edits: Vec<RawEdit>) -> Result<Plan, EditError>; // only constructor; fails on NotFound/Ambiguous/Overlap
    fn apply(self) -> String;   // total, no Result — illegal states already gone
}

Highest correctness-per-line, orthogonal to every transport decision. Adopt regardless.

3. The seam should be data, not a behavior — and the cleanest sans-io is a resumable state machine. Instead of render/parse/Transport (3 concepts) or a complete() method, a value you step:

enum Step { Call(HttpRequest), Done(String) }
impl Refac {
    fn start(sel, instr) -> (Self, Step);
    fn resume(self, resp: HttpResponse) -> Result<(Self, Step)>;
}
// driver: while let Step::Call(req) = step { step = m.resume(send(req)?)?.1 }

Single-shot today = one Call then Done. But "apply edits, and if an old missed, re-ask" becomes expressible without committing to a loop now, and record/replay testing falls out for free (feed canned HttpResponses, assert the HttpRequest sequence + final text — zero network). This generalizes round-1's split to N steps and supersedes it.

Two sharper consequences

Typed tools force enum-over-dyn. If we keep function-calling and make it typed (call<T: Tool> with schema derived from the Rust type via schemars — kills schema-vs-parse drift), generic methods aren't object-safe, so Box<dyn Backend> can't work — you need enum Backend { Anthropic, Openai }. That directly revises PR A's dyn choice.
Several existing few-shot samples (roast, advice, what does this do?) emit whole-file commentary, not localized edits. Any structured-edit mode MUST keep a whole-rewrite fallback or those core behaviors regress. (Outcome::Done / a WholeText format already covers this — but it means rewrite and edit must coexist, not replace.)
refac already logs every (selected, transform, output) as JSONL (main.rs). That's a ready-made eval corpus to A/B edit formats and capture req/resp cassettes — the empirical answer to "which edit language do models actually apply reliably."

Where this lands the recommendation

The round-1 verdict (sans-io + Outcome{ToolCall|Done}) survives but gets sharpened and de-prioritized: the trait/transport shape matters less than (a) picking the edit mechanism and (b) the validated-Plan apply step. My updated lean:

Make the seam data (resumable Step machine, idea 3) — keeps record/replay + future-retry without lock-in.
Adopt the validated Plan (idea 2) no matter what.
Prototype the mechanism question empirically rather than decide it in the abstract: SEARCH/REPLACE blocks vs typed tool-call, scored on the existing JSONL log corpus for apply-success rate. The few-shot-friendliness of SEARCH/REPLACE makes me suspect it beats tool-calls on reliability and is a smaller diff — but that's a measurement, not an opinion.

Want me to (i) build a quick apply-success bake-off (SEARCH/REPLACE vs tool-call) over the logs, or (ii) just commit to one mechanism and build it? And the apply-error (a)/(b) + PR-A-fold questions from the last comment still stand.

bddap-bot · 2026-06-02T21:56:58Z

Edit-tool design (stolen from opencode + claude-code)

Researched both. The split is clean: claude-code owns the tool contract (exact match, unique-or-error, replace_all), opencode owns the apply robustness (a fallback chain of progressively fuzzier matchers so the model's old doesn't have to be byte-perfect). refac wants both.

The contract (from claude-code's `str_replace` / Edit)

old must match the source exactly as written (whitespace + indentation included) — before fuzzy fallback.
old must resolve to exactly one location. Multiple → error, unless replace_all. The model disambiguates by including more surrounding context, exactly as claude-code trains it to.
new must differ from old (no-op = error). Empty new = deletion. Insertion = anchor on an existing substring and echo it back inside new (no separate insert op).
A hallucinated/edited old that doesn't match fails loudly — never silently mis-applies.

The apply algorithm (from opencode's `replace()`)

opencode tries a chain of "replacers," each a generator that yields candidate old strings to look for; the first candidate that lands a unique hit wins. Ordered exact → fuzzy so a precise match always beats a loose one:

Simple — old verbatim.
LineTrimmed — match line-by-line ignoring each line's leading/trailing whitespace; splice the original lines.
BlockAnchor — for 3+ line blocks, anchor on first & last line, accept the middle by Levenshtein similarity (≥0.3 when ambiguous). Survives a reworded interior.
WhitespaceNormalized — collapse all runs of whitespace to one space, then match.
IndentationFlexible — strip common leading indent from both sides, match dedented.
(opencode also has EscapeNormalized / TrimmedBoundary / ContextAware / MultiOccurrence — marginal for us; MultiOccurrence is just the replace_all path.)

Resolution per claude-code semantics: within the first replacer that yields any hit, require index == lastIndexOf (unique) else skip → next replacer; if a replacer yields a unique hit, apply and stop. Exhaust all → "could not find old" error.

For refac I'd ship the core 5 (1–5); they cover the real failure modes (indent drift, reflowed whitespace, reworded block) and skip the long tail.

Rust shape

/// A replacer yields candidate substrings of `src` to try matching `old` against,
/// fuzzy-to-exact. First candidate with a UNIQUE occurrence wins.
type Replacer = fn(src: &str, old: &str) -> Vec<String>;

const CHAIN: &[Replacer] = &[simple, line_trimmed, block_anchor, ws_normalized, indent_flexible];

fn find_unique<'s>(src: &'s str, old: &str) -> Result<Range<usize>, EditError> {
    for replacer in CHAIN {
        for cand in replacer(src, old) {
            let mut it = src.match_indices(&cand);
            if let Some((i, _)) = it.next() {
                if it.next().is_none() {                  // unique
                    return Ok(i..i + cand.len());
                }                                          // else ambiguous → keep trying
            }
        }
    }
    Err(EditError::NotFound(old.into()))
}

This is exactly the round-2 Plan idea with opencode's matchers as the resolver. replace_all swaps the unique check for "replace every occurrence of the matched candidate."

refac-specific calls (need your read)

Batch vs single. claude-code's Edit is one {old,new} per call; refac does one model round-trip, so I'd take a batch: apply_edits({ edits: [{old, new, replace_all?}] }) — all changes in one tool call.
Sequential vs offset-Plan. opencode applies edits sequentially (each sees the prior result) — simplest, matches model expectations, and lets a later edit target text an earlier one produced. The round-2 Plan (resolve all vs the original, require non-overlapping, apply by descending offset) instead catches conflicting/overlapping edits up front but forbids dependent edits. I lean sequential (opencode) for refac; it's what the models are tuned for. Overlap just surfaces as a later not-found.
Error policy = the open (a)/(b). After the chain exhausts: (a) abort loudly [orig Function-call edits (structured replacements, not full rewrite) #31] or (b) feed EditError back as a tool_result and let the model retry once. The fuzzy chain shrinks how often (b) is even needed.

Tool description (the prompt the model sees) — steal claude-code's precision

Replace exact substrings in the selected text. For each edit, old must appear in the text; copy it verbatim including whitespace and indentation. Make old long enough to be unique, or set replace_all. new is what replaces it — empty to delete; to insert, include surrounding text in both old and new. Edits apply in order.

Sources: opencode edit.ts, claude-code / Anthropic text-editor tool.

This composes straight onto the round-2 plan: the apply_edits tool's args parse into the typed Edits, find_unique is the validated-Plan resolver, and the whole thing is mechanism-agnostic — it works the same whether the {old,new} list arrives via a tool-call or SEARCH/REPLACE blocks. So we can lock the edit semantics now and still defer the tool-vs-blocks mechanism bake-off.

bddap-bot · 2026-06-02T23:39:03Z

PR B is up — #33 (tool/function-call edit mode), built on the design from this thread and the opencode/claude-code edit semantics. Stacks on A (#32). 38 tests, clippy clean, live-tested on Anthropic; two reviewer-loop rounds applied. Details in the PR.

bddap · 2026-06-19T18:49:08Z

Remind me why we have two prs who's descriptions seem to indicate they do the same thing. #33 and #31

bddap-bot · 2026-06-19T18:56:47Z

#33 supersedes this — they're a prototype and its finished version, not two of the same thing.

The lineage:

Add Anthropic (Claude) API backend #30 (merged) added the Anthropic backend.
Function-call edits (structured replacements, not full rewrite) #31 (this PR) was the first cut of function-call edits: the model calls one apply_edits tool returning a batch of {old, new} exact-substring replacements applied in order, with whole-text rewrite kept as a fallback. It was stacked on Add Anthropic (Claude) API backend #30 (now moot — Add Anthropic (Claude) API backend #30 merged) and has since gone unmergeable as main moved under it.
Tool/function-call edit mode #33 ("PR B of the plan in Function-call edits (structured replacements, not full rewrite) #31") is the rewrite that took the idea further: an iterative edit/view/reset/finish tool loop (apply → feed results back → repeat) with a fuzzy-match fallback chain (exact → line-trimmed → block-anchor → whitespace/indent-flexible, after opencode/claude-code) instead of exact-only; rewrite mode removed entirely (your call — personal tool); provider-agnostic loop with typed schemars schemas; 35 tests + live-tested; two review rounds. Self-contained and mergeable.

So #33 fully subsumes #31. Closing this in favor of #33 — reopen if you'd rather land the simpler batch-edit version instead.

bddap-bot and others added 4 commits May 29, 2026 10:44

bddap-bot mentioned this pull request Jun 2, 2026

Refactor provider dispatch behind a Backend trait #32

Closed

bddap-bot mentioned this pull request Jun 2, 2026

Tool/function-call edit mode #33

Merged

bddap-bot closed this Jun 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function-call edits (structured replacements, not full rewrite)#31

Function-call edits (structured replacements, not full rewrite)#31
bddap-bot wants to merge 4 commits into
bddap:mainfrom
bddap-bot:tool-edits

bddap-bot commented May 29, 2026

Uh oh!

bddap commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap commented Jun 19, 2026

Uh oh!

bddap-bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bddap-bot commented May 29, 2026

How it works

Details

Testing (live, with the provided key)

Uh oh!

bddap commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap-bot commented Jun 2, 2026

Verdict: protocol/transport split (true sans-io) + an Outcome { ToolCall | Done } sum type, scoped one-shot.

Where I overrule the advocates

Typed tool

One decision for you — apply-error policy

Knock-on for PR A (#32)

Uh oh!

bddap-bot commented Jun 2, 2026

Three ideas worth more than the trait debate

Two sharper consequences

Where this lands the recommendation

Uh oh!

bddap-bot commented Jun 2, 2026

Edit-tool design (stolen from opencode + claude-code)

The contract (from claude-code's str_replace / Edit)

The apply algorithm (from opencode's replace())

Rust shape

refac-specific calls (need your read)

Tool description (the prompt the model sees) — steal claude-code's precision

Uh oh!

bddap-bot commented Jun 2, 2026

Uh oh!

bddap commented Jun 19, 2026

Uh oh!

bddap-bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Verdict: protocol/transport split (true sans-io) + an `Outcome { ToolCall | Done }` sum type, scoped one-shot.

The contract (from claude-code's `str_replace` / Edit)

The apply algorithm (from opencode's `replace()`)