Fix import merge bug and ship agent-friendly data-op tools by mkreyman · Pull Request #105 · mkreyman/loopctl

mkreyman · 2026-04-17T02:07:08Z

Summary

Fixes the merge=true import 422 (string-vs-integer epic number mismatch) and translates opaque Ecto unique-constraint errors into actionable domain messages with merge=true hints.
Adds create_story and backfill_story MCP tools (plus underlying endpoints) so agents don't have to wrap single-story additions in bulk-import payloads or invent hacks for work completed outside loopctl.
Exposes merge + payload_path on import_stories so agents can add to existing epics and avoid 129KB inline tool calls.
Lowers data-op role requirements (create/update/delete epics, stories, deps) from :user to :orchestrator so an autonomous orchestrator can recover from partial imports without human intervention. Project-level destructive ops stay at :user.
Clarifies in the /loopctl:orchestrate skill that metadata operations are allowed directly — sub-agents are only required for editing application code.

Why

Agents trying to import Epic 72 stories burned an entire session on paper cuts: the merge path was actually broken, errors were opaque (epics[0].tenant_id: has already been taken for this project), there was no per-story creation tool, no way to mark pre-loopctl work as verified, and orchestrators couldn't even delete their own partial imports to retry. Each of these got fixed at the root.

Tests

Reproducer tests for the merge upsert bug (both orchestrator-role and string-epic-number cases).
Full coverage on create_in_project (success, missing epic_number, not-found epic, orchestrator role).
Full coverage on backfill (success with provenance, missing/blank reason, already-verified idempotency, agent role forbidden).
2274 tests green, Dialyzer clean, Credo clean.

Test plan

CI green on branch
Run through /loopctl:orchestrate import all epic 72 stories again on a fresh tenant and confirm it takes one import_stories call with merge: true instead of ten turns of probing
mcp__loopctl__backfill_story marks a pre-existing story as verified with provenance captured in metadata.backfill
mcp__loopctl__create_story with epic_number (not UUID) creates a story in the correct epic
Error message for duplicate fresh import now says "use merge=true" rather than "tenant_id: has already been taken"

MCP server

Bumped to v2.1.0 — new tools (create_story, backfill_story) and new import_stories options (merge, payload_path). README updated. Consumers pull via npx loopctl-mcp-server.

🤖 Generated with Claude Code

Agents hit a series of paper cuts trying to import Epic 72 stories into an existing epic via loopctl. This addresses the root causes so the next import takes one tool call instead of ten turns of probing. ## Server fixes - **Import merge upsert bug**: `Map.get(epic_by_number, ...)` in `merge_import_project` missed when clients sent `"72"` (string) instead of `72` (integer) because DB epic numbers are integers. Lookup returned nil, insert branch ran, unique constraint fired with the opaque `epics[0].tenant_id: has already been taken for this project`. Fixed by normalizing epic numbers to integers and story numbers to strings at the top of both `import_project/4` and `merge_import_project/4`, before any DB lookups or validations. - **Domain error translation**: composite unique_constraint violations are reported on the FIRST field of the constraint (tenant_id), which reads nonsensically. `ImportExport.format_changeset_path_error/2` now translates `Epic`/`Story` unique-number violations into `Epic N already exists in this project. Use merge=true to update ...` - **Data-op role policy**: create/update/delete for epics, stories, and dependencies lowered from `:user` to `:orchestrator`. Project-level destructive ops (delete project, delete budget, cost anomaly resolve, article archive, tenant key rotate) stay at `:user`. CLAUDE.md Security section updated to document the split. - **Backfill endpoint**: `POST /stories/:id/backfill` marks a story verified when the work was done outside loopctl. Sets agent_status and verified_status in one shot, records provenance in `metadata.backfill` and an audit event tagged `backfilled_pre_loopctl`. Requires a `reason`; accepts optional `evidence_url` and `pr_number`. Role: `:orchestrator`. - **Single-story endpoint by epic number**: `POST /projects/:project_id/stories` accepts `epic_number` in the body, resolves to the epic UUID, and creates a story. Complements the existing `POST /epics/:epic_id/stories` which requires the UUID. ## MCP server v2.1.0 - `import_stories` now exposes `merge: boolean` and `payload_path` (absolute path to JSON file) so agents can merge into existing epics and avoid 129KB inline tool calls. - New `create_story` tool — wraps the new endpoint and accepts either `epic_id` or (`project_id` + `epic_number`). - New `backfill_story` tool — wraps the new endpoint. ## Skill update `skills/loopctl-orchestrate.md` now explicitly carves out "data operations" (imports, creates, backfills, dispatches, reads, verifications) as things the orchestrator does directly without spawning a sub-agent. Sub-agents are only required for editing application code. ## Tests - Reproducer tests for the merge upsert bug (both orchestrator-role + existing story AND string-typed epic number cases). - Full coverage on `create_in_project` (success, missing epic_number, not-found epic, orchestrator role). - Full coverage on `backfill` (success with provenance, missing reason, blank reason, already-verified idempotency guard, agent role forbidden). All 2274 tests green. Dialyzer clean. Credo clean.

Follow-up to silence "No operation spec defined" warnings from openapi_test.exs for the two new controller actions.

Team review flagged several concrete issues with the initial PR. The critical one is that backfill_story accepted ANY story and wrote both agent_status=:reported_done AND verified_status=:verified in one transaction — which lets a single orchestrator identity implement and verify its own work, bypassing chain-of-custody. ## Backfill hardening (lib/loopctl/progress.ex) - Replace `guard_not_verified/1` with `guard_backfillable/1`. Now rejects: - `verified_status == :verified` — idempotent no-op - `verified_status == :rejected` — investigate the rejection, don't paper over - `assigned_agent_id != nil` — story was dispatched through loopctl; must use the normal report/review/verify flow. This is the structural guard that makes backfill safe regardless of role. - Rename audit action from `backfilled_pre_loopctl` to `backfilled` and move the provenance reason into `new_state.source = "pre_loopctl"`. Matches existing taxonomy (`created`, `verified`, `rejected`) which names the TRANSITION, not the reason. - Emit `story.backfilled` webhook event. Subscribers watching for completion transitions now see backfills as a distinct event. Added to the Webhook `@valid_event_types` allowlist. - Update `@spec` to enumerate the new error atoms plus the `%Ecto.Changeset{}` shape that can propagate from the Multi story step. - Controller maps each new atom to an actionable 422 message (explains WHY the backfill was refused and what to do instead). ## Role policy revert on DELETE (CLAUDE.md compliance) The initial PR lowered create/update/DELETE for epics, stories, and dependencies to `:orchestrator`. CLAUDE.md's destructive-op rule says any DELETE stays at `:user`. Revert: - StoryController, EpicController, Story/EpicDependencyController: split the role plug — `:orchestrator` for create/update, `:user` for delete. - Update CLAUDE.md to match: constructive/metadata ops are orchestrator, any data removal is user. Rule of thumb restated. - Fix moduledoc drift: four controllers still said "user role" for create/update operations that moved to orchestrator. ## create_in_project error handling - Previously folded `Projects.get_project` and `Epics.get_epic_by_number` into one `{:error, :not_found}` handler that always said "Epic number N not found". If the project UUID was wrong, clients got an epic-focused error. - Tag the `with` steps (`{:project, ...}`, `{:epic, ...}`) and route each failure to the correct response: 404 for missing project (matches OpenAPI), 422 for missing epic with actionable hint. ## Domain error translation beyond import - FallbackController now translates Epic/Story unique-constraint errors into `"Epic 72 already exists in this project. Pick a different number, or use the import endpoint with `merge=true`..."` regardless of which controller surfaced the changeset. Direct POST /epics/:id/stories and POST /projects/:id/stories now get the same quality error as the import path. - `ImportExport.unique_number_violation?/1` rewritten to match on `constraint: :unique` in opts instead of the literal message string. Robust against future wording tweaks to the schema. ## Tests added - `refuses to backfill a story with dispatch lineage` — the chain-of-custody guard. - `refuses to backfill a rejected story` — state-machine guard. - `backfill is tenant-scoped (cannot reach another tenant's story)` — security boundary. - `returns 404 when story does not exist`. - `emits story.backfilled webhook event when a subscriber is configured` — webhook emission path verified end to end. Also updated the existing "success path" test to assert the new audit action name (`backfilled`) and the new `source: "pre_loopctl"` field. ## Type signatures - `Epics.get_epic_by_number/3` @SPEC widened to `integer() | String.t()` (it always accepted both). ## MCP tool docs - README: `backfill_story` now carries an explicit warning that it bypasses the review/verify chain and lists the refusal conditions. - `import_stories` tool description: documented payload/payload_path precedence (inline `payload` wins). 2279 tests green. Credo clean. Dialyzer clean.

Adversarial review found a critical bypass: force_unclaim clears assigned_agent_id but not implementer_dispatch_id, letting an orchestrator chain force_unclaim + backfill to "verify" dispatched work without a review. Also found resource-exhaustion and file-read primitive in the MCP payload_path. ## Chain-of-custody bypass closed (lib/loopctl/progress.ex) guard_backfillable/1 now refuses stories that have ANY of: - verified_status in [:verified, :rejected] - agent_status != :pending (catches :contracted, :assigned, :implementing, :reported_done -- the full dispatched set) - assigned_agent_id set - implementer_dispatch_id set (the ever-dispatched marker force_unclaim_story does NOT clear -- this was the bypass) - verifier_dispatch_id set (defense in depth) All five checks are structural -- the guard does not rely on any one field alone, so future changes to the lifecycle can't regress this protection by accident. ## Param validation at the context boundary backfill_story/4 now validates: - reason -- max 2000 chars - pr_number -- positive integer or numeric string; anything else rejected - evidence_url -- must be http(s)://, no user:pass@ userinfo (prevents leaking GitHub PATs into permanent audit records) Each failure maps to a specific 422 with an actionable message. The controller's branch explosion was extracted into backfill_error_message/1 to satisfy Credo complexity. ## Idempotency (retry safety) Previously, a client that timed out on a successful backfill got 422 on retry. Now: when guard_backfillable returns :already_verified, compare the incoming payload to metadata.backfill. If they match, return 200 with the story. If they differ, return 422 with a message that points at who verified and why. Tests cover both paths: identical retry -> 200, different payload -> 422. ## Webhook payload completeness story.backfilled webhook payload now includes actor_id, actor_label, and source: "pre_loopctl" -- matching what story.verified emits and what the audit row records. Subscribers can now attribute backfill events. ## MCP payload_path safety (mcp-server/index.js) resolvePayload previously did fs.readFile(path) with no validation. A prompt-injected agent could pass /dev/zero (DoS) or ~/.aws/credentials (local file read) and the MCP would happily process it. Now: - path.isAbsolute required - /proc, /dev, /sys prefixes rejected - stat first; refuse non-regular files - size cap of 5 MiB (server also enforces 2 MiB body limit) ## FallbackController precision unique_constraint_violation?/1 now matches on constraint_name containing "number", not any :unique constraint. When a future schema adds a second unique constraint (external_id, slug), it won't be misreported as "Epic X already exists". Same fix applied to ImportExport.unique_number_violation?/1. ## Discoverability route_discovery_controller.ex now lists the two new routes that were missing: - POST /api/v1/projects/:project_id/stories - POST /api/v1/stories/:id/backfill The previous PR tightened route discovery; this closes the gap that slipped in. ## Tests added - refuses backfill after force_unclaim clears assigned_agent_id -- the critical bypass - refuses backfill when agent_status is past :pending -- the contracted but unassigned case - idempotent retry with same payload returns 200 -- resilience - retry with DIFFERENT payload after success returns 422 -- distinguishes retry from conflict - rejects non-integer pr_number -- param validation - rejects evidence_url with credentials in userinfo -- token leak prevention - tenant isolation: cannot create a story in another tenant's project -- cross-tenant defense on create_in_project ## MCP tool description backfill_story description now lists all refusal conditions (not just in the README). Agents reading the tool schema see the same warnings as humans reading the README. 2286 tests green. Credo clean. Dialyzer clean.

Local Elixir 1.18 `mix format` wants the pattern broken across 3 lines; CI Elixir 1.19 wants 2 lines. Refactor into two sequential matches: one extracts the 16-byte UUID prefix, the second splits it into fields. Both formatters accept this shape unchanged.

PR #105 shipped new functionality but the public-facing docs still pointed at the old tool count and made no mention of the new tools. This follow-up fixes that so agents and humans hitting loopctl.com see accurate guidance. ## loopctl.com/docs (page_html/docs.html.heex) - "41 typed tools" -> "50 typed tools" (section intro) - "42 tools" -> "50 tools" (Other Notable Tools summary) - New "Work Breakdown Tools" subsection documenting import_stories (with merge + payload_path), create_story (with epic_number convenience), and backfill_story (with the dispatch-lineage refusal warning). Matches the pattern of the existing Knowledge Wiki Tools enumeration. ## Landing page (home.html.heex) - "42 typed tools" -> "50 typed tools" (two occurrences) - Getting-started step 3 now mentions ?merge=true for appending to an existing epic. ## Orchestration guide (docs/orchestration-guide.md) - Pattern 1 section now mentions merge=true for incremental adds, plus an inline create_story example. - New Pattern 3 "Per-story backfill with provenance" documenting the backfill_story tool, its structural guard (no dispatch lineage), its idempotent-retry semantics, and the audit/webhook it emits. Old Pattern 3 "Epic-wide verification" becomes Pattern 4. ## Chain-of-custody wiki article (docs/articles/chain-of-custody.md) - New "Pre-loopctl work: the backfill exception" section explaining why backfill is safe (structural dispatch-lineage refusal prevents force_unclaim + backfill from laundering dispatched work) and when to use it (honest onboarding of pre-existing work with provenance). ## Changelogs - mcp-server/CHANGELOG.md: new 2.1.0 entry for the MCP package (merge, payload_path, create_story, backfill_story, type tolerance, path-safety validation, domain error translation). - CHANGELOG.md: new [Unreleased] entry covering the server-side changes (new endpoints, role policy, security scoping of the unique-constraint translation). 2286 tests still pass. Credo + Dialyzer clean.

mkreyman added 5 commits April 16, 2026 20:04

Add OpenAPI specs for create_in_project and backfill

5acf230

Follow-up to silence "No operation spec defined" warnings from openapi_test.exs for the two new controller actions.

mkreyman merged commit 1944d7a into master Apr 17, 2026
6 checks passed

mkreyman deleted the fix/import-merge-and-agent-ergonomics branch April 17, 2026 03:08

mkreyman mentioned this pull request Apr 17, 2026

Document PR #105 in public docs, landing, guide, and changelogs #106

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix import merge bug and ship agent-friendly data-op tools#105

Fix import merge bug and ship agent-friendly data-op tools#105
mkreyman merged 5 commits intomasterfrom
fix/import-merge-and-agent-ergonomics

mkreyman commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkreyman commented Apr 17, 2026

Summary

Why

Tests

Test plan

MCP server

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant