[infra P2] SKILL.md↔helper integration test layer for /idd-edit (gap exposed by #154 R1/R2)

## Problem

**Source**: surfaced during /idd-verify --pr 159 Round 2 (M-R2-4 + DA Probe 7 + Codex Bug 3 — independently flagged by 3 reviewers)

`/idd-edit` skill has 19→23 test fixtures (R1→R3 series of `#154`) but **all fixtures invoke the helper script directly**, bypassing the SKILL.md orchestration layer. The integration layer between SKILL.md (prose bash for AI to follow) and `idd-edit-helper.sh` (extracted parser/validator) has zero test coverage.

## Evidence

R2 verify findings that originated in SKILL.md↔helper integration layer (NOT helper standalone):
- **R1 B1**: `$APPEND_BODY` undefined — SKILL.md heredoc used wrong variable name vs helper's `BODY_INPUT` export. Fixtures didn't catch (helper alone is fine; integration was broken).
- **R1 B2**: `$REPO` vs `$GITHUB_REPO` — same pattern. Helper exports `REPO`; SKILL.md Step 6/7 referenced undefined `GITHUB_REPO`.
- **R2 H7**: SKILL.md Step 1 for-loop closed BEFORE Steps 1.5-7 → batch mode silently processed only LAST target. Pure SKILL.md prose structure bug; helper has nothing to do with it.

All 3 were "silently passing tests, broken in production" failures. The fixture gap is structural — current fixtures test helper in isolation, but the actual user-facing surface is helper-through-SKILL.md.

## Open questions

1. **What's the test layer?** SKILL.md is prose for AI to follow, not executable bash. Options:
   - (a) Extract SKILL.md orchestration into sourceable `idd-edit-orchestrate.sh` that AI invokes; helper subcommands become its primitives. Fixture tests orchestrate end-to-end with mock gh.
   - (b) Generate runnable bash from SKILL.md prose (parse fenced bash blocks, dry-run them with mocked gh/git).
   - (c) Property-based testing: spec the expected variable bindings + invariants; static check SKILL.md prose against spec.
   - (d) Live integration: a "smoke test" that runs `/idd-edit` end-to-end via Claude Code automation against a test repo. Most accurate but heaviest infrastructure.

2. **Which variables are part of the SKILL.md↔helper contract?**
   - Helper exports: MODE / SCOPE_FLAG / SECTION_FLAG / REASON / BODY_INPUT / BODY_FILE / REPO / CWD / LAST / OVERRIDE_USER_CONTENT / TARGETS / RESOLVED_COMMENT_IDS
   - SKILL.md consumes: same names (must stay in sync)
   - Need a contract spec doc to prevent drift

3. **Granularity**:
   - Per-step integration (Step 1 → Step 1.5 → Step 2 → ...)?
   - Per-mode integration (--append / --replace / --prepend-note end-to-end)?
   - Per-flag integration?

## Type

infrastructure / test framework

## Priority

**P2** — pattern proven 3x (R1 B1/B2 + R2 H7) and **#154 still has latent integration bugs likely** post-R3 since fixture gap unchanged. Until this lands, future SKILL.md edits remain vulnerable to "tests green but production broken" regression.

## Trigger conditions

- **Immediate**: as part of #156 (test framework generalization) decision OR as standalone P2 follow-up
- Or: next time SKILL.md↔helper integration breaks in production

## Related

- **#156** — IDD plugin test framework (P2; reuse same infrastructure)
- **#154** — root of R3 fix loop that exposed this gap
- **#155** — bash vs alternative layer (orthogonal but related: if #155 picks Rust/Python layer, integration test pattern changes anyway)

Refs #154 #156


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[infra P2] SKILL.md↔helper integration test layer for /idd-edit (gap exposed by #154 R1/R2) #163

Problem

Evidence

Open questions

Type

Priority

Trigger conditions

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[infra P2] SKILL.md↔helper integration test layer for /idd-edit (gap exposed by #154 R1/R2) #163

Description

Problem

Evidence

Open questions

Type

Priority

Trigger conditions

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions