Problem
Source: surfaced during /idd-verify --pr 159 Round 2 (M-R2-4 + DA Probe 7 + Codex Bug 3 — independently flagged by 3 reviewers)
/idd-edit skill has 19→23 test fixtures (R1→R3 series of #154) but all fixtures invoke the helper script directly, bypassing the SKILL.md orchestration layer. The integration layer between SKILL.md (prose bash for AI to follow) and idd-edit-helper.sh (extracted parser/validator) has zero test coverage.
Evidence
R2 verify findings that originated in SKILL.md↔helper integration layer (NOT helper standalone):
- R1 B1:
$APPEND_BODY undefined — SKILL.md heredoc used wrong variable name vs helper's BODY_INPUT export. Fixtures didn't catch (helper alone is fine; integration was broken).
- R1 B2:
$REPO vs $GITHUB_REPO — same pattern. Helper exports REPO; SKILL.md Step 6/7 referenced undefined GITHUB_REPO.
- R2 H7: SKILL.md Step 1 for-loop closed BEFORE Steps 1.5-7 → batch mode silently processed only LAST target. Pure SKILL.md prose structure bug; helper has nothing to do with it.
All 3 were "silently passing tests, broken in production" failures. The fixture gap is structural — current fixtures test helper in isolation, but the actual user-facing surface is helper-through-SKILL.md.
Open questions
-
What's the test layer? SKILL.md is prose for AI to follow, not executable bash. Options:
- (a) Extract SKILL.md orchestration into sourceable
idd-edit-orchestrate.sh that AI invokes; helper subcommands become its primitives. Fixture tests orchestrate end-to-end with mock gh.
- (b) Generate runnable bash from SKILL.md prose (parse fenced bash blocks, dry-run them with mocked gh/git).
- (c) Property-based testing: spec the expected variable bindings + invariants; static check SKILL.md prose against spec.
- (d) Live integration: a "smoke test" that runs
/idd-edit end-to-end via Claude Code automation against a test repo. Most accurate but heaviest infrastructure.
-
Which variables are part of the SKILL.md↔helper contract?
- Helper exports: MODE / SCOPE_FLAG / SECTION_FLAG / REASON / BODY_INPUT / BODY_FILE / REPO / CWD / LAST / OVERRIDE_USER_CONTENT / TARGETS / RESOLVED_COMMENT_IDS
- SKILL.md consumes: same names (must stay in sync)
- Need a contract spec doc to prevent drift
-
Granularity:
- Per-step integration (Step 1 → Step 1.5 → Step 2 → ...)?
- Per-mode integration (--append / --replace / --prepend-note end-to-end)?
- Per-flag integration?
Type
infrastructure / test framework
Priority
P2 — pattern proven 3x (R1 B1/B2 + R2 H7) and #154 still has latent integration bugs likely post-R3 since fixture gap unchanged. Until this lands, future SKILL.md edits remain vulnerable to "tests green but production broken" regression.
Trigger conditions
Related
Refs #154 #156
Problem
Source: surfaced during /idd-verify --pr 159 Round 2 (M-R2-4 + DA Probe 7 + Codex Bug 3 — independently flagged by 3 reviewers)
/idd-editskill has 19→23 test fixtures (R1→R3 series of#154) but all fixtures invoke the helper script directly, bypassing the SKILL.md orchestration layer. The integration layer between SKILL.md (prose bash for AI to follow) andidd-edit-helper.sh(extracted parser/validator) has zero test coverage.Evidence
R2 verify findings that originated in SKILL.md↔helper integration layer (NOT helper standalone):
$APPEND_BODYundefined — SKILL.md heredoc used wrong variable name vs helper'sBODY_INPUTexport. Fixtures didn't catch (helper alone is fine; integration was broken).$REPOvs$GITHUB_REPO— same pattern. Helper exportsREPO; SKILL.md Step 6/7 referenced undefinedGITHUB_REPO.All 3 were "silently passing tests, broken in production" failures. The fixture gap is structural — current fixtures test helper in isolation, but the actual user-facing surface is helper-through-SKILL.md.
Open questions
What's the test layer? SKILL.md is prose for AI to follow, not executable bash. Options:
idd-edit-orchestrate.shthat AI invokes; helper subcommands become its primitives. Fixture tests orchestrate end-to-end with mock gh./idd-editend-to-end via Claude Code automation against a test repo. Most accurate but heaviest infrastructure.Which variables are part of the SKILL.md↔helper contract?
Granularity:
Type
infrastructure / test framework
Priority
P2 — pattern proven 3x (R1 B1/B2 + R2 H7) and #154 still has latent integration bugs likely post-R3 since fixture gap unchanged. Until this lands, future SKILL.md edits remain vulnerable to "tests green but production broken" regression.
Trigger conditions
Related
Refs #154 #156