Prevent mitigation prompt injection by referencing mitigation item indexes by ganesh47 · Pull Request #62 · ganesh47/cstack

ganesh47 · 2026-04-23T04:49:35Z

The inspector previously concatenated untrusted mitigation text (recommended actions, summaries, GitHub blockers, etc.) directly into the prompt used to start follow-up workflows, enabling prompt-injection chains that could drive execution under the danger-full-access sandbox.
The goal is to stop embedding attacker-controlled strings in automatically launched prompts while preserving the mitigate UX and the ability to select specific mitigation items.

Change buildMitigationPrompt to accept a list of mitigation item indexes and build a Focus string that references item numbers (e.g., "recorded mitigation item Spec v0.1: cstack workflow wrapper for Codex CLI #1"), rather than embedding raw mitigation text.
Update startMitigationWorkflow to construct selectedActions as { index, action } entries and pass only the selected indexes into buildMitigationPrompt, while still presenting the selected action text in the inspector output.
Preserve existing flags/behavior for spawned runs (e.g., --safe, --allow-dirty, --exec, --release, --issue) so mitigation workflows behave the same except that prompts no longer contain attacker-controlled text.
Add a regression test that injects a malicious recommendedActions entry and asserts the spawned run summary references the mitigation item index (e.g., "recorded mitigation item Spec v0.1: cstack workflow wrapper for Codex CLI #1") and does not contain the malicious command text.
Modified files: src/inspector.ts, test/inspect.test.ts.

Ran the targeted unit test: npm test -- test/inspect.test.ts -t "can launch a mitigation workflow directly from a review inspection", and it passed.
Ran type checking with npm run typecheck (tsc --noEmit), and it passed.
The change is a minimal behavioral fix that keeps existing commands (mitigate, mitigate <n>, mitigate <workflow>) functional while removing untrusted strings from spawned workflow prompts.

Fix mitigation prompt injection from untrusted actions

925e99c

ganesh47 added aardvark codex labels Apr 23, 2026 — with ChatGPT Codex Connector

Provide feedback