Skip to content

Prevent mitigation prompt injection by referencing mitigation item indexes#62

Open
ganesh47 wants to merge 1 commit intomainfrom
codex/propose-fix-for-prompt-injection-vulnerability-3l4ccg
Open

Prevent mitigation prompt injection by referencing mitigation item indexes#62
ganesh47 wants to merge 1 commit intomainfrom
codex/propose-fix-for-prompt-injection-vulnerability-3l4ccg

Conversation

@ganesh47
Copy link
Copy Markdown
Owner

Motivation

  • The inspector previously concatenated untrusted mitigation text (recommended actions, summaries, GitHub blockers, etc.) directly into the prompt used to start follow-up workflows, enabling prompt-injection chains that could drive execution under the danger-full-access sandbox.
  • The goal is to stop embedding attacker-controlled strings in automatically launched prompts while preserving the mitigate UX and the ability to select specific mitigation items.

Description

  • Change buildMitigationPrompt to accept a list of mitigation item indexes and build a Focus string that references item numbers (e.g., "recorded mitigation item Spec v0.1: cstack workflow wrapper for Codex CLI #1"), rather than embedding raw mitigation text.
  • Update startMitigationWorkflow to construct selectedActions as { index, action } entries and pass only the selected indexes into buildMitigationPrompt, while still presenting the selected action text in the inspector output.
  • Preserve existing flags/behavior for spawned runs (e.g., --safe, --allow-dirty, --exec, --release, --issue) so mitigation workflows behave the same except that prompts no longer contain attacker-controlled text.
  • Add a regression test that injects a malicious recommendedActions entry and asserts the spawned run summary references the mitigation item index (e.g., "recorded mitigation item Spec v0.1: cstack workflow wrapper for Codex CLI #1") and does not contain the malicious command text.
  • Modified files: src/inspector.ts, test/inspect.test.ts.

Testing

  • Ran the targeted unit test: npm test -- test/inspect.test.ts -t "can launch a mitigation workflow directly from a review inspection", and it passed.
  • Ran type checking with npm run typecheck (tsc --noEmit), and it passed.
  • The change is a minimal behavioral fix that keeps existing commands (mitigate, mitigate <n>, mitigate <workflow>) functional while removing untrusted strings from spawned workflow prompts.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant