Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
396 changes: 396 additions & 0 deletions .project/phase-2-sweep-alignment-plan.md

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ When to use it:

Trade-off: token cost scales linearly with the number of files swept (one full agent session per file). Sweep on 10 high-risk files costs roughly as many tokens as 10 Phase 2 runs. It produces overlapping findings that Phase 3 has to deduplicate. Always preview first with `--dry-run`.

How it works: the sweep runner reads `itemdb/notes/file-risk-index.yml` (written by Phase 1), selects all files at score 4 or above (or the files matched by `FILE=`), writes one prompt per file under `tmp/file-sweep-prompts/`, then invokes the `auditor` agent once per file in sequence.
How it works: the sweep runner reads `itemdb/notes/file-risk-index.yml` (written by Phase 1), selects all files at score 4 or above (or the files matched by `FILE=`), writes one prompt per file under `tmp/file-sweep-prompts/` (using `prompts/phase-2-sweep.md`), then invokes the `auditor` agent once per file in sequence. Each sweep run writes a Phase 2 summary at `runs/phase-2-summary-sweep-<slug>-YYYY-MM-DD-HHMMSS.md`.

make list-risk-files # preview which files would be swept
python tools/run-sweep.py --dry-run # show selected files and prompts, no agent calls
Expand Down Expand Up @@ -477,15 +477,15 @@ CodeCome ships reusable phase prompts under `prompts/`:
prompts/phase-1b-recon.md
prompts/phase-1c-sandbox.md
prompts/phase-2-audit.md
prompts/phase-2-sweep.md
prompts/phase-2-sweep-summary.md
prompts/phase-3-review.md
prompts/phase-4-validate.md
prompts/phase-5-exploit.md
prompts/phase-6-report.md
prompts/sweep.md

### Wrapper environment variables

CODECOME_USE_WRAPPER=0 # bypass the styled wrapper
CODECOME_THINKING=1 # show model reasoning/thinking blocks in output
CODECOME_THINKING=0 # hide model reasoning/thinking blocks
CODECOME_RENDER_REASONING=0 # suppress on-screen Thinking panels (independent override)
Expand All @@ -497,7 +497,7 @@ CodeCome ships reusable phase prompts under `prompts/`:
CODECOME_BOOTSTRAP_DRY_RUN=1 # force --dry-run on sandbox apply/regenerate
CODECOME_BASH_SHIM_RENDER=0 # disable rtk/cat/head/tail/rg/ls/find/tree routing
CODECOME_BASH_SHIM_LS_STRIP_LONG_FORMAT=0
OPENCODE_ARGS='...' # extra flags for opencode run (forwarded directly when CODECOME_USE_WRAPPER=0; in wrapper mode only --model, --variant and --thinking are used)
OPENCODE_ARGS='...' # extra flags for opencode run (--model, --variant, --thinking)
CODECOME_MODEL=<id> # pin model per phase, e.g. anthropic/claude-opus-4-7
CODECOME_MODEL_VARIANT=<v> # pin model variant, e.g. high, max

Expand Down
16 changes: 13 additions & 3 deletions docs/file-risk-sweeps.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Show only paths for scripting:

python tools/list-risk-files.py --format paths

## Run an optional Deep Sweep
## Run a Phase 2 Deep Sweep

While the global Phase 2 agent (`make phase-2`) focuses on macro-level architectural flaws and cross-component issues, you can run an optional **Deep Sweep** to perform exhaustive, line-by-line vulnerability hunting on specific high-risk files.
While the global Phase 2 agent (`make phase-2`) focuses on macro-level architectural flaws and cross-component issues, you can run an optional **Deep Sweep** (Phase 2 sweep mode) to perform exhaustive, line-by-line vulnerability hunting on specific high-risk files. Each sweep run creates Phase 2 candidate findings under `itemdb/findings/PENDING/` and writes a Phase 2 run summary.

Run a sweep on specific files (supports glob patterns):

Expand All @@ -49,12 +49,22 @@ Preview selected files and generated prompts without invoking OpenCode:

python tools/run-sweep.py --dry-run

The sweep runner is sequential by default. It invokes the normal `auditor` agent using a specialized prompt that forces the model to read related dependencies and imports to establish complete source-to-sink context.
The sweep runner is sequential by default. It invokes the normal `auditor` agent with a specialized prompt (`prompts/phase-2-sweep.md`) that forces the model to read related dependencies and imports to establish complete source-to-sink context.

Generated temporary prompts are written under:

tmp/file-sweep-prompts/

Each per-file sweep run writes a Phase 2 run summary at:

runs/phase-2-summary-sweep-<slug>-YYYY-MM-DD-HHMMSS.md

After all selected files complete, the runner invokes a final aggregate sweep summary using `prompts/phase-2-sweep-summary.md`. The aggregate step consolidates findings, open questions, and re-run hints from all per-file summaries and writes:

runs/sweep-summary-YYYY-MM-DD-HHMMSS.md

The aggregate summary is also printed to the screen. Use `make hints` to review it later — the `Sweep` block in `codecome hints` surfaces questions from the latest aggregate sweep rollup.

## Relationship with normal Phase 2

Normal Phase 2 remains the default broad hypothesis generation pass:
Expand Down
3 changes: 1 addition & 2 deletions docs/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -454,9 +454,8 @@ All `make` targets that depend on Python tooling expect a repo-local `.venv/`. I

Wrapper controls:

CODECOME_USE_WRAPPER=0 # bypass wrapper and use raw opencode run
CODECOME_THINKING=1 # show model reasoning/thinking blocks in output
OPENCODE_ARGS='...' # extra flags for opencode run (forwarded directly when CODECOME_USE_WRAPPER=0; in wrapper mode only --model, --variant and --thinking are used)
OPENCODE_ARGS='...' # extra flags for opencode run (--model, --variant, --thinking)
CODECOME_MODEL=<id> # pin the model per phase
CODECOME_MODEL_VARIANT=<v> # pin the model variant

Expand Down
71 changes: 71 additions & 0 deletions prompts/phase-2-sweep-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# CodeCome Phase 2: Sweep Summary (Aggregate Rollup)

You are performing a consolidation pass — NOT a vulnerability hunting pass.

The per-file Phase 2 sweep runs have completed. Your job is to read all per-file sweep summaries, consolidate their findings, open questions, and re-run hints into one durable aggregate summary, and also print the same concise summary to the screen.

## Required reading

Read these files (all paths are relative to the project/workspace root):

- `AGENTS.md`
- `codecome.yml`
- ONLY the per-file sweep summaries listed in the `## Per-file sweep summaries` section of this prompt. Do NOT read unrelated historical sweep summaries — old summaries from previous sweeps are not part of this consolidation.
- Findings under `itemdb/findings/PENDING/` that were created or touched during the sweep (identifiable from the per-file summaries)

The `## Selected files` and `## Per-file sweep summaries` sections below this prompt body list exactly which files were swept and which per-file summaries were produced by this batch. Use that information, not a blind glob of all `runs/phase-2-summary-sweep-*.md`.

## Forbidden actions

- **Do NOT create new findings.** The per-file sweep runs already did that.
- **Do NOT perform fresh vulnerability hunting.** This is a consolidation pass.
- **Do NOT modify existing findings.** You are summarizing, not re-auditing.
- **Do NOT move findings between status directories.**

## Required output

### 1. Durable aggregate summary

Write a run summary using the template at `templates/run-summary.md` to:

runs/sweep-summary-YYYY-MM-DD-HHMMSS.md

Use the current UTC date and time.

### 2. Screen output

Print the same concise summary to the screen before finishing. The operator should see the rollup immediately without opening the summary file. Format the screen output clearly:

- Files selected for the sweep
- Per-file sweep summaries considered
- Findings created or updated, grouped by likely theme or affected component
- Duplicate or overlapping finding candidates noticed across files
- Open questions consolidated across per-file summaries
- Re-run hints consolidated into concrete `PROMPT_EXTRA` or `PROMPT_EXTRA_FILE` suggestions
- Limitations (missing summaries, sweep failures, vague summaries that could not be consolidated)
- Recommended next step

### 3. Aggregate summary content

The durable summary must include:

- **Goal**: Explain this is a sweep consolidation rollup from per-file Phase 2 sweep runs.
- **Files processed**: List the files selected for the sweep, and which per-file summaries were found and read.
- **Findings summary**: Consolidate findings created or updated, grouped by likely theme, affected component, or security category. Flag duplicates or near-duplicates noticed across files.
- **Open questions for the user**: Deduplicate and consolidate open questions from all per-file summaries. Questions must be complete, self-contained sentences ending in `?`.
- **Re-run prompt hints**: Merge hints into concrete `PROMPT_EXTRA` or `PROMPT_EXTRA_FILE` snippets. Remove exact duplicates.
- **Limitations**: Note any missing per-file summaries, per-file runs that appear to have failed or produced low-quality output, summaries that were too vague to consolidate, and any assumptions made during consolidation.
- **Recommended next step**: Suggest the next action (e.g., run `make phase-3` for counter-analysis, re-run a specific per-file sweep with questions answered via `PROMPT_EXTRA`).

## Final response

At the end, summarize in your response:

- Number of per-file sweep summaries read
- Total findings identified across all summaries
- Key themes discovered
- Duplicates or overlaps noticed
- Files created or modified
- Open questions for the user
- Re-run prompt hints
- Recommended next step
14 changes: 9 additions & 5 deletions prompts/sweep.md → prompts/phase-2-sweep.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# CodeCome Optional Deep-Dive Sweep
# CodeCome Phase 2: Sweep Mode

You are performing an optional CodeCome deep-dive sweep on a specific file.
You are performing CodeCome Phase 2 (hypothesis generation) in file-scoped sweep mode.

This mode is intentionally narrower than the normal global Phase 2. It is used to inspect high-risk files from `itemdb/notes/file-risk-index.yml` with intense focus, while still allowing you to read immediate dependencies needed to understand reachability and data flow.
This is the same Phase 2 described in the standard `prompts/phase-2-audit.md` prompt, but with a narrower scope: instead of hunting for macro-level architectural flaws across the entire codebase, you focus on a single high-risk file from `itemdb/notes/file-risk-index.yml` with intense, line-by-line analysis. Read immediate dependencies needed to establish reachability and data flow, but avoid expanding the run into a full-project audit.

All Phase 2 expectations apply: produce durable findings under `itemdb/findings/PENDING/`, follow frontmatter quality rules, deduplicate against existing findings, run `make frontmatter`, and write a Phase 2 run summary.

## Required reading

Expand Down Expand Up @@ -148,13 +150,15 @@ At the end, summarize in your response (or write a brief run summary under `runs
- re-run prompt hints (same content as in the run summary; use `PROMPT_EXTRA` / `PROMPT_EXTRA_FILE` snippets),
- files created or modified.

Print a concise end-of-run summary to the screen in addition to writing the durable run summary. The operator should see key results immediately without opening the summary file.

## Run summary

Write the run summary using the template at `templates/run-summary.md` to:

runs/sweep-<slug>-summary-YYYY-MM-DD-HHMMSS.md
runs/phase-2-summary-sweep-<slug>-YYYY-MM-DD-HHMMSS.md

Replace `<slug>` with a short sanitised version of the target file path
(e.g. `runs/sweep-src-app-controllers-upload-php-summary-2026-06-09-143022.md`).
(e.g. `runs/phase-2-summary-sweep-src-app-controllers-upload.php-2026-06-09-143022.md`).

You MUST fill in both sections. Questions must be complete, self-contained sentences ending in `?` — avoid terse noun phrases. Hints must use actual `PROMPT_EXTRA` or `PROMPT_EXTRA_FILE` snippets.
72 changes: 72 additions & 0 deletions tests/test_phases_completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -780,6 +780,78 @@ def test_phase6_passes_with_report_and_summary(self, tmp_path):
self._restore(completion_mod, originals)


class TestSweepCompletionGate:
def test_sweep_accepts_phase2_summary_sweep(self, tmp_path):
"""Phase sweep should pass when a fresh sweep-named Phase 2 summary exists."""
import os
import phases.completion as completion_mod

orig_root = completion_mod.ROOT
orig_findings_root = completion_mod.FINDINGS_ROOT
completion_mod.ROOT = tmp_path
completion_mod.FINDINGS_ROOT = tmp_path / "itemdb" / "findings"
(tmp_path / "runs").mkdir(parents=True, exist_ok=True)
summary = tmp_path / "runs" / "phase-2-summary-sweep-src-foo-php-2026-06-12-143022.md"
summary.write_text("", encoding="utf-8")
run_start = time.time() - 60
os.utime(summary, (run_start + 60, run_start + 60))

try:
ok, failures = completion_mod.check_phase_graceful_completion(
"sweep", None, run_start
)
assert ok is True, f"Expected ok for sweep with fresh summary, got failures={failures!r}"
assert failures == []
finally:
completion_mod.ROOT = orig_root
completion_mod.FINDINGS_ROOT = orig_findings_root

def test_sweep_failure_when_no_summary(self, tmp_path):
"""Phase sweep should report missing Phase 2 summary when nothing is freshened."""
import phases.completion as completion_mod

orig_root = completion_mod.ROOT
orig_findings_root = completion_mod.FINDINGS_ROOT
completion_mod.ROOT = tmp_path
completion_mod.FINDINGS_ROOT = tmp_path / "itemdb" / "findings"
(tmp_path / "runs").mkdir(parents=True, exist_ok=True)

try:
ok, failures = completion_mod.check_phase_graceful_completion(
"sweep", None, time.time()
)
assert ok is False
assert any("runs/phase-2-summary" in f for f in failures), (
f"Expected failure detail to mention runs/phase-2-summary, got {failures!r}"
)
finally:
completion_mod.ROOT = orig_root
completion_mod.FINDINGS_ROOT = orig_findings_root

def test_sweep_checklist_is_phase2_checklist(self):
"""Phase sweep checklist should be the same as Phase 2 checklist."""
from phases.completion import phase_checklist_lines

sweep_lines = phase_checklist_lines("sweep", None)
phase2_lines = phase_checklist_lines("2", None)
assert sweep_lines == phase2_lines, (
f"Expected sweep checklist to equal Phase 2 checklist"
)

def test_sweep_resume_prompt_uses_phase2_gate(self):
"""Resume prompt for sweep should mention phase-2-summary like Phase 2."""
from phases.completion import build_phase_resume_prompt

prompt = build_phase_resume_prompt(
"sweep", None, "stop", 1,
failure_details=[
"Missing: runs/phase-2-summary*.md — run summary was not created or updated",
],
)
assert "runs/phase-2-summary*.md" in prompt
assert "Fix only these missing items." in prompt


class TestPhase3ChecklistMentionsRunSummary:
def test_phase3_checklist_mentions_summary(self):
from phases.completion import phase_checklist_lines
Expand Down
Loading
Loading