A 5-step code review pipeline for AI coding assistants. Treats review as a state machine: three independent passes per cycle, three consecutive clean cycles required, any finding resets the counter. The minimum path to a commit is 9 static review passes plus a runtime smoke test.
AI coding assistants ship code that compiles, runs, and looks right. Single-pass review (Copilot, Cursor, CodeRabbit, etc.) catches the obvious defects but misses two failure modes:
- Author and reviewer collapse. When the same model writes and reviews the change, it inherits its own blind spots. code-forge runs three independent review perspectives (qodo, expert, adversarial) and treats their findings as untrusted claims that must be reproduced before any fix.
- Self-claimed completion. Hooks that gate on "I finished" markers are
bypassable by any agent that can write a string. code-forge gates on
actual state: a real
pre-commithook running the test suite, a mutation runner proving the tests catch regressions, and a coverage heuristic detecting drift across components.
pip install code-review-forge
code-forge install-skillFor MCP server support (IDE integration):
pip install code-review-forge[mcp]The first command installs the CLI (Python >=3.12). The second copies the
6 review skills into ~/.claude/skills/. Then in Claude Code, run the
full pipeline:
/code-forge
Or invoke individual passes:
/qodo-review # change-aware pre-review (Pass 1)
/code-review-expert # SOLID, architecture, security (Pass 2)
/adversarial-qe # red-team QE, 12 attack dimensions (Pass 3)
/kernel-fp-verify # false-positive verification (Step 3.5)
/smoke-test # runtime verification (Step 4)
Other agent targets:
code-forge install-skill --target vscode # <cwd>/.claude/skills/
code-forge install-skill --target universal # <cwd>/.agents/skills/
code-forge install-skill --dest /path/to/dir # explicit location
code-forge install-skill --skill code-forge # one skill only
code-forge install-skill --force # overwrite existingBy default, code-forge uses the claude CLI in your PATH with the session
model (no model pin). Three environment variables control the backend:
| Variable | Purpose | Default |
|---|---|---|
FORGE_BACKEND |
Select a named backend from gate.yaml |
session-default |
FORGE_OUTLET |
Force outlet: subprocess | inline | subagent |
auto-detected |
FORGE_LLM_MODEL |
Override model for CLI backends | claude-sonnet-4-6 |
Quick examples:
# Use the default (claude CLI, session model)
code-forge review
# Pin a specific model for this run
FORGE_LLM_MODEL=claude-opus-4-5 code-forge review
# Use a named API backend from gate.yaml
FORGE_BACKEND=claude-api code-forge review
# Force inline outlet (no subprocess)
FORGE_OUTLET=inline code-forge reviewNamed backends (optional) are defined in the backends: key of
.code-forge/gate.yaml (created by code-forge init):
backends:
claude-api:
type: api
format: anthropic
base_url: https://api.anthropic.com
api_key_env: ANTHROPIC_API_KEY
default: true
openai-compatible:
type: api
format: openai
base_url: https://api.openai.com/v1
api_key_env: OPENAI_API_KEY
local-claude:
type: cli
model: claude-opus-4-5
command: claudeFull reference: docs/configuration.md
Editor setup guides:
- VS Code: docs/setup-vscode.md
- Cursor: docs/setup-cursor.md
- PyCharm: docs/setup-pycharm.md
code-forge-mcp is a local stdio MCP server that exposes forge as tools
callable from any MCP-capable editor (Claude Code, VS Code Copilot, Cursor,
PyCharm AI Assistant). Reviews route to the configured CN backend -- the
calling model never reviews its own code.
| Tool | Purpose |
|---|---|
forge_review |
Review the current git diff (inline if fast, job_id if slow) |
forge_gate_check |
Pre-commit gate on staged changes |
forge_resolve_outlet |
Show which backend forge will use (read-only) |
forge_job_status |
Poll a long-running review by job_id |
forge_init |
Create .code-forge/ in the workspace |
forge_trust |
Trust the gate.yaml backends |
Prerequisite: a configured backend with its API key in the server
environment. Without it, forge_review fails closed (same as the CLI).
Claude Code:
claude mcp add forge -- code-forge-mcpLaunch claude from the repo root so the server finds .code-forge/gate.yaml.
VS Code (1.102+, .vscode/mcp.json):
{
"servers": {
"forge": {
"type": "stdio",
"command": "code-forge-mcp",
"cwd": "${workspaceFolder}"
}
}
}Gotcha: GUI editors do not inherit your shell environment. Either wrap
code-forge-mcp in a script that exports the API key, or set env in the
server config. See the setup doc for a pass-based wrapper example.
Verify: call forge_resolve_outlet -- it should name a backend, not
"key not set". Then call forge_review on a real diff.
Code Change
|
v
[Step 0] Syntax (0a) + Lint (0b) + Non-ASCII (0c)
|
v
[Cycle 1] Pass 1: qodo-review
Pass 2: code-review-expert
Pass 3: adversarial-qe
|
| zero findings -> counter += 1
| any finding -> fix, counter = 0, restart Cycle 1
v
[Cycle 2] (same 3 passes)
|
v
[Cycle 3] (same 3 passes)
| counter = 3
v
[Step 3.5] kernel-fp-verify (if fixes were applied during cycles)
|
v
[Step 4] smoke-test (runtime verification)
|
v
[COMMIT GATE] # post-review-c3
| Skill | Step | Purpose |
|---|---|---|
| code-forge | Orchestrator | Runs the full 5-step pipeline |
| qodo-review | Pass 1 | Change-aware pre-review with feature-grouped walkthrough |
| code-review-expert | Pass 2 | SOLID, architecture, security analysis |
| adversarial-qe | Pass 3 | Red-team QE with 12 attack dimensions |
| kernel-fp-verify | Step 3.5 | 10-step false-positive verification protocol |
| smoke-test | Step 4 | Runtime verification with bash assertion primitives |
- Multi-pass convergence. Three consecutive clean cycles from three independent perspectives. Any finding resets the counter to zero. Copilot, CodeRabbit, Cursor, and Devin are single-pass.
- Anti-hallucination gates. code-forge treats LLM review output as untrusted claims. Parser-deterministic findings auto-confirm; LLM findings require falsification before disposition; Step 4 runs the actual code. Prompt-only mitigations cap at 15% hallucination reduction; tool grounding reaches 65-80% (CodeAnt and Suprmind data, 2026).
- Real commit gate (R1). A real
.git/hooks/pre-committhat runs the test suite and blocks on NEW failures vs a baseline. Gates on diff content and test results, not a self-claimed marker. Closes the terminal-and-IDE bypass that PreToolUse hooks cannot reach. - Mutation-gated review (R2). Diff-scoped mutation runs after static review and before the verdict. Each mutant introduced into the changed code is run against the test suite; a surviving mutant flags tests that cannot catch the change. Toothless tests block the same cycle that finds the defect.
- Cross-component coverage heuristic (R3). Detects diffs that span multiple source areas with a changed function signature. An opt-in components mapping raises an uncertain finding when a hub and a dependent both change in the same diff and no integration test under the dependent's paths matches the configured test patterns.
- No cross-repo impact. code-forge reviews a single repository.
Multi-repo dependency analysis requires CodeRabbit-style tooling or
Chromium's
Cq-Depend. - No feedback learning. code-forge does not adapt to dismissed findings or developer preferences. Each review is independent.
- No long-term maintainability scoring. code-forge does not assess technical debt accumulation. SonarQube's tech-debt tracking is the closest automated approximation.
- No performance regression suite. No benchmark harness equivalent to
Rust's
perf.rust-lang.org. - R3 is artifact-presence, not coverage proof. The cross-component check confirms an integration test file exists under the expected path; it does not verify that the test exercises the specific code that changed. A present-but-stale test passes the gate.
Static review (3-cycle convergence) is one layer. code-forge learned from its own Phase 2 experience where 9 static passes and 639 mock tests missed 3 bugs that dynamic verification caught. Verification grounding (test suite + mutation + e2e coverage check) is the thesis -- not a passes count.
- Python 3.12 or newer
jqfor the bash smoke primitives- Claude Code or a compatible AI coding assistant for skill invocation
mcpPython package (optional, forcode-forge-mcp):pip install code-review-forge[mcp]
git clone https://github.com/HouMinXi/forge.git
cd forge
./install.shSymlinks each of the 6 skills from ~/.claude/skills/<name> to this
repo's skills/<name>. Hook installation is manual -- see
hooks/README.md and hooks/settings-snippet.json.
install-skill and ./install.sh install the review skills only -- they
do not set up enforcement. The R1 pre-commit gate -- the un-fakeable layer that
runs the test suite on every commit and blocks on new failures, regardless of
what the in-editor review claims -- is a separate, manual step:
-
Add a
test:section to.code-forge/gate.yaml. Without it,gate-checkexits withgate.yaml must have a 'test' section:test: command: [pytest, -q] timeout_seconds: 900
command[0]must be a known runner (python3,python,pytest,cargo,go,make,npm,npx,node); no shell metacharacters are allowed. -
Install the hook:
code-forge install-hooks
This writes
.git/hooks/pre-committhat runscode-forge verify(a receipt tamper check) and thencode-forge gate-check(the test gate). -
If
git config core.hooksPathis set,install-hooksrefuses to write to a custom hooks path and prints a manual fallback. Add these two lines to your existing pre-commit hook by hand:code-forge verify --quiet 2>/dev/null || exit 1 exec code-forge gate-check
The skills give you the review passes; this gate is what makes a green verdict
mean the tests actually pass. Without it, an in-editor review that never ran can
still reach a commit. Commits that stage only non-code files (docs, config,
metadata such as .md, .yaml, .toml, LICENSE, README) are detected by
the hook and skip the gate automatically -- no receipts and no --no-verify
needed. Any staged file outside that set, including unknown extensions, re-arms
the gate for the whole commit.
| Hook | Trigger | Purpose |
|---|---|---|
check_worktree.sh |
PreToolUse Edit/Write | Block edits in main worktree |
check_non_ascii.sh |
PreToolUse Write/Edit | Non-ASCII character detection |
check_read_before_edit.sh |
PreToolUse Edit | 1:1 read-before-edit ratio |
check_review_tracker.sh |
PostToolUse Bash | Review cycle state machine |
check_git_commit_review.sh |
PreToolUse Bash | Block unreviewed commits |
check_git_push_review.sh |
PreToolUse Bash | Block unreviewed pushes |
Some hooks contain environment-specific logic (Kerberos auth, pattern
matching) you will need to adapt. See hooks/README.md.
skills/smoke-test/test-library/shell/ ships 19 reusable bash assertion
functions with no dependencies beyond jq:
run_and_capture,run_concurrent,concurrent_waitassert_success,assert_failure,assert_exit_codeassert_output_contains,assert_output_not_containsassert_stderr_contains,assert_stderr_emptyassert_file_exists,assert_file_not_exists,assert_file_containsassert_json_validassert_no_zombie,assert_temp_cleanassert_no_command_exec,assert_no_command_exec_json,assert_no_path_traversal
A backward-compatible symlink at test-library/ points to
skills/smoke-test/test-library/ for users migrating from
bash-smoke-primitives.
evidence/cross-model-complementarity.md-- why 3 different review passesevidence/design-iterations.md-- how the pipeline evolvedevidence/ground-truth-verification.md-- why smoke tests must inject bugsevidence/shell-assertion-footguns.md-- 5 bash-specific trapsevidence/v9-model-coverage-matrix.md-- 4-model coverage datahooks/README.md-- hook installation and adaptation guide
Issues and discussion: https://github.com/HouMinXi/forge/issues.
Apache-2.0