diff --git a/.gitignore b/.gitignore
index 13100d69..020a0ffb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -16,4 +16,6 @@ mess/
.coverage
.ralphify/
scripts/tui_dev/output/
+.cheese/
+.serena/
diff --git a/CHANGELOG.md b/CHANGELOG.md
index eb41309c..c944bd7e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,19 @@
All notable changes to ralphify are documented here.
+## Unreleased
+
+### Added
+
+- **First-class opencode adapter** — agents that take the prompt as a positional argument instead of stdin are now first-class. Set `agent: opencode run` and ralphify adds `--format json`, appends your prompt as a safe positional argument (no `bash -c` wrapper, no quoting hazards), and parses opencode's JSON event stream for live tool-use tracking. See [Using with Different Agents](https://ralphify.co/docs/agents/#opencode) for the permission setup opencode needs to run autonomously.
+- **`max_turns` enforcement** — the `max_turns` frontmatter field now actively caps tool-use events per iteration. Streaming adapters that count tool uses (Claude, Codex, opencode) are SIGTERM'd at the limit and the iteration is recorded as completed-at-cap; blocking adapters (Copilot) cannot be preempted mid-run, so their tool uses are counted post-hoc after the process exits and the iteration is marked completed-at-cap once the count reaches the limit; adapters that emit no countable events (Crush) treat the field as a no-op.
+- **Agent lifecycle hooks** — new `hooks` frontmatter field plus an `AgentHook` Protocol (`ShellAgentHook`, `CombinedAgentHook`, `NoOpAgentHook`). Register shell commands to observe iteration start/end, prompt assembly, tool use, turn-approaching-limit, turn cap, and completion-signal events. Hooks are observers — a failing hook is logged but never aborts the run. See the new [Hooks](https://ralphify.co/docs/hooks/) page.
+- **Per-CLI soft wind-down** — when `max_turns_grace` is set, Claude's `PreToolUse` and Codex's `PostToolUse` hooks are installed into a per-iteration tempdir (`CLAUDE_CONFIG_DIR` / `CODEX_HOME` overrides, so the user's real config stays untouched) to nudge the agent toward wrapping up before the hard cap. Adapters without a hook system downgrade to hard-cap-only.
+
+### Changed
+
+- **Prompt delivery is now an adapter concern** — the agent execution layer asks each adapter where the prompt goes (stdin vs. a positional argument) via a new `deliver_prompt` step. Existing stdin agents (Claude, Codex, Copilot, generic, and `bash -c` wrappers) are unaffected; arg-delivery agents spawn with `stdin=DEVNULL` and no stdin writer thread.
+
## 0.4.0b3 — 2026-04-12
### Improved
diff --git a/README.md b/README.md
index e633cffd..2118042d 100644
--- a/README.md
+++ b/README.md
@@ -63,6 +63,7 @@ Ralph loops give you:
| **grow-coverage** | Write tests for untested modules, one per iteration, until coverage hits the target |
| **security-audit** | Hunt for vulnerabilities — scan, find, fix, verify, repeat |
| **clear-backlog** | Work through a TODO list or issue tracker, one task per loop |
+| **promise-completion** | Work until a target is done, then emit a promise tag so the loop stops early |
| **write-docs** | Generate documentation for undocumented modules, one at a time |
| **improve-codebase** | Find and fix code smells, refactor patterns, modernize APIs |
| **migrate** | Incrementally migrate files from one framework or pattern to another |
@@ -95,11 +96,17 @@ Scaffold a ralph and start experimenting:
ralph scaffold my-ralph
```
+The scaffolded `RALPH.md` includes the normal command/arg template plus a commented promise-completion path you can enable if the agent should stop early by emitting a matching `...` tag.
+
+For a committed example, see [`examples/promise-completion/RALPH.md`](examples/promise-completion/RALPH.md) — it shows a loop that exits early once the requested target is complete.
+
Edit `my-ralph/RALPH.md`, then run it:
```bash
ralph run my-ralph # loops until Ctrl+C
ralph run my-ralph -n 5 # run 5 iterations then stop
+# from a repo checkout:
+ralph run examples/promise-completion -n 10 --target "stabilize the failing auth tests"
```
### What `ralph run` does
diff --git a/docs/agents.md b/docs/agents.md
index a14d9a04..cef4d6e8 100644
--- a/docs/agents.md
+++ b/docs/agents.md
@@ -15,11 +15,13 @@ This page shows how to configure the [`agent` frontmatter field](quick-reference
## Agent comparison
-| Agent | Stdin support | Streaming | Wrapper needed |
+| Agent | Prompt delivery | Streaming | Wrapper needed |
|---|---|---|---|
-| [Claude Code](#claude-code) | Native (`-p`) | Yes — real-time activity tracking | No |
+| [Claude Code](#claude-code) | Stdin (`-p`) | Yes — real-time activity tracking | No |
+| [opencode](#opencode) | Positional arg (`run ""`) | Yes — tool-use tracking | No |
| [Aider](#aider) | Via bash wrapper | No | Yes (`bash -c`) |
-| [Codex CLI](#codex-cli) | Native (`exec`) | No | No |
+| [Codex CLI](#codex-cli) | Stdin (`exec`) | No | No |
+| [Crush](#crush) | Stdin (`run`) | No | No |
| [Custom](#custom-wrapper-script) | You implement it | No | Yes (script) |
If you're not sure which to pick: **start with Claude Code.** It has the deepest integration, the best autonomous coding capabilities, and is the default.
@@ -36,9 +38,34 @@ Your agent must:
1. **Read a prompt from stdin** — the full assembled prompt is piped in
2. **Do work in the current directory** — edit files, run commands, make commits
-3. **Exit when done** — exit code 0 means success, non-zero means failure
+3. **Exit cleanly** — exit code `0` means the agent process succeeded; non-zero means failure
+4. **Optionally emit a completion signal** — set `completion_signal` in frontmatter (default inner text: `RALPH_PROMISE_COMPLETE`) if you want the agent to print an explicit `...` marker
-That's it. No special protocol, no API — just stdin in, work done, process exits.
+Normal exit codes still indicate process success or failure. They do **not** trigger promise completion by themselves.
+
+Ralphify only stops early on promise completion when both of these are true:
+
+- `stop_on_completion_signal: true`
+- the matching `...` tag is detected in agent output or captured result text
+
+`completion_signal` is the inner promise text. For example, `completion_signal: COMPLETE` means the agent must output `COMPLETE`.
+
+Ralphify still keeps its own command/prompt loop architecture. Only the promise tag format and matching align with Ralph-Wiggum.
+
+Minimal example:
+
+```markdown
+---
+agent: claude -p --dangerously-skip-permissions
+completion_signal: COMPLETE
+stop_on_completion_signal: true
+---
+
+Implement the next todo. When the work is fully complete, print
+COMPLETE and exit.
+```
+
+That's it. No API required — just stdin in, output out, process exits.
## Claude Code
@@ -74,6 +101,32 @@ This enables ralphify to:
- Track agent activity in real time
- Extract the final result text from the agent's response
+## opencode
+
+[opencode](https://opencode.ai) takes the prompt as a **positional argument** to its `run` subcommand rather than on stdin. Ralphify has a first-class adapter for it — no `bash -c` wrapper needed.
+
+```markdown
+---
+agent: opencode run --agent build
+---
+```
+
+| Flag | Purpose |
+|---|---|
+| `run` | Non-interactive mode — runs one prompt and exits |
+| `--agent build` | Selects an agent profile permissive enough to edit files autonomously (see the caveat below) |
+
+When ralphify detects that the agent command's binary is `opencode`, it automatically:
+
+- Adds `--format json` so opencode emits a parseable event stream.
+- Appends the assembled prompt as the final positional argument (no stdin, no shell — quotes, `$(...)`, and newlines in the prompt are passed through safely as a single argument).
+- Parses the JSON stream to track tool use in real time.
+
+!!! warning "opencode refuses writes by default"
+ opencode's built-in agents start with restrictive `ask`/`deny` permission presets ([anomalyco/opencode #10411](https://github.com/anomalyco/opencode/issues/10411), [#13851](https://github.com/anomalyco/opencode/issues/13851)). An unconfigured `opencode run` will stall waiting for approval or refuse to edit files — there is no one to approve in an autonomous loop.
+
+ This is opencode-side configuration, not something ralphify can override. Before looping, set up an agent profile (or permission config) that allows the edits and commands your prompt needs — the opencode analogue of Claude Code's `--dangerously-skip-permissions`. See [opencode's permissions docs](https://opencode.ai/docs/permissions/) for the `--agent` profile and permission settings.
+
## Aider
[Aider](https://aider.chat) is an AI pair-programming tool that works with multiple LLM providers.
@@ -117,6 +170,28 @@ agent: codex exec --sandbox danger-full-access -
| `--sandbox danger-full-access` | Full filesystem access for autonomous operation |
| `-` | Read prompt from stdin |
+## Crush
+
+[Charm Crush](https://github.com/charmbracelet/crush) is TUI-first but supports non-interactive use via its `run` subcommand, which reads the prompt from stdin. Ralphify has a first-class adapter for it — no `bash -c` wrapper needed.
+
+```markdown
+---
+agent: crush run
+---
+```
+
+| Flag | Purpose |
+|---|---|
+| `run` | Non-interactive mode — runs one prompt from stdin and exits |
+
+When ralphify detects that the agent command's binary is `crush`, it automatically adds `--quiet` to hide the progress spinner. `crush run` auto-approves every permission request for the duration of the invocation, so no `--yolo`-style flag is needed to run autonomously.
+
+!!! info "Configure a provider first"
+ `crush run` exits with "no providers configured" if no model provider is set up. Configure one non-interactively before looping — e.g. export `ANTHROPIC_API_KEY` (or another provider's key) or commit a `crush.json`. Run `crush` once interactively if you prefer the guided setup.
+
+!!! warning "No structured output — turn capping unavailable"
+ Crush emits plain text only (no JSON/streaming-event mode), so ralphify runs it in [blocking mode](#blocking-mode-all-other-agents) and cannot count tool calls or enforce `max_turns` for it. Completion still works via the [`` tag](#what-ralphify-needs-from-an-agent) scanned from stdout. Use [`--timeout`](cli.md#ralph-run) as the safety net instead of a turn cap.
+
## Custom wrapper script
For full control, write a wrapper script that reads stdin and calls your agent however it needs to be called.
diff --git a/docs/api.md b/docs/api.md
index e87d3d37..32cbdb14 100644
--- a/docs/api.md
+++ b/docs/api.md
@@ -74,7 +74,8 @@ config = RunConfig(
|---|---|---|---|
| `agent` | `str` | -- | Full agent command string |
| `ralph_dir` | `Path` | -- | Path to the ralph directory |
-| `ralph_file` | `Path` | -- | Path to the RALPH.md file |
+| `ralph_file` | `Path | None` | `None` | Path to the RALPH.md file. Supply exactly one of `ralph_file` or `prompt`. |
+| `prompt` | `str | None` | `None` | In-memory prompt body (no frontmatter). Supply exactly one of `ralph_file` or `prompt`. See [Embedding](embedding.md#running-a-prompt-from-memory). |
| `commands` | `list[Command]` | `[]` | Commands to run each iteration |
| `args` | `dict[str, str]` | `{}` | User argument values |
| `max_iterations` | `int | None` | `None` | Max iterations (`None` = unlimited) |
@@ -387,6 +388,12 @@ When extra listeners are registered, events are broadcast to both the built-in q
| `resume_run(run_id)` | Resume a paused run. |
| `list_runs()` | Return a snapshot of all registered runs. |
| `get_run(run_id)` | Look up a run by ID. |
+| `wait_for_any(run_ids, timeout=None)` | Block until at least one of `run_ids` reaches a terminal status; returns the finished IDs (`[]` on timeout). |
+| `wait_for_all(run_ids, timeout=None)` | Block until all `run_ids` finish or `timeout` elapses; returns `True` iff all finished. |
+| `get_result(run_id)` | Snapshot the run's status and counts as a frozen `RunResult`. Raises `KeyError` if unknown. |
+| `shutdown(timeout=None)` | Request stop on every run and join their threads; returns `True` iff all joined in time. |
+
+For the create → start → wait → result → shutdown lifecycle and thread-safety notes, see [Embedding](embedding.md).
---
diff --git a/docs/cli.md b/docs/cli.md
index 408001ce..7d4b3669 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -113,6 +113,7 @@ The loop also stops automatically when:
- All `-n` iterations have completed
- `--stop-on-error` is set and the agent exits non-zero or times out
+- `stop_on_completion_signal: true` is set in frontmatter and the matching `...` tag is detected in agent output or captured result text
### Peeking at live agent output
@@ -152,7 +153,7 @@ ralph scaffold # Creates RALPH.md in the current directory
|---|---|---|
| `[NAME]` | none | Directory name. If omitted, creates RALPH.md in the current directory |
-The generated template includes an example command (`git-log`), an example arg (`focus`), and a prompt body with placeholders for both. Edit it, then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough.
+The generated template includes an example command (`git-log`), an example arg (`focus`), a prompt body with placeholders for both, and commented `completion_signal` / `stop_on_completion_signal` lines showing the promise-completion path. Uncomment them if you want the agent to stop early by emitting a matching `...` tag. Then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough.
Errors if `RALPH.md` already exists at the target location.
@@ -190,6 +191,10 @@ Your instructions here. Reference args with {{ args.dir }}.
| `commands` | list | no | Commands to run each iteration (each has `name` and `run`) |
| `args` | list of strings | no | Declared argument names for user arguments. Letters, digits, hyphens, and underscores only. |
| `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) |
+| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) |
+| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) |
+
+Exit code `0` still only means the agent process succeeded. Ralphify keeps its own loop architecture; only the promise tag format and matching align with Ralph-Wiggum.
### Commands
diff --git a/docs/contributing/codebase-map.md b/docs/contributing/codebase-map.md
index 65d178f1..aa240807 100644
--- a/docs/contributing/codebase-map.md
+++ b/docs/contributing/codebase-map.md
@@ -31,7 +31,17 @@ src/ralphify/ # All source code
├── _events.py # Event types, emitter protocol, and BoundEmitter convenience wrapper
├── _keypress.py # Cross-platform single-keypress listener (powers the `p` peek toggle)
├── _output.py # ProcessResult base class, subprocess constants (SESSION_KWARGS, SUBPROCESS_TEXT_KWARGS), format durations
-└── _brand.py # Brand color constants shared across CLI and console rendering
+├── _brand.py # Brand color constants shared across CLI and console rendering
+├── hooks.py # Agent lifecycle hooks — AgentHook Protocol, ShellAgentHook, CombinedAgentHook
+├── _wind_down_shim.py # Module invoked by per-CLI hooks to nudge agents toward max_turns wind-down
+└── adapters/ # Pluggable CLI adapter layer — one module per agent CLI
+ ├── _protocol.py # CLIAdapter Protocol, AdapterEvent, Invocation, stdin_invocation, ADAPTERS registry
+ ├── claude.py # Claude Code adapter (stdin, stream-json, PreToolUse wind-down)
+ ├── codex.py # Codex adapter (stdin, --json, PostToolUse wind-down)
+ ├── copilot.py # GitHub Copilot adapter (stdin, blocking)
+ ├── crush.py # Crush adapter (stdin, blocking, plain text — no tool-use counting)
+ ├── opencode.py # opencode adapter (prompt as positional arg, --format json)
+ └── _generic.py # Fallback adapter for unknown CLIs (stdin, no parsing)
tests/ # Pytest tests — one test file per module
docs/ # MkDocs site (Material theme) — user-facing documentation
diff --git a/docs/cookbook.md b/docs/cookbook.md
index e6455725..65b1ca89 100644
--- a/docs/cookbook.md
+++ b/docs/cookbook.md
@@ -1,13 +1,13 @@
---
title: Ralph Loop Recipes
-description: Copy-pasteable ralph loop setups for autonomous ML research, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement.
-keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research
+description: Copy-pasteable ralph loop setups for autonomous ML research, promise-based early exit, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement.
+keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, promise completion, early exit agent loop, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research
---
# Cookbook
!!! tldr "TL;DR"
- 8 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), and [test coverage](#test-coverage). Each is a real, runnable example from the `examples/` directory.
+ 9 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), [test coverage](#test-coverage), and [promise completion](#promise-completion). Each is a real, runnable example from the `examples/` directory.
Copy-pasteable setups for common autonomous workflows. Each recipe is a real, runnable ralph from the [`examples/`](https://github.com/computerlovetech/ralphify/tree/main/examples) directory.
@@ -269,6 +269,40 @@ The coverage report gives the agent a clear metric to improve and shows exactly
---
+## Stop the loop when the task is complete {: #promise-completion }
+
+A loop that uses ralphify's promise-completion path to stop before the iteration budget. The agent keeps working until the requested target is done, then emits `COMPLETE` so the run ends immediately instead of burning the remaining iterations.
+
+**`promise-completion/RALPH.md`**
+
+```markdown
+--8<-- "examples/promise-completion/RALPH.md"
+```
+
+```bash
+ralph run promise-completion -n 10 --target "stabilize the failing auth tests"
+```
+
+```text
+▶ Running: promise-completion
+ 3 commands · max 10 iterations
+
+── Iteration 1 ──
+ Commands: 3 ran
+✓ Iteration 1 completed (51.4s)
+
+── Iteration 2 ──
+ Commands: 3 ran
+✓ Iteration 2 completed via promise tag COMPLETE (43.2s)
+
+──────────────────────
+Done: 2 iterations — 2 succeeded
+```
+
+Set `completion_signal` to the inner promise text and `stop_on_completion_signal: true` to enable early exit. The agent must emit the matching `...` tag — bare text does not count.
+
+---
+
## Next steps
- [CLI Reference](cli.md) — all `ralph run` options (`--timeout`, `--stop-on-error`, `--delay`, user args)
diff --git a/docs/embedding.md b/docs/embedding.md
new file mode 100644
index 00000000..69449c98
--- /dev/null
+++ b/docs/embedding.md
@@ -0,0 +1,146 @@
+---
+title: Embedding Ralphify as a Headless Engine
+description: Drive ralphify from a long-lived Python process — headless import without rich/typer, in-memory prompts, concurrent run lifecycle, typed events, and a clean shutdown.
+keywords: embed ralphify, headless agent loop, ralphify library, RunManager lifecycle, in-memory prompt, typed events, shutdown RunManager, concurrent AI agent runs
+---
+
+# Embedding Ralphify
+
+!!! tldr "TL;DR"
+ `import ralphify` pulls in only the engine — no `rich` or `typer`. Drive runs from your own process with `RunManager`: `create_run` → `start_run` → `wait_for_any` / `wait_for_all` → `get_result` → `shutdown`. Run prompts straight from memory with `RunConfig(prompt=...)`, and annotate event handlers with the typed payloads.
+
+This page is for embedders driving ralphify as a library from a long-lived host process (a web server, a scheduler, another agent). For the full API reference, see the [Python API](api.md).
+
+## Headless import — no TUI dependencies
+
+The engine, manager, and event system are TUI-free. Install ralphify without the `[cli]` extra and the import chain pulls in neither `rich` nor `typer`:
+
+```bash
+pip install ralphify # engine only (pyyaml)
+pip install 'ralphify[cli]' # adds the `ralph` console script (rich + typer)
+```
+
+```python
+import ralphify, sys
+
+assert "rich" not in sys.modules
+assert "typer" not in sys.modules
+```
+
+The `ralph` console script stays registered either way. If it is invoked without the `[cli]` extra installed, it exits with an actionable message:
+
+```
+The `ralph` CLI requires the [cli] extra: pip install 'ralphify[cli]'
+```
+
+`pyyaml` stays a core dependency — frontmatter parsing is part of the engine, not the TUI.
+
+## Running a prompt from memory
+
+Embedders that already hold the prompt body in memory don't need to write a throwaway `RALPH.md`. Set `RunConfig.prompt` instead of `ralph_file`:
+
+```python
+from pathlib import Path
+from ralphify import RunConfig, RunState, run_loop
+
+config = RunConfig(
+ agent="claude -p --dangerously-skip-permissions",
+ ralph_dir=Path("."),
+ prompt="Fix the failing test in {{ args.target }} and commit.",
+ args={"target": "tests/test_widget.py"},
+ max_iterations=3,
+)
+run_loop(config, RunState(run_id="in-memory"))
+```
+
+`prompt` is the prompt **body** — placeholders (`{{ commands.x }}`, `{{ args.x }}`, `{{ ralph.x }}`) are still resolved, but no frontmatter is parsed from it. Set `agent`, `commands`, and `args` on `RunConfig` directly.
+
+!!! note "Exactly one prompt source"
+ `RunConfig` requires exactly one of `prompt` or `ralph_file`. Passing both or neither raises `ValueError`.
+
+## Typed event payloads
+
+`Event` is generic over its data payload. Annotate handlers with the concrete `TypedDict` and your type checker carries the payload type through — no `.get()` or `cast`:
+
+```python
+from ralphify import Event, EventType, IterationEndedData
+
+def on_iteration(event: Event[IterationEndedData]) -> None:
+ duration = event.data["duration_formatted"] # statically typed str
+ print(f"iteration {event.data['iteration']} took {duration}")
+```
+
+The exported payload types are `RunStartedData`, `RunStoppedData`, `IterationStartedData`, `IterationEndedData`, `CommandsStartedData`, `CommandsCompletedData`, `PromptAssembledData`, `AgentActivityData`, `AgentOutputLineData`, `ToolUseData`, `TurnApproachingLimitData`, `TurnCappedData`, and `LogMessageData`. `EventData` is the union of all of them; an emitter that handles arbitrary events receives `Event[EventData]` and narrows on `event.type`. `to_dict()` is unchanged — `TypedDict`s are plain dicts at runtime.
+
+## Thread-safety
+
+`RunManager` is thread-safe. Each run executes on its own daemon thread; the registry, control methods, and the wait primitives are guarded internally. You can call `start_run`, `stop_run`, `pause_run`, `resume_run`, `get_result`, and the wait methods from any thread.
+
+`RunState` exposes the same thread-safe control methods (`request_stop`, `request_pause`, `request_resume`) — safe to call while the run thread is mid-iteration; they take effect at the next iteration boundary.
+
+## The run lifecycle
+
+The supported lifecycle for a managed run is **create → start → wait → result → shutdown**:
+
+```python
+from pathlib import Path
+from ralphify import RunManager, RunConfig
+
+manager = RunManager()
+
+a = manager.create_run(RunConfig(
+ agent="claude -p --dangerously-skip-permissions",
+ ralph_dir=Path("."), prompt="task A", max_iterations=2,
+))
+b = manager.create_run(RunConfig(
+ agent="claude -p --dangerously-skip-permissions",
+ ralph_dir=Path("."), prompt="task B", max_iterations=2,
+))
+
+manager.start_run(a.state.run_id)
+manager.start_run(b.state.run_id)
+
+# Block until at least one finishes (returns the finished run IDs).
+done = manager.wait_for_any([a.state.run_id, b.state.run_id], timeout=300)
+
+# Block until all finish (returns True iff every run finished in time).
+all_done = manager.wait_for_all([a.state.run_id, b.state.run_id], timeout=300)
+
+# Snapshot the structured outcome.
+result = manager.get_result(a.state.run_id)
+print(result.status, result.completed, result.failed)
+
+# Request stop on every run and join their threads.
+manager.shutdown(timeout=30)
+```
+
+### Waiting for completion
+
+| Method | Returns | Blocks until |
+|---|---|---|
+| `wait_for_any(run_ids, timeout=None)` | `list[str]` — the finished run IDs (`[]` on timeout) | at least one of `run_ids` reaches a terminal status |
+| `wait_for_all(run_ids, timeout=None)` | `bool` — `True` iff all finished | every run in `run_ids` finishes, or `timeout` elapses |
+
+Both are backed by an internal condition notified when any run thread exits — no polling. A terminal status is `COMPLETED`, `STOPPED`, or `FAILED`. Unknown run IDs are ignored by `wait_for_any` and can never satisfy `wait_for_all`.
+
+### Reading the result
+
+`get_result(run_id)` returns a frozen `RunResult` snapshot. It reports the **current** counts regardless of terminal state, so wait first (e.g. with `wait_for_all`) if you want the final outcome. It raises `KeyError` for an unknown run ID.
+
+| Field | Type | Description |
+|---|---|---|
+| `run_id` | `str` | The run's ID |
+| `status` | `RunStatus` | Current lifecycle status |
+| `total` | `int` | `completed + failed` |
+| `completed` | `int` | Successful iterations |
+| `failed` | `int` | Failed iterations (includes timed out) |
+| `timed_out_count` | `int` | Timed-out iterations (subset of `failed`) |
+
+### Shutting down
+
+`shutdown(timeout=None)` requests stop on every registered run and joins each live thread. It returns `True` iff all live threads joined within `timeout`; with `timeout=None` it blocks until every run thread exits. Unstarted runs are harmless — there is no thread to join. Call it once when the host process is winding down.
+
+## Next steps
+
+- [**Python API**](api.md) — full reference for every public type and method
+- [**How the loop works**](how-it-works.md) — the iteration cycle `run_loop` executes
diff --git a/docs/getting-started.md b/docs/getting-started.md
index 306954ad..313f4781 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -50,9 +50,10 @@ ralph scaffold my-ralph
```text
Created my-ralph/RALPH.md
Edit the file, then run: ralph run my-ralph
+Optional early exit: uncomment completion_signal + stop_on_completion_signal and emit COMPLETE.
```
-This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, and prompt. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing.
+This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, prompt, and a commented promise-completion example. If you want the loop to stop before its iteration budget, uncomment `completion_signal` and `stop_on_completion_signal`, then tell the agent to emit the matching `...` tag when it is done. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing.
Or create the file manually as shown below.
@@ -251,6 +252,16 @@ The agent's output streams live to your terminal between the iteration markers
If the agent breaks a test, the next iteration sees the failure output via `{{ commands.tests }}` and fixes it automatically.
+!!! tip "Optional: stop when the task is fully complete"
+ Add these frontmatter fields if you want the loop to stop on an explicit completion marker:
+
+ ```yaml
+ completion_signal: COMPLETE
+ stop_on_completion_signal: true
+ ```
+
+ `completion_signal` is the inner promise text. With `completion_signal: COMPLETE`, the agent must emit `COMPLETE`. If you omit it, the default promise tag is `RALPH_PROMISE_COMPLETE`. The loop only exits early when `stop_on_completion_signal` is enabled and that tag is detected in agent output or captured result text. Exit code `0` still only means the agent process succeeded.
+
Once you're confident the loop works, drop the `-n 3` to let it run indefinitely. Press `Ctrl+C` to stop.
## Step 7: Steer while it runs
@@ -274,7 +285,7 @@ Read TODO.md and focus only on the API module.
This is the most powerful part of ralph loops — you're steering a running agent with a text file.
!!! warning "Frontmatter changes need a restart"
- Only the **prompt body** is re-read each iteration. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. If you add a new command or change the agent, stop the loop with `Ctrl+C` and restart it.
+ Only the **prompt body** is re-read each iteration. Frontmatter is parsed once at startup. If you add a new command, change the agent, or change completion settings, stop the loop with `Ctrl+C` and restart it.
## Next steps
diff --git a/docs/hooks.md b/docs/hooks.md
new file mode 100644
index 00000000..98f0d3dd
--- /dev/null
+++ b/docs/hooks.md
@@ -0,0 +1,123 @@
+---
+title: Lifecycle Hooks
+description: Subscribe to ralphify iteration boundaries, tool-use events, and turn-cap signals via shell commands declared in RALPH.md frontmatter or Python AgentHook implementations.
+keywords: ralphify hooks, shell hooks, agent lifecycle hooks, on_tool_use, on_iteration_completed, on_turn_capped, RALPH.md hooks field
+---
+
+# Lifecycle Hooks
+
+Hooks let you react to ralphify iteration boundaries, tool-use events, and turn-cap signals without modifying the engine. Two flavors exist:
+
+- **Shell hooks** — declared in `RALPH.md` frontmatter under the `hooks:` field. Each entry is a `{event, run}` pair; ralphify pipes the event payload as JSON to the command's stdin. No Python required.
+- **Python hooks** — classes implementing the `AgentHook` Protocol. Pass them via `RunConfig(hooks=[...])` to `run_loop`. Useful when embedding ralphify in a larger application.
+
+This page covers both.
+
+## Available events
+
+| Event | Fires when | Payload fields |
+|---|---|---|
+| `on_iteration_started` | Right before commands run for an iteration | `iteration` |
+| `on_commands_completed` | After all commands finish, before prompt assembly | `iteration`, `outputs` (`{name: stdout}`) |
+| `on_prompt_assembled` | After placeholder resolution, before agent spawn | `iteration`, `prompt` |
+| `on_tool_use` | Each time the adapter parses a `tool_use` event | `iteration`, `tool_name`, `count` |
+| `on_turn_approaching_limit` | When `count >= max_turns - max_turns_grace` | `iteration`, `count`, `max_turns` |
+| `on_turn_capped` | When `count >= max_turns` and the agent is about to be terminated | `iteration`, `count` |
+| `on_iteration_completed` | After the iteration's `AgentResult` is finalized | `iteration`, `result` (dict form) |
+| `on_completion_signal` | When the adapter detects a `...` tag | `iteration`, `signal` |
+
+Event names are validated when frontmatter loads — unknown names raise a clear error.
+
+The `on_tool_use`, `on_turn_approaching_limit`, and `on_turn_capped` events only fire for adapters that count tool uses (`counts_what == "tool_use"` — Claude, Codex, Copilot, opencode). Adapters like Crush that emit no structured events never produce these callbacks.
+
+## Shell hooks (RALPH.md)
+
+Add a `hooks:` list to your frontmatter. Each entry is parsed with `shlex.split` (no shell metacharacters); the event payload is JSON-encoded and written to the command's stdin.
+
+```markdown
+---
+agent: claude -p --dangerously-skip-permissions
+max_turns: 20
+hooks:
+ - event: on_iteration_started
+ run: ./scripts/notify-start.sh
+ - event: on_turn_approaching_limit
+ run: ./scripts/page-oncall.sh
+ - event: on_iteration_completed
+ run: ./scripts/log-result.py
+---
+```
+
+A minimal `notify-start.sh`:
+
+```bash
+#!/bin/bash
+payload=$(cat -)
+echo "iteration starting: $payload" >> ralph_events.log
+```
+
+The payload for `on_iteration_started` is `{"iteration": 3}`.
+
+### Failure handling
+
+- Hook stdout is captured to the run log.
+- A non-zero exit code is logged but does **not** abort the run — hooks are observers, not gatekeepers.
+- One misbehaving hook does not poison the others. Each hook is invoked independently with per-call exception isolation.
+
+### Multiple hooks for one event
+
+You can register multiple hooks for the same event. They run in declaration order:
+
+```markdown
+---
+hooks:
+ - event: on_iteration_completed
+ run: ./scripts/save-metrics.sh
+ - event: on_iteration_completed
+ run: ./scripts/notify-slack.sh
+---
+```
+
+## Python hooks (`AgentHook` Protocol)
+
+For tighter integration, implement the `AgentHook` Protocol directly. All methods are keyword-only so future field additions stay backward compatible.
+
+```python
+from ralphify import RunConfig, RunState, run_loop
+from ralphify.hooks import NoOpAgentHook
+
+
+class MetricsHook(NoOpAgentHook):
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None:
+ print(f"iter {iteration}: {tool_name} (#{count})")
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None:
+ print(f"iter {iteration}: capped at {count} turns")
+
+
+config = RunConfig(
+ agent="claude -p --dangerously-skip-permissions",
+ ralph_dir=Path("."),
+ ralph_file=Path("RALPH.md"),
+ hooks=[MetricsHook()],
+)
+run_loop(config, RunState(run_id="my-run"))
+```
+
+`NoOpAgentHook` provides empty implementations of every method so you only override the ones you care about. Pass any number of hooks via `RunConfig(hooks=[...])`; ralphify wraps them in a `CombinedAgentHook` that fans events with exception isolation.
+
+## When to use which
+
+- **Shell hooks** when the action is a small script, lives in your repo, and benefits from the agent's own exit-code semantics.
+- **Python hooks** when you're embedding ralphify, need access to the live `RunState`, or want type-safe payloads.
+
+Both run in the same process as the engine. Long-running hook code blocks the loop, so keep heavy work asynchronous (fire-and-forget background scripts, queues, etc.).
+
+## Soft wind-down vs. hooks
+
+Don't confuse the `on_turn_approaching_limit` hook (yours, observer-only) with the per-CLI soft wind-down message injected directly into Claude or Codex (theirs, in-band). The wind-down message is a hint to the agent so it has time to hand off cleanly; your hook fires alongside it so external systems can observe the same threshold.
+
+## See also
+
+- [Quick Reference — frontmatter fields](quick-reference.md#frontmatter-fields)
+- [Using with Different Agents](agents.md)
diff --git a/docs/how-it-works.md b/docs/how-it-works.md
index 74d39e9d..f4be00a7 100644
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@@ -19,7 +19,7 @@ Every iteration follows the same sequence. Here's what happens at each step.
The prompt body (everything below the frontmatter) is read from disk **every iteration**. This means you can edit the prompt text — add rules, change the task, adjust constraints — while the loop is running. Changes take effect on the next cycle.
-Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. To change those, restart the loop.
+Frontmatter settings are parsed once at startup. To change them, restart the loop.
### 2. Run commands and capture output
@@ -70,6 +70,8 @@ echo "" | claude -p --dangerously-skip-permissions
The agent reads the prompt, does work in the current directory (edits files, runs commands, makes commits), and exits. Ralphify waits for the agent process to finish.
+Exit codes still mean process success or failure. Promise completion is separate: `completion_signal` is the inner promise text (default: `RALPH_PROMISE_COMPLETE`), so ralphify only stops early when `stop_on_completion_signal` is `true` and the agent emits the matching `...` tag in output or captured result text. This aligns the promise format with Ralph-Wiggum, but ralphify still uses its own command/prompt loop architecture.
+
When the agent command starts with `claude`, ralphify automatically adds `--output-format stream-json --verbose` to enable structured streaming. This lets ralphify track agent activity in real time — you don't need to configure this yourself.
### 6. Loop back with fresh context
@@ -82,7 +84,7 @@ The loop starts the next iteration from step 1. The RALPH.md is re-read, command
|---|---|---|
| Prompt body | Every iteration | Edit the prompt while the loop runs — the next iteration follows your new instructions |
| Command output | Every iteration | The agent always sees fresh data (latest git log, current test status, etc.) |
-| Frontmatter (`agent`, `commands`, `args`) | Once at startup | Parsed when the loop starts. Restart to pick up changes. |
+| Frontmatter settings | Once at startup | Parsed when the loop starts. Restart to pick up changes. |
| User arguments | Once at startup | Passed via CLI flags, constant for the run |
## How broken code gets fixed automatically
@@ -179,6 +181,7 @@ The loop continues until one of these happens:
| `Ctrl+C` (first) | Gracefully finishes the current iteration, then stops the loop. The agent completes its work and the iteration result is recorded. |
| `Ctrl+C` (second) | Force-stops immediately — kills the agent process and exits. Use when you don't want to wait for the current iteration to finish. |
| `-n` limit reached | Loop stops after completing the specified number of iterations |
+| `stop_on_completion_signal: true` and matching `...` tag detected | Loop stops after the current iteration |
| `--stop-on-error` and agent exits non-zero or times out | Loop stops after the current iteration |
| `--timeout` exceeded | Agent process is killed, iteration is marked as timed out, loop continues (unless `--stop-on-error`) |
diff --git a/docs/index.md b/docs/index.md
index 7f284a07..ca17330a 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -16,7 +16,7 @@ hide:
A **ralph** is a directory with a `RALPH.md` file — a skill-like format that bundles a prompt, the commands to run between iterations, and any files the agent needs. **Ralphify** is the CLI runtime that executes them.
-See [The Ralph Format](blog/posts/the-ralph-format.md) for the full spec.
+See [The Ralph Format](blog/posts/the-ralph-standard.md) for the full spec.
```
grow-coverage/
@@ -52,7 +52,7 @@ One directory. One command. Each iteration starts with fresh context and current
*Works with any agent CLI. Swap `claude -p` for Codex, Aider, or your own — just change the `agent` field.*
[Get Started](getting-started.md){ .md-button .md-button--primary }
-[Read the Format Spec](blog/posts/the-ralph-format.md){ .md-button }
+[Read the Format Spec](blog/posts/the-ralph-standard.md){ .md-button }
---
@@ -149,7 +149,7 @@ Ralphs are to the outer loop what [skills](https://agentskills.io/) are to the i
## Next steps
-- **[The Ralph Format](blog/posts/the-ralph-format.md)** — the full spec
+- **[The Ralph Format](blog/posts/the-ralph-standard.md)** — the full spec
- **[Getting Started](getting-started.md)** — from install to a running loop in 10 minutes
- **[How it Works](how-it-works.md)** — what happens inside each iteration
- **[Cookbook](cookbook.md)** — copy-pasteable ralphs for coding, docs, research, and more
diff --git a/docs/quick-reference.md b/docs/quick-reference.md
index 3fb7d880..5b0d80cb 100644
--- a/docs/quick-reference.md
+++ b/docs/quick-reference.md
@@ -78,6 +78,11 @@ Your instructions here. Use {{ args.dir }} for user arguments.
| `commands` | list | no | Commands to run each iteration |
| `args` | list | no | User argument names. Letters, digits, hyphens, and underscores only. |
| `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) |
+| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) |
+| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) |
+| `max_turns` | int | no | Hard cap on tool-use events per iteration. Streaming adapters (Claude, Codex, opencode) are SIGTERM'd at the limit; blocking adapters (Copilot) cannot be preempted mid-run, so their tool uses are counted post-hoc after the process exits and the iteration is marked completed-at-cap once the count reaches the limit. Missing disables the cap; adapters that count no tool uses treat it as a no-op. |
+| `max_turns_grace` | int | no | Tool-use count before `max_turns` at which a soft wind-down message is injected into Claude/Codex (default: `2`; `0` disables the wind-down). |
+| `hooks` | list | no | Shell commands run at lifecycle points. Each entry is `{event, run}`. See [Hooks](hooks.md). |
### Command fields
@@ -160,6 +165,7 @@ Each iteration:
| `P` (shift+p) | Open full-screen peek — scroll the entire activity buffer. `j/k` line, `space/b` page, `g/G` top/bottom, `q` or `P` exits |
| `-n` limit reached | Stops after the specified number of iterations |
| `--stop-on-error` | Stops if agent exits non-zero or times out |
+| matching `...` tag detected | Stops early only when `stop_on_completion_signal: true` and the configured promise tag is found in agent output/result |
## Live editing
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index a981fddc..52426260 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -522,7 +522,7 @@ For programmatic control over concurrent runs, use the [Python API's `RunManager
### Can I edit RALPH.md while the loop runs?
-Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup, so changing those requires restarting the loop.
+Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter settings are parsed once at startup, so changing them requires restarting the loop.
### How do I disable the co-author credit in commits?
diff --git a/examples/promise-completion/RALPH.md b/examples/promise-completion/RALPH.md
new file mode 100644
index 00000000..7302a110
--- /dev/null
+++ b/examples/promise-completion/RALPH.md
@@ -0,0 +1,49 @@
+---
+agent: claude -p --dangerously-skip-permissions
+commands:
+ - name: tests
+ run: uv run pytest -x
+ - name: lint
+ run: uv run ruff check .
+ - name: git-log
+ run: git log --oneline -10
+args:
+ - target
+completion_signal: COMPLETE
+stop_on_completion_signal: true
+---
+
+# Stop Early with a Promise Tag
+
+You are an autonomous coding agent running in a loop. Each iteration
+starts with a fresh context. Your progress lives in the code and git.
+
+## Recent commits
+
+{{ commands.git-log }}
+
+## Test results
+
+{{ commands.tests }}
+
+## Lint
+
+{{ commands.lint }}
+
+Fix any failing tests or lint violations above before doing anything else.
+
+## Task
+
+Get the requested target to a clean, shippable state.
+{{ args.target }}
+
+When the task is complete and no more changes are needed, print
+COMPLETE and exit so the loop stops early.
+
+## Rules
+
+- One fix or improvement per iteration
+- Keep the target scoped — do not drift into unrelated cleanup
+- If tests or lint are failing, fix them before new work
+- Only emit COMPLETE when the target is truly done
+- Commit with a descriptive message and push
diff --git a/mkdocs.yml b/mkdocs.yml
index 014f223b..bc3121ba 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -132,6 +132,8 @@ nav:
- Quick Reference: quick-reference.md
- CLI: cli.md
- Python API: api.md
+ - Hooks: hooks.md
+ - Embedding: embedding.md
- Blog: blog/index.md
- Help:
- Troubleshooting: troubleshooting.md
diff --git a/pyproject.toml b/pyproject.toml
index a424ab0d..2e459738 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -18,7 +18,10 @@ classifiers = [
"Topic :: Software Development :: Build Tools",
"Topic :: Software Development :: Quality Assurance",
]
-dependencies = ["typer>=0.9", "rich>=13.0", "pyyaml>=6.0"]
+dependencies = ["pyyaml>=6.0"]
+
+[project.optional-dependencies]
+cli = ["typer>=0.9", "rich>=13.0"]
[project.scripts]
@@ -26,6 +29,7 @@ ralph = "ralphify:main"
[dependency-groups]
dev = [
+ "ralphify[cli]",
"mkdocs>=1.6.1",
"mkdocs-material>=9.7.5",
"mkdocs-git-revision-date-localized-plugin>=1.2",
diff --git a/src/ralphify/__init__.py b/src/ralphify/__init__.py
index 13294123..5fed5abe 100644
--- a/src/ralphify/__init__.py
+++ b/src/ralphify/__init__.py
@@ -24,23 +24,59 @@
__version__ = "0.0.0"
from ralphify.engine import run_loop
-from ralphify._run_types import Command, RunConfig, RunState, RunStatus
+from ralphify._run_types import (
+ Command,
+ RunConfig,
+ RunResult,
+ RunState,
+ RunStatus,
+)
from ralphify._events import (
+ AgentActivityData,
+ AgentOutputLineData,
BoundEmitter,
+ CommandsCompletedData,
+ CommandsStartedData,
Event,
+ EventData,
EventEmitter,
EventType,
FanoutEmitter,
+ IterationEndedData,
+ IterationStartedData,
+ LogMessageData,
NullEmitter,
+ PromptAssembledData,
QueueEmitter,
+ RunStartedData,
+ RunStoppedData,
StopReason,
+ ToolUseData,
+ TurnApproachingLimitData,
+ TurnCappedData,
+)
+from ralphify.hooks import (
+ AgentHook,
+ CombinedAgentHook,
+ NoOpAgentHook,
+ ShellAgentHook,
)
from ralphify.manager import ManagedRun, RunManager
def main() -> None:
"""Entry point for the ``ralph`` CLI (called by the console script)."""
- from ralphify.cli import app
+ try:
+ from ralphify.cli import app
+ except ModuleNotFoundError as exc:
+ # Only a genuinely-absent CLI dependency gets the install hint; any
+ # other missing module is a real bug inside ralphify.cli, so re-raise
+ # it rather than masking it behind the [cli]-extra message.
+ if exc.name in {"rich", "typer"}:
+ raise SystemExit(
+ "The `ralph` CLI requires the [cli] extra: pip install 'ralphify[cli]'"
+ ) from exc
+ raise
app()
@@ -51,9 +87,11 @@ def main() -> None:
"BoundEmitter",
"Command",
"RunConfig",
+ "RunResult",
"RunState",
"RunStatus",
"Event",
+ "EventData",
"EventEmitter",
"EventType",
"FanoutEmitter",
@@ -62,4 +100,23 @@ def main() -> None:
"StopReason",
"ManagedRun",
"RunManager",
+ # Lifecycle hooks
+ "AgentHook",
+ "CombinedAgentHook",
+ "NoOpAgentHook",
+ "ShellAgentHook",
+ # Typed event payloads
+ "AgentActivityData",
+ "AgentOutputLineData",
+ "CommandsCompletedData",
+ "CommandsStartedData",
+ "IterationEndedData",
+ "IterationStartedData",
+ "LogMessageData",
+ "PromptAssembledData",
+ "RunStartedData",
+ "RunStoppedData",
+ "ToolUseData",
+ "TurnApproachingLimitData",
+ "TurnCappedData",
]
diff --git a/src/ralphify/_agent.py b/src/ralphify/_agent.py
index af6fb972..ecd57a43 100644
--- a/src/ralphify/_agent.py
+++ b/src/ralphify/_agent.py
@@ -16,13 +16,17 @@
from __future__ import annotations
import json
+import logging
import os
import queue
+import shutil
import signal
import subprocess
+import tempfile
import threading
import time
from collections.abc import Callable
+from contextlib import suppress
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
@@ -37,6 +41,12 @@
collect_output,
warn,
)
+from ralphify.adapters import CLIAdapter, select_adapter
+
+_log = logging.getLogger(__name__)
+
+_counter_write_failure_logged = False
+"""Module-level latch so the wind-down counter write failure logs only once."""
# ── Callback type aliases ──────────────────────────────────────────────
# Used across the streaming and blocking execution paths for callbacks
@@ -48,20 +58,33 @@
OutputLineCallback = Callable[[str, OutputStream], None]
"""Receives raw output lines with their stream name ("stdout"/"stderr")."""
+ToolUseCallback = Callable[[str, int], None]
+"""Invoked by both the streaming path (as lines arrive) and the blocking
+path (after the subprocess exits, during post-hoc counting). Best-effort:
+exceptions are swallowed so a buggy subscriber cannot kill the agent loop.
+"""
+
+
+def _call_safely(callback: Callable[..., Any] | None, *args: Any) -> None:
+ """Invoke an observer *callback* with *args*, swallowing any exception.
+
+ Used for best-effort observer callbacks during output draining: a
+ raising callback must never stop the drain loop or leave the reader
+ thread hung.
+ """
+ if callback is None:
+ return
+ try:
+ callback(*args)
+ except Exception:
+ pass
+
+
# Typed constants for the OutputStream literal so the type checker enforces
# that only "stdout" / "stderr" ever reach ``on_output_line``.
_STDOUT: OutputStream = "stdout"
_STDERR: OutputStream = "stderr"
-# Agent binary name that supports --output-format stream-json.
-# Public because _console_emitter also needs this for display logic.
-CLAUDE_BINARY = "claude"
-
-# CLI flags appended when streaming mode is used.
-_OUTPUT_FORMAT_FLAG = "--output-format"
-_STREAM_FORMAT = "stream-json"
-_VERBOSE_FLAG = "--verbose"
-
# JSON stream event types and fields for result extraction.
_RESULT_EVENT_TYPE = "result"
_RESULT_FIELD = "result"
@@ -81,6 +104,45 @@
# Seconds to wait for the agent process to exit after a kill signal.
_PROCESS_WAIT_TIMEOUT = 5.0
+# Tempdir prefix for per-iteration wind-down hook config. Surfaces in
+# ``$TMPDIR`` listings so users can spot stale dirs after a crash.
+_HOOK_TEMPDIR_PREFIX = "ralphify-"
+
+# Counter filename when ``log_dir`` is unset and the file lives inside
+# the per-iteration tempdir. When ``log_dir`` is set, the counter lives
+# at ``/.turncount``.
+_COUNTER_FILENAME = "turncount"
+_COUNTER_LOG_SUFFIX = ".turncount"
+_COUNTER_PAD_WIDTH = _LOG_ITERATION_PAD_WIDTH
+
+
+def _extract_result_text_from_lines(lines: list[str] | None) -> str | None:
+ """Return the last string payload from any JSON ``result`` event in *lines*."""
+ if lines is None:
+ return None
+
+ result_text = None
+ for line in lines:
+ extracted = _extract_result_text_from_line(line)
+ if extracted is not None:
+ result_text = extracted
+ return result_text
+
+
+def _extract_result_text_from_line(line: str) -> str | None:
+ """Return the string payload from a single JSON ``result`` event line."""
+ try:
+ parsed = json.loads(line.strip())
+ except json.JSONDecodeError:
+ return None
+ if (
+ isinstance(parsed, dict)
+ and parsed.get("type") == _RESULT_EVENT_TYPE
+ and isinstance(parsed.get(_RESULT_FIELD), str)
+ ):
+ return parsed[_RESULT_FIELD]
+ return None
+
def _try_graceful_group_kill(proc: subprocess.Popen[Any]) -> bool:
"""Attempt to kill the process via its POSIX process group.
@@ -231,15 +293,45 @@ class AgentResult(ProcessResult):
result_text: str | None = None
captured_stdout: str | None = None
captured_stderr: str | None = None
+ # Tool-use events the adapter reported for this iteration; ``0`` when
+ # no adapter was active or the adapter's ``counts_what != "tool_use"``.
+ tool_use_count: int = 0
+ # True when the cap was reached and the agent process was terminated
+ # (streaming) or the cap was exceeded post-hoc (blocking path).
+ turn_capped: bool = False
+
+
+@dataclass(slots=True)
+class _WindDownContext:
+ """Per-iteration tempdir and counter state for soft wind-down hooks.
+
+ Built by :func:`_setup_wind_down` for adapters whose
+ ``supports_soft_wind_down`` is True and a ``max_turns`` cap is
+ configured. Carries the env-var overrides the spawned agent needs,
+ plus the cleanup hook called from the streaming path's ``finally``.
+ """
+
+ tempdir: Path
+ counter_path: Path
+ env_overrides: dict[str, str]
+
+ def cleanup(self) -> None:
+ try:
+ self.counter_path.unlink()
+ except (FileNotFoundError, OSError):
+ pass
+ shutil.rmtree(self.tempdir, ignore_errors=True)
@dataclass(frozen=True, slots=True)
class _StreamResult:
"""Accumulated output from reading the agent's JSON stream."""
- stdout_lines: tuple[str, ...]
+ stdout_lines: tuple[str, ...] | None
result_text: str | None
timed_out: bool
+ tool_use_count: int = 0
+ turn_capped: bool = False
def _write_log(
@@ -260,19 +352,6 @@ def _write_log(
return log_file
-def _supports_stream_json(cmd: list[str]) -> bool:
- """Return True if the agent command supports ``--output-format stream-json``.
-
- Currently only Claude Code supports this protocol. To add streaming
- support for another agent, extend the check here — no other changes
- needed since :func:`_run_agent_streaming` handles the protocol generically.
- """
- if not cmd:
- return False
- binary = Path(cmd[0]).stem
- return binary == CLAUDE_BINARY
-
-
def _readline_pump(
stdout: IO[str],
line_queue: queue.Queue[str | None],
@@ -298,6 +377,11 @@ def _read_agent_stream(
deadline: float | None,
on_activity: ActivityCallback | None,
on_output_line: OutputLineCallback | None = None,
+ *,
+ capture_stdout: bool = True,
+ adapter: CLIAdapter | None = None,
+ max_turns: int | None = None,
+ on_tool_use: ToolUseCallback | None = None,
) -> _StreamResult:
"""Read the agent's JSON stream line-by-line until EOF or timeout.
@@ -322,23 +406,27 @@ def _read_agent_stream(
Returns early with ``timed_out=True`` when the deadline is exceeded,
leaving the caller responsible for killing the subprocess.
+
+ When *capture_stdout* is ``False``, stdout is still drained and parsed but
+ not retained in memory. This keeps the streaming path lightweight when no
+ later completion-signal parsing or log writing needs the raw bytes.
"""
- stdout_lines: list[str] = []
+ stdout_lines: list[str] | None = [] if capture_stdout else None
result_text: str | None = None
+ tool_use_count = 0
+ turn_capped = False
+ count_tool_use = adapter is not None and adapter.counts_what == "tool_use"
line_q: queue.Queue[str | None] = queue.Queue()
- reader = threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True)
- reader.start()
+ threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True).start()
while True:
# Compute how long we can wait for the next line.
if deadline is not None:
- remaining = deadline - time.monotonic()
- # Use max(remaining, 0) so that an already-expired deadline
- # still does a non-blocking drain of queued lines before
- # returning — lines the reader thread already buffered are
- # not silently lost.
- get_timeout: float | None = max(remaining, 0)
+ # Clamp to 0 so that an already-expired deadline still does a
+ # non-blocking drain of queued lines before returning — lines
+ # the reader thread already buffered are not silently lost.
+ get_timeout: float | None = max(deadline - time.monotonic(), 0)
else:
get_timeout = None
@@ -347,121 +435,175 @@ def _read_agent_stream(
except queue.Empty:
# Deadline expired while waiting for a line.
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=True,
+ tool_use_count=tool_use_count,
+ turn_capped=turn_capped,
)
if line is None: # EOF sentinel from reader thread
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=False,
+ tool_use_count=tool_use_count,
+ turn_capped=turn_capped,
)
- stdout_lines.append(line)
- if on_output_line is not None:
- try:
- on_output_line(line.rstrip("\r\n"), _STDOUT)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ if stdout_lines is not None:
+ stdout_lines.append(line)
+ _call_safely(on_output_line, line.rstrip("\r\n"), _STDOUT)
stripped = line.strip()
if stripped:
try:
parsed = json.loads(stripped)
except json.JSONDecodeError:
- parsed = None
- if isinstance(parsed, dict):
- if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance(
- parsed.get(_RESULT_FIELD), str
- ):
- result_text = parsed[_RESULT_FIELD]
- if on_activity is not None:
- try:
- on_activity(parsed)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ pass
+ else:
+ if isinstance(parsed, dict):
+ if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance(
+ parsed.get(_RESULT_FIELD), str
+ ):
+ result_text = parsed[_RESULT_FIELD]
+ _call_safely(on_activity, parsed)
+
+ if count_tool_use and adapter is not None:
+ event = adapter.parse_event(stripped)
+ if event is not None and event.kind == "tool_use":
+ tool_use_count += 1
+ _call_safely(on_tool_use, event.name or "", tool_use_count)
+ if max_turns is not None and tool_use_count >= max_turns:
+ return _StreamResult(
+ stdout_lines=(
+ tuple(stdout_lines)
+ if stdout_lines is not None
+ else None
+ ),
+ result_text=result_text,
+ timed_out=False,
+ tool_use_count=tool_use_count,
+ turn_capped=True,
+ )
# Also check deadline after processing — if the reader thread
# already queued many lines, this prevents unbounded processing
# past the deadline.
if deadline is not None and time.monotonic() > deadline:
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=True,
+ tool_use_count=tool_use_count,
+ turn_capped=turn_capped,
)
def _run_agent_streaming(
cmd: list[str],
- prompt: str,
+ stdin_text: str | None,
timeout: float | None,
log_dir: Path | None,
iteration: int,
on_activity: ActivityCallback | None = None,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool = False,
+ adapter: CLIAdapter | None = None,
+ max_turns: int | None = None,
+ on_tool_use: ToolUseCallback | None = None,
+ env: dict[str, str] | None = None,
) -> AgentResult:
"""Run the agent subprocess with line-by-line streaming of JSON output.
- Used for agents that support ``--output-format stream-json`` (e.g. Claude
- Code). Stream processing is delegated to :func:`_read_agent_stream`;
- this function owns the subprocess lifecycle (spawn, stdin delivery,
- timeout kill, and cleanup via ``try/finally``).
+ Used for adapters whose ``supports_streaming`` flag is True (e.g. Claude
+ Code's ``--output-format stream-json``, Codex's ``--json``). The command
+ list *must already include* any adapter-required flags —
+ :func:`execute_agent` calls ``adapter.build_command`` before dispatching.
+
+ *stdin_text* is the adapter-resolved prompt payload: a string when the
+ agent reads its prompt from stdin (the writer thread delivers it), or
+ ``None`` for arg-delivery agents whose prompt already lives in *cmd*.
+ When ``None``, stdin is wired to ``DEVNULL`` so the child gets immediate
+ EOF and no writer thread runs.
+
+ Stream processing is delegated to :func:`_read_agent_stream`; this
+ function owns the subprocess lifecycle (spawn, stdin delivery, timeout
+ kill, and cleanup via ``try/finally``).
stderr is drained concurrently on a background reader thread so large
stderr volume can't deadlock the child on a full OS pipe buffer while
the main thread is reading stdout.
"""
- stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG]
start = time.monotonic()
deadline = (start + timeout) if timeout is not None else None
+ capture_stdout_text = log_dir is not None or capture_stdout
+ pipe_stderr = log_dir is not None or on_output_line is not None
+ capture_stderr_text = log_dir is not None
+ pipe_stdin = stdin_text is not None
+
writer_thread: threading.Thread | None = None
- stderr_lines: list[str] = []
+ stderr_lines: list[str] | None = [] if capture_stderr_text else None
stderr_thread: threading.Thread | None = None
+ spawn_env = _build_spawn_env(env)
proc = subprocess.Popen(
- stream_cmd,
- stdin=subprocess.PIPE,
+ cmd,
+ stdin=subprocess.PIPE if pipe_stdin else subprocess.DEVNULL,
stdout=subprocess.PIPE,
- stderr=subprocess.PIPE,
+ stderr=subprocess.PIPE if pipe_stderr else None,
+ env=spawn_env,
**SUBPROCESS_TEXT_KWARGS,
**SESSION_KWARGS,
)
try:
# Popen with PIPE guarantees non-None streams; guard explicitly
# so the type checker narrows and -O mode cannot skip the check.
- if proc.stdin is None or proc.stdout is None or proc.stderr is None:
- raise RuntimeError("subprocess.Popen failed to create PIPE streams")
+ if proc.stdout is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stdout")
+ if pipe_stdin and proc.stdin is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stdin")
+ if pipe_stderr and proc.stderr is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stderr")
# Start the stderr pump BEFORE writing stdin so large prompts can't
# deadlock against an agent that writes substantial diagnostics to
# stderr while still reading its stdin.
- stderr_thread = _start_pump_thread(
- proc.stderr, stderr_lines, _STDERR, on_output_line
- )
+ if proc.stderr is not None:
+ stderr_thread = _start_pump_thread(
+ proc.stderr, stderr_lines, _STDERR, on_output_line
+ )
# Deliver the prompt on a background thread so that a blocked write
# (child not reading stdin, pipe buffer full) cannot prevent
# proc.wait / deadline checks from firing. Killing the process
# group unblocks the write with BrokenPipeError, which
- # _deliver_prompt already swallows.
- writer_thread = _start_writer_thread(proc, prompt)
-
- stream = _read_agent_stream(proc.stdout, deadline, on_activity, on_output_line)
+ # _deliver_prompt already swallows. Arg-delivery agents
+ # (stdin_text is None) skip this entirely.
+ if stdin_text is not None:
+ writer_thread = _start_writer_thread(proc, stdin_text)
+
+ stream = _read_agent_stream(
+ proc.stdout,
+ deadline,
+ on_activity,
+ on_output_line,
+ capture_stdout=capture_stdout_text,
+ adapter=adapter,
+ max_turns=max_turns,
+ on_tool_use=on_tool_use,
+ )
- if stream.timed_out:
+ if stream.timed_out or stream.turn_capped:
_kill_process_group(proc)
proc.wait()
finally:
_cleanup_agent(proc, stderr_thread, writer_thread)
- stdout = "".join(stream.stdout_lines)
- stderr = "".join(stderr_lines)
+ stdout = "".join(stream.stdout_lines) if stream.stdout_lines is not None else None
+ stderr = "".join(stderr_lines) if stderr_lines is not None else None
log_file = _write_log(log_dir, iteration, stdout, stderr)
@@ -471,8 +613,10 @@ def _run_agent_streaming(
log_file=log_file,
result_text=stream.result_text,
timed_out=stream.timed_out,
- captured_stdout=stdout if log_dir is not None else None,
- captured_stderr=stderr if log_dir is not None else None,
+ captured_stdout=stdout if capture_stdout_text else None,
+ captured_stderr=stderr if capture_stderr_text else None,
+ tool_use_count=stream.tool_use_count,
+ turn_capped=stream.turn_capped,
)
@@ -502,12 +646,7 @@ def _pump_stream(
for line in iter(stream.readline, ""):
if buffer is not None:
buffer.append(line)
- if on_output_line is not None:
- try:
- on_output_line(line.rstrip("\r\n"), stream_name)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ _call_safely(on_output_line, line.rstrip("\r\n"), stream_name)
except (ValueError, OSError):
# Pipe closed concurrently — exit cleanly so join() returns.
pass
@@ -596,14 +735,25 @@ def _cleanup_agent(
def _run_agent_blocking(
cmd: list[str],
- prompt: str,
+ stdin_text: str | None,
timeout: float | None,
log_dir: Path | None,
iteration: int,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool = False,
+ adapter: CLIAdapter | None = None,
+ max_turns: int | None = None,
+ on_tool_use: ToolUseCallback | None = None,
+ env: dict[str, str] | None = None,
) -> AgentResult:
"""Run the agent subprocess and return the result.
+ *stdin_text* is the adapter-resolved prompt payload: a string when the
+ agent reads its prompt from stdin (the writer thread delivers it), or
+ ``None`` for arg-delivery agents whose prompt already lives in *cmd*.
+ When ``None``, stdin is wired to ``DEVNULL`` and no writer thread runs.
+
Conditionally pipes stdout/stderr based on whether any subscriber
needs the output:
@@ -613,9 +763,9 @@ def _run_agent_blocking(
- **Callback only** (``on_output_line`` set, no log dir) — reader
threads forward lines to the callback without accumulating them,
avoiding unbounded memory growth.
- - **Log capture** (``log_dir`` set) — reader threads accumulate
- lines into lists for log writing; lines are also forwarded to the
- callback if provided.
+ - **Buffered capture** (``log_dir`` or ``capture_stdout`` set) —
+ reader threads accumulate lines for log writing or later completion
+ parsing; lines are also forwarded to the callback if provided.
The subprocess is started in its own process group so that on
``KeyboardInterrupt`` or timeout the entire child tree can be killed
@@ -625,7 +775,23 @@ def _run_agent_blocking(
Raises ``FileNotFoundError`` if the command binary does not exist.
"""
start = time.monotonic()
- capture = log_dir is not None or on_output_line is not None
+ # Blocking-path adapters count tool uses post-hoc by re-scanning the
+ # captured stdout (see _count_tool_uses_post_hoc), so a cap is only
+ # enforceable when the bytes are buffered. Force buffering when a cap
+ # is set on a tool-use-counting adapter, otherwise tool_use_count would
+ # always be 0 and turn_capped would never fire.
+ needs_post_hoc_count = (
+ max_turns is not None
+ and adapter is not None
+ and adapter.counts_what == "tool_use"
+ )
+ capture_stdout_text = log_dir is not None or capture_stdout or needs_post_hoc_count
+ capture_stderr_text = log_dir is not None
+ pipe_stdout = (
+ capture_stdout_text or on_output_line is not None or capture_result_text
+ )
+ pipe_stderr = capture_stderr_text or on_output_line is not None
+ pipe_stdin = stdin_text is not None
# When no subscriber needs the bytes, stdout/stderr are left
# un-piped so the child writes directly to the terminal. When
@@ -638,32 +804,49 @@ def _run_agent_blocking(
writer_thread: threading.Thread | None = None
stdout_thread: threading.Thread | None = None
stderr_thread: threading.Thread | None = None
- stdout_lines: list[str] | None = [] if log_dir is not None else None
- stderr_lines: list[str] | None = [] if log_dir is not None else None
+ stdout_lines: list[str] | None = [] if capture_stdout_text else None
+ stderr_lines: list[str] | None = [] if capture_stderr_text else None
+ result_text: str | None = None
+
+ def _on_output_line(line: str, stream_name: OutputStream) -> None:
+ nonlocal result_text
+ if capture_result_text and stream_name == _STDOUT:
+ extracted = _extract_result_text_from_line(line)
+ if extracted is not None:
+ result_text = extracted
+ if on_output_line is not None:
+ on_output_line(line, stream_name)
- pipe = subprocess.PIPE if capture else None
+ spawn_env = _build_spawn_env(env)
proc = subprocess.Popen(
cmd,
- stdin=subprocess.PIPE,
- stdout=pipe,
- stderr=pipe,
+ stdin=subprocess.PIPE if pipe_stdin else subprocess.DEVNULL,
+ stdout=subprocess.PIPE if pipe_stdout else None,
+ stderr=subprocess.PIPE if pipe_stderr else None,
+ env=spawn_env,
**SUBPROCESS_TEXT_KWARGS,
**SESSION_KWARGS,
)
try:
- if proc.stdin is None:
+ if pipe_stdin and proc.stdin is None:
raise RuntimeError("subprocess.Popen failed to create PIPE stdin")
- if capture:
- if proc.stdout is None or proc.stderr is None:
- raise RuntimeError("subprocess.Popen failed to create PIPE streams")
+ if pipe_stdout:
+ if proc.stdout is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stdout")
stdout_thread = _start_pump_thread(
- proc.stdout, stdout_lines, _STDOUT, on_output_line
+ proc.stdout, stdout_lines, _STDOUT, _on_output_line
)
+ if pipe_stderr:
+ if proc.stderr is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stderr")
stderr_thread = _start_pump_thread(
- proc.stderr, stderr_lines, _STDERR, on_output_line
+ proc.stderr, stderr_lines, _STDERR, _on_output_line
)
- writer_thread = _start_writer_thread(proc, prompt)
+ # Arg-delivery agents (stdin_text is None) get DEVNULL stdin and no
+ # writer thread; the prompt already lives in the spawned argv.
+ if stdin_text is not None:
+ writer_thread = _start_writer_thread(proc, stdin_text)
try:
returncode = proc.wait(timeout=timeout)
@@ -677,13 +860,23 @@ def _run_agent_blocking(
stderr = "".join(stderr_lines) if stderr_lines is not None else None
log_file = _write_log(log_dir, iteration, stdout, stderr)
+ tool_use_count, turn_capped = _count_tool_uses_post_hoc(
+ adapter=adapter,
+ stdout_lines=stdout_lines,
+ max_turns=max_turns,
+ on_tool_use=on_tool_use,
+ )
+
return AgentResult(
returncode=None if timed_out else returncode,
elapsed=time.monotonic() - start,
log_file=log_file,
+ result_text=result_text or _extract_result_text_from_lines(stdout_lines),
timed_out=timed_out,
- captured_stdout=stdout,
- captured_stderr=stderr,
+ captured_stdout=stdout if capture_stdout_text else None,
+ captured_stderr=stderr if capture_stderr_text else None,
+ tool_use_count=tool_use_count,
+ turn_capped=turn_capped,
)
@@ -694,35 +887,249 @@ def execute_agent(
timeout: float | None,
log_dir: Path | None,
iteration: int,
+ adapter: CLIAdapter | None = None,
on_activity: ActivityCallback | None = None,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool | None = None,
+ max_turns: int | None = None,
+ max_turns_grace: int = 0,
+ on_tool_use: ToolUseCallback | None = None,
) -> AgentResult:
"""Run the agent subprocess, auto-selecting streaming or blocking mode.
- Uses streaming mode for agents that support ``--output-format stream-json``
- (e.g. Claude Code); all other agents use the blocking path that drains
- stdout and stderr via reader threads. The *on_activity* callback is
- only invoked in streaming mode; *on_output_line* fires for both modes
- as raw lines arrive.
+ The *adapter* argument (or :func:`select_adapter` when omitted) decides
+ which execution path runs: adapters whose ``supports_streaming`` flag is
+ True take the line-streaming path that drives ``on_activity`` callbacks;
+ all others take the blocking path with concurrent stdout/stderr drain.
+ ``adapter.build_command(cmd)`` is applied before spawning, so the CLI
+ receives any flags the adapter requires (e.g. Claude's
+ ``--output-format stream-json --verbose`` or Codex's ``--json``).
+
+ When *max_turns* is set, the streaming path counts adapter-reported
+ tool-use events and terminates the subprocess once the cap is reached.
+ The blocking path cannot preempt but records the post-hoc count.
+ *max_turns_grace* enables a soft wind-down: if the adapter supports it,
+ a per-iteration tempdir is set up with a counter file and environment
+ variables pointing the agent at ``_wind_down_shim`` so it can warn the
+ agent when the cap is ``grace`` tool-uses away.
This is the single entry point the engine should use — callers don't need
to know which execution mode is selected.
"""
- if _supports_stream_json(cmd):
- return _run_agent_streaming(
- cmd,
- prompt,
+ if adapter is None:
+ adapter = select_adapter(cmd)
+ cmd = adapter.build_command(cmd)
+ # Let the adapter decide where the prompt goes: stdin adapters return
+ # the command unchanged with ``stdin_text=prompt``; arg-delivery adapters
+ # (e.g. opencode) append the prompt to argv and return ``stdin_text=None``
+ # so the child is spawned with ``stdin=DEVNULL`` and no writer thread.
+ inv = adapter.deliver_prompt(cmd, prompt)
+ supports_streaming = adapter.supports_streaming
+ if capture_stdout is None:
+ capture_stdout = log_dir is not None or (
+ not supports_streaming and on_output_line is None and capture_result_text
+ )
+
+ wind_down = _setup_wind_down(
+ adapter=adapter,
+ max_turns=max_turns,
+ max_turns_grace=max_turns_grace,
+ log_dir=log_dir,
+ iteration=iteration,
+ )
+ wrapped_on_tool_use = _wrap_tool_use_with_counter(
+ on_tool_use=on_tool_use,
+ counter_path=wind_down.counter_path if wind_down is not None else None,
+ )
+ env = wind_down.env_overrides if wind_down is not None else None
+
+ try:
+ if supports_streaming:
+ return _run_agent_streaming(
+ inv.argv,
+ inv.stdin_text,
+ timeout,
+ log_dir,
+ iteration,
+ on_activity=on_activity,
+ on_output_line=on_output_line,
+ capture_result_text=capture_result_text,
+ capture_stdout=capture_stdout,
+ adapter=adapter,
+ max_turns=max_turns,
+ on_tool_use=wrapped_on_tool_use,
+ env=env,
+ )
+ return _run_agent_blocking(
+ inv.argv,
+ inv.stdin_text,
timeout,
log_dir,
iteration,
- on_activity=on_activity,
on_output_line=on_output_line,
+ capture_result_text=capture_result_text,
+ capture_stdout=capture_stdout,
+ adapter=adapter,
+ max_turns=max_turns,
+ on_tool_use=wrapped_on_tool_use,
+ env=env,
)
- return _run_agent_blocking(
- cmd,
- prompt,
- timeout,
- log_dir,
- iteration,
- on_output_line=on_output_line,
+ finally:
+ if wind_down is not None:
+ wind_down.cleanup()
+
+
+def _build_spawn_env(overrides: dict[str, str] | None) -> dict[str, str] | None:
+ """Compose a spawn environment merging *overrides* onto ``os.environ``.
+
+ Returns ``None`` when no overrides are requested so ``Popen`` inherits
+ the parent's environment directly (the common case).
+ """
+ if not overrides:
+ return None
+ merged = os.environ.copy()
+ merged.update(overrides)
+ return merged
+
+
+def _setup_wind_down(
+ *,
+ adapter: CLIAdapter,
+ max_turns: int | None,
+ max_turns_grace: int,
+ log_dir: Path | None,
+ iteration: int,
+) -> _WindDownContext | None:
+ """Prepare the per-iteration tempdir + counter for soft wind-down.
+
+ Returns ``None`` — skipping the hook wiring entirely — when any of the
+ preconditions are not met:
+
+ - *max_turns* is unset (no cap = nothing to wind down toward).
+ - *max_turns_grace* is ``0`` (user opted out of the warning window).
+ - The adapter's ``supports_soft_wind_down`` flag is ``False``.
+ - The adapter's ``install_wind_down_hook`` raises
+ ``NotImplementedError`` (e.g. Copilot has no hook system today).
+
+ The tempdir is created with a ``ralphify-`` prefix so stale dirs are
+ easy to spot in ``$TMPDIR`` after a crash. The counter file lives in
+ *log_dir* when logging is enabled (so it's co-located with the
+ iteration log) and inside the tempdir otherwise. Either way the
+ caller must invoke :meth:`_WindDownContext.cleanup` in its ``finally``.
+ """
+ if max_turns is None or max_turns_grace <= 0:
+ return None
+ if not adapter.supports_soft_wind_down:
+ return None
+ # Clamp the grace below the cap. A grace >= max_turns is reachable via
+ # the library API (RunConfig does not validate it the way the CLI does);
+ # left unclamped the shim's threshold (max(cap - grace, 0)) collapses to
+ # 0 and injects the wind-down nudge on the very first tool use.
+ effective_grace = min(max_turns_grace, max(max_turns - 1, 0))
+ tempdir = Path(tempfile.mkdtemp(prefix=_HOOK_TEMPDIR_PREFIX))
+ counter_path = _resolve_counter_path(log_dir, iteration, tempdir)
+ _atomic_write_counter(counter_path, 0)
+ try:
+ env_overrides = adapter.install_wind_down_hook(
+ tempdir=tempdir,
+ counter_path=counter_path,
+ cap=max_turns,
+ grace=effective_grace,
+ )
+ except NotImplementedError:
+ # counter_path may live in log_dir rather than the tempdir, so
+ # removing the tempdir alone would orphan the "0" counter file in
+ # the log directory. Remove it explicitly, best-effort.
+ with suppress(OSError):
+ counter_path.unlink(missing_ok=True)
+ shutil.rmtree(tempdir, ignore_errors=True)
+ return None
+ return _WindDownContext(
+ tempdir=tempdir,
+ counter_path=counter_path,
+ env_overrides=env_overrides,
)
+
+
+def _resolve_counter_path(
+ log_dir: Path | None,
+ iteration: int,
+ tempdir: Path,
+) -> Path:
+ """Pick the counter file location: co-located with the log or inside tempdir."""
+ if log_dir is not None:
+ filename = f"{iteration:0{_COUNTER_PAD_WIDTH}d}{_COUNTER_LOG_SUFFIX}"
+ return log_dir / filename
+ return tempdir / _COUNTER_FILENAME
+
+
+def _atomic_write_counter(counter_path: Path, value: int) -> None:
+ """Write *value* to *counter_path* via rename so readers never see a partial int.
+
+ Best-effort: on any I/O failure the caller proceeds without wind-down
+ state — the warning subscription is advisory, not required. The first
+ failure is logged once at WARNING so a silently-disabled wind-down is
+ greppable; subsequent failures stay quiet to avoid per-iteration spam.
+ """
+ global _counter_write_failure_logged
+ try:
+ tmp_path = counter_path.with_name(counter_path.name + ".tmp")
+ tmp_path.write_text(str(value), encoding="utf-8")
+ os.replace(tmp_path, counter_path)
+ except OSError as exc:
+ if not _counter_write_failure_logged:
+ _counter_write_failure_logged = True
+ _log.warning(
+ "soft wind-down counter write to %s failed (%s); "
+ "wind-down will not fire this run",
+ counter_path,
+ exc,
+ )
+
+
+def _wrap_tool_use_with_counter(
+ on_tool_use: ToolUseCallback | None,
+ counter_path: Path | None,
+) -> ToolUseCallback | None:
+ """Return a callback that atomically updates *counter_path* on each tool-use.
+
+ Returns *on_tool_use* unchanged when no counter file is in play, so the
+ streaming path only pays for the disk write when wind-down is active.
+ """
+ if counter_path is None:
+ return on_tool_use
+
+ def _wrapped(name: str, count: int) -> None:
+ _atomic_write_counter(counter_path, count)
+ _call_safely(on_tool_use, name, count)
+
+ return _wrapped
+
+
+def _count_tool_uses_post_hoc(
+ *,
+ adapter: CLIAdapter | None,
+ stdout_lines: list[str] | None,
+ max_turns: int | None,
+ on_tool_use: ToolUseCallback | None,
+) -> tuple[int, bool]:
+ """Re-scan captured stdout for tool-use events when the blocking path runs.
+
+ Blocking-path adapters (no structured event stream) cannot preempt the
+ subprocess, so the cap is reported — not enforced — by scanning the
+ accumulated stdout after the child exits. ``turn_capped`` is ``True``
+ when the post-hoc count reached *max_turns*, so the engine can emit the
+ same ``ITERATION_TURN_CAPPED`` event either way.
+ """
+ if adapter is None or adapter.counts_what != "tool_use" or not stdout_lines:
+ return 0, False
+ count = 0
+ for line in stdout_lines:
+ event = adapter.parse_event(line)
+ if event is None or event.kind != "tool_use":
+ continue
+ count += 1
+ _call_safely(on_tool_use, event.name or "", count)
+ turn_capped = max_turns is not None and count >= max_turns
+ return count, turn_capped
diff --git a/src/ralphify/_console_emitter.py b/src/ralphify/_console_emitter.py
index c58d0224..11069dff 100644
--- a/src/ralphify/_console_emitter.py
+++ b/src/ralphify/_console_emitter.py
@@ -34,6 +34,7 @@
AgentOutputLineData,
CommandsCompletedData,
Event,
+ EventData,
EventType,
IterationEndedData,
IterationStartedData,
@@ -42,8 +43,8 @@
RunStoppedData,
)
from ralphify import _brand
-from ralphify._agent import CLAUDE_BINARY
from ralphify._output import format_count, format_duration
+from ralphify.adapters import select_adapter
_ICON_SUCCESS = "✓"
_ICON_FAILURE = "✗"
@@ -106,18 +107,22 @@
f"shift+{PEEK_TOGGLE_KEY} for full view[/]"
)
-# ── Claude binary detection ───────────────────────────────────────────
+# ── Adapter-driven structured-output detection ────────────────────────
-def _is_claude_command(agent: str) -> bool:
- """Return True if *agent* is a Claude Code command."""
+def _agent_renders_structured_peek(agent: str) -> bool:
+ """Return True if *agent*'s adapter feeds the structured peek panel.
+
+ Drives the ``ConsoleEmitter`` choice between :class:`_IterationPanel`
+ (structured) and :class:`_IterationSpinner` (raw). Delegates to
+ :func:`select_adapter` so adding a new CLI with peek-panel support
+ requires no edits here.
+ """
try:
- parts = shlex.split(agent)
+ cmd = shlex.split(agent)
except ValueError:
return False
- if not parts:
- return False
- return Path(parts[0]).stem == CLAUDE_BINARY
+ return select_adapter(cmd).renders_structured_peek
# ── Tool argument abbreviation ────────────────────────────────────────
@@ -169,7 +174,7 @@ def _format_params(tool_input: dict[str, Any], keys: list[str]) -> str:
val = tool_input.get(key)
if val is not None:
parts.append(f"{key}: {val}")
- return " · ".join(parts) if parts else ""
+ return " · ".join(parts)
def _extract_file_path(i: dict[str, Any]) -> str:
@@ -353,6 +358,11 @@ def freeze(self, outcome: str) -> None:
self._end = time.monotonic()
self._outcome = outcome
+ @property
+ def outcome(self) -> str | None:
+ """The frozen-iteration outcome label, or ``None`` while live."""
+ return self._outcome
+
# ── Scroll buffer management ─────────────────────────────────────
def add_scroll_line(self, markup: str) -> None:
@@ -407,8 +417,7 @@ def _build_body(self) -> Group:
"""Body group: scroll lines (or peek message) + spacer + footer."""
rows: list[Any] = []
if self._peek_visible:
- visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]
- for line in visible:
+ for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:
line.no_wrap = True
line.overflow = "ellipsis"
rows.append(line)
@@ -483,10 +492,8 @@ def apply(self, raw: dict[str, Any]) -> None:
)
def _apply_assistant(self, raw: dict[str, Any]) -> None:
- msg = raw.get("message", {})
-
# Update token counts from usage
- usage = msg.get("usage")
+ usage = raw.get("message", {}).get("usage")
if isinstance(usage, dict):
self._input_tokens = usage.get("input_tokens", self._input_tokens)
self._output_tokens = usage.get("output_tokens", self._output_tokens)
@@ -522,14 +529,14 @@ def _apply_assistant(self, raw: dict[str, Any]) -> None:
color, cat, arg = _tool_display(name, tool_input)
self._tool_categories[cat] = self._tool_categories.get(cat, 0) + 1
- # Pad short names to a fixed column so arguments line up;
- # longer names get a guaranteed two-space gap so the arg
- # never collides with the tool label.
- if len(name) < _TOOL_NAME_COL:
- name_col = f"{name:<{_TOOL_NAME_COL}}"
- else:
- name_col = f"{name} "
if arg:
+ # Pad short names to a fixed column so arguments line up;
+ # longer names get a guaranteed two-space gap so the arg
+ # never collides with the tool label.
+ if len(name) < _TOOL_NAME_COL:
+ name_col = f"{name:<{_TOOL_NAME_COL}}"
+ else:
+ name_col = f"{name} "
self.add_scroll_line(
f"[bold {color}]{escape_markup(name_col)}[/]"
f"[dim]{escape_markup(arg)}[/]"
@@ -550,16 +557,13 @@ def _apply_user(self, raw: dict[str, Any]) -> None:
def _format_tokens(self) -> str:
"""Format token counts as compact ctx/out string."""
parts: list[str] = []
- total_in = self._input_tokens
- if total_in > 0:
- parts.append(f"ctx {format_count(total_in)}")
+ if self._input_tokens > 0:
+ parts.append(f"ctx {format_count(self._input_tokens)}")
if self._output_tokens > 0:
parts.append(f"out {format_count(self._output_tokens)}")
return " · ".join(parts)
def _format_categories(self) -> str:
- if not self._tool_categories:
- return ""
parts = [f"{v} {k}" for k, v in self._tool_categories.items()]
return " · ".join(parts)
@@ -611,11 +615,10 @@ class _IterationSpinner(_LivePanelBase):
"""
def _build_footer(self) -> Table:
- line_count = len(self._scroll_lines)
summary = Text(no_wrap=True, overflow="ellipsis")
- if line_count > 0:
+ if self._scroll_lines:
summary.append(
- _plural(line_count, "line"),
+ _plural(len(self._scroll_lines), "line"),
style=f"bold {_brand.PURPLE}",
)
summary.append(" of agent output", style="dim")
@@ -632,6 +635,10 @@ def _build_footer(self) -> Table:
_FULLSCREEN_CHROME_ROWS = 2
_FULLSCREEN_MIN_VISIBLE = 3
+# Fallback terminal height used before the first render populates the
+# real value, and when ``Console.size.height`` access fails.
+_DEFAULT_CONSOLE_HEIGHT = 40
+
@dataclass(slots=True)
class _ScrollbarMetrics:
@@ -661,8 +668,8 @@ def _scrollbar_metrics(total: int, visible: int, offset: int) -> _ScrollbarMetri
if total <= visible:
return _ScrollbarMetrics(show=False, thumb_start=0, thumb_size=0)
thumb_size = max(1, visible * visible // total)
- max_off = max(total - visible, 1)
- frac = 1.0 - (offset / max_off)
+ # Safe: the early return above guarantees total > visible, so total - visible ≥ 1.
+ frac = 1.0 - (offset / (total - visible))
track_space = visible - thumb_size
thumb_start = int(frac * track_space)
return _ScrollbarMetrics(show=True, thumb_start=thumb_start, thumb_size=thumb_size)
@@ -763,9 +770,8 @@ def scroll_up(self, lines: int = 1) -> None:
def scroll_down(self, lines: int = 1) -> None:
"""Scroll toward newer lines (offset shrinks)."""
- new_offset = max(0, self._offset - lines)
- self._offset = new_offset
- if new_offset == 0:
+ self._offset = max(0, self._offset - lines)
+ if self._offset == 0:
self._auto_scroll = True
def scroll_to_top(self) -> None:
@@ -774,54 +780,47 @@ def scroll_to_top(self) -> None:
self._auto_scroll = False
def scroll_to_bottom(self) -> None:
+ """Snap to the newest line and re-enable follow mode."""
self._offset = 0
self._auto_scroll = True
# ── Iteration navigation ─────────────────────────────────────────
- def _reset_view(self) -> None:
- """Snap to bottom + follow when switching iterations."""
- self._offset = 0
- self._auto_scroll = True
+ def _step_iteration(self, direction: int) -> bool:
+ """Move *direction* iterations (-1 = prev, +1 = next).
- def prev_iteration(self) -> bool:
- """Move to the iteration before the current one. Returns ``True``
- if the view changed; ``False`` when there is no older iteration."""
+ Returns ``True`` when the view changed; ``False`` when already at
+ the boundary in the requested direction. When the current
+ iteration was evicted from the navigator, snaps to the oldest
+ (prev) or newest (next) entry instead of failing.
+ """
ids = self._navigator.iteration_ids()
if not ids:
return False
if self._iteration_id not in ids:
- # Current iteration was evicted — snap to oldest available.
- self._iteration_id = ids[0]
- self._reset_view()
+ self._iteration_id = ids[0] if direction < 0 else ids[-1]
+ self.scroll_to_bottom()
return True
- idx = ids.index(self._iteration_id)
- if idx == 0:
+ new_idx = ids.index(self._iteration_id) + direction
+ if not 0 <= new_idx < len(ids):
return False
- self._iteration_id = ids[idx - 1]
- self._reset_view()
+ self._iteration_id = ids[new_idx]
+ self.scroll_to_bottom()
return True
+ def prev_iteration(self) -> bool:
+ """Move to the iteration before the current one. Returns ``True``
+ if the view changed; ``False`` when there is no older iteration."""
+ return self._step_iteration(-1)
+
def next_iteration(self) -> bool:
"""Move to the iteration after the current one. Returns ``True``
if the view changed; ``False`` when already on the newest."""
- ids = self._navigator.iteration_ids()
- if not ids:
- return False
- if self._iteration_id not in ids:
- self._iteration_id = ids[-1]
- self._reset_view()
- return True
- idx = ids.index(self._iteration_id)
- if idx >= len(ids) - 1:
- return False
- self._iteration_id = ids[idx + 1]
- self._reset_view()
- return True
+ return self._step_iteration(+1)
# ── Rendering ────────────────────────────────────────────────────
- _console_height: int = 40 # updated on every render
+ _console_height: int = _DEFAULT_CONSOLE_HEIGHT # updated on every render
def _build_header(self, total: int, visible: int) -> Text:
header = Text(no_wrap=True, overflow="ellipsis")
@@ -840,12 +839,12 @@ def _build_header(self, total: int, visible: int) -> Text:
header.append("live", style=f"italic {_brand.GREEN}")
else:
source = self._source
- outcome = source._outcome if source is not None else None
+ outcome = source.outcome if source is not None else None
if outcome:
header.append(" · ", style="dim")
header.append(outcome, style=f"italic {_brand.LAVENDER}")
header.append(" · ", style="dim")
- header.append(f"{_plural(total, 'line')}", style="dim")
+ header.append(_plural(total, "line"), style="dim")
if self._auto_scroll:
header.append(" · ", style="dim")
header.append("following", style=f"italic {_brand.GREEN}")
@@ -974,10 +973,10 @@ def __init__(self, console: Console) -> None:
# receiving events). ``None`` between iterations.
self._current_iteration: int | None = None
# Bounded ring buffer of finished iteration panels, keyed by
- # iteration number. Insertion order is tracked separately so
- # eviction is O(1). Used by fullscreen peek for browsing.
+ # iteration number. Python dicts preserve insertion order, so
+ # the oldest entry is always first — used for eviction. Used by
+ # fullscreen peek for browsing.
self._iteration_history: dict[int, _LivePanelBase] = {}
- self._iteration_order: list[int] = []
# Fullscreen peek state — a second Live using Rich's alt-screen
# that shows an iteration's full activity buffer with scroll +
# iteration-navigation controls. While fullscreen is active the
@@ -1044,10 +1043,7 @@ def iteration_ids(self) -> list[int]:
def panel_for(self, iteration_id: int) -> _LivePanelBase | None:
"""Look up the panel for *iteration_id* in history or active state."""
- if (
- self._current_iteration == iteration_id
- and self._active_renderable is not None
- ):
+ if self.is_live(iteration_id):
return self._active_renderable
return self._iteration_history.get(iteration_id)
@@ -1109,28 +1105,27 @@ def _archive_current_iteration_unlocked(self, outcome: str) -> None:
iteration_id = self._current_iteration
panel.freeze(outcome)
# Record (or refresh order of) the iteration in history.
- if iteration_id in self._iteration_history:
- self._iteration_order.remove(iteration_id)
+ # Pop-then-insert moves an existing entry to the end of the dict's
+ # insertion order so eviction always drops the oldest first.
+ self._iteration_history.pop(iteration_id, None)
self._iteration_history[iteration_id] = panel
- self._iteration_order.append(iteration_id)
# Eviction: drop oldest until at or below the cap, but skip the
# iteration the user is currently viewing in fullscreen.
viewing = (
- self._fullscreen_view._iteration_id
+ self._fullscreen_view.iteration_id
if self._fullscreen_view is not None
else None
)
- while len(self._iteration_order) > _MAX_HISTORY_ITERATIONS:
+ while len(self._iteration_history) > _MAX_HISTORY_ITERATIONS:
candidate = next(
- (iid for iid in self._iteration_order if iid != viewing),
+ (iid for iid in self._iteration_history if iid != viewing),
None,
)
if candidate is None:
# All remaining entries are the viewed iteration (impossible
# with one viewer) — bail to avoid an infinite loop.
break
- self._iteration_order.remove(candidate)
- self._iteration_history.pop(candidate, None)
+ self._iteration_history.pop(candidate)
self._active_renderable = None
self._current_iteration = None
@@ -1203,14 +1198,18 @@ def _panel_for_event(self, iteration: int | None) -> _LivePanelBase | None:
return self._active_renderable
def _on_agent_output_line(self, data: AgentOutputLineData) -> None:
+ # When we have structured rendering, raw lines are redundant noise.
+ # ``_structured_agent`` is write-once (set in ``_on_run_started``
+ # before any iteration events flow), so the check is lock-free —
+ # same pattern as ``_on_agent_activity``. Skipping the lock also
+ # avoids contention on every stdout line when running Claude.
+ if self._structured_agent:
+ return
with self._console_lock:
- # When we have structured rendering, raw lines are redundant noise.
- if self._structured_agent:
- return
- line = escape_markup(data["line"])
target = self._panel_for_event(data["iteration"])
if not isinstance(target, _IterationSpinner):
return
+ line = escape_markup(data["line"])
target.add_scroll_line(f"[white]{line}[/]")
self._refresh_live_unlocked(target)
@@ -1240,7 +1239,7 @@ def _on_agent_activity(self, data: AgentActivityData) -> None:
"[dim]peek: live activity unavailable (continuing)[/]"
)
- def emit(self, event: Event) -> None:
+ def emit(self, event: Event[EventData]) -> None:
handler = self._handlers.get(event.type)
if handler is not None:
handler(event.data)
@@ -1248,7 +1247,7 @@ def emit(self, event: Event) -> None:
def _on_run_started(self, data: RunStartedData) -> None:
ralph_name = data["ralph_name"]
agent = data["agent"]
- self._structured_agent = _is_claude_command(agent)
+ self._structured_agent = _agent_renders_structured_peek(agent)
with self._console_lock:
self._console.print(
f"\n[bold {_brand.PURPLE}]{_ICON_PLAY} Running:[/] [bold]{escape_markup(ralph_name)}[/]"
@@ -1296,6 +1295,15 @@ def _start_compact_live_unlocked(self, renderable: _LivePanelBase) -> None:
)
self._live.start()
+ def _stop_compact_live_unlocked(self) -> None:
+ """Stop the compact Live region if active. No-op otherwise.
+
+ Caller must hold ``_console_lock``.
+ """
+ if self._live is not None:
+ self._live.stop()
+ self._live = None
+
def _stop_live_unlocked(self) -> None:
"""Tear down all Live regions and forget the active iteration.
@@ -1309,9 +1317,7 @@ def _stop_live_unlocked(self) -> None:
self._fullscreen_live.stop()
self._fullscreen_live = None
self._fullscreen_view = None
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._active_renderable = None
self._current_iteration = None
@@ -1336,8 +1342,8 @@ def enter_fullscreen(self) -> bool:
if self._fullscreen_view is not None:
return True # already active — no-op
initial_id: int | None = self._current_iteration
- if initial_id is None and self._iteration_order:
- initial_id = self._iteration_order[-1]
+ if initial_id is None:
+ initial_id = next(reversed(self._iteration_history), None)
if initial_id is None or self.panel_for(initial_id) is None:
self._console.print("[dim]Full peek: no iterations yet[/]")
return False
@@ -1345,9 +1351,7 @@ def enter_fullscreen(self) -> bool:
self._fullscreen_view = view
# Stop the compact Live before taking over the terminal so
# the two Rich renderers don't fight for the same console.
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._fullscreen_live = Live(
view,
console=self._console,
@@ -1418,7 +1422,7 @@ def _fullscreen_page_size(self) -> int:
try:
height = self._console.size.height
except Exception:
- height = 40
+ height = _DEFAULT_CONSOLE_HEIGHT
return max(1, height - _FULLSCREEN_CHROME_ROWS - 2)
def handle_key(self, key: str) -> None:
@@ -1447,12 +1451,11 @@ def _handle_fullscreen_key(self, key: str) -> None:
if view is None:
return # raced with exit
if key not in ("q", FULLSCREEN_PEEK_KEY):
- page = self._fullscreen_page_size()
actions: dict[str, Callable[[], object]] = {
"j": lambda: view.scroll_down(1),
"k": lambda: view.scroll_up(1),
- " ": lambda: view.scroll_down(page),
- "b": lambda: view.scroll_up(page),
+ " ": lambda: view.scroll_down(self._fullscreen_page_size()),
+ "b": lambda: view.scroll_up(self._fullscreen_page_size()),
"g": view.scroll_to_top,
"G": view.scroll_to_bottom,
PREV_ITERATION_KEY: view.prev_iteration,
@@ -1472,9 +1475,9 @@ def _on_iteration_started(self, data: IterationStartedData) -> None:
with self._console_lock:
self._peek_broken = False
# Defensive: if a previous iteration didn't archive (engine
- # error), evict it now so we don't leak panel state.
- if self._active_renderable is not None:
- self._archive_current_iteration_unlocked("interrupted")
+ # error), evict it now so we don't leak panel state. The
+ # archive call no-ops when nothing is active.
+ self._archive_current_iteration_unlocked("interrupted")
self._current_iteration = iteration
renderable = self._create_panel_unlocked()
@@ -1524,9 +1527,7 @@ def _on_iteration_ended(
# underlying panel is preserved in history for fullscreen
# browsing. When fullscreen is active there is no compact
# Live to stop; the panel was buffering events directly.
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._archive_current_iteration_unlocked(outcome)
def do_print() -> None:
diff --git a/src/ralphify/_events.py b/src/ralphify/_events.py
index 2c80a53a..8bf912ec 100644
--- a/src/ralphify/_events.py
+++ b/src/ralphify/_events.py
@@ -12,11 +12,12 @@
from enum import Enum
from typing import (
Any,
+ Generic,
Literal,
NotRequired,
Protocol,
TypedDict,
- cast,
+ TypeVar,
runtime_checkable,
)
@@ -82,6 +83,11 @@ class EventType(Enum):
# ── Agent activity (live streaming) ─────────────────────────
AGENT_ACTIVITY = "agent_activity"
AGENT_OUTPUT_LINE = "agent_output_line"
+ TOOL_USE = "tool_use"
+
+ # ── Turn-cap enforcement ────────────────────────────────────
+ ITERATION_TURN_APPROACHING_LIMIT = "iteration_turn_approaching_limit"
+ ITERATION_TURN_CAPPED = "iteration_turn_capped"
# ── Other ───────────────────────────────────────────────────
LOG_MESSAGE = "log_message"
@@ -153,14 +159,36 @@ class AgentOutputLineData(TypedDict):
iteration: int
+class ToolUseData(TypedDict):
+ iteration: int
+ tool_name: str
+ count: int
+
+
+class TurnApproachingLimitData(TypedDict):
+ iteration: int
+ count: int
+ max_turns: int
+
+
+class TurnCappedData(TypedDict):
+ iteration: int
+ count: int
+
+
class LogMessageData(TypedDict):
message: str
level: LogLevel
traceback: NotRequired[str]
+class NoData(TypedDict):
+ """Empty payload for events that carry no data (e.g. ``RUN_PAUSED``)."""
+
+
EventData = (
- RunStartedData
+ NoData
+ | RunStartedData
| RunStoppedData
| IterationStartedData
| IterationEndedData
@@ -169,18 +197,34 @@ class LogMessageData(TypedDict):
| PromptAssembledData
| AgentActivityData
| AgentOutputLineData
+ | ToolUseData
+ | TurnApproachingLimitData
+ | TurnCappedData
| LogMessageData
)
"""Union of all typed event data payloads."""
+# Plain TypeVar (no PEP 696 default) — the Python floor is 3.11; the
+# ``default=`` arg needs 3.13+ or a runtime typing_extensions dep, which
+# would fight the pyyaml-only core. Bare ``Event`` references resolve
+# to the EventData bound.
+DataT = TypeVar("DataT", bound="EventData")
+
+
@dataclass(slots=True)
-class Event:
- """A structured event emitted by the run loop."""
+class Event(Generic[DataT]):
+ """A structured event emitted by the run loop.
+
+ Generic over its payload so embedders can annotate handlers with the
+ concrete data type (``def on(e: Event[IterationEndedData])``) without
+ casting at every access site. ``TypedDict``s are plain dicts at
+ runtime, so :meth:`to_dict` is unaffected.
+ """
type: EventType
run_id: str
- data: dict[str, Any] = field(default_factory=dict)
+ data: DataT = field(default_factory=dict) # empty dict for no-payload events
timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
def to_dict(self) -> dict[str, Any]:
@@ -197,7 +241,7 @@ def to_dict(self) -> dict[str, Any]:
class EventEmitter(Protocol):
"""Protocol for objects that receive run-loop events."""
- def emit(self, event: Event) -> None: ...
+ def emit(self, event: Event[EventData]) -> None: ...
def wants_agent_output_lines(self) -> bool:
"""Return True if this emitter will render AGENT_OUTPUT_LINE events.
@@ -212,7 +256,7 @@ def wants_agent_output_lines(self) -> bool:
class NullEmitter:
"""Discards all events silently."""
- def emit(self, event: Event) -> None:
+ def emit(self, event: Event[EventData]) -> None:
pass
def wants_agent_output_lines(self) -> bool:
@@ -222,10 +266,10 @@ def wants_agent_output_lines(self) -> bool:
class QueueEmitter:
"""Pushes events into a :class:`queue.Queue` for async consumption."""
- def __init__(self, q: queue.Queue[Event] | None = None) -> None:
- self.queue: queue.Queue[Event] = q or queue.Queue()
+ def __init__(self, q: queue.Queue[Event[EventData]] | None = None) -> None:
+ self.queue: queue.Queue[Event[EventData]] = q or queue.Queue()
- def emit(self, event: Event) -> None:
+ def emit(self, event: Event[EventData]) -> None:
self.queue.put(event)
def wants_agent_output_lines(self) -> bool:
@@ -238,7 +282,7 @@ class FanoutEmitter:
def __init__(self, emitters: list[EventEmitter]) -> None:
self._emitters = emitters
- def emit(self, event: Event) -> None:
+ def emit(self, event: Event[EventData]) -> None:
for e in self._emitters:
e.emit(event)
@@ -267,13 +311,12 @@ def __call__(
event_type: EventType,
data: EventData | None = None,
) -> None:
- self._emitter.emit(
- Event(
- type=event_type,
- run_id=self._run_id,
- data=cast(dict[str, Any], data) if data is not None else {},
- )
+ event: Event[EventData] = Event(
+ type=event_type,
+ run_id=self._run_id,
+ data=data if data is not None else NoData(),
)
+ self._emitter.emit(event)
def log_info(self, message: str) -> None:
"""Emit a ``LOG_MESSAGE`` event at info level."""
diff --git a/src/ralphify/_frontmatter.py b/src/ralphify/_frontmatter.py
index a361c241..2a8337db 100644
--- a/src/ralphify/_frontmatter.py
+++ b/src/ralphify/_frontmatter.py
@@ -26,12 +26,27 @@
FIELD_ARGS = "args"
FIELD_CREDIT = "credit"
FIELD_RALPH = "ralph"
+# Promise config keeps the legacy key names. ``completion_signal`` stores the
+# inner promise text, not the surrounding ``...`` markup.
+FIELD_COMPLETION_SIGNAL = "completion_signal"
+FIELD_STOP_ON_COMPLETION_SIGNAL = "stop_on_completion_signal"
+
+# Per-iteration turn-cap configuration — see docs/specs/cli-adapter-layer.md.
+FIELD_MAX_TURNS = "max_turns"
+FIELD_MAX_TURNS_GRACE = "max_turns_grace"
+
+# User-subscribable lifecycle hooks (list of {event, run} mappings).
+FIELD_HOOKS = "hooks"
# Sub-field names within each command mapping.
CMD_FIELD_NAME = "name"
CMD_FIELD_RUN = "run"
CMD_FIELD_TIMEOUT = "timeout"
+# Sub-field names within each hook mapping.
+HOOK_FIELD_EVENT = "event"
+HOOK_FIELD_RUN = "run"
+
# YAML frontmatter delimiter line.
_FRONTMATTER_DELIMITER = "---"
@@ -108,8 +123,7 @@ def parse_frontmatter(text: str) -> tuple[dict[str, Any], str]:
Returns ``(frontmatter_dict, body_text)``.
"""
- if text.startswith(_UTF8_BOM):
- text = text.removeprefix(_UTF8_BOM)
+ text = text.removeprefix(_UTF8_BOM)
fm_raw, body = _extract_frontmatter_block(text)
if fm_raw:
try:
diff --git a/src/ralphify/_output.py b/src/ralphify/_output.py
index 766dfc33..9da59e6b 100644
--- a/src/ralphify/_output.py
+++ b/src/ralphify/_output.py
@@ -71,10 +71,9 @@ def collect_output(
parts: list[str] = []
for stream in (stdout, stderr):
if stream:
- text = ensure_str(stream)
if parts and not parts[-1].endswith("\n"):
parts.append("\n")
- parts.append(text)
+ parts.append(ensure_str(stream))
return "".join(parts)
@@ -130,8 +129,8 @@ def format_duration(seconds: float) -> str:
# latter silently drops 0.5s when the total is even (e.g. 90.5→90).
total = int(seconds + 0.5)
minutes = total // _SECONDS_PER_MINUTE
- secs = total % _SECONDS_PER_MINUTE
if minutes < _MINUTES_PER_HOUR:
+ secs = total % _SECONDS_PER_MINUTE
return f"{minutes}m {secs}s"
hours = minutes // _MINUTES_PER_HOUR
mins = minutes % _MINUTES_PER_HOUR
diff --git a/src/ralphify/_promise.py b/src/ralphify/_promise.py
new file mode 100644
index 00000000..362b78fa
--- /dev/null
+++ b/src/ralphify/_promise.py
@@ -0,0 +1,27 @@
+"""Parse promise completion tags emitted by agents."""
+
+from __future__ import annotations
+
+import re
+
+_PROMISE_TAG_RE = re.compile(r"(.*?)", re.DOTALL)
+
+
+def _normalize_promise_text(text: str) -> str:
+ """Collapse internal whitespace so config and tag payloads compare consistently."""
+ return " ".join(text.split())
+
+
+def parse_promise_tags(text: str | None) -> list[str]:
+ """Return normalized inner text from all well-formed promise tags in *text*."""
+ if not text:
+ return []
+ return [
+ _normalize_promise_text(match.group(1))
+ for match in _PROMISE_TAG_RE.finditer(text)
+ ]
+
+
+def has_promise_completion(text: str | None, completion_signal: str) -> bool:
+ """Return True when *text* contains a matching promise completion tag."""
+ return _normalize_promise_text(completion_signal) in parse_promise_tags(text)
diff --git a/src/ralphify/_resolver.py b/src/ralphify/_resolver.py
index 6281f5d6..2ec44762 100644
--- a/src/ralphify/_resolver.py
+++ b/src/ralphify/_resolver.py
@@ -35,11 +35,10 @@
def resolve_args(prompt: str, user_args: dict[str, str]) -> str:
"""Replace ``{{ args.name }}`` placeholders with user-supplied values.
- When *user_args* is empty, clears any remaining ``{{ args.* }}``
- placeholders so they don't leak into the assembled prompt.
+ Unknown names (and all placeholders when *user_args* is empty) resolve
+ to the empty string, so stray ``{{ args.* }}`` never leak into the
+ assembled prompt.
"""
- if not user_args:
- return _ARGS_RE.sub("", prompt)
def _replace(match: re.Match) -> str:
return user_args.get(match.group(1), "")
diff --git a/src/ralphify/_run_types.py b/src/ralphify/_run_types.py
index 411ec93b..7fe6602f 100644
--- a/src/ralphify/_run_types.py
+++ b/src/ralphify/_run_types.py
@@ -13,13 +13,21 @@
from datetime import datetime
from enum import Enum
from pathlib import Path
+from typing import TYPE_CHECKING
from ralphify._events import STOP_COMPLETED, STOP_ERROR, STOP_USER_REQUESTED, StopReason
+if TYPE_CHECKING:
+ from ralphify.hooks import AgentHook
+
+
DEFAULT_COMMAND_TIMEOUT: float = 60
"""Default timeout in seconds for commands defined in RALPH.md frontmatter."""
+DEFAULT_COMPLETION_SIGNAL = "RALPH_PROMISE_COMPLETE"
+"""Default inner ``...`` text that marks promise completion."""
+
RUN_ID_LENGTH: int = 12
"""Number of hex characters used for generated run IDs."""
@@ -87,7 +95,10 @@ class RunConfig:
agent: str
ralph_dir: Path
- ralph_file: Path
+ ralph_file: Path | None = None
+ # In-memory prompt *body* (no frontmatter). Mutually exclusive with
+ # ``ralph_file``: supply exactly one. Placeholders are still resolved.
+ prompt: str | None = None
commands: list[Command] = field(default_factory=list)
args: dict[str, str] = field(default_factory=dict)
max_iterations: int | None = None
@@ -97,6 +108,22 @@ class RunConfig:
log_dir: Path | None = None
project_root: Path = field(default=Path("."))
credit: bool = True
+ # Inner text expected inside ``...``.
+ completion_signal: str = DEFAULT_COMPLETION_SIGNAL
+ # Stop the run when the configured promise payload is observed.
+ stop_on_completion_signal: bool = False
+ # Per-iteration tool-use cap; None disables the cap.
+ max_turns: int | None = None
+ # Soft wind-down fires at ``max_turns - max_turns_grace``.
+ max_turns_grace: int = 2
+ # User-supplied lifecycle hooks from ``RALPH.md`` frontmatter.
+ hooks: list["AgentHook"] = field(default_factory=list)
+
+ def __post_init__(self) -> None:
+ if (self.prompt is None) == (self.ralph_file is None):
+ raise ValueError(
+ "RunConfig requires exactly one of `prompt` or `ralph_file`"
+ )
@dataclass(slots=True)
@@ -120,6 +147,7 @@ class RunState:
failed: int = 0
timed_out_count: int = 0
started_at: datetime | None = None
+ promise_completed: bool = False
_stop_event: threading.Event = field(
default_factory=threading.Event, init=False, repr=False, compare=False
@@ -134,27 +162,22 @@ def total(self) -> int:
return self.completed + self.failed
def __post_init__(self) -> None:
- # Set initially: the run starts in an unpaused (resumed) state.
self._resume_event.set()
def request_stop(self) -> None:
- """Signal the loop to stop after the current iteration."""
self._stop_event.set()
self._resume_event.set()
def request_pause(self) -> None:
- """Pause the loop between iterations until resumed."""
self.status = RunStatus.PAUSED
self._resume_event.clear()
def request_resume(self) -> None:
- """Resume a paused loop."""
self.status = RunStatus.RUNNING
self._resume_event.set()
@property
def stop_requested(self) -> bool:
- """Whether a stop has been requested."""
return self._stop_event.is_set()
def wait_for_stop(self, timeout: float | None = None) -> bool:
@@ -163,7 +186,6 @@ def wait_for_stop(self, timeout: float | None = None) -> bool:
@property
def paused(self) -> bool:
- """Whether the run is currently paused."""
return not self._resume_event.is_set()
def wait_for_unpause(self, timeout: float | None = None) -> bool:
@@ -171,14 +193,29 @@ def wait_for_unpause(self, timeout: float | None = None) -> bool:
return self._resume_event.wait(timeout=timeout)
def mark_completed(self) -> None:
- """Record a successful iteration."""
self.completed += 1
def mark_failed(self) -> None:
- """Record a failed iteration."""
self.failed += 1
def mark_timed_out(self) -> None:
"""Record a timed-out iteration (also counts as failed)."""
self.timed_out_count += 1
self.mark_failed()
+
+
+@dataclass(frozen=True, slots=True)
+class RunResult:
+ """Immutable snapshot of a run's outcome — status plus iteration counts.
+
+ Built by :meth:`RunManager.get_result` from a :class:`RunState`. The
+ ``timed_out_count`` is a subset of ``failed`` (see :class:`RunState`'s
+ counter invariant), and ``total == completed + failed``.
+ """
+
+ run_id: str
+ status: RunStatus
+ total: int
+ completed: int
+ failed: int
+ timed_out_count: int
diff --git a/src/ralphify/_wind_down_shim.py b/src/ralphify/_wind_down_shim.py
new file mode 100644
index 00000000..02b01acb
--- /dev/null
+++ b/src/ralphify/_wind_down_shim.py
@@ -0,0 +1,134 @@
+"""Hook shim invoked by agent CLIs to inject a soft wind-down message.
+
+Claude (PreToolUse) and Codex (PostToolUse Bash matcher) both treat
+exit-code 0 + a JSON payload on stdout as the standard injection
+channel for hook output. The JSON shape differs per CLI, so this shim
+is dispatched by an ``agent`` argument.
+
+Invocation::
+
+ python -m ralphify._wind_down_shim
+
+The shim reads the running tool-use count from ``counter_path``
+(written by ``_agent.py`` after every parsed tool_use event) and emits
+the wind-down message only when ``count >= cap - grace``. Any failure
+(missing file, malformed integer, unknown agent, missing args) is
+treated as a no-op so a buggy hook can never break the agent loop.
+"""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+from typing import Any
+
+
+CLAUDE = "claude"
+CODEX = "codex"
+
+_VALID_AGENTS: frozenset[str] = frozenset({CLAUDE, CODEX})
+
+
+def _build_message(count: int, cap: int) -> str:
+ """Return the user-facing wind-down sentence.
+
+ Phrased as instruction the agent can act on directly so the next
+ one-or-two turns are spent finishing rather than mid-task work that
+ will be SIGTERM'd anyway.
+ """
+ return (
+ f"You have used {count} of {cap} tool uses. "
+ "Wrap up your work in the next 1-2 turns."
+ )
+
+
+def _claude_payload(message: str) -> dict[str, Any]:
+ """Build the Claude PreToolUse ``additionalContext`` payload.
+
+ Matches the schema documented in the Claude Code hook reference: an
+ ``hookSpecificOutput`` object whose ``hookEventName`` is the source
+ event and whose ``additionalContext`` is appended to the agent's
+ context window before the tool runs.
+ """
+ return {
+ "hookSpecificOutput": {
+ "hookEventName": "PreToolUse",
+ "additionalContext": message,
+ }
+ }
+
+
+def _codex_payload(message: str) -> dict[str, Any]:
+ """Build the Codex PostToolUse ``systemMessage`` payload.
+
+ Codex's hook output schema treats ``systemMessage`` as a free-form
+ string injected into the conversation — the analogue of Claude's
+ ``additionalContext`` for the matched tool event.
+ """
+ return {"systemMessage": message}
+
+
+def _read_counter(path: Path) -> int:
+ """Return the integer stored in *path*, or ``0`` for any failure mode.
+
+ ``_agent.py`` writes ``"0"`` before spawn and atomic-replaces the
+ file on every tool_use event. A missing file just before the first
+ write or a partially-written file mid-rename both reasonably
+ represent ``count == 0``.
+ """
+ try:
+ text = path.read_text(encoding="utf-8").strip()
+ except (FileNotFoundError, OSError):
+ return 0
+ try:
+ return int(text or "0")
+ except ValueError:
+ return 0
+
+
+def _resolve_agent_payload(
+ agent: str,
+ message: str,
+) -> dict[str, Any] | None:
+ if agent == CLAUDE:
+ return _claude_payload(message)
+ if agent == CODEX:
+ return _codex_payload(message)
+ return None
+
+
+def main(argv: list[str]) -> int:
+ """Entry point for ``python -m ralphify._wind_down_shim``.
+
+ All failures return ``0`` (no-op + no output) so a misbehaving hook
+ is observably absent rather than disruptive — the worst case is the
+ soft wind-down does not fire and the hard SIGTERM cap takes over.
+ """
+ if len(argv) < 5:
+ return 0
+ counter_path = Path(argv[1])
+ try:
+ cap = int(argv[2])
+ grace = int(argv[3])
+ except ValueError:
+ return 0
+ agent = argv[4]
+ if agent not in _VALID_AGENTS:
+ return 0
+
+ count = _read_counter(counter_path)
+ threshold = max(cap - grace, 0)
+ if count < threshold:
+ return 0
+
+ payload = _resolve_agent_payload(agent, _build_message(count, cap))
+ if payload is None:
+ return 0
+ json.dump(payload, sys.stdout)
+ sys.stdout.write("\n")
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main(list(sys.argv)))
diff --git a/src/ralphify/adapters/__init__.py b/src/ralphify/adapters/__init__.py
new file mode 100644
index 00000000..6d0a5221
--- /dev/null
+++ b/src/ralphify/adapters/__init__.py
@@ -0,0 +1,76 @@
+"""Pluggable CLI adapter layer.
+
+Each supported agent CLI (Claude, Codex, Copilot, ...) implements the
+:class:`CLIAdapter` protocol in its own module under this package. The
+engine dispatches on :func:`select_adapter` at run time, so adding a new
+CLI means writing one file and registering it in :data:`ADAPTERS` — no
+edits to the engine, emitter, or subprocess machinery.
+
+Adapters translate the CLI's native output format to a common
+:class:`AdapterEvent` stream and advertise capability flags so the core
+loop can gracefully degrade when a CLI lacks structured output or hook
+injection. Process lifecycle (spawn, SIGTERM at cap, reap) stays in
+``_agent.py``; adapters only observe.
+
+The Protocol, :class:`AdapterEvent`, and :data:`ADAPTERS` live in
+:mod:`_protocol` so concrete adapter modules can depend on them
+without cycling through this package's ``__init__``.
+"""
+
+from __future__ import annotations
+
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ AdapterEventKind,
+ CLIAdapter,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+def select_adapter(cmd: list[str]) -> CLIAdapter:
+ """Return the first registered adapter that claims *cmd*.
+
+ Falls back to :class:`GenericAdapter` when nothing matches. Never
+ returns None — callers can always dispatch safely.
+ """
+ from ralphify.adapters._generic import GenericAdapter
+
+ for adapter in ADAPTERS:
+ if adapter.matches(cmd):
+ return adapter
+ return GenericAdapter()
+
+
+def _register_builtin_adapters() -> None:
+ """Import concrete adapter modules so their ``ADAPTERS.append`` runs.
+
+ Keeps the registry populated without forcing callers to import every
+ adapter module manually. Imports are deferred to the bottom of this
+ module (executed once at first package import) so cyclic-import risk
+ is contained.
+ """
+ from ralphify.adapters import ( # noqa: F401
+ claude,
+ codex,
+ copilot,
+ crush,
+ opencode,
+ )
+
+
+_register_builtin_adapters()
+
+
+__all__ = [
+ "ADAPTERS",
+ "AdapterEvent",
+ "AdapterEventKind",
+ "CLIAdapter",
+ "CountsWhat",
+ "Invocation",
+ "select_adapter",
+ "stdin_invocation",
+]
diff --git a/src/ralphify/adapters/_generic.py b/src/ralphify/adapters/_generic.py
new file mode 100644
index 00000000..04193d0b
--- /dev/null
+++ b/src/ralphify/adapters/_generic.py
@@ -0,0 +1,79 @@
+"""Fallback adapter for CLIs with no dedicated implementation.
+
+Returned by :func:`ralphify.adapters.select_adapter` when no specific
+adapter's ``matches`` returns True. All capability flags are False, so
+the core loop treats sessions as blocking, untyped, and uncappable.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+class GenericAdapter:
+ """No-op adapter: pass commands through unchanged, parse nothing."""
+
+ name: str = "generic"
+ counts_what: CountsWhat = "none"
+ supports_streaming: bool = False
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = False
+ # Untyped agents have no streaming result event; the engine must keep
+ # the full stdout buffer if it wants promise detection.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ return False
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ return list(cmd)
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Unknown CLIs are assumed to read the prompt from stdin."""
+ return stdin_invocation(cmd, prompt)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the full stdout for the promise tag.
+
+ Unknown CLIs have no event schema to parse, so the whole-stdout
+ regex scan is the only reliable path. Matches the current
+ engine-side behavior so switching to adapter-owned detection does
+ not regress promise completion for untyped agents.
+
+ *result_text* is unused (the blocking path does not populate it
+ for unknown CLIs); the engine opts into
+ ``requires_full_stdout_for_completion`` to make sure *stdout* is
+ supplied when promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "GenericAdapter does not support soft wind-down; max_turns will hard-kill."
+ )
diff --git a/src/ralphify/adapters/_protocol.py b/src/ralphify/adapters/_protocol.py
new file mode 100644
index 00000000..3d2441f2
--- /dev/null
+++ b/src/ralphify/adapters/_protocol.py
@@ -0,0 +1,149 @@
+"""Adapter Protocol, event type, and registry.
+
+Concrete adapter modules (:mod:`claude`, :mod:`codex`, :mod:`copilot`,
+:mod:`_generic`) import from here rather than from the package
+``__init__``. The package ``__init__`` populates :data:`ADAPTERS` by
+importing concrete adapters, and those adapters need the Protocol
+before the ``__init__`` finishes executing — keeping the Protocol in a
+leaf module makes the import graph acyclic.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Literal, NamedTuple, Protocol, runtime_checkable
+
+
+AdapterEventKind = Literal["tool_use", "turn", "message", "result"]
+"""Categories of events an adapter can surface from a CLI's output stream."""
+
+CountsWhat = Literal["tool_use", "turn", "none"]
+"""What an adapter counts against ``max_turns`` — tool uses, turns, or nothing."""
+
+
+class AdapterEvent(NamedTuple):
+ """A single structured event parsed from a CLI's output stream.
+
+ ``kind`` is the event category; ``name`` carries a tool name for
+ ``tool_use`` events (``None`` otherwise); ``raw`` is the original
+ parsed JSON object so callers can inspect extra fields when needed.
+ """
+
+ kind: AdapterEventKind
+ name: str | None = None
+ raw: dict | None = None
+
+
+class Invocation(NamedTuple):
+ """How to launch an agent with a given prompt.
+
+ ``argv`` is the final spawn command; ``stdin_text`` is the payload to
+ write to the child's stdin, or ``None`` to signal that stdin must not
+ be piped (the caller uses ``DEVNULL`` so the child gets immediate EOF).
+ """
+
+ argv: list[str]
+ stdin_text: str | None
+
+
+def stdin_invocation(cmd: list[str], prompt: str) -> Invocation:
+ """Build the default stdin-delivery invocation for *cmd* and *prompt*.
+
+ Shared by every adapter that pipes the prompt to the child's stdin
+ (claude / codex / copilot / generic) so each does not hand-roll the
+ same ``Invocation(list(cmd), prompt)`` construction.
+ """
+ return Invocation(argv=list(cmd), stdin_text=prompt)
+
+
+@runtime_checkable
+class CLIAdapter(Protocol):
+ """Protocol every CLI adapter must satisfy.
+
+ Adapters are stateless singletons: the same instance is reused for
+ every iteration of every run. Any per-iteration state lives in the
+ caller (``_agent.py``) — adapters only translate.
+ """
+
+ name: str
+ counts_what: CountsWhat
+ supports_streaming: bool
+ renders_structured_peek: bool
+ supports_soft_wind_down: bool
+ requires_full_stdout_for_completion: bool
+
+ def matches(self, cmd: list[str]) -> bool:
+ """Return True if this adapter handles the given agent command."""
+ ...
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Return the command with any adapter-required flags appended.
+
+ Idempotent: calling twice returns the same command.
+ """
+ ...
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Return the final argv and the stdin payload for this prompt.
+
+ *cmd* is the already-flag-injected command (output of
+ :meth:`build_command`). stdin adapters return
+ ``Invocation(cmd, prompt)`` (via :func:`stdin_invocation`);
+ arg-delivery adapters append the prompt to argv and return
+ ``stdin_text=None`` so the caller pipes ``DEVNULL`` instead.
+ """
+ ...
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse one line of stdout into an :class:`AdapterEvent`.
+
+ Returns ``None`` for lines that are not recognised events.
+ MUST NOT raise on malformed input (per FR-8).
+ """
+ ...
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Return True if the agent's final output contains the completion signal.
+
+ The signal is wrapped in ``...`` markup; the
+ inner text equals ``user_signal``.
+
+ Adapters receive both the streaming-extracted *result_text* (the
+ terminal assistant message, when the streaming path could parse one)
+ and the full *stdout* buffer (only present when the engine chose to
+ capture it). Adapters with ``requires_full_stdout_for_completion``
+ set False MUST be able to detect completion from *result_text* alone;
+ engines may pass ``stdout=None`` to skip the memory cost.
+ """
+ ...
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ """Write hook config files into *tempdir* and return env-var overrides.
+
+ Only called when ``supports_soft_wind_down`` is True. Adapters that
+ set the flag False may leave this unimplemented (a ``NotImplementedError``
+ is acceptable and is treated as a runtime downgrade to hard-cap-only).
+ """
+ ...
+
+
+ADAPTERS: list[CLIAdapter] = []
+"""Adapter registry, populated at import time by concrete adapter modules.
+
+Ordering matters: :func:`ralphify.adapters.select_adapter` returns the
+first adapter whose ``matches`` method returns True, with
+:class:`ralphify.adapters._generic.GenericAdapter` as a final catch-all.
+Specific adapters go first, generic last.
+"""
diff --git a/src/ralphify/adapters/claude.py b/src/ralphify/adapters/claude.py
new file mode 100644
index 00000000..18fba7c9
--- /dev/null
+++ b/src/ralphify/adapters/claude.py
@@ -0,0 +1,234 @@
+"""Claude Code adapter.
+
+Claude is the only CLI shipping a stable ``--output-format stream-json``
+protocol today; its structured events drive the peek panel and power
+per-event tool-use counting. Every Claude iteration emits:
+
+1. A ``system`` init event with the model name.
+2. Zero or more ``assistant`` messages whose ``content`` list may include
+ ``tool_use``, ``thinking``, and ``text`` blocks.
+3. Zero or more ``user`` messages (tool results echoed back).
+4. A terminal ``result`` event carrying the final assistant text.
+
+Tool-use counting is scoped to ``assistant`` messages; we ignore
+``tool_use`` blocks echoed back by ``user`` events so each invocation is
+counted exactly once.
+"""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+CLAUDE_BINARY_STEM = "claude"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the Claude CLI."""
+
+_OUTPUT_FORMAT_FLAG = "--output-format"
+_OUTPUT_FORMAT_VALUE = "stream-json"
+_VERBOSE_FLAG = "--verbose"
+
+_EVENT_TYPE_ASSISTANT = "assistant"
+_EVENT_TYPE_RESULT = "result"
+_BLOCK_TYPE_TOOL_USE = "tool_use"
+
+_SETTINGS_FILENAME = "settings.json"
+"""File Claude reads for hook configuration when ``CLAUDE_CONFIG_DIR`` is set."""
+
+_HOOK_EVENT = "PreToolUse"
+"""Claude hook stage that fires before each tool invocation."""
+
+
+class ClaudeAdapter:
+ """Parses Claude's stream-json output and supports soft wind-down."""
+
+ name: str = "claude"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = True
+ renders_structured_peek: bool = True
+ supports_soft_wind_down: bool = True
+ # Claude's final assistant text already arrives as ``agent.result_text``
+ # via the stream-json ``result`` event, so the engine does not need to
+ # buffer the full stdout to scan for the promise tag.
+ requires_full_stdout_for_completion: bool = False
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == CLAUDE_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Ensure ``--output-format stream-json --verbose`` is present.
+
+ Idempotent: running twice yields the same command. If the caller
+ already supplied ``--output-format ``, the existing value is
+ overwritten with ``stream-json`` — we cannot honor a user-chosen
+ format while still emitting a parseable event stream.
+ """
+ result = list(cmd)
+ try:
+ format_index = result.index(_OUTPUT_FORMAT_FLAG)
+ except ValueError:
+ result.extend([_OUTPUT_FORMAT_FLAG, _OUTPUT_FORMAT_VALUE])
+ else:
+ value_index = format_index + 1
+ if value_index < len(result):
+ result[value_index] = _OUTPUT_FORMAT_VALUE
+ else:
+ result.append(_OUTPUT_FORMAT_VALUE)
+ if _VERBOSE_FLAG not in result:
+ result.append(_VERBOSE_FLAG)
+ return result
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Claude reads the prompt from stdin (``-p`` non-interactive mode)."""
+ return stdin_invocation(cmd, prompt)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse one stream-json line into an :class:`AdapterEvent`.
+
+ Empty lines, non-JSON payloads, and non-dict JSON return ``None``.
+ ``result`` events return ``AdapterEvent(kind="result")``. An
+ ``assistant`` event whose content contains a ``tool_use`` block
+ returns the first such block as ``AdapterEvent(kind="tool_use")``;
+ Claude emits one tool_use block per assistant message, so
+ single-event dispatch matches the protocol. Every other parsed
+ event dict — including non-tool-use ``assistant`` messages —
+ returns ``AdapterEvent(kind="message")`` so callers can still
+ render them without counting against the turn cap.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = parsed.get("type")
+ if event_type == _EVENT_TYPE_RESULT:
+ return AdapterEvent(kind="result", raw=parsed)
+ if event_type != _EVENT_TYPE_ASSISTANT:
+ return AdapterEvent(kind="message", raw=parsed)
+
+ for block in _iter_content_blocks(parsed):
+ if block.get("type") == _BLOCK_TYPE_TOOL_USE:
+ name = block.get("name")
+ return AdapterEvent(
+ kind="tool_use",
+ name=name if isinstance(name, str) else None,
+ raw=parsed,
+ )
+ return AdapterEvent(kind="message", raw=parsed)
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the streaming-extracted result text for ``{signal}``.
+
+ Claude's terminal ``result`` event carries the last assistant
+ message as a plain string, which the streaming reader already
+ captures into :attr:`AgentResult.result_text`. Using *result_text*
+ directly avoids buffering the full stdout — large transcripts can
+ run into many megabytes per iteration.
+
+ Only the parsed result text is considered — raw JSON from
+ ``status`` or ``assistant`` messages can legitimately echo
+ ``...`` substrings that must not trigger
+ completion.
+
+ *stdout* is unused (Claude does not need a fallback because the
+ streaming path always populates *result_text* on a successful run);
+ it stays in the signature for protocol parity.
+ """
+ del stdout
+ if result_text is None:
+ return False
+ return has_promise_completion(result_text, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ """Write Claude's ``settings.json`` and return a ``CLAUDE_CONFIG_DIR`` override.
+
+ The settings file registers a ``PreToolUse`` hook that invokes
+ :mod:`ralphify._wind_down_shim` with the per-iteration counter
+ path. Spawning Claude with ``CLAUDE_CONFIG_DIR=``
+ isolates the hook from the user's real ``~/.claude`` config so a
+ crash leaves no global side effects.
+ """
+ settings_path = tempdir / _SETTINGS_FILENAME
+ command = _build_shim_command(counter_path, cap, grace)
+ settings_path.write_text(
+ json.dumps(_build_settings_payload(command), indent=2),
+ encoding="utf-8",
+ )
+ return {"CLAUDE_CONFIG_DIR": str(tempdir)}
+
+
+def _iter_content_blocks(raw: dict) -> list[dict]:
+ """Return the ``message.content`` list, filtered to dict blocks only."""
+ message = raw.get("message")
+ if not isinstance(message, dict):
+ return []
+ content = message.get("content")
+ if not isinstance(content, list):
+ return []
+ return [block for block in content if isinstance(block, dict)]
+
+
+def _build_shim_command(counter_path: Path, cap: int, grace: int) -> str:
+ """Return the shell command string Claude's hook runner will execute.
+
+ Uses ``sys.executable`` so the shim runs under the same Python that
+ spawned ralphify — avoids relying on a system ``python`` on PATH.
+ """
+ return (
+ f"{sys.executable} -m ralphify._wind_down_shim "
+ f"{counter_path} {cap} {grace} claude"
+ )
+
+
+def _build_settings_payload(command: str) -> dict:
+ """Return the JSON dict written to ``settings.json``.
+
+ The shape matches Claude Code's hook reference: the top-level
+ ``hooks`` mapping keys event names to a list of matcher groups, each
+ of which carries an inner ``hooks`` list of ``{type, command}``
+ entries.
+ """
+ return {
+ "hooks": {
+ _HOOK_EVENT: [
+ {
+ "matcher": "*", # fire on every tool
+ "hooks": [
+ {"type": "command", "command": command},
+ ],
+ }
+ ]
+ }
+ }
+
+
+ADAPTERS.append(ClaudeAdapter())
diff --git a/src/ralphify/adapters/codex.py b/src/ralphify/adapters/codex.py
new file mode 100644
index 00000000..efb3dfe3
--- /dev/null
+++ b/src/ralphify/adapters/codex.py
@@ -0,0 +1,259 @@
+"""Codex CLI adapter.
+
+Codex emits newline-delimited JSON with explicit event types:
+
+- ``TurnStarted`` / ``TurnCompleted`` — conversation turn boundaries.
+- ``CollabToolCall`` / ``McpToolCall`` — tool invocations initiated by
+ the agent.
+- ``CommandExecution`` — shell commands run inside the sandbox.
+
+We map every tool-call event to ``AdapterEvent(kind="tool_use", ...)``
+so the turn-cap counter uses the same user-facing metric across CLIs.
+Turn boundaries surface as ``kind="turn"`` events for adapters that want
+them; they do not count against ``max_turns`` today (counts_what is
+``tool_use``, not ``turn``, for a unified metric — see spec Q13).
+"""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+CODEX_BINARY_STEM = "codex"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the Codex CLI."""
+
+_JSON_FLAG = "--json"
+"""Flag appended to request newline-delimited JSON output."""
+
+_TURN_EVENTS: frozenset[str] = frozenset({"TurnStarted", "TurnCompleted"})
+_TOOL_CALL_EVENTS: frozenset[str] = frozenset(
+ {"CollabToolCall", "McpToolCall", "CommandExecution"}
+)
+_RESULT_EVENTS: frozenset[str] = frozenset({"TaskComplete", "TurnCompleted"})
+
+_HOOKS_FILENAME = "hooks.json"
+"""File Codex reads from ``$CODEX_HOME`` to discover hook scripts."""
+
+_CONFIG_FILENAME = "config.toml"
+"""Codex config file; the hooks feature is enabled here via the ``[features]`` table."""
+
+_HOOK_EVENT = "PostToolUse"
+"""Codex hook stage that fires after each tool invocation — the earliest
+deterministic point where the counter file has an authoritative value."""
+
+_FEATURE_FLAG_TOML = "[features]\nhooks = true\n"
+"""Minimal config.toml body that enables the hook runner.
+
+Codex reads the hooks feature flag from the ``[features]`` table (maturity
+Stable, default ``true``); writing it explicitly keeps the per-iteration
+``CODEX_HOME`` self-contained regardless of the user's real config.
+"""
+
+
+class CodexAdapter:
+ """Parses Codex's ``--json`` event stream."""
+
+ name: str = "codex"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = True
+ # Codex emits structured JSON that the streaming execution path parses
+ # for activity callbacks, but the console peek panel only understands
+ # Claude's stream-json schema today. Keep peek in raw-line mode until
+ # the emitter can render Codex events directly.
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = True
+ # Codex's terminal text lives inside ``TaskComplete`` / ``TurnCompleted``
+ # events, which the streaming reader does not extract into
+ # ``agent.result_text``. The full stdout buffer is currently the only
+ # source for promise-tag scanning.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == CODEX_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Append ``--json`` to request structured output. Idempotent."""
+ result = list(cmd)
+ if _JSON_FLAG not in result:
+ result.append(_JSON_FLAG)
+ return result
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Codex reads the prompt from stdin (``codex exec -``)."""
+ return stdin_invocation(cmd, prompt)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Classify one JSONL line as turn / tool_use / message / result.
+
+ Unknown event types return ``AdapterEvent(kind="message", ...)`` so
+ callers can still render them (e.g. peek panel) without counting
+ them against the turn cap. Malformed lines return ``None``.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = _event_type(parsed)
+ if event_type in _TOOL_CALL_EVENTS:
+ return AdapterEvent(
+ kind="tool_use",
+ name=_tool_name(parsed, event_type),
+ raw=parsed,
+ )
+ if event_type in _RESULT_EVENTS:
+ return AdapterEvent(kind="result", raw=parsed)
+ if event_type in _TURN_EVENTS:
+ return AdapterEvent(kind="turn", raw=parsed)
+ return AdapterEvent(kind="message", raw=parsed)
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan every ``TurnCompleted`` / ``TaskComplete`` event for the promise tag.
+
+ Codex does not carry a single terminal ``result`` string the way
+ Claude does; completion may be spread across assistant text in
+ multiple events. Falling back to a whole-stdout scan is safe
+ because promise tags are explicit and non-ambiguous markup.
+
+ *result_text* is unused — Codex never populates it through the
+ streaming reader (no ``{"type":"result"}`` lines). The engine
+ opts into ``requires_full_stdout_for_completion`` to make sure
+ *stdout* is supplied when promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ if has_promise_completion(stdout, user_signal):
+ return True
+ for line in stdout.splitlines():
+ stripped = line.strip()
+ if not stripped:
+ continue
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ continue
+ if isinstance(parsed, dict) and _event_type(parsed) in _RESULT_EVENTS:
+ text = _event_text_payload(parsed)
+ if text and has_promise_completion(text, user_signal):
+ return True
+ return False
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ """Write Codex's ``hooks.json`` + ``config.toml`` and override ``CODEX_HOME``.
+
+ Codex's hook system is gated behind a feature flag in
+ ``config.toml``; this method writes both files atomically into
+ *tempdir* and points the CLI at it via ``CODEX_HOME`` so the
+ user's real ``~/.codex`` config stays untouched.
+ """
+ command = _build_shim_command(counter_path, cap, grace)
+ (tempdir / _HOOKS_FILENAME).write_text(
+ json.dumps(_build_hooks_payload(command), indent=2),
+ encoding="utf-8",
+ )
+ (tempdir / _CONFIG_FILENAME).write_text(_FEATURE_FLAG_TOML, encoding="utf-8")
+ return {"CODEX_HOME": str(tempdir)}
+
+
+def _build_shim_command(counter_path: Path, cap: int, grace: int) -> str:
+ """Return the shell command Codex's hook runner executes."""
+ return (
+ f"{sys.executable} -m ralphify._wind_down_shim "
+ f"{counter_path} {cap} {grace} codex"
+ )
+
+
+def _build_hooks_payload(command: str) -> dict:
+ """Return the JSON dict written to ``hooks.json``."""
+ return {
+ _HOOK_EVENT: [
+ {
+ "matcher": "*", # fire on every tool
+ "hooks": [
+ {"type": "command", "command": command},
+ ],
+ }
+ ]
+ }
+
+
+def _event_type(parsed: dict) -> str | None:
+ """Return the Codex event type, whether top-level or nested under ``type``."""
+ event_type = parsed.get("type") or parsed.get("kind")
+ if isinstance(event_type, str):
+ return event_type
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ nested = msg.get("type") or msg.get("kind")
+ if isinstance(nested, str):
+ return nested
+ return None
+
+
+def _tool_name(parsed: dict, event_type: str | None) -> str | None:
+ """Best-effort extraction of the tool name from a tool-call event.
+
+ Codex event shapes vary by tool type — ``CommandExecution`` carries a
+ command, ``CollabToolCall`` a tool name, ``McpToolCall`` a server +
+ tool. When no specific name is available, return the event type.
+ """
+ for key in ("name", "tool", "tool_name"):
+ value = parsed.get(key)
+ if isinstance(value, str):
+ return value
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ for key in ("name", "tool", "tool_name", "command"):
+ value = msg.get(key)
+ if isinstance(value, str):
+ return value
+ return event_type
+
+
+def _event_text_payload(parsed: dict) -> str | None:
+ """Extract any final-assistant text from a Codex result event."""
+ for key in ("result", "text", "content", "output"):
+ value = parsed.get(key)
+ if isinstance(value, str):
+ return value
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ for key in ("result", "text", "content", "output"):
+ value = msg.get(key)
+ if isinstance(value, str):
+ return value
+ return None
+
+
+ADAPTERS.append(CodexAdapter())
diff --git a/src/ralphify/adapters/copilot.py b/src/ralphify/adapters/copilot.py
new file mode 100644
index 00000000..2422c2d7
--- /dev/null
+++ b/src/ralphify/adapters/copilot.py
@@ -0,0 +1,161 @@
+"""GitHub Copilot CLI adapter (alpha).
+
+The standalone ``copilot`` binary (GA 2026-02-25) ships a
+``--output-format json`` mode that is **only loosely documented**. This
+adapter does best-effort counting based on the empirical shapes seen in
+the ralphify test corpus; unknown event types return ``None`` rather
+than crashing.
+
+Capability matrix:
+
+- ``counts_what = "tool_use"`` with an alpha caveat — counting accuracy
+ depends on ongoing schema discovery (see :file:`docs/agents.md`).
+- ``supports_streaming = False`` — event schema is unverified, so the
+ adapter falls through the blocking path and avoids per-line parsing.
+- ``renders_structured_peek = False`` — peek panel stays in raw-line mode.
+- ``supports_soft_wind_down = False`` — Copilot has no hook system as of
+ 2026-04, so ``install_wind_down_hook`` raises :class:`NotImplementedError`
+ (which the engine downgrades to hard-cap-only).
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+COPILOT_BINARY_STEM = "copilot"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the standalone Copilot CLI.
+
+Note: this is the GA ``copilot`` binary, NOT the ``gh copilot`` subcommand.
+The ``gh`` stem is deliberately excluded because ``gh`` hosts many other
+subcommands that have nothing to do with AI agents.
+"""
+
+_OUTPUT_FORMAT_FLAGS: tuple[str, ...] = ("--output-format", "json")
+
+_TOOL_USE_EVENT_TYPES: frozenset[str] = frozenset(
+ {"tool_use", "tool_call", "ToolCall", "ToolUse"}
+)
+_RESULT_EVENT_TYPES: frozenset[str] = frozenset(
+ {"result", "response", "final", "Final", "Complete"}
+)
+
+
+class CopilotAdapter:
+ """Best-effort adapter for the standalone Copilot CLI."""
+
+ name: str = "copilot"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = False
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = False
+ # Copilot runs on the blocking path with no stream parsing today; the
+ # promise tag must be located somewhere in the captured stdout.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == COPILOT_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Ensure ``--output-format json`` is present.
+
+ Idempotent: running twice yields the same command. If the caller
+ already supplied ``--output-format ``, the existing value is
+ overwritten with ``json`` — we cannot honor a user-chosen format
+ while still emitting a parseable event stream.
+ """
+ result = list(cmd)
+ output_format_flag, output_format_value = _OUTPUT_FORMAT_FLAGS
+ try:
+ format_index = result.index(output_format_flag)
+ except ValueError:
+ result.extend(_OUTPUT_FORMAT_FLAGS)
+ else:
+ value_index = format_index + 1
+ if value_index < len(result):
+ result[value_index] = output_format_value
+ else:
+ result.append(output_format_value)
+ return result
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Copilot reads the prompt from stdin (blocking path)."""
+ return stdin_invocation(cmd, prompt)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse best-effort; return ``None`` for unknown shapes.
+
+ The Copilot event schema is ``[unverified]`` in the spec — this
+ method intentionally errs on the side of *not* inventing events
+ so the turn cap is never inflated by false positives.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = parsed.get("type") or parsed.get("event") or parsed.get("kind")
+ if not isinstance(event_type, str):
+ return None
+
+ if event_type in _TOOL_USE_EVENT_TYPES:
+ name = parsed.get("name") or parsed.get("tool")
+ return AdapterEvent(
+ kind="tool_use",
+ name=name if isinstance(name, str) else None,
+ raw=parsed,
+ )
+ if event_type in _RESULT_EVENT_TYPES:
+ return AdapterEvent(kind="result", raw=parsed)
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the entire stdout for the promise tag.
+
+ Without a verified event schema there is no reliable per-event
+ extraction path; the whole-stdout scan is the safest fallback.
+ *result_text* is unused — Copilot runs on the blocking path and
+ does not produce a streaming result event today.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "Copilot CLI has no hook system as of 2026-04; max_turns "
+ "will hard-kill without soft wind-down signal."
+ )
+
+
+ADAPTERS.append(CopilotAdapter())
diff --git a/src/ralphify/adapters/crush.py b/src/ralphify/adapters/crush.py
new file mode 100644
index 00000000..e47ead68
--- /dev/null
+++ b/src/ralphify/adapters/crush.py
@@ -0,0 +1,130 @@
+"""Charm Crush CLI adapter.
+
+Crush (https://github.com/charmbracelet/crush) is TUI-first but ships a
+``crush run`` subcommand for non-interactive single-prompt use:
+
+ crush run "" # prompt as positional args
+ echo "" | crush run # prompt piped on stdin
+
+``crush run`` auto-approves every permission request for the duration of
+the invocation, so it runs fully autonomously in a loop without a
+``--yolo``-style flag. A provider must be configured first (via env vars
+such as ``ANTHROPIC_API_KEY`` or a ``crush.json``); otherwise ``run`` exits
+with "no providers configured".
+
+Capability matrix:
+
+- ``counts_what = "none"`` — crush emits **plain text / markdown only**.
+ There is no ``--json`` / ``--output-format`` / streaming-event mode, so
+ there are no tool-use or turn events to count against ``max_turns``.
+- ``supports_streaming = False`` — no parseable event stream; the adapter
+ runs on the blocking path and never parses per-line events.
+- ``renders_structured_peek = False`` — peek panel stays in raw-line mode.
+- ``supports_soft_wind_down = False`` — crush has no hook system, so
+ ``install_wind_down_hook`` raises :class:`NotImplementedError` (the
+ engine downgrades this to hard-cap-only).
+- ``requires_full_stdout_for_completion = True`` — with no streaming
+ result event, promise detection scans the full stdout buffer.
+
+Because crush gives ralphify no structured output, this adapter behaves
+like the generic stdin adapter; its job is to claim the ``crush`` binary
+stem, inject the headless ``--quiet`` flag, and provide a named home for a
+future JSON-output upgrade should Crush add one.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+ stdin_invocation,
+)
+
+
+CRUSH_BINARY_STEM = "crush"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the Crush CLI."""
+
+_QUIET_FLAGS: frozenset[str] = frozenset({"--quiet", "-q"})
+
+
+class CrushAdapter:
+ """Runs ``crush run`` on the blocking path; crush has no structured output."""
+
+ name: str = "crush"
+ counts_what: CountsWhat = "none"
+ supports_streaming: bool = False
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = False
+ # crush emits no streaming result event; the full stdout buffer is the
+ # only source for promise-tag scanning.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == CRUSH_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Append ``--quiet`` to hide crush's spinner in non-interactive runs.
+
+ Idempotent: skips injection when the caller already supplied
+ ``--quiet`` or its ``-q`` short form. ``--quiet`` is a ``run``
+ subcommand flag, so it is appended after any existing args (e.g.
+ ``crush run`` -> ``crush run --quiet``).
+ """
+ result = list(cmd)
+ if _QUIET_FLAGS.isdisjoint(result):
+ result.append("--quiet")
+ return result
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """crush reads the prompt from stdin (blocking path)."""
+ return stdin_invocation(cmd, prompt)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """crush has no structured event stream; nothing to parse.
+
+ Returns ``None`` unconditionally, like the generic adapter. Never
+ raises (per FR-8).
+ """
+ del line
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the full stdout for the ``{signal}`` tag.
+
+ *result_text* is unused — crush runs on the blocking path and emits
+ no streaming result event. The engine opts into
+ ``requires_full_stdout_for_completion`` so *stdout* is supplied when
+ promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "crush has no hook system; soft wind-down is unavailable and "
+ "max_turns will hard-kill without a wind-down signal."
+ )
+
+
+ADAPTERS.append(CrushAdapter())
diff --git a/src/ralphify/adapters/opencode.py b/src/ralphify/adapters/opencode.py
new file mode 100644
index 00000000..7176acfa
--- /dev/null
+++ b/src/ralphify/adapters/opencode.py
@@ -0,0 +1,189 @@
+"""opencode CLI adapter.
+
+opencode delivers the prompt as a positional argument (``opencode run
+""``) rather than on stdin, and emits newline-delimited JSON when
+invoked with ``--format json``. Each line looks like::
+
+ {"type": "", "part": {...}}
+
+This adapter is the first *arg-delivery* adapter: :meth:`deliver_prompt`
+appends the prompt to argv and returns ``stdin_text=None`` so ``_agent.py``
+spawns the child with ``stdin=DEVNULL`` and runs no writer thread.
+
+Event mapping:
+
+- ``tool_use`` -> ``AdapterEvent(kind="tool_use", ...)`` (name best-effort
+ from ``part``).
+- ``step_finish`` -> ``AdapterEvent(kind="result", ...)`` (carries token /
+ cost data in ``part`` that this adapter does not surface).
+- ``step_start`` / ``text`` / ``reasoning`` / ``error`` ->
+ ``AdapterEvent(kind="message")`` so callers can render them without
+ counting against the turn cap.
+- unknown / malformed -> ``None`` (MUST NOT raise, for parity with the
+ other adapters).
+
+Completion detection mirrors :mod:`codex`: opencode has no terminal
+``{"type":"result"}`` line that the streaming reader extracts into
+``result_text``, so the adapter scans the full stdout buffer for the
+``...`` tag.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ CountsWhat,
+ Invocation,
+)
+
+
+OPENCODE_BINARY_STEM = "opencode"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the opencode CLI."""
+
+_FORMAT_FLAGS: tuple[str, ...] = ("--format", "json")
+
+_TOOL_USE_EVENT = "tool_use"
+_RESULT_EVENT = "step_finish"
+# opencode v1.15.x emits these informational events (verified against the
+# run.ts emit() call sites): they render in the peek panel without counting
+# against the turn cap. There is no ``tool_result`` event — tool output rides
+# on the ``tool_use`` event once the tool part reaches completed/error status.
+_MESSAGE_EVENTS: frozenset[str] = frozenset(
+ {"step_start", "text", "reasoning", "error"}
+)
+
+
+class OpenCodeAdapter:
+ """Parses opencode's ``--format json`` event stream; delivers prompt by arg."""
+
+ name: str = "opencode"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = True
+ # The console peek panel only understands Claude's stream-json schema
+ # today, so keep opencode in raw-line peek mode (as with codex).
+ renders_structured_peek: bool = False
+ # opencode has no hook system; soft wind-down is a Phase-3 stub anyway.
+ supports_soft_wind_down: bool = False
+ # opencode emits no terminal ``{"type":"result"}`` line that the
+ # streaming reader extracts into ``result_text``; the full stdout
+ # buffer is the only source for promise-tag scanning.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == OPENCODE_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Ensure ``--format json`` is present.
+
+ Idempotent: running twice yields the same command. If the caller
+ already supplied ``--format ``, the existing value is
+ overwritten with ``json`` — we cannot honor a user-chosen format
+ while still emitting a parseable event stream.
+ """
+ result = list(cmd)
+ format_flag, format_value = _FORMAT_FLAGS
+ try:
+ format_index = result.index(format_flag)
+ except ValueError:
+ result.extend(_FORMAT_FLAGS)
+ else:
+ value_index = format_index + 1
+ if value_index < len(result):
+ result[value_index] = format_value
+ else:
+ result.append(format_value)
+ return result
+
+ def deliver_prompt(self, cmd: list[str], prompt: str) -> Invocation:
+ """Append the prompt as a positional arg; opencode does not read stdin."""
+ return Invocation(argv=[*cmd, prompt], stdin_text=None)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Classify one JSONL line as tool_use / result / message.
+
+ Empty lines, non-JSON payloads, and non-dict JSON return ``None``.
+ Never raises on garbage input (FR-5).
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = parsed.get("type")
+ if event_type == _TOOL_USE_EVENT:
+ return AdapterEvent(
+ kind="tool_use",
+ name=_tool_name(parsed),
+ raw=parsed,
+ )
+ if event_type == _RESULT_EVENT:
+ return AdapterEvent(kind="result", raw=parsed)
+ if event_type in _MESSAGE_EVENTS:
+ return AdapterEvent(kind="message", raw=parsed)
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the full stdout for the ``{signal}`` tag.
+
+ *result_text* is unused — opencode never populates it through the
+ streaming reader (no ``{"type":"result"}`` lines). The engine opts
+ into ``requires_full_stdout_for_completion`` to make sure *stdout*
+ is supplied when promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "opencode has no hook system; soft wind-down is scheduled for "
+ "Phase 3 of the CLI adapter layer spec."
+ )
+
+
+def _tool_name(parsed: dict) -> str | None:
+ """Best-effort extraction of the tool name from a ``tool_use`` event.
+
+ opencode nests event data under ``part``; the tool name may live there
+ or at the top level depending on the build. Returns ``None`` when no
+ string name is found.
+ """
+ for key in ("name", "tool", "tool_name"):
+ value = parsed.get(key)
+ if isinstance(value, str):
+ return value
+ part = parsed.get("part")
+ if isinstance(part, dict):
+ for key in ("name", "tool", "tool_name"):
+ value = part.get(key)
+ if isinstance(value, str):
+ return value
+ return None
+
+
+ADAPTERS.append(OpenCodeAdapter())
diff --git a/src/ralphify/cli.py b/src/ralphify/cli.py
index 6652d535..3aee6d1f 100644
--- a/src/ralphify/cli.py
+++ b/src/ralphify/cli.py
@@ -27,11 +27,18 @@
CMD_FIELD_NAME,
CMD_FIELD_RUN,
CMD_FIELD_TIMEOUT,
+ HOOK_FIELD_EVENT,
+ HOOK_FIELD_RUN,
NAME_RE,
FIELD_AGENT,
FIELD_ARGS,
FIELD_COMMANDS,
FIELD_CREDIT,
+ FIELD_COMPLETION_SIGNAL,
+ FIELD_HOOKS,
+ FIELD_MAX_TURNS,
+ FIELD_MAX_TURNS_GRACE,
+ FIELD_STOP_ON_COMPLETION_SIGNAL,
RALPH_MARKER,
VALID_NAME_CHARS_MSG,
parse_frontmatter,
@@ -39,12 +46,14 @@
from ralphify._output import IS_WINDOWS
from ralphify._run_types import (
Command,
+ DEFAULT_COMPLETION_SIGNAL,
DEFAULT_COMMAND_TIMEOUT,
RunConfig,
RunState,
generate_run_id,
)
from ralphify.engine import run_loop
+from ralphify.hooks import HOOK_EVENT_NAMES, AgentHook, ShellAgentHook
if IS_WINDOWS:
sys.stdout.reconfigure(encoding="utf-8")
@@ -56,13 +65,11 @@
def _exit_error(msg: str) -> NoReturn:
- """Print an error in red and exit with code 1."""
_console.print(f"[red]{escape_markup(msg)}[/]")
raise typer.Exit(1)
def _is_nonempty_string(value: Any) -> bool:
- """Return True if *value* is a non-empty string (after stripping whitespace)."""
return isinstance(value, str) and bool(value.strip())
@@ -128,6 +135,10 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None:
run: git log --oneline -5
args:
- focus
+# Optional early exit: uncomment both lines and have the agent emit
+# COMPLETE when the task is done.
+# completion_signal: COMPLETE
+# stop_on_completion_signal: true
---
You are an autonomous coding agent running in a loop. Each iteration
@@ -146,6 +157,7 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None:
- Implement one thing per iteration
+- If you enable promise completion above, emit COMPLETE and exit
- Run tests and fix failures before committing
- Commit with a descriptive message and push
"""
@@ -202,7 +214,13 @@ def scaffold(
help="Directory name. If omitted, creates RALPH.md in the current directory.",
),
) -> None:
- """Scaffold a new ralph with a ready-to-customize template."""
+ """Scaffold a new ralph with a ready-to-customize template.
+
+ The template includes example commands and args plus an optional
+ completion-signal path you can enable with ``completion_signal`` and
+ ``stop_on_completion_signal`` when the agent should stop early by
+ emitting a matching ``...`` tag.
+ """
if name:
target_dir = Path.cwd() / name
target_dir.mkdir(exist_ok=True)
@@ -219,6 +237,13 @@ def scaffold(
_console.print(
f"[dim]Edit the file, then run:[/] ralph run {escape_markup(name or '.')}"
)
+ _console.print(
+ "[dim]Optional early exit:[/] uncomment "
+ f"{escape_markup(FIELD_COMPLETION_SIGNAL)} + "
+ f"{escape_markup(FIELD_STOP_ON_COMPLETION_SIGNAL)} "
+ "and emit "
+ f"{escape_markup('COMPLETE')}."
+ )
def _parse_user_args(
@@ -428,6 +453,107 @@ def _validate_credit(raw_credit: Any) -> bool:
return raw_credit
+def _validate_completion_signal(raw_signal: Any) -> str:
+ """Validate the inner ``...`` text from frontmatter."""
+ if raw_signal is None:
+ return DEFAULT_COMPLETION_SIGNAL
+ if not _is_nonempty_string(raw_signal):
+ _exit_error(f"'{FIELD_COMPLETION_SIGNAL}' must be a non-empty string.")
+ if raw_signal != raw_signal.strip():
+ _exit_error(
+ f"'{FIELD_COMPLETION_SIGNAL}' must not include leading or trailing whitespace."
+ )
+ if "<" in raw_signal or ">" in raw_signal:
+ _exit_error(
+ f"'{FIELD_COMPLETION_SIGNAL}' must be the text inside "
+ "..., not markup or a raw output fragment. "
+ "Example: completion_signal: COMPLETE"
+ )
+ return raw_signal
+
+
+def _validate_max_turns(raw: Any) -> int | None:
+ """Validate the ``max_turns`` frontmatter field.
+
+ Returns ``None`` when absent (no cap). Exits with an error when the
+ value is not a positive integer.
+ """
+ if raw is None:
+ return None
+ if isinstance(raw, bool) or not isinstance(raw, int) or raw < 1:
+ _exit_error(f"'{FIELD_MAX_TURNS}' must be a positive integer, got {raw!r}.")
+ return raw
+
+
+def _validate_max_turns_grace(raw: Any, max_turns: int | None) -> int:
+ """Validate the ``max_turns_grace`` field.
+
+ Defaults to ``2`` when absent. Must be a non-negative integer and
+ strictly less than *max_turns* (when *max_turns* is set).
+ """
+ if raw is None:
+ grace = 2
+ elif isinstance(raw, bool) or not isinstance(raw, int) or raw < 0:
+ _exit_error(
+ f"'{FIELD_MAX_TURNS_GRACE}' must be a non-negative integer, got {raw!r}."
+ )
+ else:
+ grace = raw
+ if max_turns is not None and grace >= max_turns:
+ _exit_error(
+ f"'{FIELD_MAX_TURNS_GRACE}' ({grace}) must be less than "
+ f"'{FIELD_MAX_TURNS}' ({max_turns})."
+ )
+ return grace
+
+
+def _validate_stop_on_completion_signal(raw_value: Any) -> bool:
+ """Validate the stop-on-completion-signal frontmatter field."""
+ if raw_value is None:
+ return False
+ if not isinstance(raw_value, bool):
+ _exit_error(
+ f"'{FIELD_STOP_ON_COMPLETION_SIGNAL}' must be true or false, got {raw_value!r}."
+ )
+ return raw_value
+
+
+def _validate_hooks(raw: Any) -> list[AgentHook]:
+ """Validate the ``hooks`` frontmatter field and return :class:`ShellAgentHook` list.
+
+ The field is a list of ``{event, run}`` mappings. Each ``event`` must
+ be one of the names in :data:`ralphify.hooks.HOOK_EVENT_NAMES`.
+ Returns an empty list when the field is absent.
+ """
+ if raw is None:
+ return []
+ if not isinstance(raw, list):
+ _exit_error(f"'{FIELD_HOOKS}' must be a list of {{event, run}} mappings.")
+ hooks: list[AgentHook] = []
+ for entry in raw:
+ if (
+ not isinstance(entry, dict)
+ or HOOK_FIELD_EVENT not in entry
+ or HOOK_FIELD_RUN not in entry
+ ):
+ _exit_error(
+ f"Each hook must have '{HOOK_FIELD_EVENT}' and '{HOOK_FIELD_RUN}' fields."
+ )
+ event = entry[HOOK_FIELD_EVENT]
+ command = entry[HOOK_FIELD_RUN]
+ if not _is_nonempty_string(event):
+ _exit_error(f"Hook '{HOOK_FIELD_EVENT}' must be a non-empty string.")
+ if event not in HOOK_EVENT_NAMES:
+ _exit_error(
+ f"Unknown hook event {event!r}. "
+ f"Valid events: {', '.join(sorted(HOOK_EVENT_NAMES))}."
+ )
+ if not _is_nonempty_string(command):
+ _exit_error(f"Hook '{HOOK_FIELD_RUN}' must be a non-empty string.")
+ hooks.append(ShellAgentHook(event=event, command=command))
+ return hooks
+
+
def _validate_run_options(
max_iterations: int | None,
delay: float,
@@ -471,6 +597,15 @@ def _build_run_config(
ralph_args = _parse_user_args(extra_args, declared_names)
credit = _validate_credit(fm.get(FIELD_CREDIT))
+ completion_signal = _validate_completion_signal(fm.get(FIELD_COMPLETION_SIGNAL))
+ stop_on_completion_signal = _validate_stop_on_completion_signal(
+ fm.get(FIELD_STOP_ON_COMPLETION_SIGNAL)
+ )
+ max_turns = _validate_max_turns(fm.get(FIELD_MAX_TURNS))
+ max_turns_grace = _validate_max_turns_grace(
+ fm.get(FIELD_MAX_TURNS_GRACE), max_turns
+ )
+ hooks = _validate_hooks(fm.get(FIELD_HOOKS))
return RunConfig(
agent=agent,
@@ -485,6 +620,11 @@ def _build_run_config(
log_dir=Path(log_dir) if log_dir else None,
project_root=Path.cwd(),
credit=credit,
+ completion_signal=completion_signal,
+ stop_on_completion_signal=stop_on_completion_signal,
+ max_turns=max_turns,
+ max_turns_grace=max_turns_grace,
+ hooks=hooks,
)
@@ -534,6 +674,10 @@ def run(
passed as user arguments. Use {{ args.name }} placeholders in
RALPH.md to reference them.
+ To stop before the iteration budget, set ``completion_signal`` and
+ ``stop_on_completion_signal`` in frontmatter and have the agent emit
+ the matching ``...`` tag.
+
Keybindings (interactive terminal):
p Toggle live peek of agent output (on by default)
P Enter full-screen peek — scroll the entire buffer
diff --git a/src/ralphify/engine.py b/src/ralphify/engine.py
index 1072ee05..dcff2e14 100644
--- a/src/ralphify/engine.py
+++ b/src/ralphify/engine.py
@@ -13,6 +13,7 @@
import traceback
from datetime import datetime, timezone
from pathlib import Path
+from typing import Any
from ralphify._agent import execute_agent
from ralphify._events import (
@@ -29,6 +30,9 @@
PromptAssembledData,
RunStartedData,
RunStoppedData,
+ ToolUseData,
+ TurnApproachingLimitData,
+ TurnCappedData,
)
from ralphify._frontmatter import (
FIELD_AGENT,
@@ -45,6 +49,8 @@
)
from ralphify._resolver import resolve_all, resolve_args
from ralphify._runner import run_command
+from ralphify.adapters import select_adapter
+from ralphify.hooks import CombinedAgentHook
_PAUSE_POLL_INTERVAL = 0.25 # seconds between pause/resume checks
@@ -52,7 +58,6 @@
def _field_hint(field_name: str) -> str:
- """Return a user-facing hint pointing to a frontmatter field."""
return f"Check the '{field_name}' field in your {RALPH_MARKER} frontmatter."
@@ -103,8 +108,6 @@ def _run_commands(
quoted_args = {k: shlex.quote(v) for k, v in user_args.items()}
for cmd in commands:
run_str = resolve_args(cmd.run, quoted_args)
- # Determine working directory: if the command starts with ./ it's
- # relative to the ralph directory, otherwise use project root.
if run_str.lstrip().startswith(_RELATIVE_CMD_PREFIX):
cwd = ralph_dir
else:
@@ -150,11 +153,16 @@ def _assemble_prompt(
) -> str:
"""Build the full prompt for one iteration.
- Reads the RALPH.md body, resolves user args, command output, and
- context placeholders.
+ Uses ``config.prompt`` as the body when set (no file read, no
+ frontmatter parse); otherwise reads the RALPH.md body. Either way it
+ resolves user args, command output, and context placeholders.
"""
- raw = config.ralph_file.read_text(encoding="utf-8")
- _, prompt = parse_frontmatter(raw)
+ if config.prompt is not None:
+ prompt = config.prompt
+ else:
+ assert config.ralph_file is not None # __post_init__ guarantees one is set
+ raw = config.ralph_file.read_text(encoding="utf-8")
+ _, prompt = parse_frontmatter(raw)
ralph_context = _build_ralph_context(config, state)
prompt = resolve_all(prompt, command_outputs, config.args, ralph_context)
if config.credit:
@@ -167,10 +175,11 @@ def _run_agent_phase(
config: RunConfig,
state: RunState,
emit: BoundEmitter,
-) -> bool:
+ hooks: CombinedAgentHook | None,
+) -> tuple[bool, bool]:
"""Run the agent subprocess, update state counters, and emit the result event.
- Returns ``True`` when the agent exited successfully (code 0, no timeout).
+ Returns ``(agent_succeeded, stop_for_completion_signal)``.
"""
try:
cmd = shlex.split(config.agent)
@@ -179,9 +188,9 @@ def _run_agent_phase(
f"Invalid agent command syntax: {config.agent!r}. {_field_hint(FIELD_AGENT)}"
) from exc
- # Option C: recheck per-line so mid-iteration peek toggle takes effect.
- # When neither peek nor logging needs output, pass None so the blocking
- # path can inherit file descriptors (critical-01 contract).
+ adapter = select_adapter(cmd)
+ completion_signal = config.completion_signal
+
def _on_output_line(line: str, stream: OutputStream) -> None:
if emit.wants_agent_output_lines():
emit.agent_output_line(line, stream, state.iteration)
@@ -191,18 +200,46 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
else:
on_output_line = None
+ # Capture full stdout only when somebody downstream actually needs the
+ # bytes — log writing, or promise detection for adapters that cannot
+ # work from ``agent.result_text`` alone. Without this gate every
+ # iteration would buffer the entire transcript even for verbose
+ # streaming agents, regressing memory vs the prior tail-scan path.
+ capture_stdout_for_promise = (
+ config.stop_on_completion_signal and adapter.requires_full_stdout_for_completion
+ )
+ capture_stdout = config.log_dir is not None or capture_stdout_for_promise
+
+ on_tool_use = _build_tool_use_bridge(
+ state=state,
+ emit=emit,
+ hooks=hooks,
+ max_turns=config.max_turns,
+ max_turns_grace=config.max_turns_grace,
+ )
+
try:
+
+ def on_activity(data: dict[str, Any]) -> None:
+ emit(
+ EventType.AGENT_ACTIVITY,
+ AgentActivityData(raw=data, iteration=state.iteration),
+ )
+
agent = execute_agent(
cmd,
prompt,
timeout=config.timeout,
log_dir=config.log_dir,
iteration=state.iteration,
- on_activity=lambda data: emit(
- EventType.AGENT_ACTIVITY,
- AgentActivityData(raw=data, iteration=state.iteration),
- ),
+ adapter=adapter,
+ on_activity=on_activity,
on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=capture_stdout,
+ max_turns=config.max_turns,
+ max_turns_grace=config.max_turns_grace,
+ on_tool_use=on_tool_use,
)
except FileNotFoundError as exc:
raise FileNotFoundError(
@@ -210,15 +247,48 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
) from exc
duration = format_duration(agent.elapsed)
+ promise_completed = agent.success and adapter.extract_completion_signal(
+ result_text=agent.result_text,
+ stdout=agent.captured_stdout,
+ user_signal=completion_signal,
+ )
+ if promise_completed:
+ state.promise_completed = True
+
+ if agent.turn_capped:
+ emit(
+ EventType.ITERATION_TURN_CAPPED,
+ TurnCappedData(
+ iteration=state.iteration,
+ count=agent.tool_use_count,
+ ),
+ )
+ if hooks is not None:
+ hooks.on_turn_capped(
+ iteration=state.iteration,
+ count=agent.tool_use_count,
+ )
if agent.timed_out:
state.mark_timed_out()
event_type = EventType.ITERATION_TIMED_OUT
state_detail = f"timed out after {duration}"
+ elif agent.turn_capped:
+ state.mark_completed()
+ event_type = EventType.ITERATION_COMPLETED
+ state_detail = (
+ f"completed at turn cap ({agent.tool_use_count} tool uses, {duration})"
+ )
elif agent.success:
state.mark_completed()
event_type = EventType.ITERATION_COMPLETED
- state_detail = f"completed ({duration})"
+ if promise_completed:
+ state_detail = (
+ "completed via promise tag "
+ f"{completion_signal} ({duration})"
+ )
+ else:
+ state_detail = f"completed ({duration})"
else:
state.mark_failed()
event_type = EventType.ITERATION_FAILED
@@ -233,32 +303,127 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
log_file=str(agent.log_file) if agent.log_file else None,
result_text=agent.result_text,
)
- # When logging captured output and peek was off (lines were not rendered
- # live), include captured output so the emitter can echo it after
- # stopping the Live spinner. When peek was on, lines were already shown.
- if not emit.wants_agent_output_lines() and config.log_dir is not None:
- ended_data["echo_stdout"] = agent.captured_stdout
- ended_data["echo_stderr"] = agent.captured_stderr
+ if not emit.wants_agent_output_lines():
+ # When peek was off, echo any captured raw output after the spinner
+ # stops so blocking agents do not appear silent. Structured agents
+ # already surface their parsed result_text, so avoid echoing raw JSON
+ # unless we explicitly captured logs.
+ if config.log_dir is not None:
+ ended_data["echo_stdout"] = agent.captured_stdout
+ ended_data["echo_stderr"] = agent.captured_stderr
+ elif agent.result_text is None and agent.captured_stdout is not None:
+ ended_data["echo_stdout"] = agent.captured_stdout
emit(event_type, ended_data)
- return agent.success
+ if hooks is not None:
+ hooks.on_iteration_completed(
+ iteration=state.iteration,
+ result={
+ "returncode": agent.returncode,
+ "timed_out": agent.timed_out,
+ "turn_capped": agent.turn_capped,
+ "tool_use_count": agent.tool_use_count,
+ "duration": agent.elapsed,
+ "result_text": agent.result_text,
+ },
+ )
+ if promise_completed:
+ hooks.on_completion_signal(
+ iteration=state.iteration,
+ signal=completion_signal,
+ )
+ return agent.success, promise_completed and config.stop_on_completion_signal
+
+
+def _build_tool_use_bridge(
+ *,
+ state: RunState,
+ emit: BoundEmitter,
+ hooks: CombinedAgentHook | None,
+ max_turns: int | None,
+ max_turns_grace: int,
+):
+ """Return a ``ToolUseCallback`` that emits ``TOOL_USE`` and approaching-limit events.
+
+ Collapses the per-tool-use notification shape expected by ``_agent``
+ (``(tool_name, count)``) into structured events plus the hook
+ notifications. Returns ``None`` when no subscriber cares — the
+ streaming path then skips all per-line overhead.
+ """
+ if max_turns is None and hooks is None:
+ return None
+
+ # Clamp the grace below the cap. ``RunConfig`` does not reject a grace
+ # >= max_turns the way the CLI does, so an unclamped value would make
+ # the threshold <= 0 and fire ITERATION_TURN_APPROACHING_LIMIT on the
+ # first tool use. Mirrors the wind-down shim's clamp.
+ approaching_threshold = (
+ (max_turns - min(max_turns_grace, max(max_turns - 1, 0)))
+ if max_turns is not None and max_turns_grace > 0
+ else None
+ )
+ approaching_fired = False
+
+ def _on_tool_use(tool_name: str, count: int) -> None:
+ nonlocal approaching_fired
+ emit(
+ EventType.TOOL_USE,
+ ToolUseData(
+ iteration=state.iteration,
+ tool_name=tool_name,
+ count=count,
+ ),
+ )
+ if hooks is not None:
+ hooks.on_tool_use(
+ iteration=state.iteration,
+ tool_name=tool_name,
+ count=count,
+ )
+ if (
+ not approaching_fired
+ and approaching_threshold is not None
+ and max_turns is not None
+ and count >= approaching_threshold
+ and count < max_turns
+ ):
+ approaching_fired = True
+ emit(
+ EventType.ITERATION_TURN_APPROACHING_LIMIT,
+ TurnApproachingLimitData(
+ iteration=state.iteration,
+ count=count,
+ max_turns=max_turns,
+ ),
+ )
+ if hooks is not None:
+ hooks.on_turn_approaching_limit(
+ iteration=state.iteration,
+ count=count,
+ max_turns=max_turns,
+ )
+
+ return _on_tool_use
def _run_iteration(
config: RunConfig,
state: RunState,
emit: BoundEmitter,
-) -> bool:
+ hooks: CombinedAgentHook | None,
+) -> tuple[bool, bool]:
"""Execute one iteration of the agent loop.
- Returns ``True`` if the loop should continue, ``False`` when
- ``--stop-on-error`` triggers.
+ Returns (should_continue, stop_for_completion_signal):
+ - should_continue: True if the loop should continue, False to break
+ - stop_for_completion_signal: True if a completion signal ended the run early
"""
iteration = state.iteration
emit(EventType.ITERATION_STARTED, IterationStartedData(iteration=iteration))
+ if hooks is not None:
+ hooks.on_iteration_started(iteration=iteration)
- # Run commands and collect outputs for placeholder resolution
command_outputs: dict[str, str] = {}
if config.commands:
emit(
@@ -278,23 +443,27 @@ def _run_iteration(
count=len(command_outputs),
),
)
+ if hooks is not None:
+ hooks.on_commands_completed(iteration=iteration, outputs=command_outputs)
- # Assemble prompt
prompt = _assemble_prompt(config, state, command_outputs)
emit(
EventType.PROMPT_ASSEMBLED,
PromptAssembledData(iteration=iteration, prompt_length=len(prompt)),
)
+ if hooks is not None:
+ hooks.on_prompt_assembled(iteration=iteration, prompt=prompt)
- # Run agent
- agent_succeeded = _run_agent_phase(prompt, config, state, emit)
+ agent_succeeded, stop_for_completion_signal = _run_agent_phase(
+ prompt, config, state, emit, hooks
+ )
if not agent_succeeded and config.stop_on_error:
state.status = RunStatus.FAILED
emit.log_error("Stopping due to --stop-on-error.")
- return False
+ return False, stop_for_completion_signal
- return True
+ return True, stop_for_completion_signal
def _delay_if_needed(config: RunConfig, state: RunState, emit: BoundEmitter) -> None:
@@ -326,6 +495,8 @@ def run_loop(
state.status = RunStatus.RUNNING
state.started_at = datetime.now(timezone.utc)
+ hooks = CombinedAgentHook(list(config.hooks)) if config.hooks else None
+
if config.log_dir:
config.log_dir.mkdir(parents=True, exist_ok=True)
@@ -346,14 +517,19 @@ def run_loop(
if not _handle_control_signals(state, emit):
break
- state.iteration += 1
if (
config.max_iterations is not None
- and state.iteration > config.max_iterations
+ and state.iteration >= config.max_iterations
):
break
+ state.iteration += 1
- should_continue = _run_iteration(config, state, emit)
+ should_continue, stop_for_completion_signal = _run_iteration(
+ config, state, emit, hooks
+ )
+ if stop_for_completion_signal:
+ state.status = RunStatus.COMPLETED
+ break
if not should_continue:
break
@@ -369,11 +545,10 @@ def run_loop(
if state.status == RunStatus.RUNNING:
state.status = RunStatus.COMPLETED
- reason = state.status.reason
emit(
EventType.RUN_STOPPED,
RunStoppedData(
- reason=reason,
+ reason=state.status.reason,
total=state.total,
completed=state.completed,
failed=state.failed,
diff --git a/src/ralphify/hooks.py b/src/ralphify/hooks.py
new file mode 100644
index 00000000..9daa8b2c
--- /dev/null
+++ b/src/ralphify/hooks.py
@@ -0,0 +1,235 @@
+"""User-subscribable agent lifecycle hook protocol.
+
+Hooks let downstream consumers (milknado orchestration, user shell
+scripts declared in ``RALPH.md``) react to iteration boundaries,
+tool-use events, and turn-cap signals without coupling to the engine
+internals.
+
+The :class:`AgentHook` Protocol defines keyword-only ``on_*`` callbacks.
+:class:`CombinedAgentHook` fans events out across a list of hooks with
+per-hook exception isolation — a single misbehaving hook script cannot
+take down the run. :class:`ShellAgentHook` is the concrete hook that
+backs the ``hooks:`` frontmatter field: it invokes a shell command with
+the event payload as JSON on stdin.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import shlex
+import subprocess
+from typing import Any, Protocol, runtime_checkable
+
+
+_log = logging.getLogger(__name__)
+
+
+HOOK_EVENT_NAMES: frozenset[str] = frozenset(
+ {
+ "on_iteration_started",
+ "on_commands_completed",
+ "on_prompt_assembled",
+ "on_tool_use",
+ "on_turn_approaching_limit",
+ "on_turn_capped",
+ "on_iteration_completed",
+ "on_completion_signal",
+ }
+)
+"""Valid event names for the ``hooks:`` frontmatter field.
+
+Kept as a frozenset so :mod:`ralphify._frontmatter` can validate user
+configuration without importing :class:`AgentHook` directly.
+"""
+
+
+@runtime_checkable
+class AgentHook(Protocol):
+ """Receive structured notifications at iteration boundaries.
+
+ All methods accept keyword arguments only so adding new fields in
+ future versions stays backward compatible for implementers. Hooks
+ MUST NOT raise — the fanout catches exceptions, but hooks that fail
+ silently are hard to debug. Log and continue.
+ """
+
+ def on_iteration_started(self, *, iteration: int) -> None: ...
+
+ def on_commands_completed(
+ self, *, iteration: int, outputs: dict[str, str]
+ ) -> None: ...
+
+ def on_prompt_assembled(self, *, iteration: int, prompt: str) -> None: ...
+
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None: ...
+
+ def on_turn_approaching_limit(
+ self, *, iteration: int, count: int, max_turns: int
+ ) -> None: ...
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None: ...
+
+ def on_iteration_completed(
+ self, *, iteration: int, result: dict[str, Any]
+ ) -> None: ...
+
+ def on_completion_signal(self, *, iteration: int, signal: str) -> None: ...
+
+
+class NoOpAgentHook:
+ """Default :class:`AgentHook` that discards every event silently.
+
+ Useful as a base class for hooks that only care about a subset of
+ events — override the methods you need, inherit the rest.
+ """
+
+ def on_iteration_started(self, *, iteration: int) -> None:
+ pass
+
+ def on_commands_completed(self, *, iteration: int, outputs: dict[str, str]) -> None:
+ pass
+
+ def on_prompt_assembled(self, *, iteration: int, prompt: str) -> None:
+ pass
+
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None:
+ pass
+
+ def on_turn_approaching_limit(
+ self, *, iteration: int, count: int, max_turns: int
+ ) -> None:
+ pass
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None:
+ pass
+
+ def on_iteration_completed(self, *, iteration: int, result: dict[str, Any]) -> None:
+ pass
+
+ def on_completion_signal(self, *, iteration: int, signal: str) -> None:
+ pass
+
+
+class CombinedAgentHook:
+ """Fan each callback across a list of hooks with exception isolation.
+
+ One misbehaving hook cannot poison the others: exceptions are logged
+ at warning level and the fanout continues.
+ """
+
+ def __init__(self, hooks: list[AgentHook]) -> None:
+ self._hooks = hooks
+
+ def _fanout(self, event_name: str, **kwargs: Any) -> None:
+ for hook in self._hooks:
+ method = getattr(hook, event_name, None)
+ if method is None:
+ continue
+ try:
+ method(**kwargs)
+ except Exception as exc:
+ _log.warning(
+ "hook %r raised in %s: %s",
+ getattr(hook, "__class__", type(hook)).__name__,
+ event_name,
+ exc,
+ )
+
+ def on_iteration_started(self, *, iteration: int) -> None:
+ self._fanout("on_iteration_started", iteration=iteration)
+
+ def on_commands_completed(self, *, iteration: int, outputs: dict[str, str]) -> None:
+ self._fanout("on_commands_completed", iteration=iteration, outputs=outputs)
+
+ def on_prompt_assembled(self, *, iteration: int, prompt: str) -> None:
+ self._fanout("on_prompt_assembled", iteration=iteration, prompt=prompt)
+
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None:
+ self._fanout(
+ "on_tool_use",
+ iteration=iteration,
+ tool_name=tool_name,
+ count=count,
+ )
+
+ def on_turn_approaching_limit(
+ self, *, iteration: int, count: int, max_turns: int
+ ) -> None:
+ self._fanout(
+ "on_turn_approaching_limit",
+ iteration=iteration,
+ count=count,
+ max_turns=max_turns,
+ )
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None:
+ self._fanout("on_turn_capped", iteration=iteration, count=count)
+
+ def on_iteration_completed(self, *, iteration: int, result: dict[str, Any]) -> None:
+ self._fanout("on_iteration_completed", iteration=iteration, result=result)
+
+ def on_completion_signal(self, *, iteration: int, signal: str) -> None:
+ self._fanout("on_completion_signal", iteration=iteration, signal=signal)
+
+
+class ShellAgentHook(NoOpAgentHook):
+ """Invoke a shell command for one lifecycle event.
+
+ The event payload is serialized to JSON and written to the command's
+ stdin. Stdout is captured to the log; a non-zero exit is logged but
+ does NOT abort the run (per FR-9). The command is parsed with
+ :func:`shlex.split` — no shell metacharacter expansion.
+ """
+
+ def __init__(self, event: str, command: str) -> None:
+ if event not in HOOK_EVENT_NAMES:
+ raise ValueError(
+ f"unknown hook event {event!r}; "
+ f"expected one of {sorted(HOOK_EVENT_NAMES)}"
+ )
+ self._event = event
+ self._command = command
+ setattr(self, event, self._invoke)
+
+ def _invoke(self, **payload: Any) -> None:
+ try:
+ data = json.dumps(payload, default=str)
+ except (TypeError, ValueError) as exc:
+ _log.warning("hook %r: failed to serialize payload: %s", self._event, exc)
+ return
+ try:
+ proc = subprocess.run(
+ shlex.split(self._command),
+ input=data,
+ capture_output=True,
+ text=True,
+ check=False,
+ )
+ except (OSError, subprocess.SubprocessError) as exc:
+ _log.warning(
+ "hook %r: command %r failed to start: %s",
+ self._event,
+ self._command,
+ exc,
+ )
+ return
+ if proc.returncode != 0:
+ _log.warning(
+ "hook %r: command %r exited %d (stderr=%r)",
+ self._event,
+ self._command,
+ proc.returncode,
+ proc.stderr[:200],
+ )
+ if proc.stdout:
+ _log.info("hook %r stdout: %s", self._event, proc.stdout[:500])
+
+
+__all__ = [
+ "AgentHook",
+ "CombinedAgentHook",
+ "HOOK_EVENT_NAMES",
+ "NoOpAgentHook",
+ "ShellAgentHook",
+]
diff --git a/src/ralphify/manager.py b/src/ralphify/manager.py
index 3eee5ae6..0be3b367 100644
--- a/src/ralphify/manager.py
+++ b/src/ralphify/manager.py
@@ -7,13 +7,26 @@
from __future__ import annotations
import threading
+import time
+from collections.abc import Sequence
from dataclasses import dataclass, field
from ralphify._events import EventEmitter, FanoutEmitter, QueueEmitter
-from ralphify._run_types import RunConfig, RunState, generate_run_id
+from ralphify._run_types import (
+ RunConfig,
+ RunResult,
+ RunState,
+ RunStatus,
+ generate_run_id,
+)
from ralphify.engine import run_loop
+_TERMINAL_STATUSES = frozenset(
+ {RunStatus.COMPLETED, RunStatus.STOPPED, RunStatus.FAILED}
+)
+
+
@dataclass(slots=True)
class ManagedRun:
"""A run bundled with its background thread and event queue.
@@ -59,6 +72,9 @@ class RunManager:
def __init__(self) -> None:
self._runs: dict[str, ManagedRun] = {}
self._lock = threading.Lock()
+ # Notified once whenever any run thread exits, so waiters can wake
+ # and re-check terminal status without polling.
+ self._done = threading.Condition()
def _lookup(self, run_id: str) -> ManagedRun:
"""Look up a run by ID. Caller must hold ``_lock``."""
@@ -103,9 +119,18 @@ def start_run(self, run_id: str) -> None:
if managed.thread is not None:
raise RuntimeError(f"Run '{run_id}' has already been started")
emitter = managed.build_emitter()
+ config = managed.config
+ state = managed.state
+
+ def target() -> None:
+ try:
+ run_loop(config, state, emitter)
+ finally:
+ with self._done:
+ self._done.notify_all()
+
thread = threading.Thread(
- target=run_loop,
- args=(managed.config, managed.state, emitter),
+ target=target,
daemon=True,
name=f"run-{run_id}",
)
@@ -133,3 +158,107 @@ def get_run(self, run_id: str) -> ManagedRun | None:
"""Look up a run by ID, returning ``None`` if not found."""
with self._lock:
return self._runs.get(run_id)
+
+ def _finished_ids(self, run_ids: Sequence[str]) -> list[str]:
+ """Return the subset of *run_ids* whose status is terminal."""
+ with self._lock:
+ return [
+ run_id
+ for run_id in run_ids
+ if run_id in self._runs
+ and self._runs[run_id].state.status in _TERMINAL_STATUSES
+ ]
+
+ def wait_for_any(
+ self, run_ids: Sequence[str], timeout: float | None = None
+ ) -> list[str]:
+ """Block until at least one of *run_ids* reaches a terminal status.
+
+ Returns the finished run IDs. Returns ``[]`` if *timeout* elapses
+ before any finish. Unknown IDs are ignored (never reported as
+ finished); when *run_ids* is empty or none are registered, returns
+ ``[]`` immediately rather than blocking forever (nothing can wake it).
+ """
+ deadline = None if timeout is None else time.monotonic() + timeout
+ with self._lock:
+ if not any(run_id in self._runs for run_id in run_ids):
+ return []
+ with self._done:
+ while True:
+ finished = self._finished_ids(run_ids)
+ if finished:
+ return finished
+ remaining = None
+ if deadline is not None:
+ remaining = deadline - time.monotonic()
+ if remaining <= 0:
+ return []
+ self._done.wait(timeout=remaining)
+
+ def wait_for_all(
+ self, run_ids: Sequence[str], timeout: float | None = None
+ ) -> bool:
+ """Block until every run in *run_ids* finishes or *timeout* elapses.
+
+ Returns ``True`` iff all finished. Unknown IDs can never finish, so
+ if any ID in *run_ids* is not registered this returns ``False``
+ immediately rather than blocking forever. An empty *run_ids* is
+ vacuously satisfied and returns ``True``.
+ """
+ deadline = None if timeout is None else time.monotonic() + timeout
+ target = set(run_ids)
+ with self._lock:
+ if not all(run_id in self._runs for run_id in run_ids):
+ return False
+ with self._done:
+ while True:
+ if set(self._finished_ids(run_ids)) >= target:
+ return True
+ remaining = None
+ if deadline is not None:
+ remaining = deadline - time.monotonic()
+ if remaining <= 0:
+ return False
+ self._done.wait(timeout=remaining)
+
+ def get_result(self, run_id: str) -> RunResult:
+ """Snapshot the run's status and iteration counts.
+
+ Returns current counts regardless of terminal state — wait first
+ (e.g. via :meth:`wait_for_all`) for a final result. Raises
+ ``KeyError`` if the run ID is unknown.
+ """
+ state = self._require_run(run_id).state
+ return RunResult(
+ run_id=state.run_id,
+ status=state.status,
+ total=state.total,
+ completed=state.completed,
+ failed=state.failed,
+ timed_out_count=state.timed_out_count,
+ )
+
+ def shutdown(self, timeout: float | None = None) -> bool:
+ """Request stop on every run and join its thread.
+
+ Returns ``True`` iff all live threads joined within *timeout*.
+ With ``timeout=None`` this blocks until every run thread exits.
+ """
+ with self._lock:
+ runs = list(self._runs.values())
+ for managed in runs:
+ managed.state.request_stop()
+
+ deadline = None if timeout is None else time.monotonic() + timeout
+ all_joined = True
+ for managed in runs:
+ thread = managed.thread
+ if thread is None:
+ continue
+ remaining = None
+ if deadline is not None:
+ remaining = max(0.0, deadline - time.monotonic())
+ thread.join(timeout=remaining)
+ if thread.is_alive():
+ all_joined = False
+ return all_joined
diff --git a/tests/conftest.py b/tests/conftest.py
index ce9e46ab..4d999714 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -2,8 +2,17 @@
import pytest
+from ralphify.adapters import ADAPTERS
+
@pytest.fixture(autouse=True)
def _disable_streaming(monkeypatch):
- """Disable the Popen-based streaming path in all tests."""
- monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: False)
+ """Force the blocking path and raw peek on every registered adapter.
+
+ Tests that explicitly need the Popen streaming path or structured
+ peek rendering re-enable the relevant flag on the specific adapter
+ they exercise.
+ """
+ for adapter in ADAPTERS:
+ monkeypatch.setattr(adapter, "supports_streaming", False)
+ monkeypatch.setattr(adapter, "renders_structured_peek", False)
diff --git a/tests/fixtures/adapters/claude_basic_run.jsonl b/tests/fixtures/adapters/claude_basic_run.jsonl
new file mode 100644
index 00000000..8b91f08c
--- /dev/null
+++ b/tests/fixtures/adapters/claude_basic_run.jsonl
@@ -0,0 +1,8 @@
+{"type":"system","subtype":"init","session_id":"fixture-001","model":"claude-opus-4-7"}
+{"type":"assistant","message":{"content":[{"type":"thinking","thinking":"Plan: read RALPH.md then edit."}]}}
+{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","input":{"file_path":"/work/RALPH.md"}}]}}
+{"type":"user","message":{"content":[{"type":"tool_result","content":"# prompt body"}]}}
+{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Edit","input":{"file_path":"/work/RALPH.md","old_string":"todo","new_string":"done"}}]}}
+{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","input":{"command":"uv run pytest -x"}}]}}
+{"type":"assistant","message":{"content":[{"type":"text","text":"All tests pass."}]}}
+{"type":"result","subtype":"success","result":"All tests pass. COMPLETE"}
diff --git a/tests/fixtures/adapters/codex_basic_run.jsonl b/tests/fixtures/adapters/codex_basic_run.jsonl
new file mode 100644
index 00000000..d06f983d
--- /dev/null
+++ b/tests/fixtures/adapters/codex_basic_run.jsonl
@@ -0,0 +1,7 @@
+{"type":"TurnStarted"}
+{"type":"CommandExecution","command":"ls -la"}
+{"type":"CollabToolCall","name":"Read"}
+{"msg":{"type":"McpToolCall","tool":"git_status"}}
+{"type":"CommandExecution","command":"uv run pytest"}
+{"type":"TurnCompleted"}
+{"type":"TaskComplete","text":"Done. COMPLETE"}
diff --git a/tests/fixtures/adapters/copilot_basic_run.jsonl b/tests/fixtures/adapters/copilot_basic_run.jsonl
new file mode 100644
index 00000000..4459999c
--- /dev/null
+++ b/tests/fixtures/adapters/copilot_basic_run.jsonl
@@ -0,0 +1,5 @@
+{"type":"tool_use","name":"Read"}
+{"type":"tool_call","name":"Edit"}
+{"type":"ToolCall","tool":"Bash"}
+{"type":"progress","message":"thinking..."}
+{"type":"result","text":"done COMPLETE"}
diff --git a/tests/fixtures/adapters/opencode_basic_run.jsonl b/tests/fixtures/adapters/opencode_basic_run.jsonl
new file mode 100644
index 00000000..4751dad6
--- /dev/null
+++ b/tests/fixtures/adapters/opencode_basic_run.jsonl
@@ -0,0 +1,7 @@
+{"type":"step_start","part":{}}
+{"type":"tool_use","part":{"tool":"read"}}
+{"type":"text","part":{"text":"reading files"}}
+{"type":"tool_use","part":{"tool":"edit"}}
+{"type":"tool_use","name":"bash"}
+{"type":"step_finish","part":{"tokens":123}}
+{"type":"text","part":{"text":"done COMPLETE"}}
diff --git a/tests/test_adapters_claude.py b/tests/test_adapters_claude.py
new file mode 100644
index 00000000..95d3958a
--- /dev/null
+++ b/tests/test_adapters_claude.py
@@ -0,0 +1,199 @@
+"""Tests for the Claude stream-json adapter."""
+
+from __future__ import annotations
+
+import json
+
+from ralphify.adapters import Invocation, select_adapter
+from ralphify.adapters.claude import ClaudeAdapter
+
+
+def _assistant_event(*blocks: dict) -> str:
+ """Build one JSON line matching Claude's assistant message schema."""
+ return json.dumps(
+ {
+ "type": "assistant",
+ "message": {"content": list(blocks)},
+ }
+ )
+
+
+def _result_event(result_text: str) -> str:
+ return json.dumps({"type": "result", "result": result_text})
+
+
+def test_matches_claude_binary_stem() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.matches(["claude"]) is True
+ assert adapter.matches(["/usr/local/bin/claude"]) is True
+ assert adapter.matches(["claude", "--print"]) is True
+ assert adapter.matches(["codex"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_stream_flags() -> None:
+ adapter = ClaudeAdapter()
+ result = adapter.build_command(["claude"])
+ assert result == ["claude", "--output-format", "stream-json", "--verbose"]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = ClaudeAdapter()
+ once = adapter.build_command(["claude"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_build_command_preserves_user_flags() -> None:
+ adapter = ClaudeAdapter()
+ result = adapter.build_command(["claude", "--print", "-p"])
+ assert result[:3] == ["claude", "--print", "-p"]
+ assert "--output-format" in result
+ assert "stream-json" in result
+
+
+def test_deliver_prompt_uses_stdin() -> None:
+ adapter = ClaudeAdapter()
+ cmd = ["claude", "--output-format", "stream-json", "--verbose"]
+ inv = adapter.deliver_prompt(cmd, "p")
+ assert inv == Invocation(cmd, "p")
+ assert inv.stdin_text == "p"
+
+
+def test_parse_tool_use_event() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "tool_use", "name": "Bash", "input": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Bash"
+
+
+def test_parse_result_event() -> None:
+ adapter = ClaudeAdapter()
+ event = adapter.parse_event(_result_event("done"))
+ assert event is not None
+ assert event.kind == "result"
+
+
+def test_parse_ignores_thinking_blocks() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "thinking", "thinking": "planning..."})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "message"
+
+
+def test_parse_malformed_json_returns_none() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+ assert adapter.parse_event(" \n") is None
+
+
+def test_parse_non_dict_json_returns_none() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.parse_event("[1, 2, 3]") is None
+ assert adapter.parse_event('"just a string"') is None
+
+
+def test_parse_tool_use_with_non_string_name() -> None:
+ """Defensive: tool_use blocks with missing name must not raise."""
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "tool_use", "input": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name is None
+
+
+def test_parse_skips_first_non_tool_use_block() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event(
+ {"type": "text", "text": "thinking out loud"},
+ {"type": "tool_use", "name": "Edit", "input": {}},
+ )
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Edit"
+
+
+def test_extract_completion_signal_from_result_text() -> None:
+ adapter = ClaudeAdapter()
+ result_text = "DONE"
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="DONE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_ignores_raw_stdout() -> None:
+ """ClaudeAdapter only inspects ``result_text``; the streaming reader
+ extracts the terminal assistant message there. Promise tags embedded
+ in raw stdout (e.g. ``status`` or ``assistant`` JSON) must not trigger
+ completion."""
+ adapter = ClaudeAdapter()
+ stdout = "raw text MARKER trailing"
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stdout, user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_handles_missing_result_text() -> None:
+ adapter = ClaudeAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text="", stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_writes_settings(tmp_path) -> None:
+ adapter = ClaudeAdapter()
+ counter = tmp_path / "counter"
+ env = adapter.install_wind_down_hook(tmp_path, counter, 10, 2)
+ assert env["CLAUDE_CONFIG_DIR"] == str(tmp_path)
+ settings = json.loads((tmp_path / "settings.json").read_text(encoding="utf-8"))
+ hooks = settings["hooks"]["PreToolUse"]
+ assert hooks[0]["matcher"] == "*"
+ command = hooks[0]["hooks"][0]["command"]
+ assert "ralphify._wind_down_shim" in command
+ assert str(counter) in command
+ assert " 10 " in command
+ assert " 2 " in command
+ assert command.rstrip().endswith("claude")
+
+
+def test_capability_flags() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.name == "claude"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is True
+ assert adapter.renders_structured_peek is True
+ assert adapter.supports_soft_wind_down is True
+ assert adapter.requires_full_stdout_for_completion is False
+
+
+def test_registered_in_adapters_registry() -> None:
+ """Import side-effect registration should hand back the Claude adapter."""
+ selected = select_adapter(["claude"])
+ assert isinstance(selected, ClaudeAdapter)
diff --git a/tests/test_adapters_codex.py b/tests/test_adapters_codex.py
new file mode 100644
index 00000000..f7af107e
--- /dev/null
+++ b/tests/test_adapters_codex.py
@@ -0,0 +1,179 @@
+"""Tests for the Codex CLI adapter."""
+
+from __future__ import annotations
+
+import json
+
+from ralphify.adapters import Invocation, select_adapter
+from ralphify.adapters.codex import CodexAdapter
+
+
+def test_matches_codex_binary_stem() -> None:
+ adapter = CodexAdapter()
+ assert adapter.matches(["codex"]) is True
+ assert adapter.matches(["/usr/local/bin/codex"]) is True
+ assert adapter.matches(["codex", "exec", "--sandbox"]) is True
+ assert adapter.matches(["claude"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_json_flag() -> None:
+ adapter = CodexAdapter()
+ assert adapter.build_command(["codex"]) == ["codex", "--json"]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = CodexAdapter()
+ once = adapter.build_command(["codex"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_deliver_prompt_uses_stdin() -> None:
+ adapter = CodexAdapter()
+ cmd = ["codex", "exec", "--json"]
+ inv = adapter.deliver_prompt(cmd, "p")
+ assert inv == Invocation(cmd, "p")
+ assert inv.stdin_text == "p"
+
+
+def test_parse_tool_call_events() -> None:
+ adapter = CodexAdapter()
+ for event_type, expected_name in [
+ ("CollabToolCall", "Edit"),
+ ("McpToolCall", "Edit"),
+ ("CommandExecution", "Edit"),
+ ]:
+ line = json.dumps({"type": event_type, "name": "Edit"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == expected_name
+
+
+def test_parse_turn_events() -> None:
+ adapter = CodexAdapter()
+ for event_type in ("TurnStarted", "TurnCompleted"):
+ # TurnCompleted is a result *and* turn event; result wins.
+ line = json.dumps({"type": event_type})
+ event = adapter.parse_event(line)
+ assert event is not None
+ if event_type == "TurnCompleted":
+ assert event.kind == "result"
+ else:
+ assert event.kind == "turn"
+
+
+def test_parse_unknown_events_become_message() -> None:
+ adapter = CodexAdapter()
+ event = adapter.parse_event(json.dumps({"type": "SomethingNew"}))
+ assert event is not None
+ assert event.kind == "message"
+
+
+def test_parse_malformed_returns_none() -> None:
+ adapter = CodexAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+ assert adapter.parse_event("42") is None
+
+
+def test_parse_tool_call_nested_under_msg() -> None:
+ """Some Codex builds wrap event data under a ``msg`` key."""
+ adapter = CodexAdapter()
+ line = json.dumps({"msg": {"type": "CommandExecution", "command": "git status"}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "git status"
+
+
+def test_parse_falls_back_to_event_type_for_name() -> None:
+ adapter = CodexAdapter()
+ line = json.dumps({"type": "CollabToolCall"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.name == "CollabToolCall"
+
+
+def test_extract_completion_signal_from_stream() -> None:
+ adapter = CodexAdapter()
+ stream = "\n".join(
+ [
+ json.dumps({"type": "TurnStarted"}),
+ json.dumps({"type": "CommandExecution", "command": "ls"}),
+ json.dumps({"type": "TaskComplete", "text": "DONE"}),
+ ]
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stream, user_signal="DONE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stream, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_scans_plain_output() -> None:
+ adapter = CodexAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None,
+ stdout="some HI text",
+ user_signal="HI",
+ )
+ is True
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ """When the engine elects not to capture stdout, Codex cannot detect
+ completion — the streaming reader does not populate ``result_text``
+ for Codex's ``TaskComplete`` event shape."""
+ adapter = CodexAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_writes_hooks_json(tmp_path) -> None:
+ adapter = CodexAdapter()
+ counter = tmp_path / "counter"
+ env = adapter.install_wind_down_hook(tmp_path, counter, 10, 2)
+ assert env["CODEX_HOME"] == str(tmp_path)
+ hooks = json.loads((tmp_path / "hooks.json").read_text(encoding="utf-8"))
+ entries = hooks["PostToolUse"]
+ assert entries[0]["matcher"] == "*"
+ command = entries[0]["hooks"][0]["command"]
+ assert "ralphify._wind_down_shim" in command
+ assert str(counter) in command
+ assert command.rstrip().endswith("codex")
+ config = (tmp_path / "config.toml").read_text(encoding="utf-8")
+ # Codex reads the hooks feature flag from the [features] table (Stable,
+ # default-on); an [experimental] table would be ignored.
+ assert "[features]" in config
+ assert "[experimental]" not in config
+ assert "hooks = true" in config
+
+
+def test_capability_flags() -> None:
+ adapter = CodexAdapter()
+ assert adapter.name == "codex"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is True
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is True
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["codex"])
+ assert isinstance(selected, CodexAdapter)
diff --git a/tests/test_adapters_copilot.py b/tests/test_adapters_copilot.py
new file mode 100644
index 00000000..73c9ab2d
--- /dev/null
+++ b/tests/test_adapters_copilot.py
@@ -0,0 +1,137 @@
+"""Tests for the Copilot CLI adapter (alpha)."""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from ralphify.adapters import Invocation, select_adapter
+from ralphify.adapters.copilot import CopilotAdapter
+
+
+def test_matches_copilot_binary_stem() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.matches(["copilot"]) is True
+ assert adapter.matches(["/opt/copilot/bin/copilot"]) is True
+ # Deliberately does NOT match the gh subcommand
+ assert adapter.matches(["gh"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_json_flags() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.build_command(["copilot"]) == [
+ "copilot",
+ "--output-format",
+ "json",
+ ]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = CopilotAdapter()
+ once = adapter.build_command(["copilot"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_deliver_prompt_uses_stdin() -> None:
+ adapter = CopilotAdapter()
+ cmd = ["copilot", "--output-format", "json"]
+ inv = adapter.deliver_prompt(cmd, "p")
+ assert inv == Invocation(cmd, "p")
+ assert inv.stdin_text == "p"
+
+
+def test_parse_tool_use_variants() -> None:
+ adapter = CopilotAdapter()
+ for event_type in ("tool_use", "tool_call", "ToolCall", "ToolUse"):
+ line = json.dumps({"type": event_type, "name": "Edit"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Edit"
+
+
+def test_parse_result_variants() -> None:
+ adapter = CopilotAdapter()
+ for event_type in ("result", "response", "final", "Complete"):
+ event = adapter.parse_event(json.dumps({"type": event_type}))
+ assert event is not None
+ assert event.kind == "result"
+
+
+def test_parse_unknown_returns_none() -> None:
+ """Unknown event types must NOT count against the turn cap."""
+ adapter = CopilotAdapter()
+ assert adapter.parse_event(json.dumps({"type": "SomethingElse"})) is None
+ assert adapter.parse_event(json.dumps({"type": "progress"})) is None
+
+
+def test_parse_missing_type_returns_none() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.parse_event(json.dumps({"name": "Edit"})) is None
+
+
+def test_parse_malformed_returns_none() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+
+
+def test_parse_event_with_alternate_key_names() -> None:
+ """Covers ``event`` / ``kind`` alternative type keys."""
+ adapter = CopilotAdapter()
+ event = adapter.parse_event(json.dumps({"event": "tool_use", "name": "Bash"}))
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Bash"
+
+
+def test_extract_completion_signal_scans_stdout() -> None:
+ adapter = CopilotAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None,
+ stdout="chat chatter MARKER more text",
+ user_signal="MARKER",
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout="no marker here", user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ adapter = CopilotAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = CopilotAdapter()
+ with pytest.raises(NotImplementedError, match="no hook system"):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.name == "copilot"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is False
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is False
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["copilot"])
+ assert isinstance(selected, CopilotAdapter)
diff --git a/tests/test_adapters_crush.py b/tests/test_adapters_crush.py
new file mode 100644
index 00000000..4f31ff81
--- /dev/null
+++ b/tests/test_adapters_crush.py
@@ -0,0 +1,118 @@
+"""Tests for the Charm Crush CLI adapter."""
+
+from __future__ import annotations
+
+import pytest
+
+from ralphify.adapters import Invocation, select_adapter
+from ralphify.adapters.crush import CrushAdapter
+
+
+def test_matches_crush_binary_stem() -> None:
+ adapter = CrushAdapter()
+ assert adapter.matches(["crush"]) is True
+ assert adapter.matches(["crush", "run"]) is True
+ assert adapter.matches(["/opt/homebrew/bin/crush", "run"]) is True
+ assert adapter.matches(["claude"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_quiet() -> None:
+ adapter = CrushAdapter()
+ assert adapter.build_command(["crush", "run"]) == ["crush", "run", "--quiet"]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = CrushAdapter()
+ once = adapter.build_command(["crush", "run"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_build_command_respects_existing_quiet_flags() -> None:
+ adapter = CrushAdapter()
+ assert adapter.build_command(["crush", "run", "--quiet"]) == [
+ "crush",
+ "run",
+ "--quiet",
+ ]
+ assert adapter.build_command(["crush", "run", "-q"]) == ["crush", "run", "-q"]
+
+
+def test_deliver_prompt_uses_stdin() -> None:
+ adapter = CrushAdapter()
+ cmd = ["crush", "run", "--quiet"]
+ inv = adapter.deliver_prompt(cmd, "p")
+ assert inv == Invocation(cmd, "p")
+ assert inv.stdin_text == "p"
+
+
+def test_parse_event_always_returns_none() -> None:
+ """crush has no structured stream; nothing parses, and nothing raises."""
+ adapter = CrushAdapter()
+ assert adapter.parse_event('{"type": "tool_use"}') is None
+ assert adapter.parse_event("plain prose line") is None
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+
+
+def test_extract_completion_signal_scans_stdout() -> None:
+ adapter = CrushAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None,
+ stdout="chatter MARKER more text",
+ user_signal="MARKER",
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout="no marker here", user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_ignores_result_text() -> None:
+ """result_text is never populated for crush; only stdout counts."""
+ adapter = CrushAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text="MARKER",
+ stdout="no marker in stdout",
+ user_signal="MARKER",
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ adapter = CrushAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = CrushAdapter()
+ with pytest.raises(NotImplementedError, match="no hook system"):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = CrushAdapter()
+ assert adapter.name == "crush"
+ assert adapter.counts_what == "none"
+ assert adapter.supports_streaming is False
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is False
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["crush", "run"])
+ assert isinstance(selected, CrushAdapter)
diff --git a/tests/test_adapters_golden.py b/tests/test_adapters_golden.py
new file mode 100644
index 00000000..972e7643
--- /dev/null
+++ b/tests/test_adapters_golden.py
@@ -0,0 +1,173 @@
+"""Golden-file regression tests for adapter parsing.
+
+Each fixture is a captured (or close-to-captured) stream from one of
+the supported CLIs. The tests walk every line and assert on the
+resulting :class:`AdapterEvent` sequence and completion-signal scan.
+When a CLI's schema changes, these tests fail first and point the
+maintainer at the fixture that needs updating.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from ralphify.adapters.claude import ClaudeAdapter
+from ralphify.adapters.codex import CodexAdapter
+from ralphify.adapters.copilot import CopilotAdapter
+from ralphify.adapters.crush import CrushAdapter
+from ralphify.adapters.opencode import OpenCodeAdapter
+
+
+FIXTURES_DIR = Path(__file__).parent / "fixtures" / "adapters"
+
+
+def _parse_kinds(adapter, text: str) -> list[str]:
+ """Return the ``kind`` of every non-None event in the fixture."""
+ kinds: list[str] = []
+ for line in text.splitlines():
+ event = adapter.parse_event(line)
+ if event is not None:
+ kinds.append(event.kind)
+ return kinds
+
+
+def test_claude_golden_stream() -> None:
+ text = (FIXTURES_DIR / "claude_basic_run.jsonl").read_text()
+ adapter = ClaudeAdapter()
+
+ kinds = _parse_kinds(adapter, text)
+ # 3 tool_use blocks + 1 result; other assistant messages become
+ # ``message`` events that the turn counter ignores.
+ assert kinds.count("tool_use") == 3
+ assert kinds.count("result") == 1
+
+ # Claude reads completion solely from the streaming-extracted result_text.
+ result_text = _last_result_text(text)
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="COMPLETE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_codex_golden_stream() -> None:
+ text = (FIXTURES_DIR / "codex_basic_run.jsonl").read_text()
+ adapter = CodexAdapter()
+
+ kinds = _parse_kinds(adapter, text)
+ # 2 CommandExecution + CollabToolCall + msg-nested McpToolCall = 4 tool_use.
+ assert kinds.count("tool_use") == 4
+ # TaskComplete *and* TurnCompleted both count as result.
+ assert kinds.count("result") == 2
+ assert kinds.count("turn") == 1 # TurnStarted only; TurnCompleted wins as result
+
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="COMPLETE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="MISSING"
+ )
+ is False
+ )
+
+
+def test_copilot_golden_stream() -> None:
+ text = (FIXTURES_DIR / "copilot_basic_run.jsonl").read_text()
+ adapter = CopilotAdapter()
+
+ kinds = _parse_kinds(adapter, text)
+ # 3 canonical-type tool uses; ``progress`` is unknown and dropped.
+ assert kinds.count("tool_use") == 3
+ assert kinds.count("result") == 1
+
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="COMPLETE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="NOPE"
+ )
+ is False
+ )
+
+
+def test_opencode_golden_stream() -> None:
+ text = (FIXTURES_DIR / "opencode_basic_run.jsonl").read_text()
+ adapter = OpenCodeAdapter()
+
+ kinds = _parse_kinds(adapter, text)
+ # 3 tool_use parts + 1 step_finish (result); step_start / text are
+ # message events that do not count against the turn cap.
+ assert kinds.count("tool_use") == 3
+ assert kinds.count("result") == 1
+ # opencode counts tool uses, so these events feed the turn cap.
+ assert adapter.counts_what == "tool_use"
+
+ # opencode emits no streaming result line; completion is scanned from
+ # the full stdout buffer.
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="COMPLETE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="ABSENT"
+ )
+ is False
+ )
+
+
+def test_crush_golden_stream() -> None:
+ # crush emits plain text/markdown only — no structured events to parse,
+ # so nothing counts against max_turns (counts_what == "none").
+ adapter = CrushAdapter()
+ text = "Did the work.\nCOMPLETE\n"
+
+ assert _parse_kinds(adapter, text) == []
+ assert adapter.counts_what == "none"
+
+ # Completion is still scanned from the full stdout buffer.
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=text, user_signal="COMPLETE"
+ )
+ is True
+ )
+
+
+def _last_result_text(text: str) -> str | None:
+ """Return the last ``result`` event's payload from a Claude stream."""
+ import json
+
+ latest: str | None = None
+ for line in text.splitlines():
+ stripped = line.strip()
+ if not stripped:
+ continue
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ continue
+ if (
+ isinstance(parsed, dict)
+ and parsed.get("type") == "result"
+ and isinstance(parsed.get("result"), str)
+ ):
+ latest = parsed["result"]
+ return latest
diff --git a/tests/test_adapters_opencode.py b/tests/test_adapters_opencode.py
new file mode 100644
index 00000000..9884e795
--- /dev/null
+++ b/tests/test_adapters_opencode.py
@@ -0,0 +1,188 @@
+"""Tests for the opencode CLI adapter (arg-delivery)."""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from ralphify.adapters import CLIAdapter, Invocation, select_adapter
+from ralphify.adapters.opencode import OpenCodeAdapter
+
+
+def test_matches_opencode_binary_stem() -> None:
+ adapter = OpenCodeAdapter()
+ assert adapter.matches(["opencode"]) is True
+ assert adapter.matches(["/usr/local/bin/opencode"]) is True
+ assert adapter.matches(["opencode", "run"]) is True
+ assert adapter.matches(["claude"]) is False
+ assert adapter.matches(["codex"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_format_json() -> None:
+ adapter = OpenCodeAdapter()
+ assert adapter.build_command(["opencode", "run"]) == [
+ "opencode",
+ "run",
+ "--format",
+ "json",
+ ]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = OpenCodeAdapter()
+ once = adapter.build_command(["opencode", "run"])
+ twice = adapter.build_command(once)
+ assert once == twice == ["opencode", "run", "--format", "json"]
+
+
+def test_build_command_overwrites_other_format_value() -> None:
+ adapter = OpenCodeAdapter()
+ result = adapter.build_command(["opencode", "run", "--format", "text"])
+ assert result == ["opencode", "run", "--format", "json"]
+
+
+def test_build_command_appends_value_when_format_flag_is_last() -> None:
+ """A dangling ``--format`` with no value gets ``json`` appended, not crash."""
+ adapter = OpenCodeAdapter()
+ result = adapter.build_command(["opencode", "run", "--format"])
+ assert result == ["opencode", "run", "--format", "json"]
+
+
+def test_deliver_prompt_appends_prompt_as_arg_with_no_stdin() -> None:
+ adapter = OpenCodeAdapter()
+ inv = adapter.deliver_prompt(["opencode", "run", "--format", "json"], "hello")
+ assert inv == Invocation(["opencode", "run", "--format", "json", "hello"], None)
+ assert inv.stdin_text is None
+
+
+def test_deliver_prompt_preserves_special_characters() -> None:
+ """No shell involved — quotes / $() / newlines pass through as one argv element."""
+ adapter = OpenCodeAdapter()
+ prompt = 'fix "$(rm -rf /)" and\nmove on'
+ inv = adapter.deliver_prompt(["opencode", "run"], prompt)
+ assert inv.argv == ["opencode", "run", prompt]
+ assert inv.stdin_text is None
+
+
+def test_parse_tool_use_event_with_name() -> None:
+ adapter = OpenCodeAdapter()
+ line = json.dumps({"type": "tool_use", "part": {"name": "Edit"}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Edit"
+
+
+def test_parse_tool_use_event_top_level_name() -> None:
+ adapter = OpenCodeAdapter()
+ line = json.dumps({"type": "tool_use", "name": "Bash", "part": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Bash"
+
+
+def test_parse_tool_use_event_without_name() -> None:
+ adapter = OpenCodeAdapter()
+ line = json.dumps({"type": "tool_use", "part": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name is None
+
+
+def test_parse_step_finish_is_result() -> None:
+ adapter = OpenCodeAdapter()
+ line = json.dumps({"type": "step_finish", "part": {"tokens": 42, "cost": 0.01}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "result"
+ # Token / cost data is captured in raw but not surfaced as fields.
+ assert event.raw == {"type": "step_finish", "part": {"tokens": 42, "cost": 0.01}}
+
+
+@pytest.mark.parametrize("event_type", ["step_start", "text", "reasoning", "error"])
+def test_parse_message_events(event_type: str) -> None:
+ adapter = OpenCodeAdapter()
+ line = json.dumps({"type": event_type, "part": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "message"
+
+
+def test_parse_unknown_event_returns_none() -> None:
+ adapter = OpenCodeAdapter()
+ assert adapter.parse_event(json.dumps({"type": "something_new"})) is None
+
+
+def test_parse_malformed_never_raises() -> None:
+ adapter = OpenCodeAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+ assert adapter.parse_event(" \n") is None
+ assert adapter.parse_event("42") is None
+ assert adapter.parse_event("[1, 2, 3]") is None
+ assert adapter.parse_event('"just a string"') is None
+ assert adapter.parse_event("{") is None
+ # No "type" key at all.
+ assert adapter.parse_event(json.dumps({"part": {"name": "x"}})) is None
+
+
+def test_extract_completion_signal_from_stdout() -> None:
+ adapter = OpenCodeAdapter()
+ stdout = "\n".join(
+ [
+ json.dumps({"type": "step_start", "part": {}}),
+ json.dumps({"type": "tool_use", "part": {"name": "Edit"}}),
+ "DONE",
+ ]
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stdout, user_signal="DONE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stdout, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ adapter = OpenCodeAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text="DONE", stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = OpenCodeAdapter()
+ with pytest.raises(NotImplementedError):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = OpenCodeAdapter()
+ assert adapter.name == "opencode"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is True
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is False
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_satisfies_protocol() -> None:
+ assert isinstance(OpenCodeAdapter(), CLIAdapter)
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["opencode", "run"])
+ assert isinstance(selected, OpenCodeAdapter)
diff --git a/tests/test_adapters_registry.py b/tests/test_adapters_registry.py
new file mode 100644
index 00000000..9ff7916e
--- /dev/null
+++ b/tests/test_adapters_registry.py
@@ -0,0 +1,57 @@
+"""Tests for the adapter registry and first-match dispatch."""
+
+from __future__ import annotations
+
+from ralphify.adapters import ADAPTERS, CLIAdapter, Invocation, select_adapter
+from ralphify.adapters._generic import GenericAdapter
+from ralphify.adapters.claude import ClaudeAdapter
+from ralphify.adapters.codex import CodexAdapter
+from ralphify.adapters.copilot import CopilotAdapter
+from ralphify.adapters.crush import CrushAdapter
+from ralphify.adapters.opencode import OpenCodeAdapter
+
+
+def test_registry_contains_builtin_adapters() -> None:
+ types = {type(a) for a in ADAPTERS}
+ assert ClaudeAdapter in types
+ assert CodexAdapter in types
+ assert CopilotAdapter in types
+ assert CrushAdapter in types
+ assert OpenCodeAdapter in types
+
+
+def test_select_adapter_dispatches_by_binary_stem() -> None:
+ assert isinstance(select_adapter(["claude"]), ClaudeAdapter)
+ assert isinstance(select_adapter(["codex", "exec"]), CodexAdapter)
+ assert isinstance(select_adapter(["copilot"]), CopilotAdapter)
+ assert isinstance(select_adapter(["crush", "run"]), CrushAdapter)
+ assert isinstance(select_adapter(["opencode", "run"]), OpenCodeAdapter)
+
+
+def test_select_adapter_falls_back_to_generic() -> None:
+ selected = select_adapter(["aider", "--model", "claude-4"])
+ assert isinstance(selected, GenericAdapter)
+
+
+def test_select_adapter_handles_empty_cmd() -> None:
+ assert isinstance(select_adapter([]), GenericAdapter)
+
+
+def test_generic_adapter_parse_never_raises() -> None:
+ generic = GenericAdapter()
+ assert generic.parse_event("garbage") is None
+ assert generic.parse_event("") is None
+
+
+def test_generic_adapter_deliver_prompt_uses_stdin() -> None:
+ generic = GenericAdapter()
+ cmd = ["aider", "--model", "claude-4"]
+ inv = generic.deliver_prompt(cmd, "p")
+ assert inv == Invocation(cmd, "p")
+ assert inv.stdin_text == "p"
+
+
+def test_all_adapters_satisfy_protocol() -> None:
+ """Runtime Protocol check catches shape regressions in any adapter."""
+ for adapter in ADAPTERS:
+ assert isinstance(adapter, CLIAdapter)
diff --git a/tests/test_agent.py b/tests/test_agent.py
index f3358448..7828abb8 100644
--- a/tests/test_agent.py
+++ b/tests/test_agent.py
@@ -7,7 +7,7 @@
import subprocess
import sys
import time
-from unittest.mock import MagicMock, patch
+from unittest.mock import ANY, MagicMock, patch
import pytest
from helpers import MOCK_SUBPROCESS, fail_proc, make_mock_popen, ok_proc, timeout_proc
@@ -19,35 +19,11 @@
_read_agent_stream,
_run_agent_blocking,
_run_agent_streaming,
- _supports_stream_json,
_write_log,
execute_agent,
)
-
-
-class TestSupportsStreamJson:
- def test_claude_binary(self):
- assert _supports_stream_json(["claude", "-p"]) is True
-
- def test_claude_absolute_path(self):
- assert _supports_stream_json(["/usr/local/bin/claude", "-p"]) is True
-
- def test_non_claude_binary(self):
- assert _supports_stream_json(["aider", "--yes"]) is False
-
- def test_empty_command(self):
- assert _supports_stream_json([]) is False
-
- def test_claude_like_name(self):
- assert _supports_stream_json(["claude-code"]) is False
-
- def test_claude_with_cmd_extension(self):
- """On Windows, npm installs claude as claude.cmd — streaming must
- still be detected."""
- assert _supports_stream_json(["claude.cmd", "-p"]) is True
-
- def test_claude_with_exe_extension(self):
- assert _supports_stream_json(["claude.exe", "-p"]) is True
+from ralphify.adapters import select_adapter
+from ralphify.adapters.claude import ClaudeAdapter
class TestWriteLog:
@@ -365,6 +341,32 @@ def test_timeout_captures_partial_output(self, mock_popen, tmp_path):
assert result.captured_stdout == "partial stdout"
assert result.captured_stderr == "partial stderr"
+ @patch(MOCK_SUBPROCESS)
+ def test_capture_result_text_does_not_buffer_blocking_output_without_log_dir(
+ self, mock_popen
+ ):
+ mock_popen.return_value = ok_proc(
+ stdout_text="done\n",
+ stderr_text="some stderr\n",
+ )
+ result = _run_agent_blocking(
+ ["echo"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ capture_result_text=True,
+ )
+
+ assert result.captured_stdout is None
+ assert result.captured_stderr is None
+ assert result.result_text is None
+
+ call_kwargs = mock_popen.call_args[1]
+ assert call_kwargs.get("stdin") == subprocess.PIPE
+ assert call_kwargs.get("stdout") == subprocess.PIPE
+ assert call_kwargs.get("stderr") is None
+
@patch(MOCK_SUBPROCESS, side_effect=ok_proc)
def test_no_log_when_dir_not_set(self, mock_popen):
result = execute_agent(
@@ -400,19 +402,22 @@ def test_timed_out(self):
assert result.timed_out is True
assert result.returncode is None
- def test_success_when_zero_exit(self):
- result = AgentResult(returncode=0, elapsed=1.0, log_file=None)
- assert result.success is True
-
- def test_not_success_when_nonzero_exit(self):
- result = AgentResult(returncode=1, elapsed=1.0, log_file=None)
- assert result.success is False
-
- def test_not_success_when_timed_out(self):
- result = AgentResult(
- returncode=None, elapsed=5.0, log_file=None, timed_out=True
- )
- assert result.success is False
+ @pytest.mark.parametrize(
+ ("result", "expected"),
+ [
+ (AgentResult(returncode=0, elapsed=1.0, log_file=None), True),
+ (AgentResult(returncode=1, elapsed=1.0, log_file=None), False),
+ (
+ AgentResult(
+ returncode=None, elapsed=5.0, log_file=None, timed_out=True
+ ),
+ False,
+ ),
+ ],
+ ids=["zero-exit", "nonzero-exit", "timed-out"],
+ )
+ def test_success(self, result, expected):
+ assert result.success is expected
class TestExecuteAgentDispatch:
@@ -420,8 +425,11 @@ class TestExecuteAgentDispatch:
@patch(MOCK_SUBPROCESS)
def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch):
- """execute_agent uses the streaming path when the agent supports it."""
- monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: True)
+ """execute_agent uses the streaming path when the adapter renders
+ structured output."""
+ claude_adapter = select_adapter(["claude"])
+ assert isinstance(claude_adapter, ClaudeAdapter)
+ monkeypatch.setattr(claude_adapter, "supports_streaming", True)
mock_popen.return_value = make_mock_popen(
stdout_lines='{"type": "result", "result": "done"}\n',
returncode=0,
@@ -438,10 +446,158 @@ def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch):
assert result.result_text == "done"
mock_popen.assert_called_once()
+ def test_execute_agent_passes_capture_result_text_to_streaming_helper(
+ self, monkeypatch
+ ):
+ on_activity = MagicMock()
+ on_output_line = MagicMock()
+ fake_streaming = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01))
+
+ claude_adapter = select_adapter(["claude"])
+ assert isinstance(claude_adapter, ClaudeAdapter)
+ monkeypatch.setattr(claude_adapter, "supports_streaming", True)
+ monkeypatch.setattr("ralphify._agent._run_agent_streaming", fake_streaming)
+
+ execute_agent(
+ ["claude", "-p"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ on_activity=on_activity,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ )
+
+ fake_streaming.assert_called_once_with(
+ claude_adapter.build_command(["claude", "-p"]),
+ "prompt",
+ None,
+ None,
+ 1,
+ on_activity=on_activity,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=False,
+ adapter=claude_adapter,
+ max_turns=None,
+ on_tool_use=None,
+ env=None,
+ )
+
+ def test_execute_agent_passes_capture_result_text_to_blocking_helper(
+ self, monkeypatch
+ ):
+ on_output_line = MagicMock()
+ fake_blocking = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01))
+
+ # The autouse conftest fixture already forces every adapter's
+ # supports_streaming flag to False, so ``echo`` falls into the
+ # blocking path through the GenericAdapter fallback. The fallback
+ # is constructed fresh per ``select_adapter`` call, so the dispatch
+ # assertion matches the adapter arg with ``ANY``.
+ monkeypatch.setattr("ralphify._agent._run_agent_blocking", fake_blocking)
+
+ execute_agent(
+ ["echo"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ )
+
+ fake_blocking.assert_called_once_with(
+ ["echo"],
+ "prompt",
+ None,
+ None,
+ 1,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=False,
+ adapter=ANY,
+ max_turns=None,
+ on_tool_use=None,
+ env=None,
+ )
+
class TestExecuteAgentStreaming:
"""Tests for the streaming execution path (_run_agent_streaming)."""
+ @patch(MOCK_SUBPROCESS)
+ def test_streaming_result_event_populates_result_text(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "early done"}\n',
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text == "early done"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_blocking_result_event_populates_result_text_when_captured(
+ self, mock_popen, tmp_path
+ ):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "early done"}\n',
+ returncode=0,
+ )
+ result = _run_agent_blocking(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=tmp_path,
+ iteration=1,
+ )
+
+ assert result.result_text == "early done"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_result_text_absent_when_no_result_event(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines="status: working\n",
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text is None
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_last_result_event_wins(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "first"}\n{"type": "result", "result": "second"}\n',
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text == "second"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
@patch(MOCK_SUBPROCESS)
def test_success(self, mock_popen):
mock_popen.return_value = make_mock_popen(
@@ -507,10 +663,12 @@ def test_sends_prompt_to_stdin(self, mock_popen):
proc.stdin.close.assert_called_once()
@patch(MOCK_SUBPROCESS)
- def test_adds_stream_json_flags(self, mock_popen):
+ def test_passes_cmd_verbatim_to_popen(self, mock_popen):
+ """_run_agent_streaming no longer appends its own flags — the caller
+ (via ``adapter.build_command``) owns CLI flags now."""
mock_popen.return_value = make_mock_popen(returncode=0)
_run_agent_streaming(
- ["claude", "-p"],
+ ["claude", "-p", "--output-format", "stream-json", "--verbose"],
"prompt",
timeout=None,
log_dir=None,
@@ -519,9 +677,13 @@ def test_adds_stream_json_flags(self, mock_popen):
call_args = mock_popen.call_args
cmd = call_args[0][0]
- assert "--output-format" in cmd
- assert "stream-json" in cmd
- assert "--verbose" in cmd
+ assert cmd == [
+ "claude",
+ "-p",
+ "--output-format",
+ "stream-json",
+ "--verbose",
+ ]
@patch(MOCK_SUBPROCESS)
def test_writes_log_on_success(self, mock_popen, tmp_path):
@@ -566,6 +728,35 @@ def test_captured_output_set_when_logging(self, mock_popen, tmp_path):
assert result.captured_stdout == "agent output\n"
assert result.captured_stderr == "some stderr\n"
+ @patch(MOCK_SUBPROCESS)
+ def test_capture_result_text_does_not_buffer_stream_output_without_log_dir(
+ self, mock_popen
+ ):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines="done\n",
+ stderr_text="some stderr\n",
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ capture_result_text=True,
+ )
+
+ assert result.captured_stdout is None
+ assert result.captured_stderr is None
+ assert result.result_text is None
+
+ call_args = mock_popen.call_args
+ assert call_args.args[0] == ["claude", "-p"]
+ call_kwargs = call_args[1]
+ assert call_kwargs.get("stdin") == subprocess.PIPE
+ assert call_kwargs.get("stdout") == subprocess.PIPE
+ assert call_kwargs.get("stderr") is None
+
@patch(MOCK_SUBPROCESS)
def test_no_log_when_dir_not_set(self, mock_popen):
mock_popen.return_value = make_mock_popen(returncode=0)
@@ -763,7 +954,7 @@ def test_raises_when_stdin_is_none(self, mock_popen):
proc.poll.return_value = None # process still running for finally cleanup
mock_popen.return_value = proc
- with pytest.raises(RuntimeError, match="PIPE streams"):
+ with pytest.raises(RuntimeError, match="PIPE stdin"):
_run_agent_streaming(
["claude", "-p"],
"prompt",
@@ -820,7 +1011,7 @@ def test_on_output_line_receives_lines_in_order(self, tmp_path):
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=tmp_path,
iteration=1,
@@ -849,7 +1040,7 @@ def test_on_output_line_captures_stderr_separately(self, tmp_path):
_run_agent_blocking(
[sys.executable, "-u", "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=tmp_path,
iteration=1,
@@ -869,7 +1060,7 @@ def test_stdin_prompt_delivered_to_subprocess(self, tmp_path):
_run_agent_blocking(
[sys.executable, "-c", script],
- prompt="hello-from-prompt\n",
+ stdin_text="hello-from-prompt\n",
timeout=10,
log_dir=tmp_path,
iteration=1,
@@ -901,7 +1092,7 @@ def test_large_prompt_with_concurrent_stderr_does_not_deadlock(self, tmp_path):
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt=large_prompt,
+ stdin_text=large_prompt,
timeout=15,
log_dir=tmp_path,
iteration=1,
@@ -920,7 +1111,7 @@ def test_early_exit_with_large_prompt_does_not_crash(self, tmp_path):
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt=large_prompt,
+ stdin_text=large_prompt,
timeout=10,
log_dir=tmp_path,
iteration=1,
@@ -947,7 +1138,7 @@ def test_streaming_large_stderr_drained_concurrently(self, tmp_path):
result = _run_agent_streaming(
[sys.executable, "-c", script],
- prompt="hi",
+ stdin_text="hi",
timeout=15,
log_dir=tmp_path,
iteration=1,
@@ -977,7 +1168,7 @@ def test_timeout_enforced_when_agent_does_not_read_stdin(self, tmp_path):
start = time.monotonic()
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt=large_prompt,
+ stdin_text=large_prompt,
timeout=2.0,
log_dir=tmp_path,
iteration=1,
@@ -1013,7 +1204,7 @@ def test_streaming_timeout_enforced_on_silent_agent(self, tmp_path):
start = time.monotonic()
result = _run_agent_streaming(
[sys.executable, "-u", "-c", script],
- prompt="go",
+ stdin_text="go",
timeout=1.0,
log_dir=tmp_path,
iteration=1,
@@ -1049,7 +1240,7 @@ def on_line(line: str, stream: str) -> None:
result = _run_agent_streaming(
[sys.executable, "-u", "-c", script],
- prompt="go",
+ stdin_text="go",
timeout=15,
log_dir=tmp_path,
iteration=1,
@@ -1145,7 +1336,7 @@ def test_inherit_path_shows_output(self, capfd):
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=None,
iteration=1,
@@ -1166,7 +1357,7 @@ def test_callback_only_does_not_buffer(self, tmp_path):
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=None,
iteration=1,
@@ -1199,7 +1390,7 @@ def raising_callback(line, stream):
result = _run_agent_blocking(
[sys.executable, "-u", "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=None,
iteration=1,
@@ -1221,7 +1412,7 @@ def always_raises(line, stream):
result = _run_agent_blocking(
[sys.executable, "-u", "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=tmp_path,
iteration=1,
@@ -1349,7 +1540,7 @@ def test_grandchild_inheriting_stdout_does_not_hang(self, tmp_path):
start = time.monotonic()
result = _run_agent_blocking(
[sys.executable, "-c", script],
- prompt="",
+ stdin_text="",
timeout=15,
log_dir=tmp_path,
iteration=1,
@@ -1427,7 +1618,7 @@ def tracking_start_pump(stream, buffer, stream_name, on_output_line):
):
_run_agent_blocking(
[sys.executable, "-c", script],
- prompt="",
+ stdin_text="",
timeout=10,
log_dir=None,
iteration=1,
@@ -1446,3 +1637,154 @@ def tracking_start_pump(stream, buffer, stream_name, on_output_line):
f"{name} reader thread still alive after exception — "
"joins are not in the finally block"
)
+
+
+class TestArgDeliveryStdin:
+ """Tests for the arg-delivery stdin decision threaded through _agent.
+
+ When an adapter's ``deliver_prompt`` returns ``stdin_text=None`` the
+ spawn must use ``stdin=DEVNULL`` and start no writer thread; when it
+ returns a string the prior stdin=PIPE + writer-thread path is preserved
+ byte-for-byte.
+ """
+
+ @patch("ralphify._agent._start_writer_thread")
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_streaming_arg_delivery_uses_devnull_and_no_writer(
+ self, mock_popen, mock_writer
+ ):
+ _run_agent_streaming(
+ ["opencode", "run", "--format", "json", "hi"],
+ None,
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ )
+
+ assert mock_popen.call_args.kwargs["stdin"] == subprocess.DEVNULL
+ mock_writer.assert_not_called()
+
+ @patch("ralphify._agent._start_writer_thread")
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_streaming_stdin_delivery_pipes_and_writes(self, mock_popen, mock_writer):
+ _run_agent_streaming(
+ ["claude", "-p"],
+ "the prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ )
+
+ assert mock_popen.call_args.kwargs["stdin"] == subprocess.PIPE
+ mock_writer.assert_called_once()
+
+ @patch("ralphify._agent._start_writer_thread")
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_blocking_arg_delivery_uses_devnull_and_no_writer(
+ self, mock_popen, mock_writer
+ ):
+ _run_agent_blocking(
+ ["aider", "the prompt"],
+ None,
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ )
+
+ assert mock_popen.call_args.kwargs["stdin"] == subprocess.DEVNULL
+ mock_writer.assert_not_called()
+
+ @patch("ralphify._agent._start_writer_thread")
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_blocking_stdin_delivery_pipes_and_writes(self, mock_popen, mock_writer):
+ _run_agent_blocking(
+ ["aider"],
+ "the prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ )
+
+ assert mock_popen.call_args.kwargs["stdin"] == subprocess.PIPE
+ mock_writer.assert_called_once()
+
+ def test_execute_agent_threads_arg_delivery_through_opencode(self, tmp_path):
+ """execute_agent must route the opencode adapter to DEVNULL stdin.
+
+ The opencode adapter appends the prompt to argv and reports
+ ``stdin_text=None``; execute_agent must spawn with stdin=DEVNULL.
+ """
+ with patch(MOCK_SUBPROCESS, side_effect=ok_proc) as mock_popen:
+ execute_agent(
+ ["opencode", "run"],
+ "do the work",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ )
+
+ spawn_cmd = mock_popen.call_args.args[0]
+ assert spawn_cmd == ["opencode", "run", "--format", "json", "do the work"]
+ assert mock_popen.call_args.kwargs["stdin"] == subprocess.DEVNULL
+
+ def test_arg_delivery_does_not_hang_when_child_ignores_stdin(self, tmp_path):
+ """Real subprocess: an arg-delivery agent that never reads stdin must
+ still complete and have its stdout parsed.
+
+ With ``stdin=DEVNULL`` the child gets immediate EOF; nothing writes
+ to a pipe, so there is no deadlock and the result event is parsed.
+ """
+ # Child writes a JSON result line and exits, never touching stdin.
+ script = (
+ 'import sys; sys.stdout.write(\'{"type": "result", "result": "ok"}\\n\')'
+ )
+
+ start = time.monotonic()
+ result = _run_agent_streaming(
+ [sys.executable, "-u", "-c", script],
+ None,
+ timeout=10,
+ log_dir=tmp_path,
+ iteration=1,
+ )
+ elapsed = time.monotonic() - start
+
+ assert result.returncode == 0
+ assert result.timed_out is False
+ assert result.result_text == "ok"
+ assert elapsed < 9.0
+
+ def test_arg_delivery_devnull_gives_child_eof_when_it_reads_stdin(self, tmp_path):
+ """Real subprocess: a child that *blocks reading stdin* must still
+ finish under arg delivery.
+
+ This is the hardening guard for the DEVNULL choice. The child does
+ ``sys.stdin.read()`` (blocks until EOF) before emitting its result.
+ ``stdin=DEVNULL`` delivers immediate EOF so ``read()`` returns ``""``
+ and the child proceeds. If the spawn ever regressed to leaving stdin
+ as an unwritten open pipe, ``read()`` would block forever and the run
+ would hit the timeout — so a deadline-driven completion here would be
+ a failure, not a pass.
+ """
+ # Child blocks on stdin.read(), then reports what it received.
+ script = (
+ "import sys, json\n"
+ "data = sys.stdin.read()\n"
+ 'sys.stdout.write(json.dumps({"type": "result", "result": data}) + "\\n")\n'
+ )
+
+ start = time.monotonic()
+ result = _run_agent_streaming(
+ [sys.executable, "-u", "-c", script],
+ None,
+ timeout=10,
+ log_dir=tmp_path,
+ iteration=1,
+ )
+ elapsed = time.monotonic() - start
+
+ assert result.timed_out is False, "child blocked on stdin — DEVNULL gave no EOF"
+ assert result.returncode == 0
+ # EOF on DEVNULL means read() returned the empty string, not a hang.
+ assert result.result_text == ""
+ assert elapsed < 9.0
diff --git a/tests/test_cli.py b/tests/test_cli.py
index 79e224ea..b9fb5c28 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -1,6 +1,7 @@
"""Tests for the CLI."""
import importlib
+import re
import signal
from unittest.mock import patch, MagicMock
@@ -24,6 +25,31 @@
runner = CliRunner()
+def _flatten_help(output: str) -> str:
+ no_ansi = re.sub(r"\x1b\[[0-9;]*[a-zA-Z]", "", output)
+ return re.sub(r"\s+", " ", no_ansi)
+
+
+class TestHelp:
+ def test_scaffold_help_mentions_promise_completion(self):
+ result = runner.invoke(app, ["scaffold", "--help"])
+
+ assert result.exit_code == 0
+ flat = _flatten_help(result.output)
+ assert "completion_signal" in flat
+ assert "stop_on_completion_signal" in flat
+ assert "..." in flat
+
+ def test_run_help_mentions_promise_completion(self):
+ result = runner.invoke(app, ["run", "--help"])
+
+ assert result.exit_code == 0
+ flat = _flatten_help(result.output)
+ assert "completion_signal" in flat
+ assert "stop_on_completion_signal" in flat
+ assert "..." in flat
+
+
class TestVersion:
@pytest.mark.parametrize("flag", ["--version", "-V"])
def test_version_flag(self, flag):
@@ -95,6 +121,103 @@ def test_errors_with_malformed_agent_field(self, mock_which, tmp_path, monkeypat
assert result.exit_code == 1
assert "malformed" in result.output.lower()
+ def test_run_uses_default_completion_signal_config(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = make_ralph(tmp_path)
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "3"])
+
+ assert result.exit_code == 0
+ mock_run_loop.assert_called_once()
+ config = mock_run_loop.call_args.args[0]
+ assert config.completion_signal == "RALPH_PROMISE_COMPLETE"
+ assert config.stop_on_completion_signal is False
+ assert config.max_iterations == 3
+
+ def test_run_passes_completion_signal_frontmatter_to_config(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ "completion_signal: CUSTOM_DONE\n"
+ "stop_on_completion_signal: true\n"
+ "---\n"
+ "go"
+ )
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "2"])
+
+ assert result.exit_code == 0
+ mock_run_loop.assert_called_once()
+ config = mock_run_loop.call_args.args[0]
+ assert config.completion_signal == "CUSTOM_DONE"
+ assert config.stop_on_completion_signal is True
+
+ @pytest.mark.parametrize(
+ ("frontmatter_line", "expected_error"),
+ [
+ ("completion_signal: 0", "must be a non-empty string"),
+ (
+ 'completion_signal: " CUSTOM_DONE "',
+ "must not include leading or trailing whitespace",
+ ),
+ (
+ 'completion_signal: "CUSTOM_DONE"',
+ "must be the text inside ...",
+ ),
+ ],
+ ids=["wrong-type", "surrounding-whitespace", "markup-instead-of-text"],
+ )
+ def test_run_rejects_invalid_completion_signal_frontmatter(
+ self, mock_which, tmp_path, monkeypatch, frontmatter_line, expected_error
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ f"{frontmatter_line}\n"
+ "---\n"
+ "go"
+ )
+
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"])
+
+ assert result.exit_code == 1
+ assert "completion_signal" in result.output.lower()
+ assert expected_error in result.output.lower()
+ mock_run_loop.assert_not_called()
+
+ def test_run_rejects_non_boolean_stop_on_completion_signal(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ 'stop_on_completion_signal: "maybe"\n'
+ "---\n"
+ "go"
+ )
+
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"])
+
+ assert result.exit_code == 1
+ assert "stop_on_completion_signal" in result.output.lower()
+ assert "must be true or false" in result.output.lower()
+ mock_run_loop.assert_not_called()
+
@pytest.mark.parametrize(
"frontmatter, expected_error",
[
@@ -517,6 +640,16 @@ def test_creates_ralph_with_name(self, tmp_path, monkeypatch):
assert ralph_file.exists()
assert "Created" in result.output
+ def test_scaffold_output_mentions_promise_completion(self, tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+
+ result = runner.invoke(app, ["scaffold", "my-task"])
+
+ assert result.exit_code == 0
+ assert "completion_signal" in result.output
+ assert "stop_on_completion_signal" in result.output
+ assert "COMPLETE" in result.output
+
def test_creates_ralph_in_cwd(self, tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
result = runner.invoke(app, ["scaffold"])
@@ -556,6 +689,16 @@ def test_template_has_valid_frontmatter(self, tmp_path, monkeypatch):
assert "{{ commands.git-log }}" in body
assert "{{ args.focus }}" in body
+ def test_template_mentions_promise_completion_path(self, tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+
+ runner.invoke(app, ["scaffold", "my-task"])
+ content = (tmp_path / "my-task" / RALPH_MARKER).read_text()
+
+ assert "# completion_signal: COMPLETE" in content
+ assert "# stop_on_completion_signal: true" in content
+ assert "COMPLETE" in content
+
class TestParseUserArgs:
def test_named_flag(self):
@@ -980,8 +1123,14 @@ def test_run_resolves_installed_ralph_by_name(
installed = tmp_path / ".agents" / "ralphs" / "my-tool"
installed.mkdir(parents=True)
(installed / RALPH_MARKER).write_text("---\nagent: claude -p\n---\ngo")
- result = runner.invoke(app, ["run", "my-tool", "-n", "1"])
- # Should attempt to run (may fail at agent exec, but should NOT error on path resolution)
+ # Mock the subprocess so name resolution is exercised without spawning
+ # a real `claude` agent. Without this, the test makes a live agent call
+ # whose non-deterministic output can contain the asserted substrings by
+ # chance (the assertions check that the resolution-failure message is
+ # absent, not anything about the agent's response).
+ with patch(MOCK_SUBPROCESS, side_effect=ok_proc):
+ result = runner.invoke(app, ["run", "my-tool", "-n", "1"])
+ # Resolution should succeed (it must NOT print the path-resolution error).
assert "not a directory" not in result.output.lower()
assert "installed ralph" not in result.output.lower()
diff --git a/tests/test_cli_frontmatter_fields.py b/tests/test_cli_frontmatter_fields.py
new file mode 100644
index 00000000..984d5da6
--- /dev/null
+++ b/tests/test_cli_frontmatter_fields.py
@@ -0,0 +1,236 @@
+"""Tests for the new max_turns / max_turns_grace frontmatter fields."""
+
+from __future__ import annotations
+
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+import typer
+
+from helpers import MOCK_WHICH
+from ralphify.cli import (
+ _build_run_config,
+ _validate_hooks,
+ _validate_max_turns,
+ _validate_max_turns_grace,
+)
+from ralphify.hooks import ShellAgentHook
+
+
+class TestValidateMaxTurns:
+ def test_absent_returns_none(self) -> None:
+ assert _validate_max_turns(None) is None
+
+ def test_positive_int_passes(self) -> None:
+ assert _validate_max_turns(5) == 5
+
+ def test_zero_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(0)
+
+ def test_negative_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(-1)
+
+ def test_bool_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(True)
+
+ def test_string_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns("5")
+
+ def test_float_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(5.0)
+
+
+class TestValidateMaxTurnsGrace:
+ def test_absent_defaults_to_two(self) -> None:
+ assert _validate_max_turns_grace(None, max_turns=None) == 2
+
+ def test_absent_with_cap_defaults_to_two(self) -> None:
+ assert _validate_max_turns_grace(None, max_turns=10) == 2
+
+ def test_zero_allowed(self) -> None:
+ assert _validate_max_turns_grace(0, max_turns=10) == 0
+
+ def test_negative_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(-1, max_turns=10)
+
+ def test_bool_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(True, max_turns=10)
+
+ def test_float_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(1.5, max_turns=10)
+
+ def test_grace_equal_to_cap_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(10, max_turns=10)
+
+ def test_grace_above_cap_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(11, max_turns=10)
+
+ def test_grace_below_cap_allowed(self) -> None:
+ assert _validate_max_turns_grace(3, max_turns=10) == 3
+
+ def test_grace_without_cap_any_non_negative_allowed(self) -> None:
+ # No cap means no upper bound on grace; the value is still retained.
+ assert _validate_max_turns_grace(99, max_turns=None) == 99
+
+
+@patch(MOCK_WHICH, return_value="/usr/bin/claude")
+class TestBuildRunConfigTurnCapFields:
+ def _write_ralph(self, tmp_path: Path, body: str) -> Path:
+ ralph = tmp_path / "RALPH.md"
+ ralph.write_text(body, encoding="utf-8")
+ return ralph
+
+ def test_defaults_when_absent(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\n---\nhello\n",
+ )
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert config.max_turns is None
+ assert config.max_turns_grace == 2
+
+ def test_values_threaded_through(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 20\nmax_turns_grace: 5\n---\nhello\n",
+ )
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert config.max_turns == 20
+ assert config.max_turns_grace == 5
+
+ def test_invalid_max_turns_exits(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 0\n---\nhello\n",
+ )
+ with pytest.raises(typer.Exit):
+ _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+
+ def test_grace_at_or_above_cap_exits(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 5\nmax_turns_grace: 5\n---\nhello\n",
+ )
+ with pytest.raises(typer.Exit):
+ _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+
+
+class TestValidateHooks:
+ def test_absent_returns_empty_list(self) -> None:
+ assert _validate_hooks(None) == []
+
+ def test_valid_hook_builds_shell_agent_hook(self) -> None:
+ hooks = _validate_hooks(
+ [{"event": "on_iteration_started", "run": "./notify.sh"}]
+ )
+ assert len(hooks) == 1
+ assert isinstance(hooks[0], ShellAgentHook)
+
+ def test_multiple_hooks_preserve_order(self) -> None:
+ hooks = _validate_hooks(
+ [
+ {"event": "on_turn_capped", "run": "a"},
+ {"event": "on_completion_signal", "run": "b"},
+ ]
+ )
+ assert len(hooks) == 2
+
+ def test_non_list_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_hooks({"event": "on_tool_use", "run": "x"})
+
+ def test_missing_event_field_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_hooks([{"run": "x"}])
+
+ def test_missing_run_field_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_hooks([{"event": "on_tool_use"}])
+
+ def test_unknown_event_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_hooks([{"event": "on_nonsense", "run": "x"}])
+
+ def test_empty_command_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_hooks([{"event": "on_tool_use", "run": ""}])
+
+
+@patch(MOCK_WHICH, return_value="/usr/bin/claude")
+class TestBuildRunConfigHooks:
+ def _write_ralph(self, tmp_path: Path, body: str) -> Path:
+ ralph = tmp_path / "RALPH.md"
+ ralph.write_text(body, encoding="utf-8")
+ return ralph
+
+ def test_hooks_absent_yields_empty_list(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(tmp_path, "---\nagent: claude -p\n---\nhello\n")
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert config.hooks == []
+
+ def test_hooks_threaded_through(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\n"
+ "agent: claude -p\n"
+ "hooks:\n"
+ " - event: on_turn_capped\n"
+ " run: ./warn.sh\n"
+ "---\nhello\n",
+ )
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert len(config.hooks) == 1
+ assert isinstance(config.hooks[0], ShellAgentHook)
diff --git a/tests/test_console_emitter.py b/tests/test_console_emitter.py
index b6476d46..730b4edd 100644
--- a/tests/test_console_emitter.py
+++ b/tests/test_console_emitter.py
@@ -15,9 +15,9 @@
_IterationPanel,
_IterationSpinner,
_SinglePanelNavigator,
+ _agent_renders_structured_peek,
_format_run_info,
_format_summary,
- _is_claude_command,
_scrollbar_metrics,
_shorten_path,
)
@@ -332,7 +332,11 @@ def test_startup_hint_shown_when_peek_on_by_default(self):
output = console.export_text()
assert "press p to hide" in output
- def test_startup_hint_structured_for_claude(self):
+ def test_startup_hint_structured_for_claude(self, monkeypatch):
+ from ralphify.adapters import select_adapter
+
+ claude_adapter = select_adapter(["claude"])
+ monkeypatch.setattr(claude_adapter, "renders_structured_peek", True)
emitter, console = _capture_emitter()
emitter._peek_enabled = True
emitter.emit(
@@ -1499,24 +1503,33 @@ def test_peek_message_shown_in_spinner(self):
assert "live output on" in output
-class TestIsClaudeCommand:
- def test_claude_binary(self):
- assert _is_claude_command("claude") is True
+class TestAgentRendersStructuredPeek:
+ """Verify the emitter routes through :func:`select_adapter`."""
- def test_claude_with_flags(self):
- assert _is_claude_command("claude --dangerously-skip-permissions") is True
+ def test_claude_adapter_reports_structured_peek(self, monkeypatch):
+ from ralphify.adapters import select_adapter
+ from ralphify.adapters.claude import ClaudeAdapter
- def test_claude_full_path(self):
- assert _is_claude_command("/usr/local/bin/claude -p") is True
+ adapter = select_adapter(["claude"])
+ assert isinstance(adapter, ClaudeAdapter)
+ monkeypatch.setattr(adapter, "renders_structured_peek", True)
+ assert _agent_renders_structured_peek("claude") is True
+ assert (
+ _agent_renders_structured_peek("claude --dangerously-skip-permissions")
+ is True
+ )
+ assert _agent_renders_structured_peek("/usr/local/bin/claude -p") is True
- def test_not_claude(self):
- assert _is_claude_command("aider --yes") is False
+ def test_unknown_agent_falls_back_to_generic(self):
+ # conftest forces every adapter's renders_structured_peek to False;
+ # the GenericAdapter fallback also reports False.
+ assert _agent_renders_structured_peek("aider --yes") is False
- def test_empty(self):
- assert _is_claude_command("") is False
+ def test_empty_string_is_not_structured(self):
+ assert _agent_renders_structured_peek("") is False
- def test_invalid_shlex(self):
- assert _is_claude_command("claude 'unterminated") is False
+ def test_invalid_shlex_is_not_structured(self):
+ assert _agent_renders_structured_peek("claude 'unterminated") is False
def _populate_buffer(spinner, count: int, prefix: str = "line") -> None:
@@ -1858,7 +1871,7 @@ def test_history_eviction_protects_viewed_iteration(self):
) if i > 2 else None
emitter.emit(_make_event(EventType.ITERATION_STARTED, iteration=i))
assert 1 in emitter._iteration_history
- assert len(emitter._iteration_order) <= _MAX_HISTORY_ITERATIONS
+ assert len(emitter._iteration_history) <= _MAX_HISTORY_ITERATIONS
finally:
emitter._stop_live()
diff --git a/tests/test_engine.py b/tests/test_engine.py
index b05f8d64..ce4aa305 100644
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@@ -1,7 +1,9 @@
"""Tests for the run engine."""
+import sys
import threading
import time
+from pathlib import Path
from unittest.mock import patch
import pytest
@@ -20,10 +22,12 @@
)
from rich.console import Console
+from ralphify._agent import AgentResult
from ralphify._console_emitter import ConsoleEmitter
from ralphify._events import BoundEmitter, EventType, NullEmitter, QueueEmitter
-from ralphify._run_types import Command, RunStatus
+from ralphify._run_types import Command, RunConfig, RunStatus
from ralphify._runner import RunResult
+from ralphify.adapters import select_adapter
from ralphify.engine import (
_assemble_prompt,
_delay_if_needed,
@@ -126,6 +130,320 @@ def test_log_dir_creates_files(self, mock_run, tmp_path):
assert log_files[1].name.startswith("002_")
+class TestPromiseCompletionSignals:
+ @patch("ralphify.engine.execute_agent")
+ def test_tagged_promise_does_not_stop_by_default(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Without ``stop_on_completion_signal`` the loop must run all iterations.
+
+ A non-streaming adapter (here ``echo`` → GenericAdapter) only sees
+ promise completion when the engine elects to capture stdout, which
+ in turn requires either logging or the explicit opt-in. Skipping
+ the buffer is the whole point of the gating, so
+ ``state.promise_completed`` legitimately stays False here.
+ """
+ config = make_config(tmp_path, max_iterations=3)
+ state = make_state()
+ emitter = NullEmitter()
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout=None,
+ )
+
+ run_loop(config, state, emitter)
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+ assert [
+ call.kwargs["on_output_line"] for call in mock_execute_agent.call_args_list
+ ] == [None, None, None]
+ assert [
+ call.kwargs["capture_result_text"]
+ for call in mock_execute_agent.call_args_list
+ ] == [True, True, True]
+ # Generic adapter requires full stdout, but the user didn't opt in
+ # and no log dir is set — capture should stay off to avoid
+ # buffering verbose agent output.
+ assert [
+ call.kwargs["capture_stdout"] for call in mock_execute_agent.call_args_list
+ ] == [False, False, False]
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_off_for_streaming_adapter_without_logging(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Claude exposes ``result_text`` directly; the engine must not
+ force-buffer the full stdout transcript even when the user opts
+ into ``stop_on_completion_signal``."""
+ config = make_config(
+ tmp_path,
+ agent="claude",
+ max_iterations=1,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="RALPH_PROMISE_COMPLETE",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is False
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_on_for_blocking_adapter_with_opt_in(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Generic / Copilot adapters need the full stdout buffer to
+ scan for the promise tag — engine must opt in when the user
+ opts into completion signalling."""
+ config = make_config(
+ tmp_path,
+ max_iterations=1,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="RALPH_PROMISE_COMPLETE\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_on_when_log_dir_set(self, mock_execute_agent, tmp_path):
+ """Logging always needs the buffer regardless of completion signal."""
+ log_dir = tmp_path / "logs"
+ config = make_config(tmp_path, max_iterations=1, log_dir=log_dir)
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="anything\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_tagged_promise_stops_early_when_enabled(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=5,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+ emitter = QueueEmitter()
+ emitter.wants_agent_output_lines = lambda: True
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="RALPH_PROMISE_COMPLETE\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ mock_execute_agent.assert_called_once()
+ assert mock_execute_agent.call_args.kwargs["capture_result_text"] is True
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.total == 1
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ events = drain_events(emitter)
+ completed_events = events_of_type(events, EventType.ITERATION_COMPLETED)
+ assert len(completed_events) == 1
+ stop_event = events_of_type(events, EventType.RUN_STOPPED)[0]
+ assert stop_event.data["reason"] == "completed"
+ assert stop_event.data["total"] == 1
+ assert stop_event.data["completed"] == 1
+ assert stop_event.data["failed"] == 0
+ assert stop_event.data["timed_out_count"] == 0
+
+ @patch("ralphify.engine.execute_agent")
+ def test_custom_promise_text_matches_inner_tag_text(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=4,
+ completion_signal="CUSTOM_DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+ emitter = QueueEmitter()
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="CUSTOM_DONE\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ mock_execute_agent.assert_called_once()
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ events = drain_events(emitter)
+ stop_event = events_of_type(events, EventType.RUN_STOPPED)[0]
+ assert stop_event.data["reason"] == "completed"
+ assert stop_event.data["total"] == 1
+ assert stop_event.data["completed"] == 1
+
+ @patch("ralphify.engine.execute_agent")
+ def test_promise_tag_normalizes_inner_whitespace_before_matching(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=4,
+ completion_signal="CUSTOM DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="\n CUSTOM\tDONE \n\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 1
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.total == 1
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_untagged_raw_text_does_not_match_completion_signal(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=3,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="done RALPH_PROMISE_COMPLETE without promise tags",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_different_tagged_promise_text_does_not_match(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=3,
+ completion_signal="CUSTOM_DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="CUSTOM_DONE_NOW",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_structured_agents_ignore_raw_stdout_for_promise_detection(
+ self, mock_execute_agent, tmp_path
+ ):
+ """ClaudeAdapter only looks at ``result`` events — embedded
+ promise tags inside ``status`` or ``assistant`` JSON messages
+ must not trigger early completion."""
+ config = make_config(
+ tmp_path,
+ agent="claude",
+ max_iterations=2,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="done without promise tag",
+ captured_stdout='{"type":"status","message":"RALPH_PROMISE_COMPLETE"}\n',
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 2
+ assert state.completed == 2
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_blocking_captured_stdout_is_echoed_when_peek_is_off(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(tmp_path, max_iterations=1)
+ state = make_state()
+ emitter = QueueEmitter()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="plain blocking output\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ completed_event = events_of_type(
+ drain_events(emitter), EventType.ITERATION_COMPLETED
+ )[0]
+ assert completed_event.data["echo_stdout"] == "plain blocking output\n"
+
+
class TestRunLoopDefaults:
@patch(MOCK_SUBPROCESS, side_effect=ok_proc)
def test_runs_without_emitter(self, mock_run, tmp_path):
@@ -1077,6 +1395,61 @@ def test_ralph_name_is_ralph_dir_name(self, tmp_path):
assert result == "Name: my-ralph"
+class TestInMemoryPrompt:
+ """Curd 3 — RunConfig(prompt=...) runs the body without a file read."""
+
+ def test_assemble_uses_prompt_body_without_reading_file(self, tmp_path):
+ config = RunConfig(
+ agent="echo",
+ ralph_dir=tmp_path,
+ prompt="Search {{ args.dir }} now",
+ args={"dir": "./src"},
+ max_iterations=1,
+ credit=False,
+ )
+ state = make_state()
+ state.iteration = 1
+
+ with patch("pathlib.Path.read_text") as mock_read:
+ result = _assemble_prompt(config, state, {})
+
+ mock_read.assert_not_called()
+ assert result == "Search ./src now"
+
+ def test_prompt_body_is_not_frontmatter_parsed(self, tmp_path):
+ # A leading '---' block stays verbatim — it is the body, not frontmatter.
+ body = "---\nnot: parsed\n---\nreal prompt"
+ config = RunConfig(
+ agent="echo",
+ ralph_dir=tmp_path,
+ prompt=body,
+ max_iterations=1,
+ credit=False,
+ )
+ state = make_state()
+ state.iteration = 1
+
+ assert _assemble_prompt(config, state, {}) == body
+
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_run_loop_with_in_memory_prompt(self, mock_run, tmp_path):
+ config = RunConfig(
+ agent="echo",
+ ralph_dir=tmp_path,
+ prompt="do work",
+ max_iterations=1,
+ )
+ state = make_state()
+ q = QueueEmitter()
+
+ with patch("pathlib.Path.read_text") as mock_read:
+ run_loop(config, state, q)
+
+ mock_read.assert_not_called()
+ assert state.status == RunStatus.COMPLETED
+ assert state.completed == 1
+
+
class TestCreditInLoop:
@patch(MOCK_SUBPROCESS)
def test_credit_instruction_in_agent_input(self, mock_run, tmp_path):
@@ -1202,3 +1575,62 @@ def popen_with_toggle(*args, **kwargs):
assert output.count("first") == 0
assert output.count("second") == 0
assert output.count("third") == 0
+
+
+def _write_opencode_stub(tmp_path: Path) -> Path:
+ """Write an executable ``opencode`` stub emitting opencode-shaped JSON.
+
+ The stub ignores stdin entirely (arg-delivery), prints a tool_use and a
+ step_finish event followed by a promise completion tag, then exits 0.
+ Named ``opencode`` so ``select_adapter`` dispatches to OpenCodeAdapter.
+ """
+ script = tmp_path / "opencode"
+ script.write_text(
+ f"#!{sys.executable}\n"
+ "import sys\n"
+ 'print(\'{"type": "step_start", "part": {}}\', flush=True)\n'
+ 'print(\'{"type": "tool_use", "part": {"name": "Edit"}}\', flush=True)\n'
+ 'print(\'{"type": "step_finish", "part": {"tokens": 10}}\', flush=True)\n'
+ "print('RALPH_PROMISE_COMPLETE', flush=True)\n"
+ )
+ script.chmod(0o755)
+ return script
+
+
+class TestOpenCodeEndToEnd:
+ """End-to-end run_loop with a real arg-delivery opencode stub (FR-9)."""
+
+ def test_opencode_stub_runs_and_completes_on_promise(self, tmp_path, monkeypatch):
+ # The autouse _disable_streaming fixture forces blocking mode on every
+ # adapter; re-enable streaming on the registered opencode instance so
+ # this test exercises the real JSON-parsing activity path.
+ opencode_adapter = select_adapter(["opencode", "run"])
+ monkeypatch.setattr(opencode_adapter, "supports_streaming", True)
+
+ stub = _write_opencode_stub(tmp_path)
+ config = make_config(
+ tmp_path,
+ agent=f"{stub} run",
+ max_iterations=3,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+ emitter = QueueEmitter()
+
+ run_loop(config, state, emitter)
+
+ # Promise tag in stdout stops the loop after the first iteration.
+ assert state.promise_completed is True
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+
+ events = drain_events(emitter)
+ activity = events_of_type(events, EventType.AGENT_ACTIVITY)
+ kinds = [e.data["raw"].get("type") for e in activity]
+ assert "tool_use" in kinds
+ assert "step_finish" in kinds
+
+ stop_event = events_of_type(events, EventType.RUN_STOPPED)[0]
+ assert stop_event.data["reason"] == "completed"
diff --git a/tests/test_hooks.py b/tests/test_hooks.py
new file mode 100644
index 00000000..126a4020
--- /dev/null
+++ b/tests/test_hooks.py
@@ -0,0 +1,124 @@
+"""Tests for the :mod:`ralphify.hooks` lifecycle hook protocol."""
+
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+
+from ralphify.hooks import (
+ AgentHook,
+ CombinedAgentHook,
+ HOOK_EVENT_NAMES,
+ NoOpAgentHook,
+ ShellAgentHook,
+)
+
+
+class _RecordingHook(NoOpAgentHook):
+ """Records every call for assertions."""
+
+ def __init__(self) -> None:
+ self.calls: list[tuple[str, dict[str, Any]]] = []
+
+ def on_iteration_started(self, *, iteration: int) -> None:
+ self.calls.append(("on_iteration_started", {"iteration": iteration}))
+
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None:
+ self.calls.append(
+ (
+ "on_tool_use",
+ {"iteration": iteration, "tool_name": tool_name, "count": count},
+ )
+ )
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None:
+ self.calls.append(("on_turn_capped", {"iteration": iteration, "count": count}))
+
+
+class _RaisingHook(NoOpAgentHook):
+ """Always raises — used to verify fanout isolation."""
+
+ def on_iteration_started(self, *, iteration: int) -> None:
+ raise RuntimeError("boom")
+
+
+def test_noop_hook_satisfies_protocol() -> None:
+ hook = NoOpAgentHook()
+ assert isinstance(hook, AgentHook)
+
+
+def test_combined_fanout_delivers_to_all_hooks() -> None:
+ h1 = _RecordingHook()
+ h2 = _RecordingHook()
+ combined = CombinedAgentHook([h1, h2])
+
+ combined.on_iteration_started(iteration=3)
+ combined.on_tool_use(iteration=3, tool_name="Bash", count=1)
+
+ assert h1.calls == [
+ ("on_iteration_started", {"iteration": 3}),
+ ("on_tool_use", {"iteration": 3, "tool_name": "Bash", "count": 1}),
+ ]
+ assert h2.calls == h1.calls
+
+
+def test_combined_fanout_isolates_exceptions() -> None:
+ raising = _RaisingHook()
+ recording = _RecordingHook()
+ combined = CombinedAgentHook([raising, recording])
+
+ combined.on_iteration_started(iteration=1)
+
+ assert recording.calls == [("on_iteration_started", {"iteration": 1})]
+
+
+def test_combined_fanout_skips_missing_methods() -> None:
+ class _PartialHook:
+ def on_iteration_started(self, *, iteration: int) -> None:
+ pass
+
+ combined = CombinedAgentHook([_PartialHook()])
+ combined.on_turn_capped(iteration=1, count=10)
+
+
+def test_shell_hook_rejects_unknown_event() -> None:
+ with pytest.raises(ValueError, match="unknown hook event"):
+ ShellAgentHook("on_nonexistent_event", "true")
+
+
+def test_shell_hook_swallows_nonzero_exit(caplog: pytest.LogCaptureFixture) -> None:
+ hook = ShellAgentHook("on_iteration_started", "false")
+ with caplog.at_level("WARNING", logger="ralphify.hooks"):
+ hook.on_iteration_started(iteration=1)
+ assert any("exited 1" in record.message for record in caplog.records)
+
+
+def test_shell_hook_swallows_missing_binary(caplog: pytest.LogCaptureFixture) -> None:
+ hook = ShellAgentHook(
+ "on_iteration_started", "/nonexistent/command/that/does/not/exist"
+ )
+ with caplog.at_level("WARNING", logger="ralphify.hooks"):
+ hook.on_iteration_started(iteration=1)
+ assert any("failed to start" in record.message for record in caplog.records)
+
+
+def test_shell_hook_pipes_payload_to_stdin(tmp_path: Any) -> None:
+ out = tmp_path / "payload.json"
+ hook = ShellAgentHook(
+ "on_iteration_started",
+ f"sh -c 'cat > {out}'",
+ )
+ hook.on_iteration_started(iteration=7)
+ assert out.exists()
+ assert '"iteration": 7' in out.read_text()
+
+
+def test_hook_event_names_cover_protocol_methods() -> None:
+ # Sanity: HOOK_EVENT_NAMES should equal the AgentHook method surface.
+ expected = {
+ name
+ for name in dir(NoOpAgentHook)
+ if name.startswith("on_") and not name.startswith("_")
+ }
+ assert HOOK_EVENT_NAMES == expected
diff --git a/tests/test_init.py b/tests/test_init.py
index b0b1caa4..b8bd575a 100644
--- a/tests/test_init.py
+++ b/tests/test_init.py
@@ -1,5 +1,9 @@
"""Tests for ralphify.__init__ — version fallback and main() entry point."""
+import builtins
+
+import pytest
+
from unittest.mock import patch, MagicMock
@@ -46,3 +50,56 @@ def test_main_is_callable(self):
from ralphify import main
assert callable(main)
+
+ def test_main_raises_actionable_error_without_cli_extra(self):
+ """When rich/typer are absent, importing the CLI fails and main() exits
+ with a message pointing at the [cli] extra."""
+ import sys
+ from ralphify import main
+
+ real_import = builtins.__import__
+
+ def fake_import(name, *args, **kwargs):
+ # Emulate an absent CLI dependency: Python reports the missing
+ # *top-level* package, so name= is "typer"/"rich" even for submodules.
+ top = name.split(".")[0]
+ if top in {"typer", "rich"}:
+ raise ModuleNotFoundError(f"No module named {top!r}", name=top)
+ return real_import(name, *args, **kwargs)
+
+ # Drop any cached CLI module so the import is re-attempted.
+ saved = {
+ k: sys.modules.pop(k) for k in list(sys.modules) if k == "ralphify.cli"
+ }
+ try:
+ with patch.object(builtins, "__import__", side_effect=fake_import):
+ with pytest.raises(SystemExit, match=r"ralphify\[cli\]"):
+ main()
+ finally:
+ sys.modules.update(saved)
+
+ def test_main_reraises_unrelated_import_error(self):
+ """A real import bug inside ralphify.cli (not a missing CLI dep) must
+ propagate, not be masked behind the [cli]-extra hint."""
+ import sys
+ from ralphify import main
+
+ real_import = builtins.__import__
+
+ def fake_import(name, *args, **kwargs):
+ if name == "ralphify.cli":
+ raise ModuleNotFoundError(
+ "No module named 'ralphify._does_not_exist'",
+ name="ralphify._does_not_exist",
+ )
+ return real_import(name, *args, **kwargs)
+
+ saved = {
+ k: sys.modules.pop(k) for k in list(sys.modules) if k == "ralphify.cli"
+ }
+ try:
+ with patch.object(builtins, "__import__", side_effect=fake_import):
+ with pytest.raises(ModuleNotFoundError, match="_does_not_exist"):
+ main()
+ finally:
+ sys.modules.update(saved)
diff --git a/tests/test_manager.py b/tests/test_manager.py
index 31362636..1e1df748 100644
--- a/tests/test_manager.py
+++ b/tests/test_manager.py
@@ -9,10 +9,25 @@
from helpers import MOCK_SUBPROCESS, drain_events, event_types, make_config, ok_proc
from ralphify._events import EventType, FanoutEmitter, QueueEmitter
-from ralphify._run_types import RUN_ID_LENGTH, RunStatus
+from ralphify._run_types import RUN_ID_LENGTH, RunResult, RunStatus
from ralphify.manager import ManagedRun, RunManager
+def _returns_without_blocking(fn, timeout=2.0):
+ """Run *fn* in a watchdog thread; fail if it doesn't return in *timeout*.
+
+ Lets us assert the wait helpers never block on empty/unknown run_ids
+ with ``timeout=None`` (the infinite-hang regression) without risking a
+ hung test run if the guard is ever removed.
+ """
+ box = {}
+ thread = threading.Thread(target=lambda: box.update(result=fn()), daemon=True)
+ thread.start()
+ thread.join(timeout)
+ assert not thread.is_alive(), "wait helper blocked instead of returning"
+ return box["result"]
+
+
class TestRunManagerCreateRun:
def test_create_run_returns_managed_run(self, tmp_path):
manager = RunManager()
@@ -256,3 +271,178 @@ def test_extra_listeners_receive_events(self, mock_run, tmp_path):
assert len(primary_events) > 0
assert len(extra_events) == len(primary_events)
+
+
+class TestRunManagerWaitForAny:
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_wait_for_any_returns_first_finisher(self, mock_run, tmp_path):
+ manager = RunManager()
+ # Run A finishes after one iteration; run B runs long with a delay.
+ fast = manager.create_run(make_config(tmp_path, max_iterations=1))
+ slow = manager.create_run(make_config(tmp_path, max_iterations=100, delay=10))
+
+ manager.start_run(fast.state.run_id)
+ manager.start_run(slow.state.run_id)
+
+ finished = manager.wait_for_any(
+ [fast.state.run_id, slow.state.run_id], timeout=5
+ )
+
+ assert fast.state.run_id in finished
+ assert slow.state.run_id not in finished
+
+ manager.shutdown(timeout=5)
+
+ def test_wait_for_any_times_out_to_empty_list(self, tmp_path):
+ manager = RunManager()
+ # Never started — never finishes.
+ managed = manager.create_run(make_config(tmp_path))
+ assert manager.wait_for_any([managed.state.run_id], timeout=0.05) == []
+
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_wait_for_any_ignores_unknown_ids(self, mock_run, tmp_path):
+ # Docstring contract: unknown IDs are never reported as finished,
+ # only the real run that completes is returned.
+ manager = RunManager()
+ real = manager.create_run(make_config(tmp_path, max_iterations=1))
+ manager.start_run(real.state.run_id)
+
+ finished = manager.wait_for_any([real.state.run_id, "ghost"], timeout=5)
+
+ assert finished == [real.state.run_id]
+ assert "ghost" not in finished
+
+ def test_wait_for_any_empty_run_ids_times_out(self):
+ # No IDs can ever finish, so this can only time out to [].
+ manager = RunManager()
+ assert manager.wait_for_any([], timeout=0.05) == []
+
+ def test_wait_for_any_empty_run_ids_no_timeout_returns_immediately(self):
+ # Regression: empty run_ids with timeout=None must NOT block forever.
+ # Nothing can ever notify the condition for an empty set, so the only
+ # honest answer is an immediate [].
+ manager = RunManager()
+ assert _returns_without_blocking(lambda: manager.wait_for_any([])) == []
+
+ def test_wait_for_any_all_unknown_no_timeout_returns_immediately(self):
+ # Regression: all-unknown IDs with timeout=None must not hang.
+ manager = RunManager()
+ assert (
+ _returns_without_blocking(
+ lambda: manager.wait_for_any(["ghost1", "ghost2"])
+ )
+ == []
+ )
+
+
+class TestRunManagerWaitForAll:
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_wait_for_all_returns_true_when_all_finish(self, mock_run, tmp_path):
+ manager = RunManager()
+ a = manager.create_run(make_config(tmp_path, max_iterations=1))
+ b = manager.create_run(make_config(tmp_path, max_iterations=1))
+
+ manager.start_run(a.state.run_id)
+ manager.start_run(b.state.run_id)
+
+ assert manager.wait_for_all([a.state.run_id, b.state.run_id], timeout=5) is True
+ assert a.state.status == RunStatus.COMPLETED
+ assert b.state.status == RunStatus.COMPLETED
+
+ def test_wait_for_all_times_out_to_false(self, tmp_path):
+ manager = RunManager()
+ managed = manager.create_run(make_config(tmp_path))
+ assert manager.wait_for_all([managed.state.run_id], timeout=0.05) is False
+
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_wait_for_all_false_when_an_id_is_unknown(self, mock_run, tmp_path):
+ # Docstring contract: an unknown ID can never finish, so even when
+ # the real run completes the whole set never resolves -> times out.
+ manager = RunManager()
+ real = manager.create_run(make_config(tmp_path, max_iterations=1))
+ manager.start_run(real.state.run_id)
+ manager.wait_for_all([real.state.run_id], timeout=5) # let real finish
+
+ assert manager.wait_for_all([real.state.run_id, "ghost"], timeout=0.05) is False
+
+ def test_wait_for_all_empty_run_ids_is_true(self):
+ # Vacuously satisfied: no runs to wait on, so all (zero) are finished.
+ manager = RunManager()
+ assert manager.wait_for_all([], timeout=0.05) is True
+
+ def test_wait_for_all_unknown_id_no_timeout_returns_immediately(self):
+ # Regression: an unknown ID can never finish, so wait_for_all with
+ # timeout=None must return False immediately instead of blocking forever.
+ manager = RunManager()
+ assert (
+ _returns_without_blocking(lambda: manager.wait_for_all(["ghost"])) is False
+ )
+
+
+class TestRunManagerGetResult:
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_get_result_matches_run_state_counts(self, mock_run, tmp_path):
+ manager = RunManager()
+ managed = manager.create_run(make_config(tmp_path, max_iterations=3))
+ run_id = managed.state.run_id
+
+ manager.start_run(run_id)
+ assert manager.wait_for_all([run_id], timeout=5) is True
+
+ result = manager.get_result(run_id)
+ state = managed.state
+ assert isinstance(result, RunResult)
+ assert result.run_id == run_id
+ assert result.status == state.status
+ assert result.total == state.total
+ assert result.completed == state.completed
+ assert result.failed == state.failed
+ assert result.timed_out_count == state.timed_out_count
+ assert result.completed == 3
+
+ def test_get_result_raises_key_error_for_unknown_id(self):
+ manager = RunManager()
+ with pytest.raises(KeyError, match="No run with ID 'nope'"):
+ manager.get_result("nope")
+
+ def test_get_result_snapshots_non_terminal_run(self, tmp_path):
+ # Docstring contract: returns current counts "regardless of terminal
+ # state". An unstarted run is PENDING with zeroed counters.
+ manager = RunManager()
+ managed = manager.create_run(make_config(tmp_path, max_iterations=3))
+
+ result = manager.get_result(managed.state.run_id)
+
+ assert result.status == RunStatus.PENDING
+ assert result.total == 0
+ assert result.completed == 0
+ assert result.failed == 0
+ assert result.timed_out_count == 0
+
+
+class TestRunManagerShutdown:
+ @patch(MOCK_SUBPROCESS, side_effect=ok_proc)
+ def test_shutdown_stops_and_joins_live_runs(self, mock_run, tmp_path):
+ manager = RunManager()
+ a = manager.create_run(make_config(tmp_path, max_iterations=100, delay=10))
+ b = manager.create_run(make_config(tmp_path, max_iterations=100, delay=10))
+
+ manager.start_run(a.state.run_id)
+ manager.start_run(b.state.run_id)
+ time.sleep(0.05)
+
+ assert manager.shutdown(timeout=5) is True
+ assert a.thread is not None and not a.thread.is_alive()
+ assert b.thread is not None and not b.thread.is_alive()
+ assert a.state.status == RunStatus.STOPPED
+ assert b.state.status == RunStatus.STOPPED
+
+ def test_shutdown_with_no_runs_returns_true(self):
+ manager = RunManager()
+ assert manager.shutdown(timeout=1) is True
+
+ def test_shutdown_ignores_unstarted_runs(self, tmp_path):
+ manager = RunManager()
+ manager.create_run(make_config(tmp_path))
+ # No thread to join; request_stop is harmless.
+ assert manager.shutdown(timeout=1) is True
diff --git a/tests/test_promise.py b/tests/test_promise.py
new file mode 100644
index 00000000..55125e34
--- /dev/null
+++ b/tests/test_promise.py
@@ -0,0 +1,47 @@
+"""Tests for strict promise-tag parsing."""
+
+import pytest
+
+from ralphify._promise import has_promise_completion, parse_promise_tags
+
+
+class TestParsePromiseTags:
+ @pytest.mark.parametrize(
+ "raw_text", [None, "", "plain text", "missing close"]
+ )
+ def test_parse_promise_tags_invalid_input_returns_empty_list(self, raw_text):
+ assert parse_promise_tags(raw_text) == []
+
+ def test_parse_promise_tags_normalizes_whitespace_and_preserves_unicode(self):
+ text = (
+ "before "
+ "\n CUSTOM\tDONE \n "
+ "middle "
+ "✅ shipped"
+ )
+
+ assert parse_promise_tags(text) == ["CUSTOM DONE", "✅ shipped"]
+
+
+class TestHasPromiseCompletion:
+ def test_has_promise_completion_matches_only_exact_tag_payload(self):
+ text = (
+ "raw CUSTOM_DONE text "
+ "CUSTOM_DONE_NOW "
+ "CUSTOM_DONE"
+ )
+
+ assert has_promise_completion(text, "CUSTOM_DONE") is True
+ assert has_promise_completion(text, "CUSTOM_DONE_NOW") is True
+ assert has_promise_completion(text, "CUSTOM") is False
+
+ def test_has_promise_completion_ignores_wrong_case_and_malformed_tags(self):
+ text = "CUSTOM_DONECUSTOM_DONE"
+
+ assert has_promise_completion(text, "CUSTOM_DONE") is False
+
+ def test_has_promise_completion_normalizes_completion_signal_whitespace(self):
+ text = "\n CUSTOM\tDONE \n"
+
+ assert has_promise_completion(text, "CUSTOM DONE") is True
+ assert has_promise_completion(text, "CUSTOM\tDONE") is True
diff --git a/tests/test_run_types.py b/tests/test_run_types.py
index f7aa9bdd..075f49a8 100644
--- a/tests/test_run_types.py
+++ b/tests/test_run_types.py
@@ -11,6 +11,7 @@
RUN_ID_LENGTH,
Command,
RunConfig,
+ RunResult,
RunState,
RunStatus,
generate_run_id,
@@ -64,6 +65,59 @@ def test_defaults(self, tmp_path):
assert config.stop_on_error is False
assert config.log_dir is None
assert config.credit is True
+ assert config.prompt is None
+
+ def test_prompt_body_instead_of_ralph_file(self, tmp_path):
+ config = RunConfig(
+ agent="echo",
+ ralph_dir=tmp_path,
+ prompt="do work",
+ )
+ assert config.prompt == "do work"
+ assert config.ralph_file is None
+
+ def test_requires_prompt_or_ralph_file(self, tmp_path):
+ with pytest.raises(ValueError, match="exactly one of `prompt` or `ralph_file`"):
+ RunConfig(agent="echo", ralph_dir=tmp_path)
+
+ def test_rejects_both_prompt_and_ralph_file(self, tmp_path):
+ with pytest.raises(ValueError, match="exactly one of `prompt` or `ralph_file`"):
+ RunConfig(
+ agent="echo",
+ ralph_dir=tmp_path,
+ ralph_file=tmp_path / RALPH_MARKER,
+ prompt="do work",
+ )
+
+
+class TestRunResult:
+ def test_holds_status_and_counts(self):
+ result = RunResult(
+ run_id="r1",
+ status=RunStatus.COMPLETED,
+ total=3,
+ completed=2,
+ failed=1,
+ timed_out_count=0,
+ )
+ assert result.run_id == "r1"
+ assert result.status == RunStatus.COMPLETED
+ assert result.total == 3
+ assert result.completed == 2
+ assert result.failed == 1
+ assert result.timed_out_count == 0
+
+ def test_is_frozen(self):
+ result = RunResult(
+ run_id="r1",
+ status=RunStatus.COMPLETED,
+ total=1,
+ completed=1,
+ failed=0,
+ timed_out_count=0,
+ )
+ with pytest.raises(AttributeError):
+ result.completed = 5 # type: ignore[misc]
class TestRunState:
diff --git a/tests/test_turn_cap.py b/tests/test_turn_cap.py
new file mode 100644
index 00000000..06d72f08
--- /dev/null
+++ b/tests/test_turn_cap.py
@@ -0,0 +1,376 @@
+"""Turn-cap enforcement and soft wind-down across adapters.
+
+Covers the behaviours that distinguish how each adapter participates in
+``max_turns``:
+
+- Streaming adapters that count tool uses (claude / codex / opencode) are
+ preempted at the cap.
+- Adapters that count nothing (crush) treat ``max_turns`` as a no-op.
+- Adapters with no hook system (copilot / crush / opencode / generic)
+ downgrade soft wind-down to hard-cap-only via ``NotImplementedError``.
+- The engine emits ``ITERATION_TURN_CAPPED`` and fans the signal to hooks.
+"""
+
+from __future__ import annotations
+
+import io
+import json
+import sys
+from unittest.mock import patch
+
+import ralphify._agent as agent_mod
+from helpers import drain_events, event_types, make_config, make_state
+
+from ralphify._agent import (
+ AgentResult,
+ _atomic_write_counter,
+ _count_tool_uses_post_hoc,
+ _read_agent_stream,
+ _run_agent_blocking,
+ _setup_wind_down,
+ _wrap_tool_use_with_counter,
+)
+from ralphify._events import EventType, QueueEmitter
+from ralphify.adapters import select_adapter
+from ralphify.adapters.claude import ClaudeAdapter
+from ralphify.adapters.codex import CodexAdapter
+from ralphify.adapters.copilot import CopilotAdapter
+from ralphify.adapters.crush import CrushAdapter
+from ralphify.adapters.opencode import OpenCodeAdapter
+from ralphify.engine import run_loop
+from ralphify.hooks import NoOpAgentHook
+
+
+class _RecordingHook(NoOpAgentHook):
+ """Hook that records the turn-cap callbacks it receives."""
+
+ def __init__(self) -> None:
+ self.capped: list[int] = []
+ self.tool_uses: list[tuple[str, int]] = []
+
+ def on_turn_capped(self, *, iteration: int, count: int) -> None:
+ self.capped.append(count)
+
+ def on_tool_use(self, *, iteration: int, tool_name: str, count: int) -> None:
+ self.tool_uses.append((tool_name, count))
+
+
+# ── opencode: counts_what == "tool_use" feeds the cap ──────────────────
+
+
+def test_opencode_tool_use_events_count_toward_cap() -> None:
+ """opencode tool_use events are counted and preempt at the cap."""
+ adapter = OpenCodeAdapter()
+ stream = io.StringIO(
+ '{"type":"tool_use","name":"read"}\n'
+ '{"type":"text","part":{"text":"thinking"}}\n'
+ '{"type":"tool_use","name":"edit"}\n'
+ '{"type":"tool_use","name":"bash"}\n'
+ )
+ seen: list[tuple[str, int]] = []
+
+ result = _read_agent_stream(
+ stream,
+ deadline=None,
+ on_activity=None,
+ adapter=adapter,
+ max_turns=2,
+ on_tool_use=lambda name, count: seen.append((name, count)),
+ )
+
+ # The third tool_use is never reached: the cap fires at 2.
+ assert result.turn_capped is True
+ assert result.tool_use_count == 2
+ assert seen == [("read", 1), ("edit", 2)]
+
+
+def test_opencode_counts_without_cap_do_not_trip() -> None:
+ """With no cap, opencode counts every tool use and never caps."""
+ adapter = OpenCodeAdapter()
+ stream = io.StringIO(
+ '{"type":"tool_use","name":"read"}\n'
+ '{"type":"tool_use","name":"edit"}\n'
+ '{"type":"tool_use","name":"bash"}\n'
+ )
+
+ result = _read_agent_stream(
+ stream, deadline=None, on_activity=None, adapter=adapter, max_turns=None
+ )
+
+ assert result.turn_capped is False
+ assert result.tool_use_count == 3
+
+
+# ── crush: counts_what == "none" makes max_turns a no-op ───────────────
+
+
+def test_crush_max_turns_is_graceful_noop() -> None:
+ """crush emits no countable events, so the cap can never fire."""
+ adapter = CrushAdapter()
+ stdout_lines = [
+ "Did some work.\n",
+ "COMPLETE\n",
+ ]
+
+ count, capped = _count_tool_uses_post_hoc(
+ adapter=adapter,
+ stdout_lines=stdout_lines,
+ max_turns=1,
+ on_tool_use=lambda name, c: None,
+ )
+
+ assert count == 0
+ assert capped is False
+
+
+# ── soft wind-down downgrade to hard-cap-only ──────────────────────────
+
+
+def test_wind_down_downgrades_for_adapters_without_hooks(tmp_path) -> None:
+ """copilot / crush / opencode / generic have no hook system → no setup."""
+ for adapter in (
+ CopilotAdapter(),
+ CrushAdapter(),
+ OpenCodeAdapter(),
+ select_adapter(["echo"]), # GenericAdapter fallback
+ ):
+ ctx = _setup_wind_down(
+ adapter=adapter,
+ max_turns=5,
+ max_turns_grace=2,
+ log_dir=None,
+ iteration=1,
+ )
+ assert ctx is None, f"{adapter.name} should not set up wind-down"
+
+
+def test_wind_down_setup_for_supporting_adapters() -> None:
+ """claude / codex write hook config and return an env override."""
+ for adapter in (ClaudeAdapter(), CodexAdapter()):
+ ctx = _setup_wind_down(
+ adapter=adapter,
+ max_turns=5,
+ max_turns_grace=2,
+ log_dir=None,
+ iteration=1,
+ )
+ assert ctx is not None, f"{adapter.name} should set up wind-down"
+ try:
+ assert ctx.counter_path.read_text(encoding="utf-8") == "0"
+ assert ctx.env_overrides # non-empty config-dir override
+ finally:
+ ctx.cleanup()
+ assert not ctx.tempdir.exists()
+
+
+def test_wind_down_skipped_when_grace_zero() -> None:
+ """Grace of 0 opts out of the warning window even for claude."""
+ ctx = _setup_wind_down(
+ adapter=ClaudeAdapter(),
+ max_turns=5,
+ max_turns_grace=0,
+ log_dir=None,
+ iteration=1,
+ )
+ assert ctx is None
+
+
+# ── engine surfaces the cap as an event + hook callback ────────────────
+
+
+def test_engine_emits_turn_capped_event_and_fans_to_hook(tmp_path) -> None:
+ """A capped iteration emits ITERATION_TURN_CAPPED and notifies the hook."""
+ hook = _RecordingHook()
+ config = make_config(tmp_path, max_turns=3, max_iterations=1, hooks=[hook])
+ state = make_state()
+ emitter = QueueEmitter()
+
+ with patch("ralphify.engine.execute_agent") as mock_execute:
+ mock_execute.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ tool_use_count=3,
+ turn_capped=True,
+ )
+ run_loop(config, state, emitter)
+
+ types = event_types(drain_events(emitter))
+ assert EventType.ITERATION_TURN_CAPPED in types
+ assert hook.capped == [3]
+ # A capped iteration counts as completed, not failed.
+ assert state.completed == 1
+ assert state.failed == 0
+
+
+# ── blocking path forces buffering so the post-hoc cap can count ───────
+
+
+def test_blocking_path_forces_buffering_for_post_hoc_cap(tmp_path) -> None:
+ """A blocking adapter with max_turns must buffer stdout to count tool uses.
+
+ Regression: post-hoc counting re-scans ``stdout_lines``, which is
+ ``None`` unless ``log_dir``/``capture_stdout`` forces buffering. Before
+ the fix, a blocking adapter (Copilot) with ``max_turns`` set but no log
+ dir reported ``tool_use_count == 0`` and never set ``turn_capped`` — the
+ cap was a silent no-op. Buffering must now be forced whenever a cap is
+ set on a tool-use-counting adapter.
+ """
+ script = (
+ 'print(\'{"type":"tool_use","name":"read"}\'); '
+ 'print(\'{"type":"tool_use","name":"edit"}\'); '
+ 'print(\'{"type":"tool_use","name":"bash"}\')'
+ )
+
+ result = _run_agent_blocking(
+ [sys.executable, "-c", script],
+ stdin_text="",
+ timeout=10,
+ log_dir=None, # no log buffering
+ iteration=1,
+ capture_stdout=False, # caller did not request capture
+ adapter=CopilotAdapter(),
+ max_turns=2,
+ )
+
+ # All three tool uses are counted post-hoc; 3 >= cap of 2 → capped.
+ assert result.tool_use_count == 3
+ assert result.turn_capped is True
+
+
+def test_blocking_path_no_buffering_without_cap(tmp_path) -> None:
+ """Without a cap, the blocking path stays unbuffered and counts nothing."""
+ script = 'print(\'{"type":"tool_use","name":"read"}\')'
+
+ result = _run_agent_blocking(
+ [sys.executable, "-c", script],
+ stdin_text="",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ capture_stdout=False,
+ adapter=CopilotAdapter(),
+ max_turns=None, # no cap → no forced buffering, no post-hoc count
+ )
+
+ assert result.tool_use_count == 0
+ assert result.turn_capped is False
+
+
+# ── counter callback isolates a raising subscriber ─────────────────────
+
+
+def test_wrap_counter_swallows_subscriber_exception(tmp_path) -> None:
+ """A raising on_tool_use subscriber must not crash the wrapped callback.
+
+ Regression: the wrapped counter callback invoked the downstream
+ subscriber directly, bypassing _call_safely and contradicting the
+ ToolUseCallback swallow-exceptions contract.
+ """
+ counter_path = tmp_path / "counter"
+
+ def boom(name: str, count: int) -> None:
+ raise RuntimeError("subscriber blew up")
+
+ wrapped = _wrap_tool_use_with_counter(boom, counter_path)
+ assert wrapped is not None
+
+ # Must not propagate, and the counter write must still have happened.
+ wrapped("read", 1)
+ assert counter_path.read_text(encoding="utf-8") == "1"
+
+
+# ── wind-down grace is clamped below the cap ───────────────────────────
+
+
+def test_wind_down_grace_clamped_below_cap() -> None:
+ """grace >= max_turns is clamped so the shim threshold can't collapse to 0.
+
+ Regression: an unclamped grace (reachable via the Python API, which
+ does not validate it the way the CLI does) was passed straight to the
+ shim, whose threshold ``max(cap - grace, 0)`` then fired the wind-down
+ nudge on the very first tool use.
+ """
+ ctx = _setup_wind_down(
+ adapter=ClaudeAdapter(),
+ max_turns=3,
+ max_turns_grace=10, # exceeds the cap
+ log_dir=None,
+ iteration=1,
+ )
+ assert ctx is not None
+ try:
+ settings = json.loads(
+ (ctx.tempdir / "settings.json").read_text(encoding="utf-8")
+ )
+ command = settings["hooks"]["PreToolUse"][0]["hooks"][0]["command"]
+ # Tail of the shim command is "... claude".
+ parts = command.split()
+ assert parts[-1] == "claude"
+ cap_arg, grace_arg = int(parts[-3]), int(parts[-2])
+ assert cap_arg == 3
+ assert grace_arg == 2 # clamped from 10 to max(cap - 1, 0)
+ finally:
+ ctx.cleanup()
+
+
+# ── orphaned counter file is cleaned up on NotImplementedError ─────────
+
+
+class _RaisingHookAdapter:
+ """Minimal adapter that claims wind-down support but raises on install.
+
+ Exercises the defensive ``NotImplementedError`` branch of
+ ``_setup_wind_down`` — no shipped adapter currently reaches it, since
+ non-supporting adapters bail at the capability-flag check first.
+ """
+
+ name = "raiser"
+ supports_soft_wind_down = True
+
+ def install_wind_down_hook(self, *, tempdir, counter_path, cap, grace):
+ raise NotImplementedError
+
+
+def test_counter_removed_when_install_raises(tmp_path) -> None:
+ """The log_dir counter file must not be orphaned if install raises.
+
+ Regression: the NotImplementedError branch removed only the tempdir,
+ leaving the "0" counter file behind when it lived in log_dir.
+ """
+ log_dir = tmp_path / "logs"
+ log_dir.mkdir()
+
+ ctx = _setup_wind_down(
+ adapter=_RaisingHookAdapter(),
+ max_turns=3,
+ max_turns_grace=2,
+ log_dir=log_dir,
+ iteration=1,
+ )
+
+ assert ctx is None
+ leftover = list(log_dir.iterdir())
+ assert leftover == [], f"counter file orphaned in log_dir: {leftover}"
+
+
+# ── counter-write failure is logged once, not silently swallowed ───────
+
+
+def test_counter_write_failure_logs_once(tmp_path, caplog) -> None:
+ """A failing counter write logs one WARNING; repeats stay quiet.
+
+ Regression: _atomic_write_counter swallowed all OSError silently, so a
+ persistently-broken wind-down left no operator signal.
+ """
+ bad_path = tmp_path / "missing-dir" / "counter" # parent does not exist
+ original_latch = agent_mod._counter_write_failure_logged
+ agent_mod._counter_write_failure_logged = False
+ try:
+ with caplog.at_level("WARNING", logger="ralphify._agent"):
+ _atomic_write_counter(bad_path, 1)
+ _atomic_write_counter(bad_path, 2) # second failure must be quiet
+ warnings = [r for r in caplog.records if r.levelname == "WARNING"]
+ assert len(warnings) == 1
+ assert "wind-down" in warnings[0].getMessage().lower()
+ finally:
+ agent_mod._counter_write_failure_logged = original_latch
diff --git a/tests/test_wind_down_shim.py b/tests/test_wind_down_shim.py
new file mode 100644
index 00000000..048038b5
--- /dev/null
+++ b/tests/test_wind_down_shim.py
@@ -0,0 +1,92 @@
+"""Tests for the wind-down shim invoked by Claude/Codex hook configs."""
+
+from __future__ import annotations
+
+import json
+
+from ralphify import _wind_down_shim as shim
+
+
+def test_emits_claude_payload_when_threshold_reached(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("8")
+ rc = shim.main(["prog", str(counter), "10", "2", shim.CLAUDE])
+ assert rc == 0
+ captured = capsys.readouterr()
+ payload = json.loads(captured.out)
+ assert payload["hookSpecificOutput"]["hookEventName"] == "PreToolUse"
+ msg = payload["hookSpecificOutput"]["additionalContext"]
+ assert "8 of 10" in msg
+ assert "Wrap up" in msg
+
+
+def test_emits_codex_payload_when_threshold_reached(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("5")
+ rc = shim.main(["prog", str(counter), "6", "1", shim.CODEX])
+ assert rc == 0
+ captured = capsys.readouterr()
+ payload = json.loads(captured.out)
+ assert "systemMessage" in payload
+ assert "5 of 6" in payload["systemMessage"]
+
+
+def test_no_output_when_below_threshold(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("3")
+ rc = shim.main(["prog", str(counter), "10", "2", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_threshold_clamped_to_zero_fires_at_count_zero(tmp_path, capsys) -> None:
+ """grace > cap clamps threshold to 0; count == 0 satisfies the >= check."""
+ counter = tmp_path / "turncount"
+ counter.write_text("0")
+ rc = shim.main(["prog", str(counter), "3", "5", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out != ""
+
+
+def test_missing_counter_treated_as_zero(tmp_path, capsys) -> None:
+ rc = shim.main(["prog", str(tmp_path / "missing"), "10", "2", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_unknown_agent_is_noop(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("99")
+ rc = shim.main(["prog", str(counter), "10", "2", "copilot"])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_too_few_args_is_noop(capsys) -> None:
+ rc = shim.main(["prog", "only", "two"])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_non_integer_cap_is_noop(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("5")
+ rc = shim.main(["prog", str(counter), "not-a-number", "2", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_corrupt_counter_treated_as_zero(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("not-a-number\n")
+ rc = shim.main(["prog", str(counter), "10", "2", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
+
+
+def test_blank_counter_treated_as_zero(tmp_path, capsys) -> None:
+ counter = tmp_path / "turncount"
+ counter.write_text("")
+ rc = shim.main(["prog", str(counter), "10", "2", shim.CLAUDE])
+ assert rc == 0
+ assert capsys.readouterr().out == ""
diff --git a/uv.lock b/uv.lock
index 4ff37b5e..60ad1000 100644
--- a/uv.lock
+++ b/uv.lock
@@ -686,6 +686,10 @@ version = "0.4.0b3"
source = { editable = "." }
dependencies = [
{ name = "pyyaml" },
+]
+
+[package.optional-dependencies]
+cli = [
{ name = "rich" },
{ name = "typer" },
]
@@ -697,6 +701,7 @@ dev = [
{ name = "mkdocs-material" },
{ name = "pytest" },
{ name = "pytest-cov" },
+ { name = "ralphify", extra = ["cli"] },
{ name = "ruff" },
{ name = "ty" },
]
@@ -704,9 +709,10 @@ dev = [
[package.metadata]
requires-dist = [
{ name = "pyyaml", specifier = ">=6.0" },
- { name = "rich", specifier = ">=13.0" },
- { name = "typer", specifier = ">=0.9" },
+ { name = "rich", marker = "extra == 'cli'", specifier = ">=13.0" },
+ { name = "typer", marker = "extra == 'cli'", specifier = ">=0.9" },
]
+provides-extras = ["cli"]
[package.metadata.requires-dev]
dev = [
@@ -715,6 +721,7 @@ dev = [
{ name = "mkdocs-material", specifier = ">=9.7.5" },
{ name = "pytest", specifier = ">=8.0" },
{ name = "pytest-cov", specifier = ">=6.0" },
+ { name = "ralphify", extras = ["cli"] },
{ name = "ruff", specifier = ">=0.6" },
{ name = "ty", specifier = ">=0.0.14" },
]
diff --git a/workspace/ralphs/improve-codebase/PLAN.md b/workspace/ralphs/improve-codebase/PLAN.md
new file mode 100644
index 00000000..f6a3fcaf
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/PLAN.md
@@ -0,0 +1,56 @@
+# Improve Codebase — Plan
+
+Ralphify is a small, well-tested Python CLI (~4.7k LOC src, 628 tests passing,
+ruff + ty clean). Recent commits have been steady refactor work targeting
+`_console_emitter.py` (the biggest file at ~1.6k LOC). Continue that thread:
+squeeze out duplication and complexity in the hottest files first, then fan
+out to smaller polish.
+
+## Phases
+
+1. **Dead code / unused symbols** — private helpers with no callers, stale
+ constants, dead branches. Verify with grep + tests.
+2. **Duplication** — copy-pasted blocks (especially in `_console_emitter.py`
+ and `_agent.py`). Extract helpers where extraction does not hurt clarity.
+3. **Magic values & local constants** — stringly/numeric literals that repeat
+ inside a single module; lift to a named constant near the top.
+4. **Complex conditionals & long functions** — split `_console_emitter.py`
+ functions that juggle many states; extract pure helpers.
+5. **Naming & structure** — vague names, misplaced helpers, module-level
+ imports that could collapse.
+6. **Tests** — tighten unclear test names, merge duplicated fixtures.
+
+Each iteration must preserve behavior. If an "improvement" changes observable
+behavior (even by one log line), skip it and leave a note in backlog.md.
+
+## Current phase
+
+**Phase 4 — complex conditionals & long functions.** Phase 3 (magic values)
+is now essentially drained: every module-level scan for numeric literals
+≥ 10 across `src/ralphify/` turns up only named constants. Phase 1 (dead
+code) and Phase 2 (duplication) stay open for anything spotted in passing.
+Move on to simplifying local control flow where a variable is computed
+unconditionally but only used on one branch, or where a helper can tighten
+a nested conditional without losing clarity. The 134078d `name_col`
+scope narrowing is a representative Phase 4 move: same output, dead work
+gone, clearer scope.
+
+## Priorities (tailored to this repo)
+
+- The largest module is `_console_emitter.py`; every iteration there should
+ leave the module smaller *or* clearer, never both-at-once.
+- `_agent.py` has two execution paths (streaming / blocking) that historically
+ drift apart — watch for duplication.
+- Constants like `_MAX_VISIBLE_SCROLL`, `_MAX_SCROLL_LINES`,
+ `_SIGTERM_GRACE_PERIOD` live where they're used. Don't centralize them
+ unless two modules need the same value.
+- Do not churn public API: `src/ralphify/__init__.py` re-exports, CLI
+ commands, and event payload types.
+- Do not change docs wording in this ralph; that's for other ralphs.
+
+## Out of scope
+
+- New features or behavior changes
+- Dependency upgrades
+- Docs content (beyond fixing stale contributor notes encountered in passing)
+- Release tooling
diff --git a/workspace/ralphs/improve-codebase/backlog.md b/workspace/ralphs/improve-codebase/backlog.md
new file mode 100644
index 00000000..5c647856
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/backlog.md
@@ -0,0 +1,118 @@
+# Backlog
+
+Ordered roughly by phase, then by expected payoff. Add items freely; remove
+only when they land in a commit.
+
+## Phase 1 — dead code
+
+- Audit `_console_emitter.py` for unused private helpers / constants (grep
+ each `_foo` name for other references inside the module and tests).
+- Audit `_agent.py` for parallel streaming/blocking helpers that reference
+ the same constants but define their own copies. (cb61477 — extracted
+ `_call_safely` for the 3× best-effort observer-callback pattern; no
+ remaining obvious dup after that pass. Streaming's `_readline_pump` and
+ blocking's `_pump_stream` look similar but do genuinely different work:
+ the queue-based pump feeds a main-thread loop that parses JSON, while
+ the list-based pump does its callback work inline on its own thread.)
+- Check `cli.py` validators for unreachable error branches after recent
+ TypedDict refactors.
+- Confirm every `from typing import ...` import in `src/ralphify/` is used.
+ (Checked 4ccfa9a — all six modules import only what they use.)
+- vulture 60% flags that were verified as live: `clear_scroll`,
+ `_SinglePanelNavigator`, `_stop_live`, `serialize_frontmatter`,
+ `to_dict`, `_atexit_hook`, RunManager public methods — all used in tests,
+ docs, or scripts/. TypedDict field "unused" warnings are spurious.
+- Consider inlining `_validate_name` into `_check_unique_name` in `cli.py`
+ (the former has exactly one caller). Tradeoff: the split doc-strings
+ document the two concerns (format vs uniqueness) cleanly.
+- `_is_claude_command` (`_console_emitter.py`) and `_supports_stream_json`
+ (`_agent.py`) both check `Path(parts[0]).stem == CLAUDE_BINARY` but on
+ different inputs (string vs list). Consolidating would cross module
+ boundaries for modest payoff — revisit only if a third caller appears.
+
+## Phase 2 — duplication
+
+- Look for repeated `console.print(...)` formatting patterns in
+ `_console_emitter.py`.
+- Look for repeated dict/TypedDict key access patterns in the event handlers.
+- The fullscreen-Live teardown (`self._fullscreen_live.stop(); = None`)
+ appears in `_stop_live_unlocked` and `_teardown_fullscreen_unlocked` —
+ only two call sites and each is adjacent to other state mutations, so
+ extracting right now would just add indirection. Revisit if a third
+ caller appears.
+- The `try: fn(); except Exception: pass` pattern appears in
+ `_print_or_defer_unlocked`, `_flush_deferred_unlocked` (loop body),
+ and around `handle_key`'s body. Could extract a tiny `_safe_call`
+ but each site is one line; not worth the indirection unless a fourth
+ appears.
+- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer`
+ both create `summary = Text(no_wrap=True, overflow="ellipsis")` then
+ branch on count > 0 vs "waiting…". Two subclasses only — already
+ noted in coverage as not-worth-extracting.
+- (01f2f1c — dropped `_FullscreenPeek._reset_view` which had the same body
+ as `scroll_to_bottom`.) No other near-duplicate scroll helpers spotted
+ in that class; `scroll_up` / `scroll_down` / `scroll_to_top` each touch
+ `_auto_scroll` under different conditions. (b19625e — dropped the
+ `new_offset` alias in `scroll_down`; `scroll_up` keeps its local
+ because it compares old vs new before the assignment.)
+
+## Phase 3 — magic values
+
+Essentially drained. Latest full scan (at 134078d): every bare integer
+≥ 10 across `src/ralphify/` already resolves to a named constant, and
+the handful of remaining single-site `2`s are flagged below with
+"only if a second site appears".
+
+- Scan each module's numeric literals (especially timeouts, widths, retry
+ counts) and promote to module constants when reused. (1d7251f —
+ promoted `40` → `_DEFAULT_CONSOLE_HEIGHT` for the two fallback-height
+ sites in `_console_emitter.py`.)
+- `_console_emitter.py:_fullscreen_page_size` uses a bare `2` as the
+ "page overlap lines" magic — only one site, but could be named
+ `_PAGE_OVERLAP` for symmetry with `_FULLSCREEN_CHROME_ROWS` if a
+ second page-size helper ever appears.
+- `_keypress.py` has `_POLL_INTERVAL`, `_WIN_POLL_INTERVAL`,
+ `_THREAD_JOIN_TIMEOUT` already at module top. No obvious leftover
+ literals worth promoting.
+
+## Phase 4 — complex conditionals & long functions
+
+- (134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant`
+ so the padded name column is only computed on the branch that renders
+ it.)
+- (7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` in
+ `_output.py:format_duration` into the `if minutes < _MINUTES_PER_HOUR:`
+ branch. Saved a modulo on every duration ≥ 1h and co-located the
+ local with its only use site.)
+- (ce487d3 — inlined `text = ensure_str(stream)` in
+ `_output.py:collect_output`. Same alias-inline shape as fc5e1cb /
+ 497c028 / 52e0272. Helper name `ensure_str` already documents the
+ decode step, so the intermediate binding added no clarity.)
+- (d0060b3 — exposed `_LivePanelBase.outcome` as a public property and
+ switched `_FullscreenPeek._build_header`'s `source._outcome` read to
+ go through it. Mirror of ef9a178's `iteration_id` cleanup.) Two
+ private-attr cross-class reads remain — both of `source._scroll_lines`
+ in `_FullscreenPeek._max_offset` and `__rich_console__`. Those touch a
+ mutable list the class itself appends to, so a read-only property
+ would hide the mutation asymmetry; defer unless a clearer abstraction
+ emerges (e.g., a "get a snapshot of visible lines" helper).
+- `_IterationPanel._apply_assistant` still juggles three block types
+ (`thinking` / `text` / `tool_use`) in one ~50-line method. Splitting
+ into `_render_thinking_block` / `_render_text_block` / `_render_tool_use_block`
+ would shorten the outer loop but each helper is short enough that the
+ indirection may not pay off — revisit only if a fourth block type lands.
+- `cli.py:_parse_user_args` is 55 lines of token-by-token iteration with
+ two nested branches and a while-loop that skips already-filled declared
+ names. Could be split into `_consume_flag` / `_consume_positional`
+ helpers without changing any error message. Medium payoff, medium
+ churn — land only once behavior is fully pinned by tests (which it is).
+
+## Notes / ideas to triage
+
+- `scripts/tui_dev/` has its own fixtures; out of scope unless it blocks a
+ src/ralphify/ change.
+- `_IterationPanel._cache_read_tokens` is captured from usage but never
+ read in production — only the regression test
+ `test_format_tokens_does_not_double_count_cached_input` reads it. The
+ capture protects against a hypothetical future display, so not strictly
+ dead, but worth revisiting when token rendering changes.
diff --git a/workspace/ralphs/improve-codebase/conventions.md b/workspace/ralphs/improve-codebase/conventions.md
new file mode 100644
index 00000000..4a16e49e
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/conventions.md
@@ -0,0 +1,3 @@
+# Conventions learned in this codebase
+
+Record repo-wide patterns worth preserving. One bullet per pattern.
diff --git a/workspace/ralphs/improve-codebase/coverage/.gitkeep b/workspace/ralphs/improve-codebase/coverage/.gitkeep
new file mode 100644
index 00000000..e69de29b
diff --git a/workspace/ralphs/improve-codebase/coverage/_agent.md b/workspace/ralphs/improve-codebase/coverage/_agent.md
new file mode 100644
index 00000000..898b2029
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_agent.md
@@ -0,0 +1,136 @@
+# `_agent.py` coverage
+
+Valid at: 7402f04
+
+## Recent changes
+
+- 7402f04 — inlined the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG,
+ _STREAM_FORMAT, _VERBOSE_FLAG]` local in `_run_agent_streaming`.
+ The binding was consumed exactly once on the very next statement
+ (the first positional arg to `subprocess.Popen(...)`); no other
+ references exist in src/ or tests/ (grep confirmed). The three
+ appended tokens are already named constants
+ (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the
+ "extended command for streaming mode" intent reads cleanly at the
+ call site without the intermediate name. Same Phase 4 inline-alias
+ shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0
+ (`visible`), and e1ad87a (`binary`). Behavior preserved —
+ subprocess.Popen still receives the same list; pinned by the
+ streaming-path test coverage in `tests/test_agent.py`.
+- 66d6c60 — inlined the `remaining = deadline - time.monotonic()` local
+ in `_read_agent_stream`'s per-iteration timeout calc. The alias was
+ read exactly once on the next line as `max(remaining, 0)`; collapsing
+ to `max(deadline - time.monotonic(), 0)` matches the inline-alias
+ style from e1ad87a / 497c028 / 52e0272 / ce487d3. The adjacent
+ comment was updated ("max(remaining, 0)" → "clamp to 0") since the
+ name no longer exists. Behavior preserved — the clamped timeout
+ still reaches `line_q.get(timeout=...)`, so the non-blocking drain
+ on an already-expired deadline still fires and deadline enforcement
+ is unchanged. Pinned by the streaming-path agent tests. No other
+ `remaining` locals remain in the module (grep confirmed).
+- b24accf — inlined the `reader` thread handle in `_read_agent_stream`.
+ The local served only to call `.start()`; the thread is never joined
+ explicitly (termination is signalled through the queue's `None`
+ sentinel produced by `_readline_pump`'s `finally` and through the
+ daemon flag). Collapsing into the fluent
+ `threading.Thread(target=_readline_pump, args=(stdout, line_q),
+ daemon=True).start()` drops an unused binding and matches the
+ fire-and-forget intent. Python keeps live threads reachable via
+ `threading._active`, so no GC risk. Side effects preserved: the
+ reader still closes cleanly on `_close_pipes` (OSError in
+ `readline`), and the main loop still relies on `line_q.get` for
+ deadline enforcement. Pinned by the full `tests/test_agent.py`
+ suite (streaming-path coverage). This is the same alias/handle-drop
+ shape as e1ad87a / 497c028 / b19625e, specialised to a Thread —
+ thread-return values aren't special, they're just another handle
+ whose only use was `.start()`.
+- e1ad87a — inlined the `binary = Path(cmd[0]).stem` local in
+ `_supports_stream_json`. The alias was read exactly once on the
+ following line as `binary == CLAUDE_BINARY`. Collapsing to
+ `return Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline
+ sibling check in `_console_emitter.py:_is_claude_command`
+ (`return Path(parts[0]).stem == CLAUDE_BINARY`). Same Phase 4
+ inline-alias shape as ce487d3 / 52e0272 / 497c028 / fc5e1cb. Empty-cmd
+ short-circuit (`if not cmd: return False`) preserved so
+ `Path(cmd[0])` never gets indexed into an empty list. Backlog note
+ about consolidating `_is_claude_command` and `_supports_stream_json`
+ across modules is unchanged — still deferred until a third caller
+ appears.
+- cf72fd9 — replaced the `parsed = None` sentinel in `_read_agent_stream`
+ with a `try/except/else` block. The old code set `parsed = None` in
+ the JSON-decode-except branch solely so the next line's
+ `if isinstance(parsed, dict):` would fall through; restructuring with
+ `try: ... except: pass; else: if isinstance(...):` makes the "only
+ forward when parsing succeeded" intent structural instead of encoded
+ through a sentinel value. The error path now skips the isinstance
+ check entirely (dead work before), and the success path is unchanged.
+ `parsed` is no longer bound when the except clause runs, which matches
+ Python convention — the value was always meant to be ignored there.
+ Pinned by `tests/test_agent.py::test_ignores_non_json_lines` and the
+ broader stream-JSON coverage in that file.
+- d8d5592 — gated the `"".join(...)` of `stream.stdout_lines` and
+ `stderr_lines` at the tail of `_run_agent_streaming` on
+ `log_dir is not None`. The joined strings were only consumed by
+ `_write_log` (which short-circuits when log_dir is None) and by the
+ `captured_stdout` / `captured_stderr` AgentResult fields, both of
+ which previously discarded the joined string with
+ `... if log_dir is not None else None`. Now matches the
+ already-lazy `"".join(x) if x is not None else None` idiom in
+ `_run_agent_blocking`'s tail, and the duplicated ternary on each
+ AgentResult field collapses to a bare `stdout` / `stderr`. Same
+ observable behavior — pinned by `test_captured_output_set_when_logging`
+ and `test_no_log_when_dir_not_set` in tests/test_agent.py.
+- cb61477 — added `_call_safely(callback, *args)` helper next to the
+ callback type aliases. Replaces three copies of the
+ `if cb is not None: try: cb(...); except Exception: pass` pattern
+ (two in `_read_agent_stream`, one in `_pump_stream`) with single-line
+ calls. Behavior preserved — identical None guard, identical broad
+ `Exception` suppression, identical argument-once semantics.
+
+## Shape of the module
+
+- Two execution paths: `_run_agent_streaming` (JSON line stream, used for
+ `claude`) and `_run_agent_blocking` (subprocess.Popen with optional
+ capture, used for all other agents).
+- `execute_agent` is the single public entry point; selects mode via
+ `_supports_stream_json(cmd)` (checks `Path(cmd[0]).stem == CLAUDE_BINARY`).
+- Shared shutdown sequence is centralized in `_cleanup_agent`:
+ 1. `_ensure_process_dead` (SIGTERM → SIGKILL via `_try_graceful_group_kill`,
+ then `proc.kill()`).
+ 2. `_close_pipes` (raw `os.close` on stdout/stderr fds to unblock readers).
+ 3. `_drain_readers` (bounded join on reader/writer threads).
+ 4. `_finalize_pipes` (Python-level `pipe.close()` for GC hygiene).
+- Thread spawning uses `_start_writer_thread` / `_start_pump_thread` to
+ centralize the `Thread(..., daemon=True); .start()` boilerplate.
+
+## Verified live (grepped, confirmed used)
+
+- `CLAUDE_BINARY` — public; imported by `_console_emitter.py` for display
+ logic (see backlog note about consolidating `_is_claude_command` /
+ `_supports_stream_json`; deferred until a third caller appears).
+- `_STDOUT`, `_STDERR` — used in `_run_agent_streaming` /
+ `_run_agent_blocking` stderr pump calls and inside `_read_agent_stream`.
+- `_SIGTERM_GRACE_PERIOD`, `_THREAD_JOIN_TIMEOUT`, `_PROCESS_WAIT_TIMEOUT`
+ — each referenced exactly once; constants kept near usage as the
+ project convention prefers.
+- `AgentResult`, `_StreamResult` — returned from streaming/blocking paths
+ and consumed by `engine.py`.
+
+## Potential future wins (not yet taken)
+
+- `_run_agent_streaming` and `_run_agent_blocking` both finish with the
+ same "`stdout = "".join(...) if … else None; stderr = "".join(...)
+ if … else None; log_file = _write_log(...); return AgentResult(...)`"
+ tail. After d8d5592 the conditional-join idiom is now identical
+ across both paths (gated on `log_dir is not None` for streaming,
+ on `stdout_lines is not None` for blocking — but `stdout_lines` is
+ itself `[] if log_dir is not None else None`, so the conditions are
+ equivalent). The intermediate state still differs (`stream.stdout_lines`
+ tuple vs `stdout_lines` list|None), so extracting a shared helper
+ would mostly move arguments around. Revisit only if a third
+ execution path appears.
+- The two `if proc.stdin/stdout/stderr is None: raise RuntimeError(...)`
+ guards just after `Popen` could use a single helper, but `subprocess`
+ guarantees these are non-None when `PIPE` is passed — the guards exist
+ mainly to narrow for the type checker, and a helper would make the
+ narrow less explicit. Leave as-is.
diff --git a/workspace/ralphs/improve-codebase/coverage/_console_emitter.md b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md
new file mode 100644
index 00000000..c12834d9
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md
@@ -0,0 +1,205 @@
+# `_console_emitter.py` coverage
+
+Valid at: 2fda4f0
+
+## Recent changes
+
+- 2fda4f0 — inlined the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]`
+ local in `_LivePanelBase._build_body`. The alias was read exactly
+ once, as the iterable of the very next `for line in visible:` loop.
+ Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:`
+ matches the inline-alias pattern from 497c028 (`agent`), fc5e1cb
+ (`total_in`), 52e0272 (`msg`), ce487d3 (`text`), and e1ad87a
+ (`binary`). Behavior unchanged — each iteration still mutates the
+ Text in place (`no_wrap` / `overflow`) before appending to `rows`,
+ and the slice still materializes the last `_MAX_VISIBLE_SCROLL`
+ items. No other `visible` locals remain in the class; the name is
+ reused elsewhere in the module (fullscreen viewport height, scrollbar
+ geometry) but all in unrelated scopes.
+- 3823019 — narrowed `line = escape_markup(data["line"])` scope in
+ `_on_agent_output_line` past the
+ `if not isinstance(target, _IterationSpinner): return` guard. The
+ escape_markup call was wasted work on the early-return path (target
+ None or wrong-type panel); moving the binding after the guard keeps
+ the _IterationSpinner branch behavior identical and drops the wasted
+ work on the other branch. Same shape as 134078d's `name_col`
+ narrowing — unconditional compute that only one branch consumes.
+ Note: the `_structured_agent` short-circuit earlier in the method
+ already skips this path for Claude runs (ad7523e), so this narrowing
+ only affects the raw-stdout path.
+- d0060b3 — added a public `outcome` property on `_LivePanelBase` and
+ replaced `source._outcome` in `_FullscreenPeek._build_header` with
+ `source.outcome`. Mirrors ef9a178's `iteration_id` cleanup — both
+ commits expose an existing private attribute through a getter so the
+ cross-class read doesn't have to dip into private state. The
+ `_outcome` attribute is still the single source of truth (written
+ only in `freeze`), and tests that read `_outcome` directly
+ (test_console_emitter.py:1766) keep working. Two private-attr
+ cross-class reads remain in the module (`source._scroll_lines` at
+ lines 750 and 872); those touch a mutable list that the class itself
+ appends to, so a read-only property would paper over the mutation
+ asymmetry — not taking until a real need appears.
+- 3a8908d — replaced the `if initial_id is None and self._iteration_history:`
+ guard in `enter_fullscreen` with `next(reversed(self._iteration_history), None)`.
+ The compound condition was doing two jobs at once: pick the fallback only
+ when nothing is live, *and* sidestep `next(reversed({}))` raising
+ StopIteration on the empty dict. The `next(it, default)` form moves the
+ empty-handling into the standard library idiom so the outer `if` reads
+ as a single concern. Same observable behavior — the immediately-following
+ `if initial_id is None or self.panel_for(initial_id) is None:` branch
+ still prints "Full peek: no iterations yet" and returns False when the
+ fallback yielded nothing. Pinned by `test_enter_without_iteration_prints_hint`.
+- 59b0e34 — inlined `self._fullscreen_page_size()` into the space/b
+ action lambdas in `_handle_fullscreen_key`. The `page` local was
+ computed unconditionally in the non-exit branch but only consumed by
+ the page-down (" ") and page-up ("b") lambdas — j/k/g/G/[/] now skip
+ the call entirely. Space/b still compute it exactly once per keypress,
+ now at action-invocation time (under the same `_console_lock`), so
+ behavior is unchanged. `_fullscreen_page_size()` is a pure read of
+ `self._console.size.height` in a try/except, so deferring evaluation
+ has no observable effect — the dict build and action invocation happen
+ back-to-back inside the lock. Same Phase 4 shape as 134078d / ef176bf
+ / b19625e.
+- 52e0272 — inlined the `msg = raw.get("message", {})` local in
+ `_IterationPanel._apply_assistant`. The alias was read exactly once on
+ the next line as `msg.get("usage")`. Collapsing to
+ `usage = raw.get("message", {}).get("usage")` matches the chained-get
+ style already used by `_iter_content_blocks` two functions above
+ (`raw.get("message", {}).get("content", [])`), and the same
+ inline-alias pattern as 497c028 (`agent`) and fc5e1cb (`total_in`).
+ No other reference to `msg` exists in the function — verified by grep.
+- b19625e — dropped the `new_offset` alias in `_FullscreenPeek.scroll_down`.
+ The local was assigned directly to `self._offset`, then the
+ follow-mode check re-read it as `new_offset == 0` — identical to
+ `self._offset == 0` after the assignment. Sibling `scroll_up` keeps
+ its local because it compares old vs new *before* assigning (needed
+ to conditionally disable auto-scroll on a real move); `scroll_down`
+ has no such comparison, so the alias was dead. Same Phase 4 shape
+ as ef176bf (`line_count`) and 134078d (`name_col`).
+- 497c028 — inlined the `agent = data["agent"]` local in `_on_run_started`.
+ The alias was read exactly once, immediately below, as the arg to
+ `_is_claude_command(agent)`. Reading `data["agent"]` directly matches
+ the style established by fc5e1cb (inlined `total_in`). `ralph_name`
+ was preserved — it's used inside an f-string where `data['ralph_name']`
+ would be awkward. Same Phase 4 shape as fc5e1cb.
+- ef176bf — dropped the `line_count = len(self._scroll_lines)` alias in
+ `_IterationSpinner._build_footer`. The local served dual duty as a
+ predicate (`if line_count > 0`) and as the `_plural` arg, but both
+ uses were on the same truthy branch. Replaced the predicate with
+ `if self._scroll_lines:` (idiomatic list truthiness) and moved the
+ `len()` call inside the branch that needs it. This matches the
+ sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:`
+ inline with no local alias. Same Phase 4 shape as 134078d.
+- ad7523e — moved the `if self._structured_agent: return` short-circuit
+ in `_on_agent_output_line` from inside `_console_lock` to before
+ acquisition. The flag is write-once (set in `_on_run_started` before
+ any iteration events can flow) and already read lock-free in
+ `_on_agent_activity` — now both structured/raw output handlers use the
+ same pattern. Bonus: avoids a lock acquisition per stdout line under
+ Claude, where every line short-circuits anyway. Added a comment
+ explaining the write-once invariant so the lock-free read doesn't look
+ accidental.
+- bcadee1 — dropped the `if self._active_renderable is not None:` guard
+ wrapping `_archive_current_iteration_unlocked("interrupted")` in
+ `_on_iteration_started`. The archive helper already no-ops when
+ nothing is active (docstring: "No-op when no iteration is active"),
+ so the outer guard was mechanically redundant. Updated the
+ surrounding comment to note the no-op behavior so the call's intent
+ still reads as defensive. Same shape as 5337d88 / 4ccfa9a / 8cb0d47.
+- 134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant`'s
+ tool_use branch. The padded name column was computed unconditionally
+ but only rendered when `arg` was truthy (the `else` branch uses raw
+ `name` without padding). Moved the pad-to-column if/else inside
+ `if arg:` so the helper variable lives only where it's used. Same
+ output in both branches — only the dead formatting work is gone.
+- 1d7251f — promoted the `40` fallback-terminal-height literal to a named
+ module constant `_DEFAULT_CONSOLE_HEIGHT` near the other fullscreen
+ constants (`_FULLSCREEN_CHROME_ROWS`, `_FULLSCREEN_MIN_VISIBLE`). Two
+ use-sites both meant "reasonable default terminal height when the
+ real value isn't available": `_FullscreenPeek._console_height` (class-
+ attribute default used before the first `__rich_console__` call) and
+ `ConsoleEmitter._fullscreen_page_size`'s except-branch fallback. The
+ constant keeps them in lockstep. No other bare `40`s remain in the
+ module (grep confirmed).
+- d34e957 — dropped redundant `f"{_plural(total, 'line')}"` wrap in
+ `_FullscreenPeek._build_header`. `_plural` already returns a str
+ so the f-string just format-identity-copied it. Same shape as the
+ surrounding `header.append(literal, style=...)` calls. No other
+ `f"{_plural(...)}"` wraps remain in the module (checked with grep).
+- fc5e1cb — inlined `total_in = self._input_tokens` alias in
+ `_IterationPanel._format_tokens`. The rename hinted at a "total
+ input" aggregate that no longer exists (cache-read tokens are
+ intentionally excluded from ctx); reading `self._input_tokens`
+ directly matches what the value actually is. Matches the existing
+ style in the sibling `if self._output_tokens > 0` branch.
+- 3838006 — rewrote `ConsoleEmitter.panel_for` to call `self.is_live(...)`
+ for its guard instead of re-stating the
+ `cur_iter == id and active is not None` expression. Same behavior;
+ one source of truth for "this is the active iteration" check. Type
+ checker is happy: returning `self._active_renderable` (typed
+ `_LivePanelBase | None`) matches the declared return type even though
+ the runtime invariant guarantees non-None whenever `is_live` is True.
+- 01f2f1c — dropped `_FullscreenPeek._reset_view`. Its body
+ (`self._offset = 0; self._auto_scroll = True`) was byte-for-byte identical
+ to `scroll_to_bottom`. The two call sites in `_step_iteration` now call
+ `scroll_to_bottom()` directly; the "snap to newest line + follow" intent
+ moved into a docstring on the surviving method. No other scroll-reset
+ duplication remains.
+- ef9a178 — replaced the single cross-class `_fullscreen_view._iteration_id`
+ access in `_archive_current_iteration_unlocked` with the public
+ `iteration_id` property on `_FullscreenPeek`. No behavior change —
+ `_FullscreenPeek` already exposes this via an `@property` (line 739-741);
+ the private-attribute shortcut was an oversight from earlier iterations.
+ Now `_iteration_id` is only read from within `_FullscreenPeek` itself.
+ (d0060b3 applied the same pattern to `_LivePanelBase._outcome`.)
+- c4469a1 — extracted `_FullscreenPeek._step_iteration(direction)` from
+ `prev_iteration` / `next_iteration`. The two methods were 12-line
+ mirror images differing only in step direction (-1 vs +1) and
+ eviction-fallback (`ids[0]` vs `ids[-1]`). Combined boundary check
+ uses `0 <= new_idx < len(ids)` which collapses both `idx == 0` (prev)
+ and `idx >= len(ids) - 1` (next) into one expression.
+- 5337d88 — dropped `if not self._tool_categories: return ""` early
+ return in `_IterationPanel._format_categories`. Empty dict yields an
+ empty list comprehension which `" · ".join` turns into `""`, so the
+ guard was dead — same shape as 4ccfa9a's `_format_params` cleanup.
+ No other empty-collection-then-join pattern remains in the module
+ (`_format_tokens` builds its `parts` list with conditional appends, so
+ it has no comprehension to short-circuit).
+- 8cb0d47 — dropped the `max(total - visible, 1)` guard in
+ `_scrollbar_metrics`. The early return `if total <= visible` already
+ guarantees `total - visible ≥ 1`, so the `max(..., 1)` floor was dead
+ defensive code. Inlined the subtraction directly into the `frac`
+ calculation and added a comment noting the invariant.
+- 0900aad — dropped `_iteration_order` list; `_iteration_history` dict
+ preserves insertion order by itself. Archive now pops-then-inserts to
+ move existing entries to the end; eviction iterates the dict
+ (oldest-first); `enter_fullscreen` uses `next(reversed(...))` for the
+ most recent finished iteration. Updated the single direct-field
+ reference in `tests/test_console_emitter.py`.
+- 3e9627b — extracted `_stop_compact_live_unlocked` helper to dedupe the
+ `if self._live is not None: self._live.stop(); self._live = None` pattern
+ across `_stop_live_unlocked`, `enter_fullscreen`, and `_on_iteration_ended`.
+- 4ccfa9a — dropped `if parts else ""` branch in `_format_params`.
+ `" · ".join([])` returns `""`, so the guard was dead.
+
+## Verified live (grepped, confirmed used)
+
+Private helpers and constants that look unused but are legitimately used:
+
+- `_ICON_SUCCESS`, `_ICON_FAILURE`, `_ICON_TIMEOUT`, `_ICON_ARROW`,
+ `_ICON_DASH`, `_ICON_PLAY` — all referenced in handler print strings.
+- `clear_scroll` — used by test_console_emitter tests.
+- `_SinglePanelNavigator` — used by tests and `scripts/tui_dev/snapshot.py`.
+- `_stop_live` (the locked wrapper) — used only in tests for cleanup
+ between test cases. Production code uses `_stop_live_unlocked` inside
+ an existing lock.
+- `_format_params`, `_extract_file_path`, `_extract_key`, `_extract_params`
+ — all referenced in the `_TOOL_REGISTRY` table (`"Read"`, `"Glob"`,
+ `"Grep"`, `"Edit"`, `"Write"`, `"Bash"`, `"WebFetch"`, `"WebSearch"`, etc.).
+
+## Potential future wins (not yet taken)
+
+- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer` both
+ start with `Text(no_wrap=True, overflow="ellipsis")` and use
+ `_footer_grid(summary)` — the `Text(...)` construction repeats, but only
+ twice. Not worth extracting unless a third subclass appears.
diff --git a/workspace/ralphs/improve-codebase/coverage/_frontmatter.md b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md
new file mode 100644
index 00000000..3f4faeb2
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md
@@ -0,0 +1,52 @@
+# `_frontmatter.py` coverage
+
+Valid at: a6f4c47
+
+## Recent changes
+
+- a6f4c47 — dropped the `if text.startswith(_UTF8_BOM):` guard in
+ `parse_frontmatter` before the `text = text.removeprefix(_UTF8_BOM)`
+ call. Python's `str.removeprefix` is already a no-op (returns the
+ same string object) when the prefix is absent, so the guard was
+ purely decorative dead code. Behavior preserved:
+ - BOM-prefixed input still gets stripped (pinned by
+ `test_utf8_bom_does_not_break_frontmatter` in
+ `tests/test_frontmatter.py`).
+ - Non-BOM input is passed through unchanged (exercised by every
+ other parse test in the file).
+ - The CPython implementation returns the same object identity when
+ no prefix match occurs, so there's no allocation overhead either.
+
+## Shape of the module
+
+- `parse_frontmatter(text)` — public entry point. Strips optional
+ UTF-8 BOM, splits on `---` delimiters via `_extract_frontmatter_block`,
+ runs `yaml.safe_load`, strips HTML comments from the body with
+ `_strip_html_comments`, returns `(dict, body)`.
+- `serialize_frontmatter(frontmatter, body)` — inverse; emits
+ `---`-delimited blocks only when the frontmatter is non-empty *or*
+ the body would otherwise be mis-parsed as frontmatter.
+- Constants: `RALPH_MARKER`, `FIELD_*`, `CMD_FIELD_*`, `NAME_RE`,
+ `VALID_NAME_CHARS_MSG`. All imported by `cli.py` and
+ `_resolver.py`; each one is reused across modules so centralisation
+ is justified.
+
+## Verified live (grepped, confirmed used)
+
+- `_FRONTMATTER_DELIMITER` — used 4× in `serialize_frontmatter` (plus
+ 2× in `_extract_frontmatter_block`).
+- `_FENCE_OR_COMMENT_RE` — used in `_strip_html_comments`.
+- `_UTF8_BOM` — used in `parse_frontmatter` (BOM strip).
+
+## Potential future wins (not yet taken)
+
+- `_extract_frontmatter_block` splits `text` on `"\n"` up front, then
+ re-joins slices for the body — the body slice is re-joined even when
+ the input is tiny. Could stream with `str.find` + `str.index` to
+ avoid the list allocation, but this function runs once per
+ iteration and the input is ~1 KB in practice; not worth the churn.
+- The `serialize_frontmatter` `needs_delimiters` expression uses
+ `body.lstrip().startswith(...)` which allocates a new stripped
+ string just to check a prefix. Could collapse via
+ `re.match(r"\s*---", body)` but the current form reads cleanly;
+ revisit only if this becomes hot.
diff --git a/workspace/ralphs/improve-codebase/coverage/_output.md b/workspace/ralphs/improve-codebase/coverage/_output.md
new file mode 100644
index 00000000..e0a9dfbb
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_output.md
@@ -0,0 +1,45 @@
+# `_output.py` coverage
+
+Valid at: ce487d3
+
+## Recent changes
+
+- ce487d3 — inlined `text = ensure_str(stream)` in `collect_output`.
+ The local was assigned then read exactly once on the next line as
+ `parts.append(text)`. The `ensure_str` helper name already
+ documents the decode step, so the intermediate binding was pure
+ noise. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272.
+- 7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` scope in
+ `format_duration`. The local was computed unconditionally between
+ the `total = int(seconds + 0.5)` / `minutes = total // 60` setup and
+ the `if minutes < _MINUTES_PER_HOUR:` branch, but only consumed by
+ the `f"{minutes}m {secs}s"` return on the truthy branch. The hours
+ branch (`hours = minutes // 60; mins = minutes % 60`) never touches
+ `secs`, so for any duration ≥ 1h the modulo was wasted work. Moved
+ the assignment inside the if to co-locate with its only use site.
+ Same Phase 4 shape as 134078d / ef176bf / 59b0e34 / b19625e.
+
+## Layout snapshot (at 7730dd4)
+
+- Module is 139 lines — small, mostly format helpers.
+- Top: `IS_WINDOWS`, `SUBPROCESS_TEXT_KWARGS`, `SESSION_KWARGS` —
+ shared subprocess kwargs imported by `_agent.py` and `_runner.py`.
+- `ProcessResult` — base dataclass for `RunResult` / `AgentResult`,
+ with the shared `success` property.
+- Format helpers: `ensure_str`, `collect_output`, `warn`, `format_count`,
+ `format_duration`.
+- Module-level constants `_COUNT_THOUSANDS`, `_COUNT_MILLIONS`,
+ `_SECONDS_PER_MINUTE`, `_MINUTES_PER_HOUR` are used only by the two
+ `format_*` functions below them.
+
+## Potential future wins (not yet taken)
+
+- `format_count` repeats `f"{n / _COUNT_MILLIONS:.1f}M"` in two
+ branches (the bare ≥1M return and the rounded-cross-into-M guard).
+ Could lift to a `_format_millions(n)` helper, but each site is one
+ line and the duplication is intentional (the rounding guard explains
+ the second site). Skip unless a third user appears.
+- The two `format_*` functions both have a "rounded value crossed into
+ next unit" guard (59.95s → "1m 0s"; 999_950 → "1.0M") with parallel
+ comments referencing each other. Already kept in lockstep — no
+ refactor needed.
diff --git a/workspace/ralphs/improve-codebase/coverage/_resolver.md b/workspace/ralphs/improve-codebase/coverage/_resolver.md
new file mode 100644
index 00000000..db553415
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_resolver.md
@@ -0,0 +1,20 @@
+# `_resolver.py` coverage
+
+Valid at: 6227863
+
+## Recent changes
+
+- 6227863 — dropped `if not user_args: return _ARGS_RE.sub("", prompt)`
+ early return in `resolve_args`. `_ARGS_RE.sub` with the callable
+ `lambda m: user_args.get(m.group(1), "")` already resolves every match
+ to `""` when the dict is empty, so the fast path produced byte-for-byte
+ identical output to the general path. Test
+ `test_empty_args_clears_placeholders` still covers the empty-dict
+ behavior.
+
+## Potential future wins (not yet taken)
+
+- None obvious; the module is 80 lines and both `resolve_args` /
+ `resolve_all` share the substitution shape but operate on different
+ regexes (single-kind vs multi-kind), so extracting a helper would add
+ indirection for two very short call sites.
diff --git a/workspace/ralphs/improve-codebase/coverage/engine.md b/workspace/ralphs/improve-codebase/coverage/engine.md
new file mode 100644
index 00000000..274137c1
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/engine.md
@@ -0,0 +1,34 @@
+# `engine.py` coverage
+
+Valid at: c5ce11d
+
+## Recent changes
+
+- c5ce11d — inlined the `reason = state.status.reason` local in
+ `run_loop`. The alias was read exactly once on the next line as the
+ `reason=` kwarg to `RunStoppedData(...)`. Inlined to
+ `reason=state.status.reason` to match the chained-read style elsewhere
+ (e.g. `_IterationPanel._apply_assistant`'s `raw.get("message", {}).get("usage")`
+ after 52e0272). Same Phase 4 inline-alias shape as ce487d3 (`text`),
+ 52e0272 (`msg`), 497c028 (`agent`), fc5e1cb (`total_in`).
+ Safe: the immediately-preceding `if state.status == RunStatus.RUNNING:
+ state.status = RunStatus.COMPLETED` normalizes the status to a terminal
+ value, so `RunStatus.reason`'s ValueError guard (non-terminal) cannot
+ fire here.
+
+## Structure notes
+
+`run_loop` is the main loop orchestrator. Its control-flow helpers
+(`_handle_control_signals`, `_wait_for_resume`, `_run_iteration`,
+`_delay_if_needed`) are each short and single-purpose. `_run_iteration`
+splits into `_run_commands` → `_assemble_prompt` → `_run_agent_phase`.
+`_run_agent_phase` is the longest at ~80 lines but its branches map 1:1
+to the agent outcome tri-state (timed_out / success / failure) — no
+obvious extraction candidate without inventing synthetic abstractions.
+
+## Potential future wins (not yet taken)
+
+- None spotted. The locals that survive in `_run_agent_phase`
+ (`duration`, `event_type`, `state_detail`, `ended_data`) all have
+ multiple uses and good names. `_run_commands`'s `output` is mutated
+ before use. `_run_iteration`'s `iteration` alias is used 4 times.
diff --git a/workspace/ralphs/improve-codebase/iterations.md b/workspace/ralphs/improve-codebase/iterations.md
new file mode 100644
index 00000000..c742b441
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/iterations.md
@@ -0,0 +1,58 @@
+# Iterations
+
+One line per iteration: ` `.
+
+7402f04 refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` — the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG]` local was consumed exactly once on the very next statement (the first positional arg to `subprocess.Popen(...)`), and grep confirmed no other references in src/ or tests/. The three appended tokens are already named constants, so the intent reads cleanly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), e1ad87a (`binary`), and 52e0272 (`msg`). Behavior preserved — subprocess.Popen still receives the same list, pinned by the full streaming-path test coverage in `tests/test_agent.py`.
+
+66d6c60 refactor: inline `remaining` alias in `_read_agent_stream` timeout calc — the `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias pattern from e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). The adjacent comment was updated to refer to "clamp to 0" since `remaining` no longer exists. Behavior preserved — the same clamped value reaches `line_q.get(timeout=...)` on every iteration of the read loop, so deadline enforcement and the non-blocking drain on an already-expired deadline both behave identically. Pinned by the streaming-path tests in `tests/test_agent.py`.
+
+b24accf refactor: inline `reader` thread handle in `_read_agent_stream` — the local served only to call `.start()`; the thread is never joined because termination flows through the queue's None sentinel plus the daemon flag, so the handle had no further role. Collapsing into the fluent `threading.Thread(..., daemon=True).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime — verified by the full agent test suite (628 passed). Same alias/handle-drop shape as 2fda4f0 / e1ad87a / 497c028 / b19625e, specialised to a Thread.
+
+2fda4f0 refactor: inline `visible` alias in `_LivePanelBase._build_body` — the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once, as the iterable of the very next `for line in visible:` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a. Behavior unchanged — the slice still materializes the last `_MAX_VISIBLE_SCROLL` items, and each loop iteration still mutates the Text in-place (`no_wrap` / `overflow`) before appending to `rows`. Pinned by the broad `test_console_emitter.py` suite (peek-visible rendering paths).
+
+e1ad87a refactor: inline `binary` alias in `_supports_stream_json` — the local was assigned to `Path(cmd[0]).stem` and then read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to the chained form `Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline sibling check in `_console_emitter.py:_is_claude_command` (`return Path(parts[0]).stem == CLAUDE_BINARY`) and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Same behavior — `_supports_stream_json` still short-circuits on empty `cmd` first, and the final boolean result is unchanged. Pinned by `tests/test_agent.py::test_streaming_mode_used_for_claude` and the broader agent-selection tests.
+
+c5ce11d refactor: inline `reason = state.status.reason` alias in `run_loop`'s `RUN_STOPPED` emit. The local was read exactly once as the `reason=` kwarg to `RunStoppedData(...)` on the next line; collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). Safe at this call site — the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` normalizes status to a terminal value (STOPPED / COMPLETED / FAILED), so the `RunStatus.reason` property's "non-terminal raises ValueError" guard cannot fire.
+
+3823019 refactor: narrow `line = escape_markup(data["line"])` scope past the early-return guard in `_on_agent_output_line` — the local was computed unconditionally inside `_console_lock` but only used on the `_IterationSpinner` branch; the `if not isinstance(target, _IterationSpinner): return` early-return path would have thrown the value away. Moving the binding below the guard preserves exactly the same output (same escape_markup call, same f-string, same add_scroll_line) while skipping the wasted work whenever target is None or an _IterationPanel. Same Phase 4 shape as 134078d (`name_col`), 7730dd4 (`secs`), ef176bf (`line_count`), 59b0e34 (`page`). Pinned by the raw-line handling tests in test_console_emitter.py.
+
+ce487d3 refactor: inline `text` alias in `_output.py:collect_output` — the local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272 — the `ensure_str` helper name already documents the decode step, so the intermediate binding was pure noise. Tests in `test_output.py` cover the str, bytes, and stdout+stderr-newline-join paths.
+
+7730dd4 refactor: narrow `secs = total % _SECONDS_PER_MINUTE` scope into the `if minutes < _MINUTES_PER_HOUR:` branch in `_output.py:format_duration` — the local was computed unconditionally but only consumed by `f"{minutes}m {secs}s"` on the truthy branch. The hours branch derives `hours`/`mins` independently and never references `secs`, so the modulo was wasted work for any duration ≥ 1h. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Tests in test_output.py cover both branches (sub-hour: lines 110–137; hour+: lines 140–144), so the move is pinned in either direction.
+
+d0060b3 refactor: replace `source._outcome` cross-class access with public `outcome` property on `_LivePanelBase` — `_FullscreenPeek._build_header` was reading the private attribute directly on its `source` panel. Added an `@property outcome` that returns `self._outcome`, matching the `iteration_id` property pattern from ef9a178 (which cleaned up the parallel `_fullscreen_view._iteration_id` cross-class access). The two remaining cross-class private reads on `_LivePanelBase` (`source._scroll_lines` at lines 750 and 872) read a mutable list that's also appended to from within the class, so a read-only property would mask the asymmetry — leaving those alone until a real need appears. Same observable behavior — `_outcome` is still the single source of truth written by `freeze`, and all existing readers get the same value. The direct `panel_before._outcome == "completed"` assertion in test_console_emitter.py:1766 still passes (underlying attribute is unchanged); fullscreen-header rendering paths exercise the new property transitively.
+
+cf72fd9 refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` — the sentinel only existed so that `isinstance(parsed, dict)` on the next line would fall through on JSON-parse failure. Moving the dict-handling into the try/else branch expresses "only run when parse succeeded" structurally and drops both the sentinel assignment and the redundant isinstance check on the error path. Behavior preserved — valid-but-non-dict JSON (lists, numbers, strings) still skips forwarding via the inner isinstance guard. Same sentinel→structure shape as prior Phase 4 scope narrowings, applied one level up.
+
+a6f4c47 refactor: drop redundant BOM-startswith guard before `removeprefix` in `parse_frontmatter` — `str.removeprefix` already returns the string unchanged when the prefix is absent, so the `if text.startswith(_UTF8_BOM):` wrapper was dead defensive code. Behavior preserved — `test_utf8_bom_does_not_break_frontmatter` still pins the BOM-stripping path and every other test exercises the no-BOM path. Same "drop the guard when the operation already handles the no-match case" shape as 5337d88 (empty dict → `" · ".join([])`), 4ccfa9a (empty list → `" · ".join([])`), and 8cb0d47 (redundant `max(..., 1)` floor).
+
+3a8908d refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback — the compound `if initial_id is None and self._iteration_history:` guard mixed two concerns (only fall back when nothing is live, *and* avoid `next(reversed({}))` raising StopIteration). Switching to the `next(it, default)` sentinel idiom lets the standard default handle empty-dict case so the outer `if` only encodes the "fall back when nothing live" intent. Behavior preserved — the existing `if initial_id is None or panel_for(...) is None:` next branch still prints "no iterations yet" and returns False, covered by `test_enter_without_iteration_prints_hint`.
+
+59b0e34 refactor: inline `_fullscreen_page_size()` into the space/b lambdas in `_handle_fullscreen_key` — the `page` local was computed unconditionally in the non-exit branch but only consumed by the page-down (" ") and page-up ("b") action lambdas. j/k/g/G/[/] now skip the call entirely; space/b still compute it exactly once per keypress, now at action-invocation time under the same lock. Same Phase 4 "narrow the scope of a one-branch local" shape as 134078d / ef176bf / b19625e.
+
+52e0272 refactor: inline `msg` alias in `_apply_assistant` — the `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Collapsing into `raw.get("message", {}).get("usage")` matches the chained-get style already used by `_iter_content_blocks` (which independently does `raw.get("message", {}).get("content", [])`) and the inline-alias pattern from 497c028 / fc5e1cb.
+
+b19625e refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` — the local was assigned to `self._offset` and then re-read only as `new_offset == 0`, which is identical to `self._offset == 0` after the assignment. Sibling `scroll_up` needs its local because it compares old vs new before the assignment; `scroll_down` has no such comparison, so the alias was dead. Same shape as ef176bf and 134078d.
+
+497c028 refactor: inline `agent` alias in `_on_run_started` — the local served a single use as the arg to `_is_claude_command(...)`. Reading `data["agent"]` directly matches the sibling style from fc5e1cb (`total_in` inline). Preserved `ralph_name` — it's used inside an f-string on line 1247 where the alias still helps readability.
+
+ef176bf refactor: drop `line_count` alias in `_IterationSpinner._build_footer` — local was computed unconditionally but only used on the truthy branch (both as predicate `> 0` and as `_plural` arg). Inlined the truthy check on `self._scroll_lines` and moved the `len()` call inside the branch that uses it. Matches the style of sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:` with no local alias. Same Phase 4 shape as 134078d's `name_col` narrowing.
+
+d8d5592 refactor: gate `"".join(stdout/stderr_lines)` in `_run_agent_streaming` on `log_dir is not None` — the joined strings were only consumed by `_write_log` and the `captured_*` AgentResult fields, both of which discarded the result when log_dir was None. Mirrors the already-lazy idiom in `_run_agent_blocking` and drops the duplicated `... if log_dir is not None else None` ternary on each AgentResult field.
+ad7523e refactor: move `_structured_agent` short-circuit out of `_console_lock` in `_on_agent_output_line` — flag is write-once (set in `_on_run_started`), matches `_on_agent_activity`'s pattern, avoids lock acquisition per stdout line under Claude
+bcadee1 refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` — archive call already no-ops when nothing is active
+134078d refactor: narrow `name_col` scope into `if arg:` in `_apply_assistant` — padded variant was computed but unused when arg falsy
+1d7251f refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` — two sites used the literal 40 for "default terminal height when unknown"
+6227863 refactor: drop redundant empty-user_args branch in `resolve_args` — callable path already yields "" for every match when dict is empty
+d34e957 refactor: drop redundant f-string wrap around `_plural(total, 'line')` in fullscreen header
+fc5e1cb refactor: inline `total_in` alias in `_format_tokens` — direct read of `self._input_tokens` clarifies intent
+3838006 refactor: defer `panel_for` guard to `is_live` to dedupe identical condition
+01f2f1c refactor: drop `_reset_view` in `_FullscreenPeek` — identical body to `scroll_to_bottom`
+ef9a178 refactor: replace `_fullscreen_view._iteration_id` cross-class private access with public `iteration_id` property
+c4469a1 refactor: extract `_step_iteration` to dedupe prev/next iteration browsing in `_FullscreenPeek`
+5337d88 refactor: drop redundant empty-dict guard in `_format_categories` — `" · ".join([])` already returns `""`
+8cb0d47 refactor: drop redundant `max(total - visible, 1)` guard in `_scrollbar_metrics` — early return makes `total - visible ≥ 1`
+cb61477 refactor: extract `_call_safely` helper — dedupes 3× best-effort callback guards in `_agent.py`
+4ccfa9a refactor: drop redundant `if parts else ""` in `_format_params` (empty join already returns "")
+3e9627b refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown across 3 call sites
+0900aad refactor: drop redundant `_iteration_order` list — dict insertion order suffices