From e380f967993906685a0f6bc8e8c4e46f133e0567 Mon Sep 17 00:00:00 2001
From: Michael Bodnarchuk <davert.ua@gmail.com>
Date: Sun, 24 May 2026 00:50:52 +0300
Subject: [PATCH 1/5] Add `explorbot navigate` CLI command

Exposes the AI Navigator as a one-shot CLI command. Exits 0 when the
target URL is reached and 1 otherwise, so it can be used as a
reachability probe in CI. Inherits --session and all common options,
making it the canonical way to capture an authenticated session for
downstream agents in a single command.

Also restructures docs/commands.md to treat CLI as a first-class
surface alongside TUI: a comprehensive reference table, per-command
sections showing both invocations side by side, and coverage of the
previously-undocumented CLI-only commands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CHANGELOG.md         |  10 ++
 bin/explorbot-cli.ts |  16 ++
 docs/commands.md     | 366 ++++++++++++++++++++++++++++++-------------
 3 files changed, 287 insertions(+), 105 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 96ed339..527901d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,15 @@
 # Changelog
 
+## 2026-05-24
+
+### New CLI Options
+- **`explorbot navigate <url>`** — Drive the AI Navigator to a URL from the shell. Exits `0` when the page is reached, `1` when navigation fails (unreachable URL, unresolved redirect, connection refused, etc.). Inherits all common options including `--session`, so the canonical "probe a URL and capture an authenticated session for downstream agents" runs as a single command. The Navigator handles redirects and login walls — it is not a plain `I.amOnPage`.
+  ```bash
+  explorbot navigate /login --session              # probe + save session to output/session.json
+  explorbot navigate /dashboard --session auth.json
+  explorbot navigate /unreachable && echo ok       # exit code reflects reachability
+  ```
+
 ## 2026-05-11
 
 ### New CLI Options
diff --git a/bin/explorbot-cli.ts b/bin/explorbot-cli.ts
index cdab5ca..3a37406 100755
--- a/bin/explorbot-cli.ts
+++ b/bin/explorbot-cli.ts
@@ -599,6 +599,22 @@ addCommonOptions(program.command('research <url>').description('Research a page
   }
 );
 
+addCommonOptions(program.command('navigate <url>').description('Navigate to a URL using the AI Navigator. Exits 0 if reachable, 1 otherwise.')).action(async (url, options) => {
+  try {
+    const explorBot = new ExplorBot(buildExplorBotOptions(url, options));
+    await explorBot.start();
+
+    const { NavigateCommand } = await import('../src/commands/navigate-command.js');
+    await new NavigateCommand(explorBot).execute(url);
+
+    await explorBot.stop();
+    await showStatsAndExit(0);
+  } catch (error) {
+    console.error('Failed:', error instanceof Error ? error.message : 'Unknown error');
+    await showStatsAndExit(1);
+  }
+});
+
 addCommonOptions(
   program.command('drill <url>').alias('driller').description('Drill all components on a page to learn interactions').option('--knowledge <path>', 'Save learned interactions to knowledge file at this URL path').option('--max-components <count>', 'Maximum number of components to drill')
 ).action(async (url, options) => {
diff --git a/docs/commands.md b/docs/commands.md
index bb4f79c..22d9a4d 100644
--- a/docs/commands.md
+++ b/docs/commands.md
@@ -1,29 +1,37 @@
 # Terminal Commands Reference
 
-## TUI and CLI Commands
+Explorbot exposes the same commands through two surfaces:
 
-Explorbot has two types of commands:
+- **CLI** — run from your shell. Each command launches a browser, executes the task, prints output, and exits with `0` on success or `1` on failure. Suitable for CI, scripting, and chaining commands together.
+- **TUI** — interactive terminal UI launched by `explorbot start`. The same commands are available as slash commands inside the session, where you can chain multiple actions against a long-lived browser.
 
-- **TUI commands** — slash commands available inside the terminal UI launched by `explorbot start`
-- **CLI commands** — run directly from your shell without launching TUI
+Both surfaces are backed by the same command classes in `src/commands/`, so behavior and options match.
 
-Some commands work in both modes. Where a CLI equivalent exists, it is noted below.
+## Command Reference
 
-| TUI Command | CLI Equivalent |
-|-------------|---------------|
-| `/explore [url]` | `explorbot explore [path]` |
-| `/research [url]` | `explorbot research <url>` |
-| `/plan [--focus <feature>]` | `explorbot plan <path> [--focus <feature>]` |
-| `/drill` | `explorbot drill <url>` |
-| `/learn [note]` | `explorbot learn [url] [description]` |
-| `/runs [file]` | `explorbot runs [file]` |
-| `/rerun <file> [index]` | `explorbot rerun <file> [index]` |
-
-CLI commands run headless by default, execute the task, and exit. TUI commands run inside an interactive session where you can chain multiple actions.
-
-## Common Options
-
-These options are available on all CLI commands (`start`, `explore`, `plan`, `drill`, `research`, `context`, `docs collect`):
+| Capability | CLI | TUI | Notes |
+|---|---|---|---|
+| Start interactive session | `explorbot start [path]` | — | Boots the TUI |
+| Autonomous exploration | `explorbot explore <path>` | `/explore [url]` | Full research → plan → test cycle |
+| Research a page | `explorbot research <url>` | `/research [url]` | UI analysis only |
+| Generate test plan | `explorbot plan <path>` | `/plan [--focus <feature>]` | Writes plan markdown |
+| Navigate to a URL | `explorbot navigate <url>` | `/navigate <target>` | Reachability probe + session capture |
+| Drill page components | `explorbot drill <url>` | `/drill` | Learn interactions |
+| Execute plan tests | `explorbot test <planfile> [index]` | `/test [scenario\|number\|*]` | Run scenarios |
+| Re-run generated tests | `explorbot rerun <file> [index]` | `/rerun <file> [index]` | With AI auto-healing |
+| List generated tests | `explorbot runs [file]` | `/runs [file]` | Index + dry-run |
+| Store domain knowledge | `explorbot learn [url] [note]` | `/learn [note]` | Persisted to `knowledge/` |
+| Execute CodeceptJS command | `explorbot shell <url> <command>` | `I.click(...)` etc. inline | One-shot vs interactive |
+| Load saved plan | `explorbot plan:load <file> [index]` | `/plan:load <file>` | Preview a plan |
+| Collect documentation | `explorbot docs collect <path-or-url>` | — | See [doc-collector](./doc-collector.md) |
+| Extract built-in rules | `explorbot extract-rules <agent>` | — | Customizable rules to `rules/` |
+| Manage persistent browser | `explorbot browser {start\|stop\|status}` | — | Share browser across runs |
+| Initialize project | `explorbot init` | — | Generates `explorbot.config.*` |
+| Clean output | `explorbot clean [--type ...]` | `/clean` | CLI: artifacts. TUI: clear chat. |
+
+## Common CLI Options
+
+These options are available on every CLI command that drives a browser (`start`, `explore`, `plan`, `navigate`, `drill`, `research`, `test`, `rerun`, `shell`, `context`, `docs collect`):
 
 | Option | Description |
 |--------|-------------|
@@ -38,18 +46,20 @@ These options are available on all CLI commands (`start`, `explore`, `plan`, `dr
 
 ### `--session`
 
-Persists browser state (cookies, localStorage, sessionStorage) to a JSON file. On next run, the session is restored automatically, skipping login or setup steps.
+Persists browser state (cookies, localStorage, sessionStorage) to a JSON file. On the next run, the session is restored automatically, skipping login or setup steps.
 
 ```bash
-explorbot start /login --session                # uses default output/session.json
+explorbot start /login --session                # default output/session.json
 explorbot start /dashboard --session auth.json  # custom session file
+explorbot navigate /login --session             # probe + capture auth in one shot
+explorbot research /dashboard --session auth.json   # reuse captured auth
 ```
 
 When the flag is provided without a file path, defaults to `output/session.json`.
 
 ## Persistent Browser
 
-By default, every CLI command that needs a browser (`start`, `explore`, `plan`, `drill`, `research`, `context`) launches a fresh Chromium process and shuts it down when done. This is slow during development when you restart explorbot frequently.
+By default, every CLI command that needs a browser (`start`, `explore`, `plan`, `navigate`, `drill`, `research`, `context`) launches a fresh Chromium process and shuts it down when done. This is slow during development when you restart explorbot frequently.
 
 The `explorbot browser` command lets you run a persistent browser server that survives across explorbot sessions. Any CLI command that launches a browser will automatically detect the running server and connect to it instead of starting a new one.
 
@@ -88,6 +98,7 @@ explorbot browser status
 explorbot browser start --show
 
 # Terminal 2: run commands — they reuse the same browser
+explorbot navigate /login --session
 explorbot research /login
 explorbot plan /login --focus authentication
 explorbot start /dashboard
@@ -106,20 +117,60 @@ explorbot browser stop
 | `-c, --config <path>` | Path to configuration file |
 | `-p, --path <path>` | Working directory path |
 
-## Exploration Commands
+## Navigation
+
+### navigate
+
+Drive the AI Navigator to a URL. The Navigator handles redirects, login walls, and recoverable errors — it does not just call `I.amOnPage`.
+
+```bash
+# CLI — exits 0 if reachable, 1 otherwise
+explorbot navigate /settings
+explorbot navigate /login --session             # capture session into output/session.json
+explorbot navigate /dashboard --session auth.json
+```
+
+```
+# TUI
+/navigate /settings
+/navigate login page
+/navigate back to dashboard
+```
+
+**CLI exit code:** `0` when the Navigator confirms the page was reached, `1` when navigation failed (unreachable URL, unresolved redirect, connection refused, etc.).
+
+**Session capture:** combined with `--session`, this is the canonical way to capture an authenticated session for downstream agents. A typical CI pattern:
+
+```bash
+# 1. Establish authenticated session, fail fast if the app is down
+explorbot navigate /login --session ./auth.json || exit 1
+
+# 2. Reuse the captured session in subsequent commands
+explorbot research /dashboard --session ./auth.json
+explorbot explore /reports --session ./auth.json --max-tests 10
+```
+
+The TUI form accepts looser targets (state descriptions like "back to dashboard"); the CLI form expects a URL or path.
+
+## Exploration
 
-### `/explore [url]`
+### explore
 
-Start full exploration cycle: research → plan → test.
+Start a full exploration cycle: research → plan → test.
+
+```bash
+# CLI
+explorbot explore /dashboard
+explorbot explore /checkout --max-tests 10 --focus checkout
+```
 
 ```
+# TUI
 /explore
 /explore /dashboard
 ```
 
-If a URL is provided, navigates there first. After completion, use `/navigate` or `/explore` again to continue.
-
-**CLI equivalent:** `explorbot explore [path]` — runs the full cycle and exits.
+If a URL is provided, navigates there first. After completion, use `/navigate` or `/explore` again to continue (TUI).
 
 #### Options
 
@@ -221,26 +272,44 @@ For new tests, the planner generates freely (the loaded plan is registered for s
 - [Test Plans](./test-plans.md) — markdown format for saved plans
 - [Planner](./planner.md) — how new test scenarios are generated
 
-### `/research [url] [--data]`
+### research
+
+Analyze a page using the Researcher agent.
 
-Analyze the current page using the Researcher agent.
+```bash
+# CLI
+explorbot research /settings
+explorbot research /dashboard --data --deep
+```
 
 ```
+# TUI
 /research
 /research /settings
 /research --data
 ```
 
-- If URL provided, navigates there first
-- `--data` flag extracts structured data from the page
+If a URL is provided, navigates there first.
 
-**CLI equivalent:** `explorbot research <url>` — researches the page and exits.
+| Option | Description |
+|---|---|
+| `--data` | Extract structured data from the page |
+| `--deep` | Enable deep analysis (expand hidden elements) |
+| `--no-fix` | Skip locator fix cycle (for debugging) |
+
+### plan
 
-### `/plan [--focus <feature>]`
+Generate test scenarios using the Planner agent.
 
-Generate test scenarios for the current page using the Planner agent.
+```bash
+# CLI
+explorbot plan /login
+explorbot plan /login --focus authentication
+explorbot plan /checkout --append --style curious
+```
 
 ```
+# TUI
 /plan
 /plan --focus login
 /plan --focus "checkout flow"
@@ -248,13 +317,27 @@ Generate test scenarios for the current page using the Planner agent.
 
 The `--focus` flag narrows the scope of generated tests to a specific feature area.
 
-**CLI equivalent:** `explorbot plan <path> [--focus <feature>]` — generates a plan and exits.
+| Option | Description |
+|---|---|
+| `-a, --append` | Add tests to existing plan file |
+| `--style <name>` | Planning style: `normal`, `curious`, `psycho` |
+| `--focus <feature>` | Focus area for test planning |
 
-### `/test [scenario|number|*]`
+### test
 
 Execute test scenarios using the Tester agent.
 
+```bash
+# CLI
+explorbot test output/plans/login.md          # run all enabled tests
+explorbot test output/plans/login.md 3        # run test #3
+explorbot test output/plans/login.md 1-5      # range
+explorbot test output/plans/login.md 1,3,7    # selection
+explorbot test output/plans/login.md --grep authentication
 ```
+
+```
+# TUI
 /test              # Run next pending test
 /test *            # Run all pending tests
 /test 2            # Run test #2 from plan
@@ -262,81 +345,136 @@ Execute test scenarios using the Tester agent.
 /test User can logout successfully   # Create and run ad-hoc test
 ```
 
-### `/navigate <target>`
+| Option | Description |
+|---|---|
+| `--grep <pattern>` | Run only tests whose scenario matches the pattern |
+
+### drill
+
+Drill all components on a page to learn interactions.
 
-Navigate to a URI or state using AI assistance.
+```bash
+# CLI
+explorbot drill /components
+explorbot drill /components --max-components 10
+explorbot drill /login --knowledge /login
+```
 
 ```
-/navigate /settings
-/navigate login page
-/navigate back to dashboard
+# TUI
+/drill
 ```
 
-The Navigator agent figures out how to reach the destination.
+| Option | Description |
+|---|---|
+| `--knowledge <path>` | Save learned interactions to a knowledge file at this URL path |
+| `--max-components <count>` | Maximum number of components to drill |
 
-## Documentation Collection
+## Test Rerun
 
-### `explorbot docs collect <path-or-url>`
+### runs
 
-Crawl pages and generate a documentation spec with `Purpose`, `User Can`, and `User Might` sections for each documented page.
+List generated test files or dry-run a specific file to preview steps.
 
 ```bash
-explorbot docs collect /users/sign_in
-explorbot docs collect /docs/openapi#tag/project-analytics-tags --max-pages 20
-explorbot docs collect https://teleportal.ua/ua/serials/stb/kod --path explorbot-testing --show --session --max-pages 20
+# CLI
+explorbot runs
+explorbot runs output/tests/suite.js
 ```
 
-Output is written to:
+```
+# TUI
+/runs
+/runs output/tests/suite.js
+```
 
-- `output/docs/spec.md`
-- `output/docs/pages/*.md`
+Each test is numbered so you can reference it with `rerun`.
 
-Use `docbot.config.*` to control crawl scope, path filters, dynamic-page collapsing, and low-signal page skipping.
+### rerun
 
-See [Documentation Collection](./doc-collector.md) for full configuration, crawl modes, and examples.
+Re-run generated tests with AI-powered auto-healing. When a step fails, the Rerunner agent diagnoses the issue and executes a fix.
 
-### `explorbot docs init`
+```bash
+# CLI
+explorbot rerun output/tests/suite.js
+explorbot rerun output/tests/suite.js 3
+explorbot rerun output/tests/suite.js 1-5
+explorbot rerun output/tests/suite.js 1,3,7
+explorbot rerun output/tests/suite.js --session
+```
 
-Create a starter `docbot.config.ts` file.
+```
+# TUI
+/rerun output/tests/suite.js
+/rerun output/tests/suite.js 3
+/rerun output/tests/suite.js 1-5
+/rerun output/tests/suite.js 1,3,7
+```
+
+Tests without assertions (`I.see`, `I.seeElement`, etc.) are automatically skipped.
+
+See [Rerunning Tests](./rerun.md) for the full workflow and healing configuration.
+
+## Knowledge Management
+
+### knows (TUI)
+
+List all knowledge or show matching knowledge for a URL.
 
-```bash
-explorbot docs init
-explorbot docs init --path explorbot-testing
+```
+/knows
+/knows /login
 ```
 
-## Test Rerun
+### learn
 
-### `/runs [file]`
+Store knowledge about the current page for future reference.
 
-List generated test files or dry-run a specific file to preview steps.
+```bash
+# CLI
+explorbot learn                             # interactive mode
+explorbot learn /login "Use admin credentials"
+```
 
 ```
-/runs                                    # list all test files with indices
-/runs output/tests/suite.js              # dry-run: show steps without executing
+# TUI
+/learn
+/learn Test user credentials: test@example.com / test123
 ```
 
-Each test is numbered so you can reference it with `/rerun`.
+Without arguments, opens an interactive editor. Knowledge is saved to `./knowledge/` and used by agents during exploration.
 
-**CLI equivalent:** `explorbot runs [file]` — lists tests without starting a browser.
+## Documentation Collection (CLI only)
 
-### `/rerun <file> [index]`
+### `explorbot docs collect <path-or-url>`
 
-Re-run generated tests with AI-powered auto-healing. When a step fails, the Rerunner agent diagnoses the issue and executes a fix.
+Crawl pages and generate a documentation spec with `Purpose`, `User Can`, and `User Might` sections for each documented page.
 
-```
-/rerun output/tests/suite.js             # run all tests in file
-/rerun output/tests/suite.js 3           # run test #3 only
-/rerun output/tests/suite.js 1-5         # run tests 1 through 5
-/rerun output/tests/suite.js 1,3,7       # run specific tests
+```bash
+explorbot docs collect /users/sign_in
+explorbot docs collect /docs/openapi#tag/project-analytics-tags --max-pages 20
+explorbot docs collect https://teleportal.ua/ua/serials/stb/kod --path explorbot-testing --show --session --max-pages 20
 ```
 
-Tests without assertions (`I.see`, `I.seeElement`, etc.) are automatically skipped.
+Output is written to:
 
-**CLI equivalent:** `explorbot rerun <file> [index]` — runs with healing and exits.
+- `output/docs/spec.md`
+- `output/docs/pages/*.md`
 
-See [Rerunning Tests](./rerun.md) for the full workflow and healing configuration.
+Use `docbot.config.*` to control crawl scope, path filters, dynamic-page collapsing, and low-signal page skipping.
+
+See [Documentation Collection](./doc-collector.md) for full configuration, crawl modes, and examples.
+
+### `explorbot docs init`
 
-## Plan Management
+Create a starter `docbot.config.ts` file.
+
+```bash
+explorbot docs init
+explorbot docs init --path explorbot-testing
+```
+
+## Plan Management (TUI)
 
 ### `/plan:save [filename]`
 
@@ -357,7 +495,13 @@ Load a previously saved plan.
 /plan:load output/plans/checkout-plan.md
 ```
 
-## Page Inspection
+The CLI form `explorbot plan:load <file> [index]` previews a plan file from the shell — including details for a specific test when an index is given.
+
+### `/plan:reload`
+
+Reload the current plan file from disk after editing it externally.
+
+## Page Inspection (TUI)
 
 ### `/aria [--short]`
 
@@ -392,29 +536,11 @@ Extract structured data (tables, lists) from the current page.
 
 Uses AI to identify and format data on the page.
 
-## Knowledge Management
-
-### `/knows [url]`
-
-List all knowledge or show matching knowledge for a URL.
-
-```
-/knows
-/knows /login
-```
-
-### `/learn [note]`
+### `/context`, `/context:aria`, `/context:html`, `/context:data`, `/context:knowledge`, `/context:experience`
 
-Store knowledge about the current page for future reference.
+Print the agent-facing context for the current page in its various forms — combined snapshot, ARIA only, HTML only, data only, applicable knowledge, or stored experience.
 
-```
-/learn
-/learn Test user credentials: test@example.com / test123
-```
-
-Without arguments, opens an interactive editor. Knowledge is saved to `./knowledge/` and used by agents during exploration.
-
-## Session Commands
+## Session Commands (TUI)
 
 ### `/clean`
 
@@ -426,6 +552,8 @@ Clear the Captain agent's conversation history.
 
 Useful when the agent context becomes too large or confused.
 
+The CLI counterpart `explorbot clean [--type <kind>]` removes generated artifacts (experiences, plans, tests) from disk — different scope entirely.
+
 ### `/exit`
 
 Exit the application gracefully.
@@ -435,7 +563,35 @@ Exit the application gracefully.
 /quit
 ```
 
-## Rules & Styles
+## Other CLI Commands
+
+### `explorbot init`
+
+Initialize project configuration.
+
+```bash
+explorbot init
+explorbot init --config-path ./explorbot.config.js
+explorbot init --force
+```
+
+### `explorbot clean`
+
+Clean generated files.
+
+```bash
+explorbot clean                  # artifacts only
+explorbot clean --type experience
+explorbot clean --type all
+```
+
+### `explorbot shell <url> <command>`
+
+Execute a single CodeceptJS command on a page and exit. Handy for quick checks from a script.
+
+```bash
+explorbot shell /login "I.see('Sign in')"
+```
 
 ### `explorbot extract-rules <agent>`
 
@@ -449,9 +605,9 @@ explorbot extract-rules planner -d ./my-rules  # custom directory
 
 After extraction, edit the markdown files to customize how the agent behaves. See [Configuration: Rules](./configuration.md#rules) for details.
 
-## Direct Browser Control
+## Direct Browser Control (TUI)
 
-In addition to slash commands, you can execute CodeceptJS commands directly:
+In addition to slash commands, you can execute CodeceptJS commands directly inside the TUI:
 
 ```
 I.amOnPage('/login')
@@ -461,9 +617,9 @@ I.see('Welcome')
 I.waitForElement('.modal', 5)
 ```
 
-All [CodeceptJS Playwright helpers](https://codecept.io/helpers/Playwright/) are available.
+All [CodeceptJS Playwright helpers](https://codecept.io/helpers/Playwright/) are available. For a one-shot equivalent from the shell, use `explorbot shell <url> <command>`.
 
-## Keyboard Shortcuts
+## Keyboard Shortcuts (TUI)
 
 | Key | Action |
 |-----|--------|

From 38f9a8fc50372857508016e52231b3e0cd27cc3a Mon Sep 17 00:00:00 2001
From: Michael Bodnarchuk <davert.ua@gmail.com>
Date: Sun, 24 May 2026 21:48:27 +0300
Subject: [PATCH 2/5] Feed back why URL didn't change after a failed Navigator
 submit

When a click succeeds but the URL stays put, Navigator now diffs the
page and extracts any new alert/status/alertdialog messages, then
includes them (plus the ARIA changes) in the next retry prompt. This
breaks the loop where Navigator would re-fire 9 syntactic locator
variants against a form that was actually being rejected by the
server. The retry prompt now tells the model to re-examine credentials
or input data before changing locators when the page reacted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CHANGELOG.md        |  3 +++
 src/ai/navigator.ts | 54 ++++++++++++++++++++++++++++++++++++++++++---
 src/utils/aria.ts   | 36 ++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 527901d..d3a7e6f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,9 @@
   explorbot navigate /unreachable && echo ok       # exit code reflects reachability
   ```
 
+### Changes
+- [Navigator] When a click succeeds but the URL does not change to the expected target, the page's reaction is now captured and fed back to the AI: any alert/status messages that appeared (e.g. "Invalid email or password") and the ARIA diff are included in the next retry prompt. This breaks the "9-attempt syntactic-variant loop" that used to happen when a form submit was rejected by the server — the AI now sees *why* the submission failed and is instructed to re-examine credentials or input data before changing locators, rather than blaming the locator.
+
 ## 2026-05-11
 
 ### New CLI Options
diff --git a/src/ai/navigator.ts b/src/ai/navigator.ts
index f21461a..99201c9 100644
--- a/src/ai/navigator.ts
+++ b/src/ai/navigator.ts
@@ -5,6 +5,7 @@ import { ExperienceTracker, renderExperienceToc } from '../experience-tracker.js
 import Explorer from '../explorer.ts';
 import { KnowledgeTracker } from '../knowledge-tracker.js';
 import { type WebPageState, normalizeUrl } from '../state-manager.js';
+import { extractAlerts } from '../utils/aria.ts';
 import { extractCodeBlocks } from '../utils/code-extractor.js';
 import { HooksRunner } from '../utils/hooks-runner.ts';
 import { createDebug, pluralize, tag } from '../utils/logger.js';
@@ -245,7 +246,7 @@ class Navigator implements Agent {
     let codeBlockIndex = 0;
     let totalAttempts = 0;
     const progressBlocks: string[] = [];
-    const batchFailures: Array<{ code: string; error: string }> = [];
+    const batchFailures: Array<{ code: string; error: string; alerts?: string[]; ariaChanges?: string | null; urlAfter?: string }> = [];
 
     let resolved = false;
     await loop(
@@ -274,13 +275,37 @@ class Navigator implements Agent {
           tag('substep').log('Feeding failures back to AI for a new batch...');
           let contextMsg = 'Previous solutions did not work. Analyze the failures and try DIFFERENT strategies (not syntactic variants of the same locator).\n\n';
           if (batchFailures.length > 0) {
-            const lines = batchFailures.map((f) => `- \`${f.code.split('\n')[0]}\` → ${f.error}`).join('\n');
+            const lines = batchFailures
+              .map((f) => {
+                const head = `- \`${f.code.split('\n')[0]}\` → ${f.error}`;
+                if (!f.alerts?.length && !f.ariaChanges) return head;
+                const parts = [head];
+                if (f.alerts?.length) parts.push(`  • Page now shows: ${f.alerts.map((a) => `"${a}"`).join(', ')}`);
+                if (f.ariaChanges) {
+                  const trimmed = f.ariaChanges.split('\n').slice(0, 8).join('\n    ');
+                  parts.push(`  • ARIA changes after click:\n    ${trimmed}`);
+                }
+                return parts.join('\n');
+              })
+              .join('\n');
             contextMsg += `<previous_failures>\n${lines}\n</previous_failures>\n\n`;
           }
           if (!htmlContextAdded) {
             htmlContextAdded = true;
             contextMsg += `Full HTML context:\n\n<page_html>\n${await actionResult.combinedHtml()}\n</page_html>\n\n`;
           }
+          const rejectedByApp = batchFailures.some((f) => (f.alerts && f.alerts.length > 0) || f.ariaChanges);
+          if (rejectedByApp) {
+            contextMsg += dedent`
+              Some submits did not throw an error, but the URL did not change and the page reacted (see alerts / ARIA changes above).
+              This means the submit was REJECTED by the application — invalid input, bad credentials, validation error, or missing required field — NOT that the locator was wrong.
+              Before changing locators, re-examine the data you submitted:
+              - Use values from the knowledge/hint context literally; do not abbreviate or guess them.
+              - Address any validation message shown above before retrying the same submit.
+              Only change locators if the page did NOT react at all to your click (no alert, no ARIA change) — that suggests the click missed its target.
+
+            `;
+          }
           contextMsg += 'Propose new solutions. If errors mention "intercepts pointer events" or timeouts on visible elements, an overlay is blocking — dismiss it first (Escape, click outside, Close button) before retrying the original action.';
           conversation.addUserText(contextMsg);
           codeBlocks = [];
@@ -292,7 +317,9 @@ class Navigator implements Agent {
 
         await this.explorer.switchToMainFrame();
 
-        const prevHash = action.actionResult?.getStateHash() ?? actionResult.getStateHash();
+        const prevActionResult = action.actionResult ?? actionResult;
+        const prevHash = prevActionResult.getStateHash();
+        const prevAlerts = extractAlerts(prevActionResult.ariaSnapshot);
 
         debugLog(`Attempting resolution: ${codeBlock}`);
         const attemptOk = await action.attempt(codeBlock, message);
@@ -328,7 +355,28 @@ class Navigator implements Agent {
           resolved = urlMatches && stateChanged;
 
           if (!resolved && attemptOk) {
+            const newAlerts = extractAlerts(freshState.ariaSnapshot).filter((a) => !prevAlerts.includes(a));
+            let ariaChanges: string | null = null;
+            if (freshState.getStateHash() !== prevHash) {
+              try {
+                const diff = await freshState.diff(prevActionResult);
+                await diff.calculate();
+                ariaChanges = diff.ariaChanged;
+              } catch (err) {
+                debugLog('Failed to compute pageDiff for failed URL verification:', err);
+              }
+            }
+            batchFailures.push({
+              code: codeBlock,
+              error: `URL did not change (still ${freshState.url})`,
+              alerts: newAlerts,
+              ariaChanges,
+              urlAfter: freshState.url,
+            });
             tag('warning').log(`URL verification failed: expected ${expectedUrl}, got ${freshState.url}`);
+            if (newAlerts.length > 0) {
+              tag('warning').log(`Page now shows: ${newAlerts.map((a) => `"${a}"`).join(', ')}`);
+            }
           }
           if (freshState.getStateHash() !== prevHash && (attemptOk || urlMatches)) {
             progressBlocks.push(codeBlock);
diff --git a/src/utils/aria.ts b/src/utils/aria.ts
index a699eb5..7d5baee 100644
--- a/src/utils/aria.ts
+++ b/src/utils/aria.ts
@@ -513,6 +513,42 @@ export function parseAriaLocator(ariaStr: string): { role: string; text: string
   return { role: match[1], text: match[2] };
 }
 
+const ALERT_ROLES = new Set(['alert', 'alertdialog', 'status']);
+
+export function extractAlerts(ariaSnapshot: string | null): string[] {
+  if (!ariaSnapshot) return [];
+  const lines = ariaSnapshot.split('\n');
+  const alerts: string[] = [];
+
+  for (let i = 0; i < lines.length; i++) {
+    const line = lines[i];
+    const header = line.match(/^(\s*)-\s*(\w+)\b/);
+    if (!header) continue;
+    if (!ALERT_ROLES.has(header[2])) continue;
+
+    const inline = line.match(/^\s*-\s*\w+\s*:?\s*"([^"]+)"/);
+    if (inline) {
+      const text = inline[1].trim();
+      if (text && text !== '-') alerts.push(text);
+      continue;
+    }
+
+    const indent = header[1].length;
+    const collected: string[] = [];
+    for (let j = i + 1; j < lines.length; j++) {
+      const child = lines[j];
+      const childIndent = child.match(/^(\s*)\S/);
+      if (!childIndent) continue;
+      if (childIndent[1].length <= indent) break;
+      const quoted = child.match(/"([^"]+)"/);
+      if (quoted) collected.push(quoted[1].trim());
+    }
+    if (collected.length > 0) alerts.push(collected.join(' '));
+  }
+
+  return Array.from(new Set(alerts.filter(Boolean)));
+}
+
 // ─────────────────────────────────────────────────────────────────
 // Types
 // ─────────────────────────────────────────────────────────────────

From f3937fdf90376ae183bd5f1b8231652dc20ee8f0 Mon Sep 17 00:00:00 2001
From: Michael Bodnarchuk <davert.ua@gmail.com>
Date: Mon, 25 May 2026 01:38:40 +0300
Subject: [PATCH 3/5] Give Navigator a stop tool + fix contradicting retry
 prompt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Navigator's resolveState loop was passing tools=undefined to the AI,
so the model had no way to signal "this is hopeless, the page is
rejecting the submit." Combined with a retry prompt that, on
app-rejection, told the model both "do not change the locator" AND
"propose new solutions" in the same turn — the model spent its full
retry budget mutating locators that were already correct.

This adds a stop(reason) tool the AI can call when no locator change
will help (wrong credentials, missing knowledge, captcha, blocking
error). When called, the reason is logged at error level and surfaced
in the existing interactive failure prompt so the user knows what to
fix.

The retry prompt is restructured so the two paths are mutually
exclusive:
- if the page reacted (alerts / ARIA changes): two clear choices —
  call stop(reason), or correct the submitted data using known
  knowledge. Do not change the locator.
- if the page did not react: propose new locator strategies (the old
  behaviour).

No control-flow rewrite — the model can still emit code blocks in
text; the tool is an optional escape valve, mirroring how Tester uses
its stop() tool at tester.ts:938.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CHANGELOG.md        |  5 +++++
 src/ai/navigator.ts | 47 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 28821be..c2770a5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,10 @@
 # Changelog
 
+## 2026-05-25
+
+### Changes
+- [Navigator] Can now stop on its own when the page reacts to a submit but the cause is data, not the locator. A new `stop(reason)` tool is exposed to the Navigator's AI; the model is instructed to call it when an alert/validation message indicates the user must fix something (wrong credentials, missing knowledge, captcha). Until now the retry prompt told the model both "this is not a locator issue" AND "propose new solutions" in the same turn, so Navigator burned its full retry budget mutating locators that were already correct. The retry prompt is now branched: app-side rejection → `stop(reason)` or correct the submitted data; click that missed entirely → propose new locator strategies. When the model calls `stop()`, the reason is logged and surfaced in the interactive failure prompt so the user knows what to fix.
+
 ## 2026-05-24
 
 ### New CLI Options
diff --git a/src/ai/navigator.ts b/src/ai/navigator.ts
index 99201c9..a81fbcd 100644
--- a/src/ai/navigator.ts
+++ b/src/ai/navigator.ts
@@ -1,4 +1,6 @@
+import { tool } from 'ai';
 import dedent from 'dedent';
+import { z } from 'zod';
 import { ActionResult } from '../action-result.js';
 import type Action from '../action.ts';
 import { ExperienceTracker, renderExperienceToc } from '../experience-tracker.js';
@@ -239,7 +241,25 @@ class Navigator implements Agent {
     const conversation = this.provider.startConversation(this.systemPrompt, 'navigator');
     conversation.addUserText(prompt);
 
-    const tools = undefined;
+    let stopReason: string | null = null;
+    const tools = {
+      stop: tool({
+        description: dedent`
+          Stop the navigation because no locator change can resolve the goal.
+          Use this when the application rejected the submission (wrong credentials, missing CSRF,
+          captcha, validation failure you cannot satisfy from available data), required knowledge
+          is missing, or the page shows a blocking error you cannot dismiss.
+          Do NOT use this for locator or strategy problems — for those, emit new code blocks instead.
+        `,
+        inputSchema: z.object({
+          reason: z.string().describe('Short user-facing explanation. Quote the alert / validation text you saw and name what data or knowledge is missing.'),
+        }),
+        execute: async ({ reason }) => {
+          stopReason = reason;
+          return { success: true, message: 'Recorded. Navigator will stop and surface the reason.' };
+        },
+      }),
+    };
 
     let codeBlocks: string[] = [];
     let htmlContextAdded = false;
@@ -254,6 +274,12 @@ class Navigator implements Agent {
         if (codeBlocks.length === 0) {
           const result = await this.provider.invokeConversation(conversation, tools);
           if (!result) return;
+          if (stopReason) {
+            tag('error').log(`Navigator stopped: ${stopReason}`);
+            resolved = false;
+            stop();
+            return;
+          }
           const aiResponse = result?.response?.text;
           debugLog('AI:', aiResponse?.split('\n')[0]);
           debugLog('Received AI response:', aiResponse?.length ?? 0, 'characters');
@@ -299,14 +325,16 @@ class Navigator implements Agent {
             contextMsg += dedent`
               Some submits did not throw an error, but the URL did not change and the page reacted (see alerts / ARIA changes above).
               This means the submit was REJECTED by the application — invalid input, bad credentials, validation error, or missing required field — NOT that the locator was wrong.
-              Before changing locators, re-examine the data you submitted:
-              - Use values from the knowledge/hint context literally; do not abbreviate or guess them.
-              - Address any validation message shown above before retrying the same submit.
-              Only change locators if the page did NOT react at all to your click (no alert, no ARIA change) — that suggests the click missed its target.
 
+              You have two choices, and ONLY these two:
+              1. If the rejection can only be fixed by the user (wrong credentials, missing data, captcha, knowledge-file gap) — call the stop() tool with the alert text and what is needed. Do NOT propose more code blocks.
+              2. If you can correct the SUBMITTED DATA (not the locator) using values present in the knowledge / hint context above — emit corrected code blocks. Do not change the locator.
+
+              Only change locators if the page did NOT react at all to your click (no alert, no ARIA change) — that suggests the click missed its target.
             `;
+          } else {
+            contextMsg += 'Propose new solutions. If errors mention "intercepts pointer events" or timeouts on visible elements, an overlay is blocking — dismiss it first (Escape, click outside, Close button) before retrying the original action.';
           }
-          contextMsg += 'Propose new solutions. If errors mention "intercepts pointer events" or timeouts on visible elements, an overlay is blocking — dismiss it first (Escape, click outside, Close button) before retrying the original action.';
           conversation.addUserText(contextMsg);
           codeBlocks = [];
           batchFailures.length = 0;
@@ -428,12 +456,15 @@ class Navigator implements Agent {
       }
     }
 
-    if (!resolved && totalAttempts > 0) {
+    if (!resolved && stopReason) {
+      tag('error').log(`Navigator stopped: ${stopReason}`);
+    } else if (!resolved && totalAttempts > 0) {
       tag('error').log(`Navigation failed after ${totalAttempts} attempts`);
     }
 
     if (!resolved && isInteractive()) {
-      const userInput = await pause(`Navigator failed to resolve. Current: ${action.stateManager.getCurrentState()?.url}\n` + `Target: ${expectedUrl ?? '(none)'}\nEnter CodeceptJS commands (or press Enter to skip):`);
+      const stopLine = stopReason ? `Navigator stopped: ${stopReason}\n` : '';
+      const userInput = await pause(`${stopLine}Navigator failed to resolve. Current: ${action.stateManager.getCurrentState()?.url}\n` + `Target: ${expectedUrl ?? '(none)'}\nEnter CodeceptJS commands (or press Enter to skip):`);
 
       if (userInput?.trim()) {
         resolved = await action.attempt(userInput, message);

From 0e589870e7fe4fe8b4c0967c6a4c152b12c14512 Mon Sep 17 00:00:00 2001
From: Michael Bodnarchuk <davert.ua@gmail.com>
Date: Mon, 25 May 2026 10:44:34 +0300
Subject: [PATCH 4/5] Drop extractAlerts regex helper; let the prompt do the
 judgement
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CLAUDE.md ("Prompts & Rules — General, Not Example-Driven") forbids
adding programmatic detectors that target a semantic judgement the
model should make from a prompt. The extractAlerts() helper I added
to utils/aria.ts violated that — it was a regex over the ARIA snapshot
looking for alert/alertdialog/status roles to feed to the model as a
pre-digested "Page now shows: ..." line.

Remove it. The ARIA diff produced by the existing Diff infrastructure
is fed to the model as-is, and the retry prompt now instructs the
model to read the diff and judge for itself whether the application
rejected the action — looking for any new role/text that signals
rejection, without naming specific phrases. Different sites express
rejection differently and the prompt says so.

The stop() tool path and the prompt branching are unchanged in
structure; only the language is now "read the ARIA diff above" instead
of "see alerts above", and the pageReacted gate is based on whether
the diff is non-null rather than on whether the regex matched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CHANGELOG.md        |  2 +-
 src/ai/navigator.ts | 36 ++++++++++++------------------------
 src/utils/aria.ts   | 36 ------------------------------------
 3 files changed, 13 insertions(+), 61 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index c2770a5..eef1013 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,7 +14,7 @@
   explorbot navigate /dashboard --session auth.json
   explorbot navigate /unreachable && echo ok       # exit code reflects reachability
   ```
-- [Navigator] When a click succeeds but the URL does not change to the expected target, the page's reaction is now captured and fed back to the AI: any alert/status messages that appeared (e.g. "Invalid email or password") and the ARIA diff are included in the next retry prompt. This breaks the "9-attempt syntactic-variant loop" that used to happen when a form submit was rejected by the server — the AI now sees *why* the submission failed and is instructed to re-examine credentials or input data before changing locators, rather than blaming the locator.
+- [Navigator] When a click succeeds but the URL does not change to the expected target, the ARIA diff between the pre-click and post-click page is now included in the next retry prompt. The AI is instructed to read the diff and decide whether the application rejected the submit (in which case it should fix the submitted data, not the locator) or the click simply missed its target. This breaks the "9-attempt syntactic-variant loop" that used to happen when a form submit was rejected by the server — the model now has the evidence to tell the two cases apart.
 
 
 
diff --git a/src/ai/navigator.ts b/src/ai/navigator.ts
index a81fbcd..05bee28 100644
--- a/src/ai/navigator.ts
+++ b/src/ai/navigator.ts
@@ -7,7 +7,6 @@ import { ExperienceTracker, renderExperienceToc } from '../experience-tracker.js
 import Explorer from '../explorer.ts';
 import { KnowledgeTracker } from '../knowledge-tracker.js';
 import { type WebPageState, normalizeUrl } from '../state-manager.js';
-import { extractAlerts } from '../utils/aria.ts';
 import { extractCodeBlocks } from '../utils/code-extractor.js';
 import { HooksRunner } from '../utils/hooks-runner.ts';
 import { createDebug, pluralize, tag } from '../utils/logger.js';
@@ -266,7 +265,7 @@ class Navigator implements Agent {
     let codeBlockIndex = 0;
     let totalAttempts = 0;
     const progressBlocks: string[] = [];
-    const batchFailures: Array<{ code: string; error: string; alerts?: string[]; ariaChanges?: string | null; urlAfter?: string }> = [];
+    const batchFailures: Array<{ code: string; error: string; ariaChanges?: string | null; urlAfter?: string }> = [];
 
     let resolved = false;
     await loop(
@@ -304,14 +303,9 @@ class Navigator implements Agent {
             const lines = batchFailures
               .map((f) => {
                 const head = `- \`${f.code.split('\n')[0]}\` → ${f.error}`;
-                if (!f.alerts?.length && !f.ariaChanges) return head;
-                const parts = [head];
-                if (f.alerts?.length) parts.push(`  • Page now shows: ${f.alerts.map((a) => `"${a}"`).join(', ')}`);
-                if (f.ariaChanges) {
-                  const trimmed = f.ariaChanges.split('\n').slice(0, 8).join('\n    ');
-                  parts.push(`  • ARIA changes after click:\n    ${trimmed}`);
-                }
-                return parts.join('\n');
+                if (!f.ariaChanges) return head;
+                const trimmed = f.ariaChanges.split('\n').slice(0, 12).join('\n    ');
+                return `${head}\n  • ARIA changes after the action:\n    ${trimmed}`;
               })
               .join('\n');
             contextMsg += `<previous_failures>\n${lines}\n</previous_failures>\n\n`;
@@ -320,17 +314,17 @@ class Navigator implements Agent {
             htmlContextAdded = true;
             contextMsg += `Full HTML context:\n\n<page_html>\n${await actionResult.combinedHtml()}\n</page_html>\n\n`;
           }
-          const rejectedByApp = batchFailures.some((f) => (f.alerts && f.alerts.length > 0) || f.ariaChanges);
-          if (rejectedByApp) {
+          const pageReacted = batchFailures.some((f) => f.ariaChanges);
+          if (pageReacted) {
             contextMsg += dedent`
-              Some submits did not throw an error, but the URL did not change and the page reacted (see alerts / ARIA changes above).
-              This means the submit was REJECTED by the application — invalid input, bad credentials, validation error, or missing required field — NOT that the locator was wrong.
+              Some actions did not throw, but the URL did not change to the expected target and the page changed in other ways (see the ARIA changes listed above).
+              Read the ARIA diff above and judge what happened. Look for any new role that conveys a server response — e.g. an alert, alertdialog, status, validation message, banner, or text that names a problem ("invalid", "required", "expired", "incorrect", "denied", "captcha", "verify"). Different sites express rejection differently; do not look for a specific phrase, read what is there.
 
-              You have two choices, and ONLY these two:
-              1. If the rejection can only be fixed by the user (wrong credentials, missing data, captcha, knowledge-file gap) — call the stop() tool with the alert text and what is needed. Do NOT propose more code blocks.
-              2. If you can correct the SUBMITTED DATA (not the locator) using values present in the knowledge / hint context above — emit corrected code blocks. Do not change the locator.
+              Decide between exactly two paths:
+              1. The diff shows the application rejected the action and the fix is something only the user can provide (wrong credentials, missing data, captcha, knowledge-file gap) — call the stop() tool and quote what you saw in the diff and what is needed.
+              2. The diff shows the application rejected the action but you can correct the SUBMITTED DATA using values present in the knowledge / hint context above — emit corrected code blocks. Do not change the locator.
 
-              Only change locators if the page did NOT react at all to your click (no alert, no ARIA change) — that suggests the click missed its target.
+              Only change locators if the diff shows NOTHING relevant happened in response to your click — that is the only signal that the click missed its target.
             `;
           } else {
             contextMsg += 'Propose new solutions. If errors mention "intercepts pointer events" or timeouts on visible elements, an overlay is blocking — dismiss it first (Escape, click outside, Close button) before retrying the original action.';
@@ -347,7 +341,6 @@ class Navigator implements Agent {
 
         const prevActionResult = action.actionResult ?? actionResult;
         const prevHash = prevActionResult.getStateHash();
-        const prevAlerts = extractAlerts(prevActionResult.ariaSnapshot);
 
         debugLog(`Attempting resolution: ${codeBlock}`);
         const attemptOk = await action.attempt(codeBlock, message);
@@ -383,7 +376,6 @@ class Navigator implements Agent {
           resolved = urlMatches && stateChanged;
 
           if (!resolved && attemptOk) {
-            const newAlerts = extractAlerts(freshState.ariaSnapshot).filter((a) => !prevAlerts.includes(a));
             let ariaChanges: string | null = null;
             if (freshState.getStateHash() !== prevHash) {
               try {
@@ -397,14 +389,10 @@ class Navigator implements Agent {
             batchFailures.push({
               code: codeBlock,
               error: `URL did not change (still ${freshState.url})`,
-              alerts: newAlerts,
               ariaChanges,
               urlAfter: freshState.url,
             });
             tag('warning').log(`URL verification failed: expected ${expectedUrl}, got ${freshState.url}`);
-            if (newAlerts.length > 0) {
-              tag('warning').log(`Page now shows: ${newAlerts.map((a) => `"${a}"`).join(', ')}`);
-            }
           }
           if (freshState.getStateHash() !== prevHash && (attemptOk || urlMatches)) {
             progressBlocks.push(codeBlock);
diff --git a/src/utils/aria.ts b/src/utils/aria.ts
index 7d5baee..a699eb5 100644
--- a/src/utils/aria.ts
+++ b/src/utils/aria.ts
@@ -513,42 +513,6 @@ export function parseAriaLocator(ariaStr: string): { role: string; text: string
   return { role: match[1], text: match[2] };
 }
 
-const ALERT_ROLES = new Set(['alert', 'alertdialog', 'status']);
-
-export function extractAlerts(ariaSnapshot: string | null): string[] {
-  if (!ariaSnapshot) return [];
-  const lines = ariaSnapshot.split('\n');
-  const alerts: string[] = [];
-
-  for (let i = 0; i < lines.length; i++) {
-    const line = lines[i];
-    const header = line.match(/^(\s*)-\s*(\w+)\b/);
-    if (!header) continue;
-    if (!ALERT_ROLES.has(header[2])) continue;
-
-    const inline = line.match(/^\s*-\s*\w+\s*:?\s*"([^"]+)"/);
-    if (inline) {
-      const text = inline[1].trim();
-      if (text && text !== '-') alerts.push(text);
-      continue;
-    }
-
-    const indent = header[1].length;
-    const collected: string[] = [];
-    for (let j = i + 1; j < lines.length; j++) {
-      const child = lines[j];
-      const childIndent = child.match(/^(\s*)\S/);
-      if (!childIndent) continue;
-      if (childIndent[1].length <= indent) break;
-      const quoted = child.match(/"([^"]+)"/);
-      if (quoted) collected.push(quoted[1].trim());
-    }
-    if (collected.length > 0) alerts.push(collected.join(' '));
-  }
-
-  return Array.from(new Set(alerts.filter(Boolean)));
-}
-
 // ─────────────────────────────────────────────────────────────────
 // Types
 // ─────────────────────────────────────────────────────────────────

From 89d35939ab6ad54addf799d739ee885be257db72 Mon Sep 17 00:00:00 2001
From: Michael Bodnarchuk <davert.ua@gmail.com>
Date: Mon, 25 May 2026 15:53:48 +0300
Subject: [PATCH 5/5] Rewrite Navigator retry-and-stop prompt to be
 action-agnostic
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Earlier wording baked the form-submit case into the prompt structure:
"correct the SUBMITTED DATA", "wrong credentials, missing CSRF,
captcha", etc. The same retry path runs for ANY action whose post-
state shows the page changed but the URL did not go where expected —
clicks that opened modals, tabs that switched, accordions that
expanded, wizard steps that did not advance.

The retry prompt now describes shapes of outcome ("the application
requires something only the user can supply", "you can resolve this
from existing context", "the diff is empty or unrelated") with plural
examples spanning multiple action types, instead of branching on the
canonical login flow. The stop() tool description got the same
treatment — no more "rejected the submission" / "CSRF" wording.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 CHANGELOG.md        |  2 +-
 src/ai/navigator.ts | 27 ++++++++++++++++-----------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index eef1013..d08f12e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,7 +3,7 @@
 ## 2026-05-25
 
 ### Changes
-- [Navigator] Can now stop on its own when the page reacts to a submit but the cause is data, not the locator. A new `stop(reason)` tool is exposed to the Navigator's AI; the model is instructed to call it when an alert/validation message indicates the user must fix something (wrong credentials, missing knowledge, captcha). Until now the retry prompt told the model both "this is not a locator issue" AND "propose new solutions" in the same turn, so Navigator burned its full retry budget mutating locators that were already correct. The retry prompt is now branched: app-side rejection → `stop(reason)` or correct the submitted data; click that missed entirely → propose new locator strategies. When the model calls `stop()`, the reason is logged and surfaced in the interactive failure prompt so the user knows what to fix.
+- [Navigator] Can now stop on its own when reaching the goal requires something only the user can supply. A new `stop(reason)` tool is exposed to the Navigator's AI; the model is instructed to call it when the ARIA diff after an action indicates the application needs something the test cannot provide — for example an authentication failure, captcha, a permission the test cannot satisfy, or knowledge missing from the provided context. Until now the retry prompt told the model both "this is not a locator issue" AND "propose new solutions" in the same turn, so Navigator burned its full retry budget mutating locators that were already correct. The retry prompt is now branched into three explicit paths: the page reacted in a way only the user can resolve → call `stop()`; the page reacted in a way the AI can resolve from existing knowledge → emit the next step; the diff is empty or unrelated → propose a different locator strategy. When `stop()` is called, the reason is logged and surfaced in the interactive failure prompt so the user knows what to fix.
 
 ## 2026-05-24
 
diff --git a/src/ai/navigator.ts b/src/ai/navigator.ts
index 05bee28..c3e2863 100644
--- a/src/ai/navigator.ts
+++ b/src/ai/navigator.ts
@@ -244,14 +244,16 @@ class Navigator implements Agent {
     const tools = {
       stop: tool({
         description: dedent`
-          Stop the navigation because no locator change can resolve the goal.
-          Use this when the application rejected the submission (wrong credentials, missing CSRF,
-          captcha, validation failure you cannot satisfy from available data), required knowledge
-          is missing, or the page shows a blocking error you cannot dismiss.
+          Stop the navigation because no locator or strategy change can reach the goal.
+          Use this when reaching the goal requires something only the user can supply or that the
+          page cannot grant from the current state — for example: an authentication failure you
+          cannot guess past, a captcha or human-verification step, a permission the test cannot
+          satisfy, a piece of data not present in the available knowledge / hint context, or a
+          blocking error or dialog you cannot dismiss.
           Do NOT use this for locator or strategy problems — for those, emit new code blocks instead.
         `,
         inputSchema: z.object({
-          reason: z.string().describe('Short user-facing explanation. Quote the alert / validation text you saw and name what data or knowledge is missing.'),
+          reason: z.string().describe('Short user-facing explanation. Quote what you observed (alert text, dialog title, status message, validation note) and name what is missing or required.'),
         }),
         execute: async ({ reason }) => {
           stopReason = reason;
@@ -317,14 +319,17 @@ class Navigator implements Agent {
           const pageReacted = batchFailures.some((f) => f.ariaChanges);
           if (pageReacted) {
             contextMsg += dedent`
-              Some actions did not throw, but the URL did not change to the expected target and the page changed in other ways (see the ARIA changes listed above).
-              Read the ARIA diff above and judge what happened. Look for any new role that conveys a server response — e.g. an alert, alertdialog, status, validation message, banner, or text that names a problem ("invalid", "required", "expired", "incorrect", "denied", "captcha", "verify"). Different sites express rejection differently; do not look for a specific phrase, read what is there.
+              Some steps in the previous batch did not throw, but the URL did not change to the expected target and the page changed in other ways — the ARIA diff for each such step is listed in <previous_failures> above.
 
-              Decide between exactly two paths:
-              1. The diff shows the application rejected the action and the fix is something only the user can provide (wrong credentials, missing data, captcha, knowledge-file gap) — call the stop() tool and quote what you saw in the diff and what is needed.
-              2. The diff shows the application rejected the action but you can correct the SUBMITTED DATA using values present in the knowledge / hint context above — emit corrected code blocks. Do not change the locator.
+              Read those diffs and judge what each step actually triggered. Different action types produce different reactions; the diff is your only evidence of what happened. A diff might show, for example: a new alert / alertdialog / status / validation message appearing near a field or at page level; a modal, dialog, or wizard step opening; a banner, toast, or notification region appearing; a section expanding or collapsing; a tab or accordion switching content. A diff might also be empty or unrelated to the step — that is also a signal.
 
-              Only change locators if the diff shows NOTHING relevant happened in response to your click — that is the only signal that the click missed its target.
+              Choose exactly ONE path based on what the diffs actually show — do not assume the previous step submitted any particular kind of data:
+
+              A. The diff indicates the application requires something only the user can supply — for example: an authentication failure you cannot guess past, a captcha, a permission the test cannot satisfy, or knowledge that is not present in the provided context. Call the stop() tool and quote what you saw in the diff and what is needed.
+
+              B. The diff indicates the next step is something you can perform from the existing knowledge / hint context — for example: re-emit a step with a value that exists in the knowledge but was used incorrectly; dismiss an unexpected modal; accept a confirmation; take a follow-up step the page now requires. Emit code blocks for that next step. Do NOT change the locator of a step that already produced a reaction.
+
+              C. The diff is empty or unrelated to your step — the action likely missed its target. Propose a different locator strategy.
             `;
           } else {
             contextMsg += 'Propose new solutions. If errors mention "intercepts pointer events" or timeouts on visible elements, an overlay is blocking — dismiss it first (Escape, click outside, Close button) before retrying the original action.';