tsotchke · tsotchke · May 21, 2026 · May 21, 2026
diff --git a/README.md b/README.md
@@ -384,6 +384,13 @@ Regenerate the evidence-backed capability/functionality report with:
 python3 scripts/build_assistant_capability_report.py
 ```
 
+Regenerate the retrieval-only recall benchmark after `ASSIST.EXE --recall-probe`
+has produced `qemu/evidence/assistant_recall_486.log`:
+
+```sh
+python3 scripts/benchmark_assistant_recall.py
+```
+
 Run the non-greedy sampling matrix with:
 
 ```sh
@@ -466,7 +473,7 @@ physical 486-class DOS machine. Pentium timing is useful scaling evidence, but
 it is not a blocker for the solid 486-focused release. The hardware ladder is
 tracked in [`docs/hardware-validation.md`](docs/hardware-validation.md), with a
 DOS capture batch under `hardware/HWVALID.BAT` that writes `QUAL.LOG`,
-`PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, and `ASSISTC.LOG`, strict host
+`PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ARECALL.LOG`, and `ASSISTC.LOG`, strict host
 verification through `scripts/verify_hardware_capture.py --require-filled-notes`,
 and release evidence staging through
 `scripts/stage_hardware_capture_evidence.py`. The physical assistant gate now

diff --git a/docs/assistant-intelligence-roadmap.md b/docs/assistant-intelligence-roadmap.md
@@ -57,12 +57,14 @@ falling back to the full KDB.
 - Lightweight domain pack without retraining: `PORTABLE` ships portable
   intelligence notes generated from `data/assistant_pack_notes/portable` and
   shares the CHAT model.
+- Retrieval-only recall probe: `ASSIST.EXE --recall-probe` measures the KB2/KDB
+  recall path across every shipped pack without model generation.
 
 ## Next Milestones
 
 - Add more domain packs for hardware repair, programming, and offline reference
   manuals using the same generated KDB/KB2 contract.
-- Measure binary KDB scan time in QEMU and on real hardware, then decide
+- Compare recall-probe timing on QEMU and physical hardware, then decide
   whether the next storage step should be topic shards or offset tables.
 - Add persistent memory slots beyond name, goal, style, and problem.
 - Add a measured recall benchmark in QEMU and on physical 486 hardware.
diff --git a/docs/hardware-validation.md b/docs/hardware-validation.md
@@ -9,7 +9,7 @@ the core release gate.
 | Tier | Status | Hardware | Release Role | Required Logs |
 |---|---|---|---|---|
 | 0 | Complete | QEMU 486 profiles | Preview release gate | compile, quality, perf, assistant, vectors |
-| 1 | Next gate | Any working 486-class DOS PC with 32-64 MB RAM | Solid release baseline | `QUAL.LOG`, `PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ASSISTC.LOG` |
+| 1 | Next gate | Any working 486-class DOS PC with 32-64 MB RAM | Solid release baseline | `QUAL.LOG`, `PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ARECALL.LOG`, `ASSISTC.LOG` |
 | 2 | Useful | Faster 486DX2/DX4 or comparable late 486 board | Performance confidence | repeated `PERF.LOG`, optional kernel perf |
 | 3 | Optional | Pentium 60/90/133+ | Scaling comparison only | `PERF.LOG`, optional quality confirmation |
 | 4 | Optional | 386 or 486SX no-FPU class system | Compatibility stress test | quality and perf if memory allows |
@@ -57,6 +57,7 @@ GPT2.EXE --quality-all > QUAL.LOG
 GPT2.EXE --perf > PERF.LOG
 ASSIST.EXE --scripted > ASSIST.LOG
 ASSIST.EXE --stress-probe > ASTRESS.LOG
+ASSIST.EXE --recall-probe > ARECALL.LOG
 ```
 
 Also keep the assistant compile log when building on the target:
@@ -112,6 +113,8 @@ verifies the paired `hardware_<machine>_manifest.md` checksum table.
 - `ASTRESS.LOG` includes `ASSIST_END|suite=stress-probe|packs=5`, exactly 50
   `ASSIST_REPLY|` rows, no `status=model_unavailable` rows, and records for
   CHAT, DOSHELP, OFFICE, DEV, and PORTABLE.
+- `ARECALL.LOG` includes `ASSIST_END|suite=recall-probe|packs=5`, exactly 42
+  `ASSIST_RECALL|` rows, and validated KB2/KDB recall answers for every pack.
 - `ASSISTC.LOG` includes `ASSIST_COMPILE_OK` when target-side compilation is
   attempted.
 - The hardware notes identify machine key, CPU, clock, RAM, DOS version,
@@ -127,6 +130,7 @@ qemu/evidence/hardware_<machine>_quality.log
 qemu/evidence/hardware_<machine>_perf.log
 qemu/evidence/hardware_<machine>_assistant.log
 qemu/evidence/hardware_<machine>_assistant_stress.log
+qemu/evidence/hardware_<machine>_assistant_recall.log
 qemu/evidence/hardware_<machine>_assistant_compile.log
 qemu/evidence/hardware_<machine>_notes.md
 qemu/evidence/hardware_<machine>_manifest.md

diff --git a/docs/releases/v0.1.0-preview.md b/docs/releases/v0.1.0-preview.md
@@ -174,10 +174,12 @@ python3 scripts/evaluate_assistant_pack_retrieval.py
 python3 scripts/evaluate_assistant_kdb_index.py
 python3 scripts/evaluate_assistant_kdb_binary.py
 python3 scripts/evaluate_assistant_kdb_term_index.py
+python3 scripts/benchmark_assistant_recall.py
 python3 scripts/import_assistant_notes.py --self-test
 python3 scripts/evaluate_assistant_consistency.py
 python3 scripts/build_assistant_capability_report.py
 QEMU_TIMEOUT_SECONDS=240 bash qemu/run_assistant_stress_486.sh
+QEMU_TIMEOUT_SECONDS=240 bash qemu/run_assistant_recall_486.sh
 python3 scripts/stress_assistant_behavior.py --log qemu/evidence/assistant_stress_486.log
 python3 scripts/verify_workspace_tracking.py
 python3 scripts/build_preview_release.py --force

diff --git a/hardware/HWVALID.BAT b/hardware/HWVALID.BAT
@@ -6,6 +6,7 @@ if exist QUAL.LOG del QUAL.LOG
 if exist PERF.LOG del PERF.LOG
 if exist ASSIST.LOG del ASSIST.LOG
 if exist ASTRESS.LOG del ASTRESS.LOG
+if exist ARECALL.LOG del ARECALL.LOG
 if exist ASSISTC.LOG del ASSISTC.LOG
 
 echo HW_CAPTURE_BEGIN>HWVALID.LOG
@@ -50,6 +51,9 @@ ASSIST.EXE --scripted > ASSIST.LOG
 echo Running ASSIST.EXE --stress-probe...
 echo HW_STEP^|assistant_stress>>HWVALID.LOG
 ASSIST.EXE --stress-probe > ASTRESS.LOG
+echo Running ASSIST.EXE --recall-probe...
+echo HW_STEP^|assistant_recall>>HWVALID.LOG
+ASSIST.EXE --recall-probe > ARECALL.LOG
 goto done
 
 :missing_exe
@@ -66,5 +70,5 @@ goto done
 echo HW_CAPTURE_END>>HWVALID.LOG
 echo.
 echo Hardware validation capture complete.
-echo Copy HWVALID.LOG, QUAL.LOG, PERF.LOG, ASSIST.LOG, ASTRESS.LOG, ASSISTC.LOG,
-echo and HWNOTES.TXT back to the host for verification.
+echo Copy HWVALID.LOG, QUAL.LOG, PERF.LOG, ASSIST.LOG, ASTRESS.LOG, ARECALL.LOG,
+echo ASSISTC.LOG, and HWNOTES.TXT back to the host for verification.
diff --git a/hardware/README.md b/hardware/README.md
@@ -47,12 +47,14 @@ QUAL.LOG
 PERF.LOG
 ASSIST.LOG
 ASTRESS.LOG
+ARECALL.LOG
 ASSISTC.LOG
 ```
 
 `ASSIST.LOG` is the five-pack scripted assistant proof. `ASTRESS.LOG` is the
 50-reply stress probe for CHAT, DOSHELP, OFFICE, DEV, and PORTABLE on the same
-machine.
+machine. `ARECALL.LOG` is the retrieval-only recall benchmark for the same pack
+set.
 
 Fill in `HWNOTES.TXT` with CPU, clock, RAM, DOS version, storage, cache/turbo
 state, FreeBASIC version, and any setup notes.

diff --git a/qemu/evidence/assistant_capability_functionality_report.md b/qemu/evidence/assistant_capability_functionality_report.md
@@ -32,6 +32,10 @@ This report is generated from repository evidence files by `scripts/build_assist
 - Term-index recall evaluation: `PASS 42/42`.
 - Term-index candidate row scan ratio: `0.145`.
 - Term-index candidate byte ratio: `0.315`.
+- QEMU recall benchmark: `PASS 42 cases`.
+- QEMU recall average retrieval time: `61 ms`.
+- QEMU recall max retrieval time: `110 ms`.
+- QEMU recall modes: `kb2_term=42`.
 
 ## Language Coverage
 
@@ -43,6 +47,7 @@ This report is generated from repository evidence files by `scripts/build_assist
 - KDB text index gate: `PASS 42/42`.
 - KDB binary gate: `PASS 42/42`.
 - KDB term-index gate: `PASS 42/42`.
+- DOS recall benchmark gate: `PASS 42 cases`.
 
 Covered categories include general chat, identity, local inference, offline limits, prompt repair, repeated-answer recovery, troubleshooting, DOS setup, office writing, developer pack authoring, and portable-intelligence concepts.
 
@@ -67,6 +72,8 @@ Usefulness workflows currently cover operator prompts, trust/offline limits, DOS
 - Hardware-capture stress source mix: `golden=26 retrieval=16 model=0 fallback=0 memory=8`.
 - Hardware-capture average total reply time: `28 ms`.
 - Hardware-capture average retrieval time: `24 ms`.
+- Hardware-capture recall benchmark: `PASS 42 cases`.
+- Hardware-capture recall average retrieval time: `82 ms`.
 - Physical machine capture status: PENDING: no staged physical `hardware_<machine>_manifest.md` capture is present yet.
 
 ## Authoring And Import

diff --git a/qemu/evidence/assistant_recall_486.log b/qemu/evidence/assistant_recall_486.log
@@ -0,0 +1,181 @@
++------------------------------------------------------------+
+
+| GPT2-BASIC Assistant Shell                                 |
+
+| Pack-driven text UI; VGA sprite/icon slots are pack assets. |
+
++------------------------------------------------------------+
+
+
+
+ASSIST_BEGIN|suite=recall-probe|version=1
+
+Available packs:
+
+  CHAT - Conversation Pack
+
+  DOSHELP - DOS Help Assistant
+
+  OFFICE - Office Assistant
+
+  DEV - Developer Pack
+
+  PORTABLE - Portable Intelligence
+
+
+
+Pack : CHAT - Conversation Pack
+
+Model: PACKS\CHAT\MODEL
+
+Usage: /about
+
+Sprite asset: PACKS\CHAT\CHAT.SPR
+
+Icon asset  : PACKS\CHAT\CHAT.ICN
+
+ASSIST_PACK|id=CHAT|title=Conversation Pack|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN
+
+
+
+ASSIST_RECALL|pack=CHAT|query=how can i ask better questions|recall=kb2_term|recall_score=33|t_retrieve_ms=110|answer=Better prompts: Say the goal, give one detail, and ask for the next useful step.
+
+ASSIST_RECALL|pack=CHAT|query=what makes this intelligent on a small computer|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Small-computer usefulness: A tiny local model becomes more useful with retrieval, memory, and quick focused help without a network.
+
+ASSIST_RECALL|pack=CHAT|query=which pack should i use for writing|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Pack switching: Use CHAT for conversation, DOSHELP for DOS setup, and OFFICE for writing tasks.
+
+ASSIST_RECALL|pack=CHAT|query=can this work without the internet|recall=kb2_term|recall_score=33|t_retrieve_ms=60|answer=Network limit: I cannot browse the internet from DOS; I answer from local model weights and pack files.
+
+ASSIST_RECALL|pack=CHAT|query=how do i recover from a bad answer|recall=kb2_term|recall_score=33|t_retrieve_ms=110|answer=Mistake recovery: If an answer is wrong, ask a shorter question, switch packs, or give the exact error.
+
+ASSIST_RECALL|pack=CHAT|query=what proof helps me trust this|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Trust evidence: Trust proof comes from visible files, local weights, reproducible tests, and QEMU or hardware logs.
+
+ASSIST_RECALL|pack=CHAT|query=how should i compare options|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Compare options: Name the options, list one tradeoff for each, then choose the practical next step.
+
+ASSIST_RECALL|pack=CHAT|query=help me plan work in small steps|recall=kb2_term|recall_score=66|t_retrieve_ms=110|answer=Planning work: Break the job into small steps, do the blocking step first, and verify each result.
+
+ASSIST_RECALL|pack=CHAT|query=what should a useful answer look like|recall=kb2_term|recall_score=42|t_retrieve_ms=110|answer=Useful answer: A useful answer should be brief, concrete, honest about limits, and easy to act on.
+
+ASSIST_RECALL|pack=CHAT|query=can you explain something simply|recall=kb2_term|recall_score=24|t_retrieve_ms=50|answer=Simple explanation: Use plain words, one example, and a short answer that fits the prompt.
+
+ASSIST_RECALL|pack=CHAT|query=what can you know without web access|recall=kb2_term|recall_score=51|t_retrieve_ms=60|answer=No web access: Without internet, I cannot fetch news or live facts; use local notes or give the facts in the prompt.
+
+ASSIST_RECALL|pack=CHAT|query=how do i show confidence in an answer|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Answer confidence: Say what is known from local files, what is inferred, and what remains uncertain.
+
+Pack : DOSHELP - DOS Help Assistant
+
+Model: PACKS\DOSHELP\MODEL
+
+Usage: /about
+
+Sprite asset: PACKS\DOSHELP\DOSHELP.SPR
+
+Icon asset  : PACKS\DOSHELP\DOSHELP.ICN
+
+ASSIST_PACK|id=DOSHELP|title=DOS Help Assistant|model=PACKS\DOSHELP\MODEL|sprite=PACKS\DOSHELP\DOSHELP.SPR|icons=PACKS\DOSHELP\DOSHELP.ICN
+
+
+
+ASSIST_RECALL|pack=DOSHELP|query=what happens before autoexec bat runs|recall=kb2_term|recall_score=57|t_retrieve_ms=110|answer=AUTOEXEC.BAT hygiene: CONFIG.SYS loads drivers first, then AUTOEXEC.BAT runs commands; keep PATH short and trim resident tools.
+
+ASSIST_RECALL|pack=DOSHELP|query=why use 8.3 filenames in batches|recall=kb2_term|recall_score=24|t_retrieve_ms=60|answer=DOS filenames: Use 8.3 filenames for maximum DOS compatibility and predictable batch files.
+
+ASSIST_RECALL|pack=DOSHELP|query=how should i prepare files for real hardware|recall=kb2_term|recall_score=48|t_retrieve_ms=50|answer=Hardware copy: Copy GPT2, MODEL, PACKS, CWSDPMI, and batch files together before testing on real DOS.
+
+ASSIST_RECALL|pack=DOSHELP|query=what should i do when cwsdpmi is missing|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Missing CWSDPMI: If a protected-mode program fails to start, copy CWSDPMI.EXE beside it and rerun the command.
+
+ASSIST_RECALL|pack=DOSHELP|query=how do i mount the dosbox bundle|recall=kb2_term|recall_score=54|t_retrieve_ms=50|answer=DOSBox mount: Mount the bundle directory as C:, change to C:\GPT2, then run the batch file for the desired profile.
+
+ASSIST_RECALL|pack=DOSHELP|query=what if the fat image is full|recall=kb2_term|recall_score=60|t_retrieve_ms=60|answer=FAT image full: Remove host-only training files or grow the disk image when FAT image assembly runs out of space.
+
+ASSIST_RECALL|pack=DOSHELP|query=what logs matter from qemu|recall=kb2_term|recall_score=39|t_retrieve_ms=50|answer=QEMU logs: Capture compile logs, run logs, and copied evidence files before trusting an emulator result.
+
+ASSIST_RECALL|pack=DOSHELP|query=how do i handle a dos memory error|recall=kb2_term|recall_score=57|t_retrieve_ms=60|answer=DOS memory error: Free conventional memory by unloading TSRs, loading drivers high, or using a smaller profile.
+
+ASSIST_RECALL|pack=DOSHELP|query=how should a batch menu work|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Batch menu: Offer numbered choices, validate the input, and keep each branch short and reversible.
+
+Pack : OFFICE - Office Assistant
+
+Model: PACKS\OFFICE\MODEL
+
+Usage: /about
+
+Sprite asset: PACKS\OFFICE\OFFICE.SPR
+
+Icon asset  : PACKS\OFFICE\OFFICE.ICN
+
+ASSIST_PACK|id=OFFICE|title=Office Assistant|model=PACKS\OFFICE\MODEL|sprite=PACKS\OFFICE\OFFICE.SPR|icons=PACKS\OFFICE\OFFICE.ICN
+
+
+
+ASSIST_RECALL|pack=OFFICE|query=how should i write a handoff note|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Handoff note: Say what is done, what remains, where evidence lives, and who owns the next action.
+
+ASSIST_RECALL|pack=OFFICE|query=what belongs in a bug report|recall=kb2_term|recall_score=36|t_retrieve_ms=0|answer=Bug report shape: Include expected behavior, actual behavior, reproduction steps, logs, and the suspected area.
+
+ASSIST_RECALL|pack=OFFICE|query=make a compact release note|recall=kb2_term|recall_score=36|t_retrieve_ms=110|answer=Release note shape: Lead with what changed, list proof, then state any known limits plainly.
+
+ASSIST_RECALL|pack=OFFICE|query=what should meeting notes capture|recall=kb2_term|recall_score=51|t_retrieve_ms=0|answer=Meeting notes: Capture decisions, owners, dates, open questions, and follow-up actions.
+
+ASSIST_RECALL|pack=OFFICE|query=help me write a project plan|recall=kb2_term|recall_score=36|t_retrieve_ms=110|answer=Project plan: List the goal, milestones, owners, risks, and the next checkpoint.
+
+ASSIST_RECALL|pack=OFFICE|query=how do i track risks|recall=kb2_term|recall_score=24|t_retrieve_ms=60|answer=Risk register: For each risk, record impact, likelihood, mitigation, owner, and review date.
+
+ASSIST_RECALL|pack=OFFICE|query=what is a useful test plan|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Test plan: Define scope, cases, expected results, evidence files, and pass or fail criteria.
+
+ASSIST_RECALL|pack=OFFICE|query=how should i reply to a customer|recall=kb2_term|recall_score=36|t_retrieve_ms=60|answer=Customer reply: Acknowledge the issue, give the current status, state the next action, and avoid overpromising.
+
+ASSIST_RECALL|pack=OFFICE|query=how do i write user docs|recall=kb2_term|recall_score=51|t_retrieve_ms=50|answer=User docs: Write the task goal, prerequisites, exact steps, expected result, and troubleshooting note.
+
+Pack : DEV - Developer Pack
+
+Model: PACKS\CHAT\MODEL
+
+Usage: /about
+
+Sprite asset: PACKS\CHAT\CHAT.SPR
+
+Icon asset  : PACKS\CHAT\CHAT.ICN
+
+ASSIST_PACK|id=DEV|title=Developer Pack|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN
+
+
+
+ASSIST_RECALL|pack=DEV|query=how can this feel modern on a 486|recall=kb2_term|recall_score=36|t_retrieve_ms=60|answer=Modern 486 LLM path: Use small hot-loaded weights, compact retrieval databases, persistent memory, and short synthesis replies.
+
+ASSIST_RECALL|pack=DEV|query=what does retrieval first mean|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Retrieval first: Answer from KDB, USER notes, memory, and golden rows before asking the small model to synthesize.
+
+ASSIST_RECALL|pack=DEV|query=how do i author a pack|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Pack authoring: Write HELP and KNOW rows, rebuild KDB, run the authoring validator, then run retrieval and QEMU gates.
+
+ASSIST_RECALL|pack=DEV|query=what should i check before release|recall=kb2_term|recall_score=42|t_retrieve_ms=110|answer=Release check: Verify tests, logs, artifact names, checksums, release notes, and the target tag.
+
+ASSIST_RECALL|pack=DEV|query=how should we store fast recall data|recall=kb2_term|recall_score=45|t_retrieve_ms=50|answer=High velocity recall: Compile notes into compact keyword rows so DOS scans less text and reaches the answer faster.
+
+ASSIST_RECALL|pack=DEV|query=what should a failure record include|recall=kb2_term|recall_score=39|t_retrieve_ms=50|answer=Failure record: Record the command, input, expected result, actual result, log path, and next experiment.
+
+Pack : PORTABLE - Portable Intelligence
+
+Model: PACKS\CHAT\MODEL
+
+Usage: /about
+
+Sprite asset: PACKS\CHAT\CHAT.SPR
+
+Icon asset  : PACKS\CHAT\CHAT.ICN
+
+ASSIST_PACK|id=PORTABLE|title=Portable Intelligence|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN
+
+
+
+ASSIST_RECALL|pack=PORTABLE|query=what does portable intelligence mean|recall=kb2_term|recall_score=57|t_retrieve_ms=60|answer=portable meaning: Portable intelligence means small local model weights, retrieval, and memory can run on old machines without a network.
+
+ASSIST_RECALL|pack=PORTABLE|query=why is basic useful for teaching ai|recall=kb2_term|recall_score=57|t_retrieve_ms=50|answer=basic teaching: BASIC is useful for teaching machine intelligence because plain arrays, files, and integer arithmetic make the mechanism inspectable.
+
+ASSIST_RECALL|pack=PORTABLE|query=how could this move to c or assembly|recall=kb2_term|recall_score=15|t_retrieve_ms=60|answer=runtime ports: The same assistant contract can be reimplemented in C, assembly, Eshkol, or calculator BASIC when files, arrays, and loops exist.
+
+ASSIST_RECALL|pack=PORTABLE|query=why do hot swappable weights matter|recall=kb2_term|recall_score=45|t_retrieve_ms=50|answer=domain weight loading: Hot swappable weights load domain behavior into a tiny resident shell without rebuilding the whole runtime.
+
+ASSIST_RECALL|pack=PORTABLE|query=how should tiny machines store recall|recall=kb2_term|recall_score=72|t_retrieve_ms=60|answer=tiny machine recall: Tiny machines should store recall as compact indexed rows so slow processors scan fewer bytes before answering.
+
+ASSIST_RECALL|pack=PORTABLE|query=what proof shows this works on old hardware|recall=kb2_term|recall_score=63|t_retrieve_ms=0|answer=old hardware proof: Proof for old hardware needs local logs, repeatable tests, QEMU or hardware captures, and visible source files.
+
+ASSIST_END|suite=recall-probe|packs=5