Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,13 @@ Regenerate the evidence-backed capability/functionality report with:
python3 scripts/build_assistant_capability_report.py
```

Regenerate the retrieval-only recall benchmark after `ASSIST.EXE --recall-probe`
has produced `qemu/evidence/assistant_recall_486.log`:

```sh
python3 scripts/benchmark_assistant_recall.py
```

Run the non-greedy sampling matrix with:

```sh
Expand Down Expand Up @@ -466,7 +473,7 @@ physical 486-class DOS machine. Pentium timing is useful scaling evidence, but
it is not a blocker for the solid 486-focused release. The hardware ladder is
tracked in [`docs/hardware-validation.md`](docs/hardware-validation.md), with a
DOS capture batch under `hardware/HWVALID.BAT` that writes `QUAL.LOG`,
`PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, and `ASSISTC.LOG`, strict host
`PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ARECALL.LOG`, and `ASSISTC.LOG`, strict host
verification through `scripts/verify_hardware_capture.py --require-filled-notes`,
and release evidence staging through
`scripts/stage_hardware_capture_evidence.py`. The physical assistant gate now
Expand Down
4 changes: 3 additions & 1 deletion docs/assistant-intelligence-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,14 @@ falling back to the full KDB.
- Lightweight domain pack without retraining: `PORTABLE` ships portable
intelligence notes generated from `data/assistant_pack_notes/portable` and
shares the CHAT model.
- Retrieval-only recall probe: `ASSIST.EXE --recall-probe` measures the KB2/KDB
recall path across every shipped pack without model generation.

## Next Milestones

- Add more domain packs for hardware repair, programming, and offline reference
manuals using the same generated KDB/KB2 contract.
- Measure binary KDB scan time in QEMU and on real hardware, then decide
- Compare recall-probe timing on QEMU and physical hardware, then decide
whether the next storage step should be topic shards or offset tables.
- Add persistent memory slots beyond name, goal, style, and problem.
- Add a measured recall benchmark in QEMU and on physical 486 hardware.
6 changes: 5 additions & 1 deletion docs/hardware-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ the core release gate.
| Tier | Status | Hardware | Release Role | Required Logs |
|---|---|---|---|---|
| 0 | Complete | QEMU 486 profiles | Preview release gate | compile, quality, perf, assistant, vectors |
| 1 | Next gate | Any working 486-class DOS PC with 32-64 MB RAM | Solid release baseline | `QUAL.LOG`, `PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ASSISTC.LOG` |
| 1 | Next gate | Any working 486-class DOS PC with 32-64 MB RAM | Solid release baseline | `QUAL.LOG`, `PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ARECALL.LOG`, `ASSISTC.LOG` |
| 2 | Useful | Faster 486DX2/DX4 or comparable late 486 board | Performance confidence | repeated `PERF.LOG`, optional kernel perf |
| 3 | Optional | Pentium 60/90/133+ | Scaling comparison only | `PERF.LOG`, optional quality confirmation |
| 4 | Optional | 386 or 486SX no-FPU class system | Compatibility stress test | quality and perf if memory allows |
Expand Down Expand Up @@ -57,6 +57,7 @@ GPT2.EXE --quality-all > QUAL.LOG
GPT2.EXE --perf > PERF.LOG
ASSIST.EXE --scripted > ASSIST.LOG
ASSIST.EXE --stress-probe > ASTRESS.LOG
ASSIST.EXE --recall-probe > ARECALL.LOG
```

Also keep the assistant compile log when building on the target:
Expand Down Expand Up @@ -112,6 +113,8 @@ verifies the paired `hardware_<machine>_manifest.md` checksum table.
- `ASTRESS.LOG` includes `ASSIST_END|suite=stress-probe|packs=5`, exactly 50
`ASSIST_REPLY|` rows, no `status=model_unavailable` rows, and records for
CHAT, DOSHELP, OFFICE, DEV, and PORTABLE.
- `ARECALL.LOG` includes `ASSIST_END|suite=recall-probe|packs=5`, exactly 42
`ASSIST_RECALL|` rows, and validated KB2/KDB recall answers for every pack.
- `ASSISTC.LOG` includes `ASSIST_COMPILE_OK` when target-side compilation is
attempted.
- The hardware notes identify machine key, CPU, clock, RAM, DOS version,
Expand All @@ -127,6 +130,7 @@ qemu/evidence/hardware_<machine>_quality.log
qemu/evidence/hardware_<machine>_perf.log
qemu/evidence/hardware_<machine>_assistant.log
qemu/evidence/hardware_<machine>_assistant_stress.log
qemu/evidence/hardware_<machine>_assistant_recall.log
qemu/evidence/hardware_<machine>_assistant_compile.log
qemu/evidence/hardware_<machine>_notes.md
qemu/evidence/hardware_<machine>_manifest.md
Expand Down
2 changes: 2 additions & 0 deletions docs/releases/v0.1.0-preview.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,10 +174,12 @@ python3 scripts/evaluate_assistant_pack_retrieval.py
python3 scripts/evaluate_assistant_kdb_index.py
python3 scripts/evaluate_assistant_kdb_binary.py
python3 scripts/evaluate_assistant_kdb_term_index.py
python3 scripts/benchmark_assistant_recall.py
python3 scripts/import_assistant_notes.py --self-test
python3 scripts/evaluate_assistant_consistency.py
python3 scripts/build_assistant_capability_report.py
QEMU_TIMEOUT_SECONDS=240 bash qemu/run_assistant_stress_486.sh
QEMU_TIMEOUT_SECONDS=240 bash qemu/run_assistant_recall_486.sh
python3 scripts/stress_assistant_behavior.py --log qemu/evidence/assistant_stress_486.log
python3 scripts/verify_workspace_tracking.py
python3 scripts/build_preview_release.py --force
Expand Down
8 changes: 6 additions & 2 deletions hardware/HWVALID.BAT
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ if exist QUAL.LOG del QUAL.LOG
if exist PERF.LOG del PERF.LOG
if exist ASSIST.LOG del ASSIST.LOG
if exist ASTRESS.LOG del ASTRESS.LOG
if exist ARECALL.LOG del ARECALL.LOG
if exist ASSISTC.LOG del ASSISTC.LOG

echo HW_CAPTURE_BEGIN>HWVALID.LOG
Expand Down Expand Up @@ -50,6 +51,9 @@ ASSIST.EXE --scripted > ASSIST.LOG
echo Running ASSIST.EXE --stress-probe...
echo HW_STEP^|assistant_stress>>HWVALID.LOG
ASSIST.EXE --stress-probe > ASTRESS.LOG
echo Running ASSIST.EXE --recall-probe...
echo HW_STEP^|assistant_recall>>HWVALID.LOG
ASSIST.EXE --recall-probe > ARECALL.LOG
goto done

:missing_exe
Expand All @@ -66,5 +70,5 @@ goto done
echo HW_CAPTURE_END>>HWVALID.LOG
echo.
echo Hardware validation capture complete.
echo Copy HWVALID.LOG, QUAL.LOG, PERF.LOG, ASSIST.LOG, ASTRESS.LOG, ASSISTC.LOG,
echo and HWNOTES.TXT back to the host for verification.
echo Copy HWVALID.LOG, QUAL.LOG, PERF.LOG, ASSIST.LOG, ASTRESS.LOG, ARECALL.LOG,
echo ASSISTC.LOG, and HWNOTES.TXT back to the host for verification.
4 changes: 3 additions & 1 deletion hardware/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,14 @@ QUAL.LOG
PERF.LOG
ASSIST.LOG
ASTRESS.LOG
ARECALL.LOG
ASSISTC.LOG
```

`ASSIST.LOG` is the five-pack scripted assistant proof. `ASTRESS.LOG` is the
50-reply stress probe for CHAT, DOSHELP, OFFICE, DEV, and PORTABLE on the same
machine.
machine. `ARECALL.LOG` is the retrieval-only recall benchmark for the same pack
set.

Fill in `HWNOTES.TXT` with CPU, clock, RAM, DOS version, storage, cache/turbo
state, FreeBASIC version, and any setup notes.
Expand Down
7 changes: 7 additions & 0 deletions qemu/evidence/assistant_capability_functionality_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ This report is generated from repository evidence files by `scripts/build_assist
- Term-index recall evaluation: `PASS 42/42`.
- Term-index candidate row scan ratio: `0.145`.
- Term-index candidate byte ratio: `0.315`.
- QEMU recall benchmark: `PASS 42 cases`.
- QEMU recall average retrieval time: `61 ms`.
- QEMU recall max retrieval time: `110 ms`.
- QEMU recall modes: `kb2_term=42`.

## Language Coverage

Expand All @@ -43,6 +47,7 @@ This report is generated from repository evidence files by `scripts/build_assist
- KDB text index gate: `PASS 42/42`.
- KDB binary gate: `PASS 42/42`.
- KDB term-index gate: `PASS 42/42`.
- DOS recall benchmark gate: `PASS 42 cases`.

Covered categories include general chat, identity, local inference, offline limits, prompt repair, repeated-answer recovery, troubleshooting, DOS setup, office writing, developer pack authoring, and portable-intelligence concepts.

Expand All @@ -67,6 +72,8 @@ Usefulness workflows currently cover operator prompts, trust/offline limits, DOS
- Hardware-capture stress source mix: `golden=26 retrieval=16 model=0 fallback=0 memory=8`.
- Hardware-capture average total reply time: `28 ms`.
- Hardware-capture average retrieval time: `24 ms`.
- Hardware-capture recall benchmark: `PASS 42 cases`.
- Hardware-capture recall average retrieval time: `82 ms`.
- Physical machine capture status: PENDING: no staged physical `hardware_<machine>_manifest.md` capture is present yet.

## Authoring And Import
Expand Down
181 changes: 181 additions & 0 deletions qemu/evidence/assistant_recall_486.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
+------------------------------------------------------------+

| GPT2-BASIC Assistant Shell |

| Pack-driven text UI; VGA sprite/icon slots are pack assets. |

+------------------------------------------------------------+



ASSIST_BEGIN|suite=recall-probe|version=1

Available packs:

CHAT - Conversation Pack

DOSHELP - DOS Help Assistant

OFFICE - Office Assistant

DEV - Developer Pack

PORTABLE - Portable Intelligence



Pack : CHAT - Conversation Pack

Model: PACKS\CHAT\MODEL

Usage: /about

Sprite asset: PACKS\CHAT\CHAT.SPR

Icon asset : PACKS\CHAT\CHAT.ICN

ASSIST_PACK|id=CHAT|title=Conversation Pack|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN



ASSIST_RECALL|pack=CHAT|query=how can i ask better questions|recall=kb2_term|recall_score=33|t_retrieve_ms=110|answer=Better prompts: Say the goal, give one detail, and ask for the next useful step.

ASSIST_RECALL|pack=CHAT|query=what makes this intelligent on a small computer|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Small-computer usefulness: A tiny local model becomes more useful with retrieval, memory, and quick focused help without a network.

ASSIST_RECALL|pack=CHAT|query=which pack should i use for writing|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Pack switching: Use CHAT for conversation, DOSHELP for DOS setup, and OFFICE for writing tasks.

ASSIST_RECALL|pack=CHAT|query=can this work without the internet|recall=kb2_term|recall_score=33|t_retrieve_ms=60|answer=Network limit: I cannot browse the internet from DOS; I answer from local model weights and pack files.

ASSIST_RECALL|pack=CHAT|query=how do i recover from a bad answer|recall=kb2_term|recall_score=33|t_retrieve_ms=110|answer=Mistake recovery: If an answer is wrong, ask a shorter question, switch packs, or give the exact error.

ASSIST_RECALL|pack=CHAT|query=what proof helps me trust this|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Trust evidence: Trust proof comes from visible files, local weights, reproducible tests, and QEMU or hardware logs.

ASSIST_RECALL|pack=CHAT|query=how should i compare options|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Compare options: Name the options, list one tradeoff for each, then choose the practical next step.

ASSIST_RECALL|pack=CHAT|query=help me plan work in small steps|recall=kb2_term|recall_score=66|t_retrieve_ms=110|answer=Planning work: Break the job into small steps, do the blocking step first, and verify each result.

ASSIST_RECALL|pack=CHAT|query=what should a useful answer look like|recall=kb2_term|recall_score=42|t_retrieve_ms=110|answer=Useful answer: A useful answer should be brief, concrete, honest about limits, and easy to act on.

ASSIST_RECALL|pack=CHAT|query=can you explain something simply|recall=kb2_term|recall_score=24|t_retrieve_ms=50|answer=Simple explanation: Use plain words, one example, and a short answer that fits the prompt.

ASSIST_RECALL|pack=CHAT|query=what can you know without web access|recall=kb2_term|recall_score=51|t_retrieve_ms=60|answer=No web access: Without internet, I cannot fetch news or live facts; use local notes or give the facts in the prompt.

ASSIST_RECALL|pack=CHAT|query=how do i show confidence in an answer|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Answer confidence: Say what is known from local files, what is inferred, and what remains uncertain.

Pack : DOSHELP - DOS Help Assistant

Model: PACKS\DOSHELP\MODEL

Usage: /about

Sprite asset: PACKS\DOSHELP\DOSHELP.SPR

Icon asset : PACKS\DOSHELP\DOSHELP.ICN

ASSIST_PACK|id=DOSHELP|title=DOS Help Assistant|model=PACKS\DOSHELP\MODEL|sprite=PACKS\DOSHELP\DOSHELP.SPR|icons=PACKS\DOSHELP\DOSHELP.ICN



ASSIST_RECALL|pack=DOSHELP|query=what happens before autoexec bat runs|recall=kb2_term|recall_score=57|t_retrieve_ms=110|answer=AUTOEXEC.BAT hygiene: CONFIG.SYS loads drivers first, then AUTOEXEC.BAT runs commands; keep PATH short and trim resident tools.

ASSIST_RECALL|pack=DOSHELP|query=why use 8.3 filenames in batches|recall=kb2_term|recall_score=24|t_retrieve_ms=60|answer=DOS filenames: Use 8.3 filenames for maximum DOS compatibility and predictable batch files.

ASSIST_RECALL|pack=DOSHELP|query=how should i prepare files for real hardware|recall=kb2_term|recall_score=48|t_retrieve_ms=50|answer=Hardware copy: Copy GPT2, MODEL, PACKS, CWSDPMI, and batch files together before testing on real DOS.

ASSIST_RECALL|pack=DOSHELP|query=what should i do when cwsdpmi is missing|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Missing CWSDPMI: If a protected-mode program fails to start, copy CWSDPMI.EXE beside it and rerun the command.

ASSIST_RECALL|pack=DOSHELP|query=how do i mount the dosbox bundle|recall=kb2_term|recall_score=54|t_retrieve_ms=50|answer=DOSBox mount: Mount the bundle directory as C:, change to C:\GPT2, then run the batch file for the desired profile.

ASSIST_RECALL|pack=DOSHELP|query=what if the fat image is full|recall=kb2_term|recall_score=60|t_retrieve_ms=60|answer=FAT image full: Remove host-only training files or grow the disk image when FAT image assembly runs out of space.

ASSIST_RECALL|pack=DOSHELP|query=what logs matter from qemu|recall=kb2_term|recall_score=39|t_retrieve_ms=50|answer=QEMU logs: Capture compile logs, run logs, and copied evidence files before trusting an emulator result.

ASSIST_RECALL|pack=DOSHELP|query=how do i handle a dos memory error|recall=kb2_term|recall_score=57|t_retrieve_ms=60|answer=DOS memory error: Free conventional memory by unloading TSRs, loading drivers high, or using a smaller profile.

ASSIST_RECALL|pack=DOSHELP|query=how should a batch menu work|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Batch menu: Offer numbered choices, validate the input, and keep each branch short and reversible.

Pack : OFFICE - Office Assistant

Model: PACKS\OFFICE\MODEL

Usage: /about

Sprite asset: PACKS\OFFICE\OFFICE.SPR

Icon asset : PACKS\OFFICE\OFFICE.ICN

ASSIST_PACK|id=OFFICE|title=Office Assistant|model=PACKS\OFFICE\MODEL|sprite=PACKS\OFFICE\OFFICE.SPR|icons=PACKS\OFFICE\OFFICE.ICN



ASSIST_RECALL|pack=OFFICE|query=how should i write a handoff note|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Handoff note: Say what is done, what remains, where evidence lives, and who owns the next action.

ASSIST_RECALL|pack=OFFICE|query=what belongs in a bug report|recall=kb2_term|recall_score=36|t_retrieve_ms=0|answer=Bug report shape: Include expected behavior, actual behavior, reproduction steps, logs, and the suspected area.

ASSIST_RECALL|pack=OFFICE|query=make a compact release note|recall=kb2_term|recall_score=36|t_retrieve_ms=110|answer=Release note shape: Lead with what changed, list proof, then state any known limits plainly.

ASSIST_RECALL|pack=OFFICE|query=what should meeting notes capture|recall=kb2_term|recall_score=51|t_retrieve_ms=0|answer=Meeting notes: Capture decisions, owners, dates, open questions, and follow-up actions.

ASSIST_RECALL|pack=OFFICE|query=help me write a project plan|recall=kb2_term|recall_score=36|t_retrieve_ms=110|answer=Project plan: List the goal, milestones, owners, risks, and the next checkpoint.

ASSIST_RECALL|pack=OFFICE|query=how do i track risks|recall=kb2_term|recall_score=24|t_retrieve_ms=60|answer=Risk register: For each risk, record impact, likelihood, mitigation, owner, and review date.

ASSIST_RECALL|pack=OFFICE|query=what is a useful test plan|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Test plan: Define scope, cases, expected results, evidence files, and pass or fail criteria.

ASSIST_RECALL|pack=OFFICE|query=how should i reply to a customer|recall=kb2_term|recall_score=36|t_retrieve_ms=60|answer=Customer reply: Acknowledge the issue, give the current status, state the next action, and avoid overpromising.

ASSIST_RECALL|pack=OFFICE|query=how do i write user docs|recall=kb2_term|recall_score=51|t_retrieve_ms=50|answer=User docs: Write the task goal, prerequisites, exact steps, expected result, and troubleshooting note.

Pack : DEV - Developer Pack

Model: PACKS\CHAT\MODEL

Usage: /about

Sprite asset: PACKS\CHAT\CHAT.SPR

Icon asset : PACKS\CHAT\CHAT.ICN

ASSIST_PACK|id=DEV|title=Developer Pack|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN



ASSIST_RECALL|pack=DEV|query=how can this feel modern on a 486|recall=kb2_term|recall_score=36|t_retrieve_ms=60|answer=Modern 486 LLM path: Use small hot-loaded weights, compact retrieval databases, persistent memory, and short synthesis replies.

ASSIST_RECALL|pack=DEV|query=what does retrieval first mean|recall=kb2_term|recall_score=36|t_retrieve_ms=50|answer=Retrieval first: Answer from KDB, USER notes, memory, and golden rows before asking the small model to synthesize.

ASSIST_RECALL|pack=DEV|query=how do i author a pack|recall=kb2_term|recall_score=39|t_retrieve_ms=60|answer=Pack authoring: Write HELP and KNOW rows, rebuild KDB, run the authoring validator, then run retrieval and QEMU gates.

ASSIST_RECALL|pack=DEV|query=what should i check before release|recall=kb2_term|recall_score=42|t_retrieve_ms=110|answer=Release check: Verify tests, logs, artifact names, checksums, release notes, and the target tag.

ASSIST_RECALL|pack=DEV|query=how should we store fast recall data|recall=kb2_term|recall_score=45|t_retrieve_ms=50|answer=High velocity recall: Compile notes into compact keyword rows so DOS scans less text and reaches the answer faster.

ASSIST_RECALL|pack=DEV|query=what should a failure record include|recall=kb2_term|recall_score=39|t_retrieve_ms=50|answer=Failure record: Record the command, input, expected result, actual result, log path, and next experiment.

Pack : PORTABLE - Portable Intelligence

Model: PACKS\CHAT\MODEL

Usage: /about

Sprite asset: PACKS\CHAT\CHAT.SPR

Icon asset : PACKS\CHAT\CHAT.ICN

ASSIST_PACK|id=PORTABLE|title=Portable Intelligence|model=PACKS\CHAT\MODEL|sprite=PACKS\CHAT\CHAT.SPR|icons=PACKS\CHAT\CHAT.ICN



ASSIST_RECALL|pack=PORTABLE|query=what does portable intelligence mean|recall=kb2_term|recall_score=57|t_retrieve_ms=60|answer=portable meaning: Portable intelligence means small local model weights, retrieval, and memory can run on old machines without a network.

ASSIST_RECALL|pack=PORTABLE|query=why is basic useful for teaching ai|recall=kb2_term|recall_score=57|t_retrieve_ms=50|answer=basic teaching: BASIC is useful for teaching machine intelligence because plain arrays, files, and integer arithmetic make the mechanism inspectable.

ASSIST_RECALL|pack=PORTABLE|query=how could this move to c or assembly|recall=kb2_term|recall_score=15|t_retrieve_ms=60|answer=runtime ports: The same assistant contract can be reimplemented in C, assembly, Eshkol, or calculator BASIC when files, arrays, and loops exist.

ASSIST_RECALL|pack=PORTABLE|query=why do hot swappable weights matter|recall=kb2_term|recall_score=45|t_retrieve_ms=50|answer=domain weight loading: Hot swappable weights load domain behavior into a tiny resident shell without rebuilding the whole runtime.

ASSIST_RECALL|pack=PORTABLE|query=how should tiny machines store recall|recall=kb2_term|recall_score=72|t_retrieve_ms=60|answer=tiny machine recall: Tiny machines should store recall as compact indexed rows so slow processors scan fewer bytes before answering.

ASSIST_RECALL|pack=PORTABLE|query=what proof shows this works on old hardware|recall=kb2_term|recall_score=63|t_retrieve_ms=0|answer=old hardware proof: Proof for old hardware needs local logs, repeatable tests, QEMU or hardware captures, and visible source files.

ASSIST_END|suite=recall-probe|packs=5
Loading