Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,11 @@ KEY=VALUE`, and `/forget` in the shell. `SPRITE=` and `ICONS=` fields are
reserved for Clippy-style
artwork; the current renderer is a text-mode bubble/action UI so it works
without VGA.
Regenerate the evidence-backed capability/functionality report with:

```sh
python3 scripts/build_assistant_capability_report.py
```

Run the non-greedy sampling matrix with:

Expand Down
1 change: 1 addition & 0 deletions docs/releases/v0.1.0-preview.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ python3 scripts/evaluate_assistant_kdb_binary.py
python3 scripts/evaluate_assistant_kdb_term_index.py
python3 scripts/import_assistant_notes.py --self-test
python3 scripts/evaluate_assistant_consistency.py
python3 scripts/build_assistant_capability_report.py
QEMU_TIMEOUT_SECONDS=240 bash qemu/run_assistant_stress_486.sh
python3 scripts/stress_assistant_behavior.py --log qemu/evidence/assistant_stress_486.log
python3 scripts/verify_workspace_tracking.py
Expand Down
52 changes: 25 additions & 27 deletions qemu/evidence/assistant_capability_functionality_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,23 @@
Date: 2026-05-21
Status: `PASS`

This report is generated from repository evidence files by `scripts/build_assistant_capability_report.py`.

## Runtime Capability

- Runs under FreeDOS/QEMU 486 with five assistant packs: `CHAT`, `DOSHELP`, `OFFICE`, `DEV`, and `PORTABLE`.
- Runs under FreeDOS/QEMU 486 with 5 assistant packs: `CHAT`, `DOSHELP`, `OFFICE`, `DEV`, `PORTABLE`.
- Supports hot pack switching through `PACKS.TXT` and each pack's `PACK.INI`.
- Supports pack-local model paths, pack-local art assets, pack-local golden rows, pack-local help/knowledge rows, and editable `USER.TXT` notes.
- Uses retrieval-first answering before model synthesis: golden rows, compiled knowledge recall, session memory, and fallback checks are all explicit in `ASSIST_REPLY`.
- Uses retrieval-first answering before model synthesis: golden rows, compiled knowledge recall, session memory, and fallback checks are explicit in `ASSIST_REPLY`.
- Reports structured provenance and timing for every reply: `source`, `recall`, `recall_score`, `t_retrieve_ms`, `t_golden_ms`, `t_memory_ms`, `t_model_ms`, and `t_total_ms`.
- Interactive shell exposes `/capabilities`, `/limits`, `/sources`, `/status`, `/about`, and `/pack`.
- The answer display now includes a compact source line such as `Source: golden / kb2_term ( 60 ms)`.
- Interactive shell exposes `/capabilities`, `/limits`, `/sources`, `/status`, `/about`, `/pack`, `/memory`, `/remember KEY=VALUE`, and `/forget`.

## Recall And Storage

- Text KDB remains the readable source/fallback format: `KDB.TXT`, `KDBIDX.TXT`, and `KDB?.TXT`.
- New compiled KB2 recall is shipped for each pack: `KB2ALL.BIN`, `KB2IDX.TXT`, `KB2?.BIN`, and `KB2TERM.TXT`.
- Compiled KB2 recall ships for each pack: `KB2ALL.BIN`, `KB2IDX.TXT`, `KB2?.BIN`, and `KB2TERM.TXT`.
- KB2 files use fixed-width records for 486-friendly sequential reads and avoid reparsing large text rows during recall.
- `KB2TERM.TXT` is a compact per-pack inverted term index. The DOS runtime uses it to score likely row IDs first, then falls back to binary buckets and finally text KDB recall.
- `KB2TERM.TXT` is a compact per-pack inverted term index. The DOS runtime scores likely row IDs first, then falls back to binary buckets and finally text KDB recall.
- Current compiled KB2 payload sizes:
- `CHAT`: 78 rows, 23 buckets, 159616 binary bytes, 4280 term-index bytes.
- `DOSHELP`: 26 rows, 21 buckets, 55488 binary bytes, 2193 term-index bytes.
Expand All @@ -27,7 +28,7 @@ Status: `PASS`
- `PORTABLE`: 11 rows, 16 buckets, 23968 binary bytes, 1292 term-index bytes.
- Binary recall evaluation: `PASS 42/42`.
- Binary candidate row scan ratio: `0.531`.
- Binary candidate byte ratio: `0.689`.
- Binary candidate byte ratio: `0.688`.
- Term-index recall evaluation: `PASS 42/42`.
- Term-index candidate row scan ratio: `0.145`.
- Term-index candidate byte ratio: `0.315`.
Expand All @@ -43,19 +44,9 @@ Status: `PASS`
- KDB binary gate: `PASS 42/42`.
- KDB term-index gate: `PASS 42/42`.

Covered categories include:

- General chat, identity, local inference, local limits, offline/no-web behavior, prompt quality, repeated-answer recovery, confidence framing, simple explanation, and lightweight planning.
- Troubleshooting, debugging, release checks, DPMI/CWSDPMI, CONFIG.SYS, AUTOEXEC.BAT, FAT image limits, QEMU logs, and real-hardware copy preparation.
- Rewriting, summarizing, shortening, release notes, status updates, handoff notes, bug reports, meeting notes, risk registers, project plans, customer replies, and user docs.
- Developer-pack guidance for retrieval-first design, authoring packs, fast recall storage, release checks, failure records, and modern 486 assistant architecture.
- Portable-intelligence guidance for BASIC teaching, C/assembly/Eshkol ports,
hot-swappable weights, compact recall, and old-hardware proof.
Covered categories include general chat, identity, local inference, offline limits, prompt repair, repeated-answer recovery, troubleshooting, DOS setup, office writing, developer pack authoring, and portable-intelligence concepts.

Usefulness workflows currently cover operator prompts, trust/offline limits, DOS
setup and repair, hardware transfer and emulator evidence, office handoffs,
planning and risk, developer pack authoring, fast local recall architecture,
and portable intelligence.
Usefulness workflows currently cover operator prompts, trust/offline limits, DOS setup and repair, hardware transfer and emulator evidence, office handoffs, planning and risk, developer pack authoring, fast local recall architecture, and portable intelligence.

## DOS/QEMU Stress Result

Expand All @@ -65,27 +56,33 @@ and portable intelligence.
- Stress source mix: `golden=26 retrieval=16 model=0 fallback=0 memory=8`.
- Average total reply time in the stress report: `134 ms`.
- Average retrieval time in the stress report: `80 ms`.
- Recall modes in the stress report: `kb2_term=46 kb2_bucket=3 none=1`.
- Recall modes in the stress report: `kb2_bucket=3 kb2_term=46 none=1`.
- Visible-answer validation: `PASS`.

## Hardware-Capture Rehearsal

- QEMU rehearses the physical `C:\GPT2\HWVALID.BAT` path before real transfer.
- Hardware-capture rehearsal: `PASS`.
- Hardware-capture assistant stress replies: `50`.
- Hardware-capture stress source mix: `golden=26 retrieval=16 model=0 fallback=0 memory=8`.
- Hardware-capture average total reply time: `28 ms`.
- Hardware-capture average retrieval time: `24 ms`.
- Physical machine capture status: PENDING: no staged physical `hardware_<machine>_manifest.md` capture is present yet.

## Authoring And Import

- `scripts/import_assistant_notes.py` can import ASCII notes into `USER.TXT` or `KNOW.TXT`.
- Import is dry-run by default.
- `--target user` writes machine-local notes without changing bundled pack knowledge.
- `--target know --rebuild-kdb` updates bundled pack knowledge and regenerates KDB/KB2 artifacts.
- `scripts/create_assistant_pack.py` can create a complete lightweight pack
from a folder of ASCII notes, sharing `PACKS\CHAT\MODEL` by default.
- The pack generator writes `PACK.INI`, authoring files, `USER.TXT`,
`USAGE.TXT`, generated KDB buckets, compiled KB2 pages, and `KB2TERM.TXT`.
- `scripts/create_assistant_pack.py` can create a complete lightweight pack from a folder of ASCII notes, sharing `PACKS\CHAT\MODEL` by default.
- The pack generator writes `PACK.INI`, authoring files, `USER.TXT`, `USAGE.TXT`, generated KDB buckets, compiled KB2 pages, and `KB2TERM.TXT`.
- Authoring validator checks required pack files, source rows, generated text KDB, generated binary KDB, and model references.

## Release Payload

- Preview package manifest: `included`.
- Preview release tracked-input gate: `PASS`.
- Preview artifact verifier: `PASS`.
- DOSBox zip unzip test: `PASS`.
- Launch-kit zip unzip test: `PASS`.
- Release sidecar hashes: `PASS`.
- Runtime bundles exclude host-only `TRAIN.TXT` and `TOKBASE.TXT`.

Expand All @@ -96,6 +93,7 @@ and portable intelligence.
- Long, ambiguous, or out-of-domain prompts should be shortened or moved into an appropriate pack.
- No live web, news, package registry, or network lookup is available inside DOS.
- Current 486 stress replies did not require raw model generation; that is intentional for reliability and speed on this hardware class.
- Physical 486-class board evidence is still pending until real hardware returns the `HWVALID.LOG`, `QUAL.LOG`, `PERF.LOG`, `ASSIST.LOG`, `ASTRESS.LOG`, `ASSISTC.LOG`, and `HWNOTES.TXT` set.

## Next Production Targets

Expand Down
2 changes: 1 addition & 1 deletion qemu/evidence/preview_release_manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Generated: `2026-05-12`
Package tree: `gpt2-basic-preview`
Package zip: `gpt2-basic-preview.zip`
Package checksums: `SHA256SUMS.txt`; zip sidecar: `gpt2-basic-preview.zip.sha256`
Package status: `581 files, 119,878,941 bytes`
Package status: `583 files, 119,897,176 bytes`

This is an iterative preview payload. It ships only strict-quality release models and assistant packs; rejected repair attempts and old candidates remain repo evidence only.

Expand Down
Loading