Skip to content

Add assistant recall benchmark gate#30

Merged
tsotchke merged 1 commit into
mainfrom
assistant-recall-benchmark
May 21, 2026
Merged

Add assistant recall benchmark gate#30
tsotchke merged 1 commit into
mainfrom
assistant-recall-benchmark

Conversation

@tsotchke

Copy link
Copy Markdown
Owner

Summary

  • add ASSIST.EXE --recall-probe for retrieval-only KB2/KDB timing across all shipped packs
  • add a host benchmark parser/report and wire it into assistant, hardware-capture, capability-report, and preview-release gates
  • refresh QEMU hardware-capture evidence with ARECALL.LOG and a recall benchmark report

Verification

  • .venv-torch311/bin/python -m unittest discover tests
  • QEMU_TIMEOUT_SECONDS=300 bash qemu/run_assistant_recall_486.sh
  • QEMU_TIMEOUT_SECONDS=420 bash qemu/run_hardware_capture_486.sh
  • python3 scripts/verify_assistant_packs.py
  • python3 scripts/verify_hardware_capture.py --capture-dir qemu/evidence/hardware_capture_486_qemu
  • .venv-torch311/bin/python scripts/build_preview_release.py --self-test
  • .venv-torch311/bin/python scripts/verify_preview_artifacts.py
  • LC_ALL=C LANG=C shasum -a 256 -c release sidecars

Evidence

  • standalone QEMU recall: 42/42, avg 61 ms, max 110 ms, recall modes kb2_term=42
  • hardware-capture rehearsal recall: 42/42, avg 82 ms, max 170 ms, recall modes kb2_term=42

@tsotchke tsotchke merged commit 952a399 into main May 21, 2026
1 check passed
@tsotchke tsotchke deleted the assistant-recall-benchmark branch May 21, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant