Skip to content

Add I/O mmap loading strategy#83

Merged
gabewillen merged 22 commits into
mainfrom
gsd/issue-60-io-boundary-milestone
May 5, 2026
Merged

Add I/O mmap loading strategy#83
gabewillen merged 22 commits into
mainfrom
gsd/issue-60-io-boundary-milestone

Conversation

@gabewillen

@gabewillen gabewillen commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add the dedicated src/emel/io/mmap Stateforward.SML actor with validation, platform gating, mapped descriptor publication, bounded slot ownership, and explicit release handling
  • wire model/tensor to request and release mmap-backed loads through public events while keeping residency ownership in tensor
  • update maintained tool reporting, docs, snapshots, architecture docs, planning archive, and v1.24 milestone audit

Validation

  • scripts/check_domain_boundaries.sh exit 0
  • node .codex/get-shit-done/bin/gsd-tools.cjs validate consistency passed with one informational warning about active Phase 211 cleanup/archive state
  • v1.24 milestone audit passed: 13/13 requirements, no source-backed maintained-path contradictions
  • Phase 210 closeout evidence: EMEL_QUALITY_GATES_SCOPE=full scripts/quality_gates.sh exit 0 with no benchmark-regression override

Notes

  • left local .atmux/, .claude/, .codex/ runtime files and untracked large tests/models/reference/*.gguf LFS objects out of the PR

Note

Medium Risk
Adds a new memory-mapped I/O strategy and wires it into model/tensor, introducing platform-specific mapping/unmapping paths and new lifecycle states that can affect model loading and resource cleanup across OSes.

Overview
Adds a new src/emel/io/mmap Stateforward.SML strategy actor that validates mmap requests via explicit guard/state transitions, performs platform-selected open/mmap/munmap work behind compile-time gating, and manages a bounded mapping slot pool with an actor-owned event::release_mapping unmap surface.

Extends model/tensor to request and release mmap-backed loads through new public events (e.g., request_mapped_load/release_mapped_load), adds an mmap_resident lifecycle path, and tightens storage bind guards to prevent invalid binds when mmap-resident.

Updates behavior tests, boundary guardrail scripts, architecture mermaid/docs, benchmark/lint snapshots, and planning/milestone archives to document and validate the v1.24 mmap strategy closeout.

Reviewed by Cursor Bugbot for commit 111cd6c. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI review requested due to automatic review settings May 4, 2026 23:25

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e64d62125

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/model/tensor/guards.hpp
Comment thread src/emel/io/mmap/actions.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a dedicated mmap-based I/O loading strategy under src/emel/io/mmap, wires model/tensor to request/release mapped tensor loads via public events (while keeping residency ownership in tensor), and updates tooling, tests, scripts, docs, and planning artifacts to reflect and guard the new ownership split.

Changes:

  • Added emel::io::mmap Stateforward.SML actor (validation, platform gating, slot pool, map/release events, OS-specific mapping implementation).
  • Extended model/tensor public event surface and lifecycle to support mmap-resident tensors and tensor-owned mapped load orchestration.
  • Updated maintained tools, quality gates, domain boundary checks, docs, snapshots, and planning archives for v1.23/v1.24 closeout.

Reviewed changes

Copilot reviewed 151 out of 151 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/paritychecker/parity_engines.cpp Updates mmap-usage reporting to query tensor lifecycle via public capture_tensor_state.
tools/embedded_size/emel_probe/main.cpp Updates mmap-usage reporting to query tensor lifecycle via public capture_tensor_state.
tools/bench/whisper_emel_parity_runner.cpp Switches model file loading to read-into-memory and removes deprecated weights_mapped usage.
tools/bench/generation_bench.cpp Adds io_strategy_unavailable name mapping and updates mmap-usage reporting via tensor lifecycle capture.
tools/bench/diarization/sortformer_fixture.hpp Updates mmap-usage reporting to query tensor lifecycle via public capture_tensor_state.
tests/speech/encoder/whisper/lifecycle_tests.cpp Removes usage of deprecated weights_mapped field.
tests/speech/decoder/whisper/lifecycle_tests.cpp Removes usage of deprecated weights_mapped field.
tests/embeddings/te_fixture_data.hpp Removes usage of deprecated weights_mapped field.
src/emel/text/encoders/sm.hpp Reformats/updates embedded architecture design comment block.
src/emel/model/tensor/events.hpp Adds mmap-resident lifecycle value and public events for requesting/releasing mapped loads; extends effect request fields.
src/emel/model/tensor/errors.hpp Adds tensor-side error categories for mmap integration paths.
src/emel/model/tensor/detail.hpp Adds internal status carriers for mapped load/release flows and stores active tensor extent in storage.
src/emel/model/tensor/context.hpp Adds injected emel::io::mmap::sm* pointer to tensor context for cross-actor dispatch.
src/emel/model/loader/guards.hpp Adds loader guards for IO strategy presence and IO phase outcome classification.
src/emel/model/loader/events.hpp Adds IO loader pointer + strategy policy to load request and introduces IO phase event aggregation types.
src/emel/model/loader/errors.hpp Adds io_strategy_unavailable loader error category.
src/emel/model/data.hpp Removes deprecated weights_mapped field from model data.
src/emel/machines.hpp Publishes top-level machine aliases for IO loader and IO mmap actors.
src/emel/io/sm.hpp Adds emel::io::sm alias for the IO loader boundary machine.
src/emel/io/mmap/events.hpp Defines public mmap map/release events and outcome event payloads.
src/emel/io/mmap/errors.hpp Defines mmap error taxonomy, platform support macro, and mapping/slot constants.
src/emel/io/mmap/detail.hpp Defines mmap per-dispatch runtime status carriers.
src/emel/io/mmap/context.hpp Implements fixed-capacity mapping slot pool context and free-stack.
src/emel/io/mmap/actions.cpp Implements OS-specific open/map/unmap/close primitives for mmap strategy.
src/emel/io/loader/sm.hpp Adds IO loader boundary state machine (strategy router scaffold with explicit unsupported strategy handling).
src/emel/io/loader/guards.hpp Adds IO loader guards for basic request validation and callback presence.
src/emel/io/loader/events.hpp Defines IO loader strategy kinds and public load request/result event payloads.
src/emel/io/loader/errors.hpp Defines IO loader error taxonomy.
src/emel/io/loader/detail.hpp Defines IO loader per-dispatch runtime carrier types.
src/emel/io/loader/context.hpp Adds IO loader action context placeholder.
src/emel/io/loader/actions.hpp Implements IO loader boundary actions for recording/publishing done/error outcomes.
snapshots/quality_gates/timing.txt Updates recorded timing snapshot for quality gates.
snapshots/lint/clang_format.txt Updates clang-format snapshot file list to reflect new/removed files.
snapshots/bench/benchmarks.txt Updates benchmark snapshot output formatting/values for impacted suites.
scripts/test_with_coverage.sh Adds tests/io shard selection and changed-file inference for IO sources.
scripts/quality_gates.sh Adds IO test shard inference and fixes bash array initialization/expansion for benchmark env.
scripts/check_domain_boundaries.sh Adds additional domain boundary checks/regexes for IO strategy scoping and residency ownership.
README.md Documents IO vs tensor vs loader ownership split and references new IO architecture docs.
docs/templates/README.md.j2 Mirrors README ownership split documentation in template.
docs/roadmap.md Updates roadmap text to reflect mmap strategy implementation and deferred strategies.
CMakeLists.txt Adds mmap actions TU and includes new IO tests + IO shard configuration.
.planning/ROADMAP.md Archives v1.23/v1.24 as shipped and updates phase/milestone tracking tables.
.planning/phases/211-phase-verification-artifact-backfill/211-VALIDATION.md Adds Phase 211 validation artifact (backfill documentation phase).
.planning/phases/211-phase-verification-artifact-backfill/211-01-SUMMARY.md Adds Phase 211 summary/frontmatter (backfill documentation phase).
.planning/milestones/v1.24-REQUIREMENTS.md Adds archived v1.24 requirements document.
.planning/milestones/v1.24-phases/210-publication-and-maintained-artifact-updates/210-VERIFICATION.md Adds Phase 210 verification artifact.
.planning/milestones/v1.24-phases/210-publication-and-maintained-artifact-updates/210-VALIDATION.md Adds Phase 210 validation artifact.
.planning/milestones/v1.24-phases/210-publication-and-maintained-artifact-updates/210-SUMMARY.md Adds Phase 210 summary artifact.
.planning/milestones/v1.24-phases/210-publication-and-maintained-artifact-updates/210-CONTEXT.md Adds Phase 210 context artifact.
.planning/milestones/v1.24-phases/210-publication-and-maintained-artifact-updates/210-01-PLAN.md Adds Phase 210 plan artifact.
.planning/milestones/v1.24-phases/209-behavior-tests-and-scope-guardrails/209-VERIFICATION.md Adds Phase 209 verification artifact.
.planning/milestones/v1.24-phases/209-behavior-tests-and-scope-guardrails/209-CONTEXT.md Adds Phase 209 context artifact.
.planning/milestones/v1.24-phases/209-behavior-tests-and-scope-guardrails/209-01-SUMMARY.md Adds Phase 209 summary artifact.
.planning/milestones/v1.24-phases/209-behavior-tests-and-scope-guardrails/209-01-PLAN.md Adds Phase 209 plan artifact.
.planning/milestones/v1.24-phases/208-public-runtime-and-evidence-surfaces/208-VERIFICATION.md Adds Phase 208 verification artifact.
.planning/milestones/v1.24-phases/208-public-runtime-and-evidence-surfaces/208-VALIDATION.md Adds Phase 208 validation artifact.
.planning/milestones/v1.24-phases/208-public-runtime-and-evidence-surfaces/208-CONTEXT.md Adds Phase 208 context artifact.
.planning/milestones/v1.24-phases/208-public-runtime-and-evidence-surfaces/208-01-PLAN.md Adds Phase 208 plan artifact.
.planning/milestones/v1.24-phases/207-tensor-owned-mmap-integration/207-VALIDATION.md Adds Phase 207 validation artifact.
.planning/milestones/v1.24-phases/207-tensor-owned-mmap-integration/207-CONTEXT.md Adds Phase 207 context artifact.
.planning/milestones/v1.24-phases/207-tensor-owned-mmap-integration/207-01-SUMMARY.md Adds Phase 207 summary artifact.
.planning/milestones/v1.24-phases/206-mapped-descriptor-errors-and-lifetime/206-VALIDATION.md Adds Phase 206 validation artifact.
.planning/milestones/v1.24-phases/206-mapped-descriptor-errors-and-lifetime/206-CONTEXT.md Adds Phase 206 context artifact.
.planning/milestones/v1.24-phases/206-mapped-descriptor-errors-and-lifetime/206-01-SUMMARY.md Adds Phase 206 summary artifact.
.planning/milestones/v1.24-phases/205-mmap-validation-platform-gating/205-VERIFICATION.md Adds Phase 205 verification artifact.
.planning/milestones/v1.24-phases/205-mmap-validation-platform-gating/205-VALIDATION.md Adds Phase 205 validation artifact.
.planning/milestones/v1.24-phases/205-mmap-validation-platform-gating/205-CONTEXT.md Adds Phase 205 context artifact.
.planning/milestones/v1.24-phases/205-mmap-validation-platform-gating/205-01-SUMMARY.md Adds Phase 205 summary artifact.
.planning/milestones/v1.24-phases/204-mmap-strategy-component-boundary/204-VERIFICATION.md Adds Phase 204 verification artifact.
.planning/milestones/v1.24-phases/204-mmap-strategy-component-boundary/204-VALIDATION.md Adds Phase 204 validation artifact.
.planning/milestones/v1.24-phases/204-mmap-strategy-component-boundary/204-CONTEXT.md Adds Phase 204 context artifact.
.planning/milestones/v1.24-phases/204-mmap-strategy-component-boundary/204-01-SUMMARY.md Adds Phase 204 summary artifact.
.planning/milestones/v1.24-phases/204-mmap-strategy-component-boundary/204-01-PLAN.md Adds Phase 204 plan artifact.
.planning/milestones/v1.23-REQUIREMENTS.md Adds archived v1.23 requirements document.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-VERIFICATION.md Adds v1.23 Phase 203 verification artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-VALIDATION.md Adds v1.23 Phase 203 validation artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-SUMMARY.md Adds v1.23 Phase 203 summary artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-REVIEW.md Adds v1.23 Phase 203 review artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-REVIEW-FIX.md Adds v1.23 Phase 203 review fix artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-CONTEXT.md Adds v1.23 Phase 203 context artifact.
.planning/milestones/v1.23-phases/203-closeout-state-and-rule-debt-cleanup/203-01-PLAN.md Adds v1.23 Phase 203 plan artifact.
.planning/milestones/v1.23-phases/202-closeout-proof-repair/202-VERIFICATION.md Adds v1.23 Phase 202 verification artifact.
.planning/milestones/v1.23-phases/202-closeout-proof-repair/202-VALIDATION.md Adds v1.23 Phase 202 validation artifact.
.planning/milestones/v1.23-phases/202-closeout-proof-repair/202-CONTEXT.md Adds v1.23 Phase 202 context artifact.
.planning/milestones/v1.23-phases/202-closeout-proof-repair/202-01-SUMMARY.md Adds v1.23 Phase 202 summary artifact.
.planning/milestones/v1.23-phases/202-closeout-proof-repair/202-01-PLAN.md Adds v1.23 Phase 202 plan artifact.
.planning/milestones/v1.23-phases/201-guardrails-docs-and-closeout-proof/201-VERIFICATION.md Adds v1.23 Phase 201 verification artifact (marked superseded).
.planning/milestones/v1.23-phases/201-guardrails-docs-and-closeout-proof/201-VALIDATION.md Adds v1.23 Phase 201 validation artifact (marked superseded).
.planning/milestones/v1.23-phases/201-guardrails-docs-and-closeout-proof/201-CONTEXT.md Adds v1.23 Phase 201 context artifact (marked superseded).
.planning/milestones/v1.23-phases/201-guardrails-docs-and-closeout-proof/201-01-SUMMARY.md Adds v1.23 Phase 201 summary artifact (marked superseded).
.planning/milestones/v1.23-phases/201-guardrails-docs-and-closeout-proof/201-01-PLAN.md Adds v1.23 Phase 201 plan artifact (marked superseded).
.planning/milestones/v1.23-phases/200-loader-and-maintained-lane-integration/200-VERIFICATION.md Adds v1.23 Phase 200 verification artifact.
.planning/milestones/v1.23-phases/200-loader-and-maintained-lane-integration/200-VALIDATION.md Adds v1.23 Phase 200 validation artifact.
.planning/milestones/v1.23-phases/200-loader-and-maintained-lane-integration/200-CONTEXT.md Adds v1.23 Phase 200 context artifact.
.planning/milestones/v1.23-phases/200-loader-and-maintained-lane-integration/200-01-SUMMARY.md Adds v1.23 Phase 200 summary artifact.
.planning/milestones/v1.23-phases/200-loader-and-maintained-lane-integration/200-01-PLAN.md Adds v1.23 Phase 200 plan artifact.
.planning/milestones/v1.23-phases/199-strategy-policy-boundary/199-VERIFICATION.md Adds v1.23 Phase 199 verification artifact.
.planning/milestones/v1.23-phases/199-strategy-policy-boundary/199-VALIDATION.md Adds v1.23 Phase 199 validation artifact.
.planning/milestones/v1.23-phases/199-strategy-policy-boundary/199-CONTEXT.md Adds v1.23 Phase 199 context artifact.
.planning/milestones/v1.23-phases/199-strategy-policy-boundary/199-01-SUMMARY.md Adds v1.23 Phase 199 summary artifact.
.planning/milestones/v1.23-phases/199-strategy-policy-boundary/199-01-PLAN.md Adds v1.23 Phase 199 plan artifact.
.planning/milestones/v1.23-phases/198-tensor-to-i-o-event-contract/198-VERIFICATION.md Adds v1.23 Phase 198 verification artifact.
.planning/milestones/v1.23-phases/198-tensor-to-i-o-event-contract/198-VALIDATION.md Adds v1.23 Phase 198 validation artifact.
.planning/milestones/v1.23-phases/198-tensor-to-i-o-event-contract/198-CONTEXT.md Adds v1.23 Phase 198 context artifact.
.planning/milestones/v1.23-phases/198-tensor-to-i-o-event-contract/198-01-SUMMARY.md Adds v1.23 Phase 198 summary artifact.
.planning/milestones/v1.23-phases/198-tensor-to-i-o-event-contract/198-01-PLAN.md Adds v1.23 Phase 198 plan artifact.
.planning/milestones/v1.23-phases/197-i-o-module-skeleton-and-ownership-contract/197-VERIFICATION.md Adds v1.23 Phase 197 verification artifact.
.planning/milestones/v1.23-phases/197-i-o-module-skeleton-and-ownership-contract/197-VALIDATION.md Adds v1.23 Phase 197 validation artifact.
.planning/milestones/v1.23-phases/197-i-o-module-skeleton-and-ownership-contract/197-CONTEXT.md Adds v1.23 Phase 197 context artifact.
.planning/milestones/v1.23-phases/197-i-o-module-skeleton-and-ownership-contract/197-01-SUMMARY.md Adds v1.23 Phase 197 summary artifact.
.planning/milestones/v1.23-phases/197-i-o-module-skeleton-and-ownership-contract/197-01-PLAN.md Adds v1.23 Phase 197 plan artifact.
.planning/MILESTONES.md Updates milestone ledger entries for v1.23/v1.24 (including Phase 211 backfill record).
.planning/architecture/mermaid/model_loader.mmd Updates generated model_loader diagram to include IO phase routing and io_strategy_unavailable error path.
.planning/architecture/mermaid/io_loader.mmd Adds generated IO loader state diagram snapshot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/check_domain_boundaries.sh Outdated
Comment thread src/emel/io/mmap/errors.hpp
Comment thread src/emel/io/mmap/actions.cpp
Comment thread src/emel/io/mmap/events.hpp
Comment thread src/emel/model/tensor/events.hpp
Comment thread src/emel/text/encoders/sm.hpp
Comment thread .planning/MILESTONES.md

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0774dfb5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/io/mmap/actions.cpp
Comment thread .planning/architecture/mermaid/io_mmap.mmd Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12e8f1fc06

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/model/tensor/sm.hpp Outdated
Comment thread src/emel/io/mmap/context.hpp
Comment thread src/emel/model/tensor/sm.hpp Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70c622b8c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/io/mmap/sm.hpp Outdated
Comment thread src/emel/io/mmap/guards.hpp Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 278ccf58b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/io/mmap/actions.hpp Outdated
Comment thread src/emel/model/tensor/actions.hpp
…y evict of mmap-resident tensors

Closes PR #83 P1 review threads:
- PRRT_kwDORRHzJs5_hhbx (src/emel/io/mmap/actions.hpp:209-225): on unmap
  failure the OS mapping/file descriptor may still be live, so
  effect_mark_unmap_failed_and_release_slot must keep the slot owned
  (in_use/base/mapped_bytes/os_resource intact) and not push the handle
  back onto free_stack. The caller can then retry release and a later
  map_tensor cannot reuse the slot while the prior mapping leaks.
- PRRT_kwDORRHzJs5_hhby (src/emel/model/tensor/guards.hpp): the legacy
  evict_tensor path nulled tensor pointers without releasing the mmap
  mapping, leaking the slot and OS mapping. evict_tensor_request_valid
  now rejects mmap_resident lifecycle so the legacy evict routes to
  errored. Mapped tensors must be released through release_mapped_load.

Failing unit coverage added first per AGENTS.md:
- tests/io/mmap/lifecycle_tests.cpp 'io mmap unmap failure keeps mapping
  slot owned for retry' drives effect_mark_unmap_failed_and_release_slot
  against a constructed live-slot context and asserts in_use stays true,
  base/bytes/os_resource preserved, and the handle does not appear in
  free_stack.
- tests/model/tensor/lifecycle_tests.cpp
  'model_tensor_evict_tensor_rejects_mmap_resident_tensors' brings a
  tensor to mmap_resident via request_mapped_load, dispatches legacy
  evict_tensor, asserts CHECK_FALSE on the dispatch and
  invalid_request error_out, confirms lifecycle/buffer/handle survive,
  and proves the proper release_mapped_load path still works after the
  rejected evict.

snapshots/lint/clang_format.txt baseline refreshed via maintained
scripts/lint_snapshot.sh --update to register the now-clang-formatted
tests/io/mmap/lifecycle_tests.cpp (single-line addition).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ba0155e3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/model/tensor/actions.hpp Outdated
Comment thread src/emel/io/mmap/actions.cpp Outdated
@gabewillen

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

binding::reset_storage_binding(ctx);

P1 Badge Release active mmap tensors before rebinding storage

Calling bind_storage unconditionally resets tensor lifecycle/buffer state, but it never dispatches release_mapped_load/io::mmap::release_mapping for tensors currently in mmap_resident. After this reset, release_mapped_load is blocked by the mmap_resident guard, so any already-issued mapping handles become unreleasable through the tensor actor and the underlying io-mmap slots remain in use until process teardown, eventually exhausting k_max_mappings in long-lived runs that rebind models.


binding::reset_storage_binding(ctx);

P1 Badge Do not clear existing bindings on invalid bind requests

The invalid-bind path also calls reset_storage_binding, so a malformed bind_storage request can drop all current tensor lifecycle state immediately, including mmap_resident bookkeeping, without releasing the live mappings. This creates the same unrecoverable-handle leak as above but on an error path: a rejected bind can silently orphan active mmap slots and prevent later release_mapped_load from succeeding.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gabewillen

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7de907a4ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/emel/model/tensor/sm.hpp
@gabewillen

Copy link
Copy Markdown
Contributor Author

@codex review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 172df14. Configure here.

Comment thread .planning/STATE.md Outdated
Comment thread .planning/STATE.md Outdated
@gabewillen

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gabewillen gabewillen merged commit b483bb2 into main May 5, 2026
1 check passed
gabewillen added a commit that referenced this pull request May 7, 2026
* docs: start milestone v1.23 io boundary

* feat: add io loading boundary

* fix: close io boundary review gaps

* fix: reset tensor loader after io failure

* docs(roadmap): add gap closure phase 211

* docs(211): execute phase 211 verification artifact backfill

* docs(audit): re-audit v1.24 after phase 211 backfill (passed)

* docs(211): add wave field to phase plan frontmatter (linter)

* chore: archive v1.24 milestone files

* chore: remove REQUIREMENTS.md for v1.24 milestone

* feat: add mmap loading strategy

* fix: address io mmap PR review comments

* fix: reject mmap spans beyond eof

* fix: close mmap review follow-ups

* fix: simplify mapped load done transition

* fix: require mmap map completion callback

* fix(io/mmap,model/tensor): retain slot on unmap failure; reject legacy evict of mmap-resident tensors

Closes PR #83 P1 review threads:
- PRRT_kwDORRHzJs5_hhbx (src/emel/io/mmap/actions.hpp:209-225): on unmap
  failure the OS mapping/file descriptor may still be live, so
  effect_mark_unmap_failed_and_release_slot must keep the slot owned
  (in_use/base/mapped_bytes/os_resource intact) and not push the handle
  back onto free_stack. The caller can then retry release and a later
  map_tensor cannot reuse the slot while the prior mapping leaks.
- PRRT_kwDORRHzJs5_hhby (src/emel/model/tensor/guards.hpp): the legacy
  evict_tensor path nulled tensor pointers without releasing the mmap
  mapping, leaking the slot and OS mapping. evict_tensor_request_valid
  now rejects mmap_resident lifecycle so the legacy evict routes to
  errored. Mapped tensors must be released through release_mapped_load.

Failing unit coverage added first per AGENTS.md:
- tests/io/mmap/lifecycle_tests.cpp 'io mmap unmap failure keeps mapping
  slot owned for retry' drives effect_mark_unmap_failed_and_release_slot
  against a constructed live-slot context and asserts in_use stays true,
  base/bytes/os_resource preserved, and the handle does not appear in
  free_stack.
- tests/model/tensor/lifecycle_tests.cpp
  'model_tensor_evict_tensor_rejects_mmap_resident_tensors' brings a
  tensor to mmap_resident via request_mapped_load, dispatches legacy
  evict_tensor, asserts CHECK_FALSE on the dispatch and
  invalid_request error_out, confirms lifecycle/buffer/handle survive,
  and proves the proper release_mapped_load path still works after the
  rejected evict.

snapshots/lint/clang_format.txt baseline refreshed via maintained
scripts/lint_snapshot.sh --update to register the now-clang-formatted
tests/io/mmap/lifecycle_tests.cpp (single-line addition).

* fix: preserve mmap metadata and partial cleanup state

* fix: preserve mmap bindings on storage rebind errors

* fix: reject bind tensor on mmap resident tensors

* fix: align v1.24 planning state counters

* docs: start milestone v1.25 io read loading strategy

* docs(v1.25): scaffold roadmap phase details for phases 212-218

Adds the missing `#### Phase NNN:` detail blocks (Goal, Depends on,
Requirements, Success Criteria, Plans) inside the active v1.25 milestone
section of `.planning/ROADMAP.md`, derived from `.planning/REQUIREMENTS.md`
traceability and the canonical structure used in the v1.24 archived
roadmap. Required so `gsd-tools.cjs roadmap analyze` returns
`phase_count: 7` instead of `0` and `gsd-plan-phase` has goals/criteria
to plan against. Planning-artifact only — no source, test, snapshot,
benchmark, model, or quality-gate behavior changes.

* feat(io): add read strategy boundary actor

* feat(io): add read validation gates

* feat(io): execute read strategy attempts

* test(225-01): add failing io read batch doctests

- Cover public read_tensor_batch copied-byte success and first-error reporting
- Assert ready-state recovery after each batch dispatch

* feat(225-01): implement io read batch copy path

- Add shared tensor_load_span and public read_tensor_batch events
- Route batch validation through explicit guards and copy selected spans in io/read

* docs(225-01): complete io read batch copy plan

- Add Plan 01 execution summary and self-check
- Advance Phase 225 state, roadmap progress, and requirements traceability

* test(225-02): add failing io loader batch doctests

- add public batch success and missing-read-actor failure coverage
- add source guardrail for one read batch child dispatch

* feat(225-02): implement io loader batch read copy route

- add load_tensor_batch request and batch result events
- route one selected read_copy batch through io/read with stack status callbacks
- cover batch success, missing actor, invalid request, and read failure paths

* docs(225-02): complete io loader batch route plan

- add Plan 02 execution summary and self-check
- advance Phase 225 state, roadmap progress, and requirements

* test(225-03): add failing model loader batch doctests

- add read_copy batch success and missing scratch regression coverage
- tighten model-loader io boundary guardrail against per-tensor child dispatch

* feat(225-03): route model loader io through batch dispatch

- add request-owned io load spans and batch result carriers
- replace per-tensor io loader dispatch with one load_tensor_batch request
- keep read_copy used-strategy evidence behind batch success

* docs(225-03): complete model loader batch dispatch plan

- add Plan 03 execution summary and self-check
- advance Phase 225 state and roadmap progress

* feat(225-04): wire maintained callers to io load spans

- add request-lifetime io_load_spans storage in maintained EMEL lanes
- pass batch scratch spans into model loader load requests

* test(225-04): require io load spans in maintained caller guardrails

- assert maintained EMEL lanes provide model-loader io_load_spans scratch
- keep public source loading and no direct io/read reach-through checks

* docs(225-04): complete maintained caller batch scratch plan

- add Plan 04 execution summary and validation evidence
- advance Phase 225 state and roadmap progress

* docs(225-05): update active phase traceability

- keep reopened Phase 225 requirements pending until validation evidence
- remove stale state claims that v1.25 gap closure is already validated

* docs(225-05): correct archived closeout paths

- point v1.25 roadmap closeout artifacts at archived milestone files
- clarify archived requirements versus reopened active requirements

* docs(225-05): complete closeout traceability plan

- add Plan 05 execution summary with validation evidence
- advance state and roadmap progress to Plan 06

* docs(225-06): record current validation evidence

- publish dyld launch blocker output and source-backed fallback
- mark Phase 225 validation compliant from scoped gate evidence

* docs(225-06): publish closeout audit evidence

- align active and archived v1.25 audits with Phase 225 evidence
- summarize dyld fallback and validation outcomes

* docs(225-06): complete validation publication plan

- update state, roadmap, and requirements after Plan 06 evidence
- record final task hashes and self-check in summary

* fix(225): CR-01 guard missing parse callback

* fix(225): WR-01 classify unknown read errors

* style(225): format read error regression test

* style(225): prefix model loader parse guard

* chore: archive v1.25 milestone

* chore(v1.25): close read loading audit cleanup

* docs(state): record v1.25 PR

* chore: refresh quality gate timing after merge

* fix: address PR review comments

* fix: address read-copy review feedback

* fix: address benchmark and source review feedback

* fix: require explicit tensor read targets
gabewillen added a commit that referenced this pull request May 9, 2026
* docs: start milestone v1.23 io boundary

* feat: add io loading boundary

* fix: close io boundary review gaps

* fix: reset tensor loader after io failure

* docs(roadmap): add gap closure phase 211

* docs(211): execute phase 211 verification artifact backfill

* docs(audit): re-audit v1.24 after phase 211 backfill (passed)

* docs(211): add wave field to phase plan frontmatter (linter)

* chore: archive v1.24 milestone files

* chore: remove REQUIREMENTS.md for v1.24 milestone

* feat: add mmap loading strategy

* fix: address io mmap PR review comments

* fix: reject mmap spans beyond eof

* fix: close mmap review follow-ups

* fix: simplify mapped load done transition

* fix: require mmap map completion callback

* fix(io/mmap,model/tensor): retain slot on unmap failure; reject legacy evict of mmap-resident tensors

Closes PR #83 P1 review threads:
- PRRT_kwDORRHzJs5_hhbx (src/emel/io/mmap/actions.hpp:209-225): on unmap
  failure the OS mapping/file descriptor may still be live, so
  effect_mark_unmap_failed_and_release_slot must keep the slot owned
  (in_use/base/mapped_bytes/os_resource intact) and not push the handle
  back onto free_stack. The caller can then retry release and a later
  map_tensor cannot reuse the slot while the prior mapping leaks.
- PRRT_kwDORRHzJs5_hhby (src/emel/model/tensor/guards.hpp): the legacy
  evict_tensor path nulled tensor pointers without releasing the mmap
  mapping, leaking the slot and OS mapping. evict_tensor_request_valid
  now rejects mmap_resident lifecycle so the legacy evict routes to
  errored. Mapped tensors must be released through release_mapped_load.

Failing unit coverage added first per AGENTS.md:
- tests/io/mmap/lifecycle_tests.cpp 'io mmap unmap failure keeps mapping
  slot owned for retry' drives effect_mark_unmap_failed_and_release_slot
  against a constructed live-slot context and asserts in_use stays true,
  base/bytes/os_resource preserved, and the handle does not appear in
  free_stack.
- tests/model/tensor/lifecycle_tests.cpp
  'model_tensor_evict_tensor_rejects_mmap_resident_tensors' brings a
  tensor to mmap_resident via request_mapped_load, dispatches legacy
  evict_tensor, asserts CHECK_FALSE on the dispatch and
  invalid_request error_out, confirms lifecycle/buffer/handle survive,
  and proves the proper release_mapped_load path still works after the
  rejected evict.

snapshots/lint/clang_format.txt baseline refreshed via maintained
scripts/lint_snapshot.sh --update to register the now-clang-formatted
tests/io/mmap/lifecycle_tests.cpp (single-line addition).

* fix: preserve mmap metadata and partial cleanup state

* fix: preserve mmap bindings on storage rebind errors

* fix: reject bind tensor on mmap resident tensors

* fix: align v1.24 planning state counters

* docs: start milestone v1.25 io read loading strategy

* docs(v1.25): scaffold roadmap phase details for phases 212-218

Adds the missing `#### Phase NNN:` detail blocks (Goal, Depends on,
Requirements, Success Criteria, Plans) inside the active v1.25 milestone
section of `.planning/ROADMAP.md`, derived from `.planning/REQUIREMENTS.md`
traceability and the canonical structure used in the v1.24 archived
roadmap. Required so `gsd-tools.cjs roadmap analyze` returns
`phase_count: 7` instead of `0` and `gsd-plan-phase` has goals/criteria
to plan against. Planning-artifact only — no source, test, snapshot,
benchmark, model, or quality-gate behavior changes.

* feat(io): add read strategy boundary actor

* feat(io): add read validation gates

* feat(io): execute read strategy attempts

* test(225-01): add failing io read batch doctests

- Cover public read_tensor_batch copied-byte success and first-error reporting
- Assert ready-state recovery after each batch dispatch

* feat(225-01): implement io read batch copy path

- Add shared tensor_load_span and public read_tensor_batch events
- Route batch validation through explicit guards and copy selected spans in io/read

* docs(225-01): complete io read batch copy plan

- Add Plan 01 execution summary and self-check
- Advance Phase 225 state, roadmap progress, and requirements traceability

* test(225-02): add failing io loader batch doctests

- add public batch success and missing-read-actor failure coverage
- add source guardrail for one read batch child dispatch

* feat(225-02): implement io loader batch read copy route

- add load_tensor_batch request and batch result events
- route one selected read_copy batch through io/read with stack status callbacks
- cover batch success, missing actor, invalid request, and read failure paths

* docs(225-02): complete io loader batch route plan

- add Plan 02 execution summary and self-check
- advance Phase 225 state, roadmap progress, and requirements

* test(225-03): add failing model loader batch doctests

- add read_copy batch success and missing scratch regression coverage
- tighten model-loader io boundary guardrail against per-tensor child dispatch

* feat(225-03): route model loader io through batch dispatch

- add request-owned io load spans and batch result carriers
- replace per-tensor io loader dispatch with one load_tensor_batch request
- keep read_copy used-strategy evidence behind batch success

* docs(225-03): complete model loader batch dispatch plan

- add Plan 03 execution summary and self-check
- advance Phase 225 state and roadmap progress

* feat(225-04): wire maintained callers to io load spans

- add request-lifetime io_load_spans storage in maintained EMEL lanes
- pass batch scratch spans into model loader load requests

* test(225-04): require io load spans in maintained caller guardrails

- assert maintained EMEL lanes provide model-loader io_load_spans scratch
- keep public source loading and no direct io/read reach-through checks

* docs(225-04): complete maintained caller batch scratch plan

- add Plan 04 execution summary and validation evidence
- advance Phase 225 state and roadmap progress

* docs(225-05): update active phase traceability

- keep reopened Phase 225 requirements pending until validation evidence
- remove stale state claims that v1.25 gap closure is already validated

* docs(225-05): correct archived closeout paths

- point v1.25 roadmap closeout artifacts at archived milestone files
- clarify archived requirements versus reopened active requirements

* docs(225-05): complete closeout traceability plan

- add Plan 05 execution summary with validation evidence
- advance state and roadmap progress to Plan 06

* docs(225-06): record current validation evidence

- publish dyld launch blocker output and source-backed fallback
- mark Phase 225 validation compliant from scoped gate evidence

* docs(225-06): publish closeout audit evidence

- align active and archived v1.25 audits with Phase 225 evidence
- summarize dyld fallback and validation outcomes

* docs(225-06): complete validation publication plan

- update state, roadmap, and requirements after Plan 06 evidence
- record final task hashes and self-check in summary

* fix(225): CR-01 guard missing parse callback

* fix(225): WR-01 classify unknown read errors

* style(225): format read error regression test

* style(225): prefix model loader parse guard

* chore: archive v1.25 milestone

* chore(v1.25): close read loading audit cleanup

* docs(state): record v1.25 PR

* chore: refresh quality gate timing after merge

* fix: address PR review comments

* fix: address read-copy review feedback

* fix: address benchmark and source review feedback

* fix: require explicit tensor read targets

* Add staged read loading strategy

Implements the bounded staged-read loading path through public I/O and tensor loader surfaces, with source-backed tests, guardrails, snapshots, and milestone audit evidence.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Remove local agent artifacts

Keep workspace-local agent cache and Cursor rule files out of the milestone PR payload.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix staged read PR review feedback

Preserve accurate staged batch failure indexes, route staged chunk sizing through policy instead of hard-coded tensor sizes, and restore representative quality-gate timing evidence.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants