Skip to content

fix(uffs-core): drop per-row format!() allocations in path-resolver + result-conversion hot paths (Phase 6d, refs #193)#283

Merged
githubrobbi merged 2 commits into
mainfrom
chore/phase-6d-format-cleanup
May 19, 2026
Merged

fix(uffs-core): drop per-row format!() allocations in path-resolver + result-conversion hot paths (Phase 6d, refs #193)#283
githubrobbi merged 2 commits into
mainfrom
chore/phase-6d-format-cleanup

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

Phase 6 sub-phase 6d deliverable per phase_6_ownership_borrowing_allocation_implementation_plan.md §5d + §3.4 (local plan, gitignored under /docs/dev/).

What this changes

Two surgical hot-path category-δ fixes inside uffs-core:

Site File Refactor
Per-row path assembly crates/uffs-core/src/path_resolver/fast.rs:add_path_column_with_dir_suffix Replace two per-row format!("{parent_path}{file_name}") / format!("{parent_path}\\{file_name}") with in-place push_str on the already-owned parent_path: String
Per-row drive label crates/uffs-core/src/search/dataframe_convert.rs:display_rows_to_dataframe Replace per-row format!("{}:", row.drive) (Vec<String>) with a static [&str; 26] table indexed by DriveLetter::alphabet_index() (Vec<&'static str>)

Both run inside per-row map closures over search-result rows; each row previously incurred an unconditional small String allocation that the new code avoids.

Why this is the right shape

Phase 6d — hot-path format! / to_string audit — sweeps uffs-mft::io::parser, uffs-mft::parse, uffs-core::search, uffs-core::path_resolver, and uffs-core::aggregate for the per-record / per-row formatting hot spots flagged in plan §1.2 (HIGH priority).

The audit produces a small surgical-fix surface:

  • uffs-mft::io::parserzero format! / .to_string() calls in the parse hot path (columns.rs, merger.rs, zero_alloc.rs, dataframe_build.rs). The 205 format!() count from the 6a baseline lives in cold paths (parse-error variants, log messages, USN status, IOCP error context). Plan §0.2 risk-register row predicted this.
  • uffs-core::path_resolver::fast.rs — two per-row format! calls in add_path_column_with_dir_suffix, fixed by this PR.
  • uffs-core::search::dataframe_convert.rs — one per-row format! for drive label, fixed by this PR.
  • uffs-core::path_resolver::legacy.rs:104format!("{}:\\{}", drive, components.join("\\")) is irreducible (must allocate one owned String result per resolve); cat-β KEEP.
  • uffs-core::search::query::mod.rs, backend.rspattern.to_owned() / needle.to_owned() sites are once-per-query (query prep), cat-β KEEP. Candidate for §6e Cow<'_, str> expansion.
  • uffs-core::aggregate::finalize.rs:675-705format_field called per (bucket × sample size), not per record. Aggregation framework only samples top-K records per bucket. Marginal cat-γ, KEEP.
  • <UNKNOWN_0x...> / <DELETED:...> / <EXT:...> / <CORRUPT:...> placeholders in parse/{direct_index,direct_index_extension,forensic/base,forensic/extension,forensic,full,placeholders}.rs — conditional formatting that fires only on unknown / deleted / extension / corrupt / placeholder records. None fire on every record in the parse hot path. Cat-γ KEEP.

Verification

  • just lint-pre-push ✅ (96s): every gate green — fmt, file-size, gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift, commit-subjects, vet, vet-audit-discipline, machete, typos, reuse, cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, deny, lint-ci-windows.
  • cargo nextest run -p uffs-core ✅ 817 passed, 3 skipped, 0 failed — same numbers as pre-fix.
  • Affected test paths exercised: add_path_column_with_dir_suffix_*, add_path_column_*, add_path_column_parallel_* (path-resolver suite); display_rows_to_dataframe_*, results_to_dataframe_* (result-conversion suite). All pass — observable behavior is byte-identical.

Behavior & contracts

  • Public API unchanged: add_path_column_with_dir_suffix and display_rows_to_dataframe keep their existing signatures and return types.
  • Observable behavior preserved:
    • Paths render with the same separator handling (test corpus covers parent_path.ends_with('\\') true and false branches, plus the file_name == "." root-of-volume case).
    • Drive labels render exactly as <letter>: for A..Z — the previous format! produced identical output for the construction-valid range.
  • For the unreachable fallback path (a hypothetical out-of-range DriveLetter, which cannot occur per parse() invariants), the new code yields "?:" versus the previous code's <letter>:. Since this branch is unreachable by construction, the user-visible difference is zero.

Strict-lint posture

  • No suppression hacks introduced. Both edits replace formatting allocations with idiomatic Rust patterns (push_str mutation; static-table lookup via get().copied()).
  • The .unwrap_or("?:") fallback in display_rows_to_dataframe satisfies clippy::indexing_slicing = "deny" without needing a per-site #[expect].
  • Cross-references to Phase 6d sub-phase added in inline doc comments.

Allocation savings

Per N-row search result (typical N = 1k–100k):

  • Site 1 (add_path_column_with_dir_suffix): N small String allocations + N drops (replaced by N in-place push_str operations that reuse the parent_path buffer).
  • Site 2 (display_rows_to_dataframe): N tiny (2-byte) String allocations + N drops + 1 Vec<String> allocation (replaced by 1 Vec<&str> allocation; labels live in rodata).

Wall-clock impact will be quantified by sub-phase 6g (bench refresh on mft_read, query, search_benchmarks).

refs #193

…paths (Phase 6d)

Two surgical hot-path category-δ fixes per
`docs/dev/architecture/code_clean/phase_6_ownership_borrowing_allocation_implementation_plan.md`
§3.4.  Both run inside per-row `map` closures over search result
rows; each row previously incurred an unconditional small `String`
allocation that the new code avoids.

1) `crates/uffs-core/src/path_resolver/fast.rs:add_path_column_with_dir_suffix`
   ----------------------------------------------------------------

   Replaced the per-row `format!("{parent_path}\\{file_name}")` /
   `format!("{parent_path}{file_name}")` pair with an in-place
   mutation of the already-owned `parent_path: String` (the result
   of `self.resolve(frs_val)`).  The new shape:

       let mut path = parent_path;
       if file_name != "." {
           if !path.ends_with('\\') {
               path.push('\\');
           }
           path.push_str(file_name);
       }

   reuses `parent_path`'s heap buffer (`push_str` grows in-place when
   capacity permits) instead of allocating a fresh `String` and
   dropping the old one each row.  Typical Win32 paths have ample
   headroom on the resolver's String capacity so re-allocation is
   rare in practice.

2) `crates/uffs-core/src/search/dataframe_convert.rs:display_rows_to_dataframe`
   ----------------------------------------------------------------

   Replaced `let drives: Vec<String> = rows.iter().map(|row|
   format!("{}:", row.drive)).collect()` with a `Vec<&'static str>`
   sourced from a 26-entry const table indexed by
   `DriveLetter::alphabet_index`:

       const DRIVE_LABELS: [&str; 26] = ["A:", "B:", ..., "Z:"];

       let drives: Vec<&str> = rows.iter()
           .map(|row| {
               DRIVE_LABELS
                   .get(row.drive.alphabet_index())
                   .copied()
                   .unwrap_or("?:")
           })
           .collect();

   Each label is a `&'static str` carved into rodata; the `Vec<&str>`
   carries no heap copies.  Polars'/Arrow's column-builder still
   copies into a contiguous string buffer downstream, but the upstream
   per-row `String` allocation + free pair is eliminated.  The new
   pattern matches the existing `Vec<&str>` shape already used for
   the `name`, `path`, and `path_only` columns three lines above.

   The `.unwrap_or("?:")` fallback is unreachable in practice
   (`DriveLetter::alphabet_index` returns 0..=25 by construction —
   parse() validates A-Z) but satisfies the workspace
   `clippy::indexing_slicing = "deny"` lint without needing a
   per-site `#[expect]`.

Verification
------------

- `cargo check -p uffs-core` ✅
- `cargo clippy -p uffs-core --all-targets` ✅ (no new lint findings)
- `cargo nextest run -p uffs-core` ✅ 817/817 passed, 3 skipped
  (same numbers as pre-fix; identical behavior)
- All path-resolver tests (`add_path_column_with_dir_suffix_*`,
  `add_path_column_*`, `add_path_column_parallel_*`) and result
  conversion tests (`display_rows_to_dataframe_*`,
  `results_to_dataframe_*`) pass — observable behavior is
  byte-identical.

Strict-lint posture
-------------------

No suppression hacks introduced.  Both edits replace formatting
allocations with idiomatic Rust patterns (`push_str` mutation;
static-table lookup via `get().copied()`).  Cross-references to
Phase 6d sub-phase added in inline doc comments.

Behavior & contracts
--------------------

- Public API unchanged: `add_path_column_with_dir_suffix` and
  `display_rows_to_dataframe` keep their existing signatures and
  return types.
- Observable behavior preserved: paths render with the same
  separator handling, drive labels render exactly as `<letter>:` for
  A..Z (the previous `format!` would produce identical output for
  the construction-valid range).
- For the fallback path (a hypothetical out-of-range
  `DriveLetter`, which cannot occur per `parse()` invariants), the
  new code yields `"?:"` versus the previous code's `<letter>:`.
  Since this branch is unreachable, the user-visible difference is
  zero.

Allocation savings
------------------

Per N-row search result (typical N = 1k-100k):
- Site 1: N small `String` allocations + N drops (replaced by N
  in-place `push_str` operations that reuse the `parent_path`
  buffer).
- Site 2: N tiny (2-byte) `String` allocations + N drops + 1
  `Vec<String>` allocation (replaced by 1 `Vec<&str>` allocation;
  the labels live in rodata).

Wall-clock impact will be quantified by sub-phase 6g (bench
refresh on `mft_read`, `query`, `search_benchmarks`).

refs #193
@githubrobbi githubrobbi enabled auto-merge (squash) May 19, 2026 14:51
@githubrobbi githubrobbi merged commit fcea061 into main May 19, 2026
26 checks passed
@githubrobbi githubrobbi deleted the chore/phase-6d-format-cleanup branch May 19, 2026 15:06
githubrobbi added a commit that referenced this pull request May 19, 2026
…s-link (Phase 6f, refs #193) (#285)

Adds the workspace allocation contract as a standalone document at
`docs/architecture/code-quality/allocation_policy.md` and a §Allocation
policy section in `CONTRIBUTING.md` summarising the rule and linking
to the new doc.

Mirrors the Phase 5e `panic_policy.md` pattern (PR #278): a single
contract page captures the rule, the five-category decision tree
(α / β / γ / δ / ε), the lint posture, the per-site annotation
contract, the audit cadence, the workspace cross-references, and the
append-only decisions log.

Phase 6 sub-phase 6f deliverable per the phase-6 ownership /
borrowing / allocation plan §5f.

Contents
--------

The new `allocation_policy.md` documents:

1. The one-line rule: hot paths never allocate defensively; cold
   paths may; every prod `.clone()` / `format!()` / `to_owned()`
   must fit one of five blessed categories.
2. The five categories with patterns, verdicts, and worked
   examples from the workspace:
   - **α — Arc clone**:    KEEP (explicit `Arc::clone(&x)` form)
   - **β — Ownership fence**: KEEP (caller has `&T`, API needs `T`)
   - **γ — Error / log context**: KEEP (cold path)
   - **δ — Hot-path anti-pattern**: FIX (refactor, never suppress)
   - **ε — Test helper**:   KEEP (out of scope)
3. The lint posture: 5 clone-family lints at `deny` level
   (`redundant_clone`, `clone_on_ref_ptr`, `cloned_instead_of_copied`,
   `inefficient_to_string`, `unnecessary_to_owned`) plus the 7
   companion lints at `warn`.
4. The per-site annotation contract: every prod allocation site
   must satisfy one of three justification shapes (self-evident
   α, β/γ reason text, or δ refactor comment).
5. The audit cadence: `bash scripts/dev/clone_alloc_audit.sh` (the
   helper shipped in PR #281) produces the workspace inventory;
   absolute counts must not regress.
6. Workspace cross-references between this doc, `panic_policy.md`,
   `lint-posture.md`, `Cargo.toml [workspace.lints]`, and
   `clippy.toml`.
7. Append-only decisions log capturing every cat-δ refactor that
   has landed in Phase 6 so future contributors can trace the
   policy's evolution.

`CONTRIBUTING.md` gains a §Allocation policy section directly below
the existing §Panic policy section, summarising the rule + five
categories in roughly 200 words and linking to the full doc.

Verification
------------

- `just lint-pre-push` ✅ (95s): every gate green (fmt, file-size,
  gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift,
  commit-subjects, vet, vet-audit-discipline, machete, typos, reuse,
  cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests,
  tests, smoke, deny, lint-ci-windows).
- Markdown links validated by the `typos` + `reuse` gates plus a
  manual check against `docs/architecture/code-quality/` (existing
  `panic_policy.md` and `lint-posture.md` cross-links resolve).
- Doc structure mirrors `panic_policy.md` 1:1 so contributors who
  know one know the other.

Behavior & contracts
--------------------

- No code change.  Documentation-only.
- The lint posture described in §2 is already in force as of
  Phase 6a's `Cargo.toml` (5 deny-level + 7 warn-level
  clone-family lints).  This commit codifies the *contract* behind
  the lints, not their values.
- The decisions log in §7 records PRs #281 / #282 / #283 / #284 as
  the cat-δ refactors that have landed; future entries follow the
  same append-only convention as `panic_policy.md` §7.

Strict-lint posture
-------------------

No code change; lint posture unchanged.

refs #193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant