fix(uffs-core): drop per-row format!() allocations in path-resolver + result-conversion hot paths (Phase 6d, refs #193)#283
Merged
Conversation
…paths (Phase 6d)
Two surgical hot-path category-δ fixes per
`docs/dev/architecture/code_clean/phase_6_ownership_borrowing_allocation_implementation_plan.md`
§3.4. Both run inside per-row `map` closures over search result
rows; each row previously incurred an unconditional small `String`
allocation that the new code avoids.
1) `crates/uffs-core/src/path_resolver/fast.rs:add_path_column_with_dir_suffix`
----------------------------------------------------------------
Replaced the per-row `format!("{parent_path}\\{file_name}")` /
`format!("{parent_path}{file_name}")` pair with an in-place
mutation of the already-owned `parent_path: String` (the result
of `self.resolve(frs_val)`). The new shape:
let mut path = parent_path;
if file_name != "." {
if !path.ends_with('\\') {
path.push('\\');
}
path.push_str(file_name);
}
reuses `parent_path`'s heap buffer (`push_str` grows in-place when
capacity permits) instead of allocating a fresh `String` and
dropping the old one each row. Typical Win32 paths have ample
headroom on the resolver's String capacity so re-allocation is
rare in practice.
2) `crates/uffs-core/src/search/dataframe_convert.rs:display_rows_to_dataframe`
----------------------------------------------------------------
Replaced `let drives: Vec<String> = rows.iter().map(|row|
format!("{}:", row.drive)).collect()` with a `Vec<&'static str>`
sourced from a 26-entry const table indexed by
`DriveLetter::alphabet_index`:
const DRIVE_LABELS: [&str; 26] = ["A:", "B:", ..., "Z:"];
let drives: Vec<&str> = rows.iter()
.map(|row| {
DRIVE_LABELS
.get(row.drive.alphabet_index())
.copied()
.unwrap_or("?:")
})
.collect();
Each label is a `&'static str` carved into rodata; the `Vec<&str>`
carries no heap copies. Polars'/Arrow's column-builder still
copies into a contiguous string buffer downstream, but the upstream
per-row `String` allocation + free pair is eliminated. The new
pattern matches the existing `Vec<&str>` shape already used for
the `name`, `path`, and `path_only` columns three lines above.
The `.unwrap_or("?:")` fallback is unreachable in practice
(`DriveLetter::alphabet_index` returns 0..=25 by construction —
parse() validates A-Z) but satisfies the workspace
`clippy::indexing_slicing = "deny"` lint without needing a
per-site `#[expect]`.
Verification
------------
- `cargo check -p uffs-core` ✅
- `cargo clippy -p uffs-core --all-targets` ✅ (no new lint findings)
- `cargo nextest run -p uffs-core` ✅ 817/817 passed, 3 skipped
(same numbers as pre-fix; identical behavior)
- All path-resolver tests (`add_path_column_with_dir_suffix_*`,
`add_path_column_*`, `add_path_column_parallel_*`) and result
conversion tests (`display_rows_to_dataframe_*`,
`results_to_dataframe_*`) pass — observable behavior is
byte-identical.
Strict-lint posture
-------------------
No suppression hacks introduced. Both edits replace formatting
allocations with idiomatic Rust patterns (`push_str` mutation;
static-table lookup via `get().copied()`). Cross-references to
Phase 6d sub-phase added in inline doc comments.
Behavior & contracts
--------------------
- Public API unchanged: `add_path_column_with_dir_suffix` and
`display_rows_to_dataframe` keep their existing signatures and
return types.
- Observable behavior preserved: paths render with the same
separator handling, drive labels render exactly as `<letter>:` for
A..Z (the previous `format!` would produce identical output for
the construction-valid range).
- For the fallback path (a hypothetical out-of-range
`DriveLetter`, which cannot occur per `parse()` invariants), the
new code yields `"?:"` versus the previous code's `<letter>:`.
Since this branch is unreachable, the user-visible difference is
zero.
Allocation savings
------------------
Per N-row search result (typical N = 1k-100k):
- Site 1: N small `String` allocations + N drops (replaced by N
in-place `push_str` operations that reuse the `parent_path`
buffer).
- Site 2: N tiny (2-byte) `String` allocations + N drops + 1
`Vec<String>` allocation (replaced by 1 `Vec<&str>` allocation;
the labels live in rodata).
Wall-clock impact will be quantified by sub-phase 6g (bench
refresh on `mft_read`, `query`, `search_benchmarks`).
refs #193
githubrobbi
added a commit
that referenced
this pull request
May 19, 2026
…s-link (Phase 6f, refs #193) (#285) Adds the workspace allocation contract as a standalone document at `docs/architecture/code-quality/allocation_policy.md` and a §Allocation policy section in `CONTRIBUTING.md` summarising the rule and linking to the new doc. Mirrors the Phase 5e `panic_policy.md` pattern (PR #278): a single contract page captures the rule, the five-category decision tree (α / β / γ / δ / ε), the lint posture, the per-site annotation contract, the audit cadence, the workspace cross-references, and the append-only decisions log. Phase 6 sub-phase 6f deliverable per the phase-6 ownership / borrowing / allocation plan §5f. Contents -------- The new `allocation_policy.md` documents: 1. The one-line rule: hot paths never allocate defensively; cold paths may; every prod `.clone()` / `format!()` / `to_owned()` must fit one of five blessed categories. 2. The five categories with patterns, verdicts, and worked examples from the workspace: - **α — Arc clone**: KEEP (explicit `Arc::clone(&x)` form) - **β — Ownership fence**: KEEP (caller has `&T`, API needs `T`) - **γ — Error / log context**: KEEP (cold path) - **δ — Hot-path anti-pattern**: FIX (refactor, never suppress) - **ε — Test helper**: KEEP (out of scope) 3. The lint posture: 5 clone-family lints at `deny` level (`redundant_clone`, `clone_on_ref_ptr`, `cloned_instead_of_copied`, `inefficient_to_string`, `unnecessary_to_owned`) plus the 7 companion lints at `warn`. 4. The per-site annotation contract: every prod allocation site must satisfy one of three justification shapes (self-evident α, β/γ reason text, or δ refactor comment). 5. The audit cadence: `bash scripts/dev/clone_alloc_audit.sh` (the helper shipped in PR #281) produces the workspace inventory; absolute counts must not regress. 6. Workspace cross-references between this doc, `panic_policy.md`, `lint-posture.md`, `Cargo.toml [workspace.lints]`, and `clippy.toml`. 7. Append-only decisions log capturing every cat-δ refactor that has landed in Phase 6 so future contributors can trace the policy's evolution. `CONTRIBUTING.md` gains a §Allocation policy section directly below the existing §Panic policy section, summarising the rule + five categories in roughly 200 words and linking to the full doc. Verification ------------ - `just lint-pre-push` ✅ (95s): every gate green (fmt, file-size, gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift, commit-subjects, vet, vet-audit-discipline, machete, typos, reuse, cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, deny, lint-ci-windows). - Markdown links validated by the `typos` + `reuse` gates plus a manual check against `docs/architecture/code-quality/` (existing `panic_policy.md` and `lint-posture.md` cross-links resolve). - Doc structure mirrors `panic_policy.md` 1:1 so contributors who know one know the other. Behavior & contracts -------------------- - No code change. Documentation-only. - The lint posture described in §2 is already in force as of Phase 6a's `Cargo.toml` (5 deny-level + 7 warn-level clone-family lints). This commit codifies the *contract* behind the lints, not their values. - The decisions log in §7 records PRs #281 / #282 / #283 / #284 as the cat-δ refactors that have landed; future entries follow the same append-only convention as `panic_policy.md` §7. Strict-lint posture ------------------- No code change; lint posture unchanged. refs #193
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 6 sub-phase 6d deliverable per
phase_6_ownership_borrowing_allocation_implementation_plan.md§5d + §3.4 (local plan, gitignored under/docs/dev/).What this changes
Two surgical hot-path category-δ fixes inside
uffs-core:crates/uffs-core/src/path_resolver/fast.rs:add_path_column_with_dir_suffixformat!("{parent_path}{file_name}")/format!("{parent_path}\\{file_name}")with in-placepush_stron the already-ownedparent_path: Stringcrates/uffs-core/src/search/dataframe_convert.rs:display_rows_to_dataframeformat!("{}:", row.drive)(Vec<String>) with a static[&str; 26]table indexed byDriveLetter::alphabet_index()(Vec<&'static str>)Both run inside per-row
mapclosures over search-result rows; each row previously incurred an unconditional smallStringallocation that the new code avoids.Why this is the right shape
Phase 6d — hot-path
format!/to_stringaudit — sweepsuffs-mft::io::parser,uffs-mft::parse,uffs-core::search,uffs-core::path_resolver, anduffs-core::aggregatefor the per-record / per-row formatting hot spots flagged in plan §1.2 (HIGH priority).The audit produces a small surgical-fix surface:
uffs-mft::io::parser— zeroformat!/.to_string()calls in the parse hot path (columns.rs,merger.rs,zero_alloc.rs,dataframe_build.rs). The 205format!()count from the 6a baseline lives in cold paths (parse-error variants, log messages, USN status, IOCP error context). Plan §0.2 risk-register row predicted this.uffs-core::path_resolver::fast.rs— two per-rowformat!calls inadd_path_column_with_dir_suffix, fixed by this PR.uffs-core::search::dataframe_convert.rs— one per-rowformat!for drive label, fixed by this PR.uffs-core::path_resolver::legacy.rs:104—format!("{}:\\{}", drive, components.join("\\"))is irreducible (must allocate one owned String result per resolve); cat-β KEEP.uffs-core::search::query::mod.rs,backend.rs—pattern.to_owned()/needle.to_owned()sites are once-per-query (query prep), cat-β KEEP. Candidate for §6eCow<'_, str>expansion.uffs-core::aggregate::finalize.rs:675-705—format_fieldcalled per (bucket × sample size), not per record. Aggregation framework only samples top-K records per bucket. Marginal cat-γ, KEEP.<UNKNOWN_0x...>/<DELETED:...>/<EXT:...>/<CORRUPT:...>placeholders inparse/{direct_index,direct_index_extension,forensic/base,forensic/extension,forensic,full,placeholders}.rs— conditional formatting that fires only on unknown / deleted / extension / corrupt / placeholder records. None fire on every record in the parse hot path. Cat-γ KEEP.Verification
just lint-pre-push✅ (96s): every gate green — fmt, file-size, gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift, commit-subjects, vet, vet-audit-discipline, machete, typos, reuse, cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, deny, lint-ci-windows.cargo nextest run -p uffs-core✅ 817 passed, 3 skipped, 0 failed — same numbers as pre-fix.add_path_column_with_dir_suffix_*,add_path_column_*,add_path_column_parallel_*(path-resolver suite);display_rows_to_dataframe_*,results_to_dataframe_*(result-conversion suite). All pass — observable behavior is byte-identical.Behavior & contracts
add_path_column_with_dir_suffixanddisplay_rows_to_dataframekeep their existing signatures and return types.parent_path.ends_with('\\')true and false branches, plus thefile_name == "."root-of-volume case).<letter>:for A..Z — the previousformat!produced identical output for the construction-valid range.DriveLetter, which cannot occur perparse()invariants), the new code yields"?:"versus the previous code's<letter>:. Since this branch is unreachable by construction, the user-visible difference is zero.Strict-lint posture
push_strmutation; static-table lookup viaget().copied())..unwrap_or("?:")fallback indisplay_rows_to_dataframesatisfiesclippy::indexing_slicing = "deny"without needing a per-site#[expect].Allocation savings
Per N-row search result (typical N = 1k–100k):
add_path_column_with_dir_suffix): N smallStringallocations + N drops (replaced by N in-placepush_stroperations that reuse theparent_pathbuffer).display_rows_to_dataframe): N tiny (2-byte)Stringallocations + N drops + 1Vec<String>allocation (replaced by 1Vec<&str>allocation; labels live in rodata).Wall-clock impact will be quantified by sub-phase 6g (bench refresh on
mft_read,query,search_benchmarks).refs #193