Skip to content

refactor(uffs-core): extract fold_needle helper returning Cow<'_, str> (Phase 6e, refs #193)#284

Merged
githubrobbi merged 2 commits into
mainfrom
chore/phase-6e-cow-needle
May 19, 2026
Merged

refactor(uffs-core): extract fold_needle helper returning Cow<'_, str> (Phase 6e, refs #193)#284
githubrobbi merged 2 commits into
mainfrom
chore/phase-6e-cow-needle

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

Phase 6 sub-phase 6e deliverable per the phase-6 ownership/borrowing/allocation plan §5e + §3.5 (local plan, gitignored under /docs/dev/).

What this changes

Deduplicates the per-query needle-fold block previously duplicated between MultiDriveBackend::search (line ~571) and the free search_index function (line ~839) in crates/uffs-core/src/search/backend.rs. The new helper super::dispatch::fold_needle returns Cow<'_, str> so the case-sensitive path avoids the per-query String allocation:

Case-sensitivity Result Allocation
case_sensitive == true Cow::Borrowed(pattern) zero — reuses the caller's &str
case_sensitive == false Cow::Owned(fold.fold_into(pattern, &mut buf).to_owned()) one — irreducible (folded data must be owned)

Downstream consumers (is_path_pattern, strip_prefix, search_compact_drive, search_compact_drive_tree, search_compact_drive_regex) already accept &str and rely on Cow's Deref<Target = str> impl for transparent coercion — no call-site changes needed beyond the binding site.

Why this is the right shape

Phase 6e plan rule (§3.5): "Introduce Cow<'_, str> ONLY where genuinely ambiguous ownership exists (e.g. function takes &str but sometimes needs an owned variant for case-folding)."

This is the textbook case:

  • The function takes pattern: &str (borrowed).
  • The case-sensitive branch can keep the borrow.
  • The case-insensitive branch must own the fold buffer's contents.
  • Downstream call sites only ever read &str.

The 6d audit surfaced this as the highest-value cat-β → Cow conversion opportunity in the uffs-core::search hot path (per plan §3.4). Two other candidate sites (extract_trigram_needle in query/mod.rs:223, format_field in aggregate/finalize.rs:668) were considered but rejected:

  • extract_trigram_needle already returns the value via String storage in the caller's per-query scratch — Cow gains nothing since the caller stores it owned regardless.
  • format_field returns String for aggregation bucket sample formatting; the static-literal branches ("directory", "file", "true", "false") could be Cow but the savings are bucket-count × sample-size, not per-record. Cat-γ, KEEP.

Verification

  • just lint-pre-push ✅ (87s): every gate green — fmt, file-size, gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift, commit-subjects, vet, vet-audit-discipline, machete, typos, reuse, cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, deny, lint-ci-windows.
  • cargo nextest run -p uffs-core ✅ 817 passed, 3 skipped, 0 failed — same numbers as pre-fix.
  • Search-suite coverage: exercises both MultiDriveBackend::search (interactive TUI path) and search_index (daemon path) across case-sensitive, case-insensitive, glob, regex, tree, and match-all branches. All pass — Cow's Deref<Target = str> impl gives byte-identical behavior to the prior String shape.

Behavior & contracts

  • Public API unchanged: both search and search_index keep their existing signatures.
  • Observable behavior preserved: needle produces byte-identical content for every (case_sensitive, pattern) input. All downstream needle.strip_prefix, &needle slice borrows, and pattern matches retain their semantics under Cow's Deref<Target = str> impl.
  • Helper is pub(super) — internal to the search module tree, not part of the public surface.

Strict-lint posture

  • No suppression hacks introduced.
  • Helper uses idiomatic Cow pattern with elided lifetimes (per clippy::elidable_lifetime_names pedantic guidance).
  • alloc::borrow::Cow import matches the workspace convention (compact_cache.rs:28 precedent) and satisfies clippy::std_instead_of_alloc = "deny".
  • No new #[allow] / #[expect] reasons added anywhere.

Allocation savings

One String allocation (≈ 10–100 bytes typical pattern) per case-sensitive search query. Per-query overhead is small but real, and the helper now centralises the fold logic so any future refinement (e.g. taking ownership of the fold buffer directly via String::from_utf8 to skip the second copy in the case-insensitive path) lands in one place rather than two.

Wall-clock impact will be quantified by sub-phase 6g (bench refresh on mft_read, query, search_benchmarks).

refs #193

…> (Phase 6e)

Deduplicates the per-query needle-fold block previously duplicated
between `MultiDriveBackend::search` (line ~571) and `search_index`
(line ~839) in `crates/uffs-core/src/search/backend.rs`.  The new
helper `super::dispatch::fold_needle` returns
`Cow<'_, str>` so the case-sensitive path avoids the per-query
`String` allocation:

  - case-sensitive   → `Cow::Borrowed(pattern)`  (zero alloc)
  - case-insensitive → `Cow::Owned(fold.fold_into(pattern, &mut buf).to_owned())`
                       (one alloc for the folded result — irreducible)

Downstream consumers (`is_path_pattern`, `strip_prefix`,
`search_compact_drive`, `search_compact_drive_tree`,
`search_compact_drive_regex`) already accept `&str` and rely on
[`Cow`]'s `Deref<Target = str>` impl for transparent coercion — no
call-site changes needed beyond the binding site.

Phase 6 sub-phase 6e — Cow<'_, str> expansion per the phase-6
ownership/borrowing/allocation plan §3.5.  Plan rule: introduce
`Cow<'_, str>` only where genuinely ambiguous ownership exists
(i.e. the API takes `&str` but sometimes needs an owned variant for
case-folding).  This is the textbook case.

Allocation savings
------------------

One `String` allocation per case-sensitive search query (~10–100
bytes typical pattern).  Per-query overhead is small but real, and
the helper now centralises the fold logic so any future refinement
(e.g. taking ownership of the fold buffer directly via
`String::from_utf8` to skip the second copy in the case-insensitive
path) lands in one place rather than two.

Verification
------------

- `cargo check -p uffs-core` ✅
- `cargo clippy -p uffs-core --all-targets` ✅ (strict-lint posture
  unchanged — `alloc::borrow::Cow` over `std::borrow::Cow` per the
  workspace's `std_instead_of_alloc = "deny"` rule; lifetime elided
  per `elidable_lifetime_names` pedantic guidance)
- `cargo fmt -p uffs-core` ✅
- `cargo nextest run -p uffs-core` ✅ 817/817 passed, 3 skipped
  (identical numbers to pre-fix)
- Search-suite coverage exercises both `MultiDriveBackend::search`
  (interactive TUI path) and `search_index` (daemon path) across
  case-sensitive, case-insensitive, glob, regex, tree, and match-all
  branches.  All pass.

Strict-lint posture
-------------------

No suppression hacks.  Helper uses idiomatic `Cow` pattern with
elided lifetimes; `alloc` import matches the workspace convention
(see `compact_cache.rs:28` for the existing precedent).

Behavior & contracts
--------------------

- Public API unchanged: both `search` and `search_index` keep their
  existing signatures.
- Observable behavior preserved: `needle` produces byte-identical
  output for every (case_sensitive, pattern) input.  All downstream
  `needle.strip_prefix`, `&needle` slice borrows, and pattern matches
  retain their semantics under `Cow`'s `Deref<Target = str>` impl.
- Helper is `pub(super)` — internal to the `search` module tree.

refs #193
@githubrobbi githubrobbi merged commit 67bde8e into main May 19, 2026
26 checks passed
@githubrobbi githubrobbi deleted the chore/phase-6e-cow-needle branch May 19, 2026 15:27
githubrobbi added a commit that referenced this pull request May 19, 2026
…s-link (Phase 6f, refs #193) (#285)

Adds the workspace allocation contract as a standalone document at
`docs/architecture/code-quality/allocation_policy.md` and a §Allocation
policy section in `CONTRIBUTING.md` summarising the rule and linking
to the new doc.

Mirrors the Phase 5e `panic_policy.md` pattern (PR #278): a single
contract page captures the rule, the five-category decision tree
(α / β / γ / δ / ε), the lint posture, the per-site annotation
contract, the audit cadence, the workspace cross-references, and the
append-only decisions log.

Phase 6 sub-phase 6f deliverable per the phase-6 ownership /
borrowing / allocation plan §5f.

Contents
--------

The new `allocation_policy.md` documents:

1. The one-line rule: hot paths never allocate defensively; cold
   paths may; every prod `.clone()` / `format!()` / `to_owned()`
   must fit one of five blessed categories.
2. The five categories with patterns, verdicts, and worked
   examples from the workspace:
   - **α — Arc clone**:    KEEP (explicit `Arc::clone(&x)` form)
   - **β — Ownership fence**: KEEP (caller has `&T`, API needs `T`)
   - **γ — Error / log context**: KEEP (cold path)
   - **δ — Hot-path anti-pattern**: FIX (refactor, never suppress)
   - **ε — Test helper**:   KEEP (out of scope)
3. The lint posture: 5 clone-family lints at `deny` level
   (`redundant_clone`, `clone_on_ref_ptr`, `cloned_instead_of_copied`,
   `inefficient_to_string`, `unnecessary_to_owned`) plus the 7
   companion lints at `warn`.
4. The per-site annotation contract: every prod allocation site
   must satisfy one of three justification shapes (self-evident
   α, β/γ reason text, or δ refactor comment).
5. The audit cadence: `bash scripts/dev/clone_alloc_audit.sh` (the
   helper shipped in PR #281) produces the workspace inventory;
   absolute counts must not regress.
6. Workspace cross-references between this doc, `panic_policy.md`,
   `lint-posture.md`, `Cargo.toml [workspace.lints]`, and
   `clippy.toml`.
7. Append-only decisions log capturing every cat-δ refactor that
   has landed in Phase 6 so future contributors can trace the
   policy's evolution.

`CONTRIBUTING.md` gains a §Allocation policy section directly below
the existing §Panic policy section, summarising the rule + five
categories in roughly 200 words and linking to the full doc.

Verification
------------

- `just lint-pre-push` ✅ (95s): every gate green (fmt, file-size,
  gates-drift, hooks-drift, workflow-drift, fast-drift, manifest-drift,
  commit-subjects, vet, vet-audit-discipline, machete, typos, reuse,
  cargo-check, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests,
  tests, smoke, deny, lint-ci-windows).
- Markdown links validated by the `typos` + `reuse` gates plus a
  manual check against `docs/architecture/code-quality/` (existing
  `panic_policy.md` and `lint-posture.md` cross-links resolve).
- Doc structure mirrors `panic_policy.md` 1:1 so contributors who
  know one know the other.

Behavior & contracts
--------------------

- No code change.  Documentation-only.
- The lint posture described in §2 is already in force as of
  Phase 6a's `Cargo.toml` (5 deny-level + 7 warn-level
  clone-family lints).  This commit codifies the *contract* behind
  the lints, not their values.
- The decisions log in §7 records PRs #281 / #282 / #283 / #284 as
  the cat-δ refactors that have landed; future entries follow the
  same append-only convention as `panic_policy.md` §7.

Strict-lint posture
-------------------

No code change; lint posture unchanged.

refs #193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant