Skip to content

chore(scripts): add concurrency_audit.sh — Phase 10a baseline tool#303

Merged
githubrobbi merged 1 commit into
mainfrom
chore/concurrency-audit-phase-10a
May 20, 2026
Merged

chore(scripts): add concurrency_audit.sh — Phase 10a baseline tool#303
githubrobbi merged 1 commit into
mainfrom
chore/concurrency-audit-phase-10a

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

Summary

Phase 10a of the playbook Phase-10 effort (issue #302). Adds scripts/dev/concurrency_audit.sh — the workspace concurrency / async / shared-state baseline tool. Mirrors the shape of scripts/dev/build_codegen_audit.sh (Phase 9a).

What it does

Walks every workspace member and emits, per crate, the 7-dimension async/concurrency inventory called out by playbook §1082-1146 and Phase 10 plan §7:

  1. tokio::spawn / detached tasks — every call-site for Phase-10c hand-audit.
  2. Locks held across .await — literal .read/.write/.lock().await sites for Phase-10b hand-audit.
  3. Blocking IO inside async — files containing both async fn AND std::fs::* / std::thread::sleep (Phase-10f candidates).
  4. Arc<Mutex<…>> patterns — flat + multi-layer nesting (Arc<Mutex<Arc<…>>> separately flagged).
  5. Missing timeouts — IO/network/IPC await sites (.connect/.read_exact/.write_all/.recv/.accept/...) that need a tokio::time::timeout enclosure.
  6. Missing cancellation handling — spawn sites whose closure body (next 50 lines) lacks select! / CancellationToken / .cancelled() keywords.
  7. Unbounded channels — every unbounded_channel() / broadcast::channel(...) site for Phase-10d backpressure audit.

Default mode runs in ~6 s (pure rg + awk; no cargo invocation). --with-cargo mode also runs cargo build --workspace --tests + cargo clippy --workspace --tests -- -W clippy::await_holding_lock as a Phase-10b enforcement-mode preview.

Baseline at this SHA (prod-only)

Dimension Count Phase that consumes it
async fn + async blocks 278 (info)
tokio::spawn( call sites 27 10c
spawn_blocking call sites 53 10f (verification)
std::sync::Mutex/RwLock 22 (info)
tokio::sync::* async locks 6 10b
Arc<Mutex<…>> / Arc<RwLock<…>> 1 (info — flat, no smell)
Bounded channels 5 (info)
Unbounded channels 4 10d
tokio::time::timeout( sites 10 10e
Lock-across-await candidate sites 36 10b (primary target)
Blocking-IO-in-async candidate files 14 10f
Spawn sites lacking nearby cancellation kw 24 10c
#[tokio::test] sites (test code) 127 (info)

The headline finding is 36 lock-across-await candidate sites in uffs-daemon (34) + uffs-mcp (2) — significantly more than the initial 7-site estimate in the plan recon. Each will be hand-audited in Phase 10b for "guard held across an inner .await" (the hazard) vs "guard acquired with .await and dropped before the next .await" (legitimate).

Detection caveats (documented in the script preamble + per-helper comments)

  • Lock-across-await uses literal regex. Multi-line guard-then-await patterns (let g = lock(); g.foo(); other.await; drop(g);) require Phase-10b hand-audit to confirm.
  • Cancellation-keyword regex uses word boundaries. Bare cancel would match cancel_tx false-positively; we require CancellationToken / cancellation_token / .cancelled() / is_cancelled / select! / abort_signal / recv_cancel.
  • Arc<Mutex<…>> output filters doc-comment lines. Rustdoc prose referencing the pattern doesn't inflate the count.
  • Counter uses grep -c '^.' (not grep -c .) so echo '' doesn't falsely count as 1 (this was caught + fixed during the script's own smoke-test).
  • Glob excludes use !**/tests/** (not bare !tests/**) — UFFS has in-tree test modules under src/.../tests/ which the bare pattern would not exclude.

Why this is Phase 10a (audit-tool first)

Mirroring the established Phase 6a / 7a / 8a / 9a cadence:

  1. Build the audit tool first.
  2. Run it to capture the precise baseline (output saved locally to docs/dev/baseline/2026-05-19/phase_10_concurrency_baseline.md).
  3. Use the baseline numbers to seed the per-sub-phase hand-audits (10b lock-across-await, 10c task ownership, 10d backpressure, 10e timeout coverage, 10f blocking IO).
  4. Phase 10g consumes all hand-audit outputs into concurrency_policy.md + per-crate # Concurrency rustdoc.

Rule-1 adherence

Zero #[allow] introductions. Script-level set -uo pipefail (matches sibling audit scripts). One # shellcheck-info-level note (SC2016 — 'single quotes don't expand') is intentional: the affected lines emit Markdown backticks literally, not shell variables.

Cross-references

  • Issue: [playbook-phase-10] Async, concurrency, and shared state discipline #302
  • Plan (local): docs/dev/architecture/code_clean/phase_10_async_concurrency_shared_state_implementation_plan.md
  • Sibling audit scripts: scripts/dev/build_codegen_audit.sh (Phase 9a — same shape), scripts/dev/feature_dep_audit.sh (Phase 8a), scripts/dev/trait_generic_audit.sh (Phase 7a), scripts/dev/clone_alloc_audit.sh (Phase 6a).
  • Playbook source: world_class_rust_workspace_refactor_playbook.md §1082-1146 (local-only).

Verification

  • bash -n scripts/dev/concurrency_audit.sh → SYNTAX OK
  • shellcheck scripts/dev/concurrency_audit.sh → only SC2016 info (intentional, see above)
  • scripts/dev/concurrency_audit.sh > /tmp/baseline.md → 316-line Markdown report in ~6 s
  • All 12 pre-push gates green (file-size, typos, reuse, cargo-check, lint-ci, lint-ci-no-default, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, lint-ci-windows).

Next steps (queued)

  • 10b — hand-audit the 36 lock-across-await candidate sites.
  • 10c — hand-audit the 24 spawn-sites-without-cancellation-keywords.
  • 10d — justify or convert the 4 unbounded channels.
  • 10e — timeout coverage map.
  • 10f — blocking-IO-in-async hand-audit (14 candidate files).
  • 10gconcurrency_policy.md + per-crate # Concurrency rustdoc.
  • 10h — CONTRIBUTING cross-link + final report + close [playbook-phase-10] Async, concurrency, and shared state discipline #302.

Walks every workspace member and emits, per crate, the 7-dimension
async/concurrency inventory called out by playbook §1082-1146 and
Phase 10 plan §7:

  1. tokio::spawn / detached tasks — every call-site for hand-audit.
  2. Locks held across .await — literal .read/.write/.lock().await
     sites for Phase-10b hand-audit.
  3. Blocking IO inside async — files containing both async fn and
     std::fs::* / std::thread::sleep (Phase-10f candidates).
  4. Arc<Mutex<...>> patterns — flat + multi-layer nesting.
  5. Missing timeouts — IO/network/IPC await sites that need a
     tokio::time::timeout enclosure.
  6. Missing cancellation handling — spawn sites whose closure body
     (next 50 lines) lacks select! / CancellationToken / cancelled()
     keywords.
  7. Unbounded channels — every unbounded_channel() / broadcast::channel()
     site for Phase-10d backpressure audit.

Mirrors the shape of scripts/dev/build_codegen_audit.sh (Phase 9a):
shebang + SPDX header, --with-cargo flag (runs cargo build --tests +
cargo clippy -W clippy::await_holding_lock), workspace-root detection,
RG_PROD_GLOBS filter, Markdown report to stdout.

Prod-only filter excludes test code (tests/, benches/, examples/,
tests.rs / *_tests.rs / *_test.rs / test_*.rs).  Note: the **/
recursive prefix is required for directory excludes because UFFS has
in-tree test modules under src/.../tests/ (the canonical Rust pattern)
in addition to top-level crates/*/tests/.  Without the prefix,
!tests/** would only match the top-level path.

Detection caveats are documented inline:
  * Lock-across-await uses literal regex; multi-line guard-then-await
    patterns require Phase-10b hand-audit.
  * Cancellation-keyword regex uses word boundaries (e.g. requires
    CancellationToken / cancelled() / abort_signal, NOT bare 'cancel'
    which would match cancel_tx false-positively).
  * Arc<Mutex<...>> output filters doc-comment / block-comment lines
    so rustdoc prose doesn't inflate the count.
  * Counter uses grep -c '^.' (not grep -c .) so echo '' doesn't
    falsely count as 1.

Baseline at SHA ff8b897 (Phase 10 entry — prod-only counts):
  * 278 async fn + blocks across 5 crates (daemon=132, mft=74,
    mcp=46, client=25, core=1).
  * 27 tokio::spawn( sites (daemon=23, mft=3, client=1).
  * 53 spawn_blocking sites (daemon=36, mft=15, core=1, client=1).
  * 22 std::sync::Mutex/RwLock + 6 tokio::sync::* + 1 Arc<Mutex<>>.
  * 36 lock-across-await candidate sites (daemon=34, mcp=2) —
    primary Phase-10b audit target.
  * 5 bounded + 4 unbounded channels — Phase-10d target.
  * 10 prod-code tokio::time::timeout sites — Phase-10e target.
  * 24 of 27 spawn sites lack nearby cancellation keywords —
    Phase-10c target.
  * 14 blocking-IO-in-async candidate files — Phase-10f target.

Runs in ~6 s (pure rg + awk; no cargo invocation in default mode).

Refs #302.
@githubrobbi githubrobbi merged commit 2144a58 into main May 20, 2026
19 checks passed
@githubrobbi githubrobbi deleted the chore/concurrency-audit-phase-10a branch May 20, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[playbook-phase-10] Async, concurrency, and shared state discipline

1 participant