Avoid queued maintenance prune locks#349
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR extends the queue-storage receipt-plane with per-claim-slot rescue cursors for both stale and deadline-based rescue scanning. It introduces migrations v033–v035 to install cursor columns and materialize terminal-delete closure evidence, refactors stale-receipt rescue from full-table scans to cursor-driven per-slot scanning with configurable batch limits, replaces ChangesReceipt-Plane Cursor & Maintenance Locks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
|
d1e1744 to
ecd42ca
Compare
ecd42ca to
f4373e0
Compare
5k/s offered-load check after rebase on #350Run: Shape:
Results:
Wait-event shape during
Interpretation:
This is evidence that #349 fixes the queued-prune-lock failure mode, but it is not evidence that a single logical queue can sustainably absorb 5k/s end-to-end on this machine. The next throughput lever is still queue fanout/striping or a deeper claim/receipt batching redesign that reduces per-completion durable writes. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4373e0e8e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ORDER BY claims.claimed_at, claims.job_id, claims.run_lease | ||
| LIMIT $3 |
There was a problem hiding this comment.
Avoid stranding stale claims behind a fresh blocker
When a healthy long-running receipt claim sits just after the rescue cursor, this LIMIT $3 confines every rescue pass to the same first 10k rows because the cursor advancement below stops before the first non-advanceable row. Any stale open claims beyond that window are never considered until the healthy blocker closes or becomes stale, so large claim partitions can leave jobs stuck in running even though maintenance keeps firing; the new test only covers a stale tail within the limited window.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
awa-model/src/queue_storage.rs (1)
11664-11673:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOnly map actual NOWAIT lock contention to
Blocked. These three branches currently treat anyLOCK TABLE ... NOWAITfailure as contention. That masks real database errors behind the maintenance backoff path, so the runtime can keep reportingBlockedinstead of surfacing a real failure.
awa-model/src/queue_storage.rs#L11664-L11673: match the lock error and returnPruneOutcome::Blockedonly for55P03; propagate other errors withmap_sqlx_error.awa-model/src/queue_storage.rs#L11999-L12008: apply the sameis_lock_contention_errorcheck before converting the failure toBlocked.awa-model/src/queue_storage.rs#L12227-L12236: keep the claim-prune NOWAIT semantics, but do not swallow non-lock failures intoBlocked.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 11664 - 11673, awa-model/src/queue_storage.rs:11664-11673 — change the current unconditional match-on-lock failure to inspect the sqlx error and only return PruneOutcome::Blocked when the DB error code is the Postgres lock contention code "55P03"; for any other error propagate it via map_sqlx_error (or return Err(map_sqlx_error(err))) instead of swallowing it into Blocked. awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check: treat the NOWAIT lock failure as Blocked only when the error is "55P03", otherwise propagate the error with map_sqlx_error. awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path keep NOWAIT semantics but do not convert non-lock errors to PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and for other sqlx errors propagate them using map_sqlx_error. Use a small helper or inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03") to identify lock contention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8203-8213: The loop that rescues stale receipts (using
claim_slot_count() and rescue_stale_receipt_claims_for_slot_tx) can exhaust the
global remaining batch on low-numbered slots and never reach the slot that
prune_oldest_claims is waiting on; change the algorithm to identify the
prune-blocking slot first (e.g. by calling the existing prune_oldest_claims() or
the routine that returns the oldest-generation slot) and attempt rescue for that
slot before iterating other slots, subtracting its rescued count from remaining,
then iterate the remaining slots (skipping the already-handled prune-blocking
slot) so the global remaining budget cannot be fully consumed by other slots
before the prune-blocking slot is serviced; ensure you still break when
remaining==0 and preserve error handling around
rescue_stale_receipt_claims_for_slot_tx.
---
Outside diff comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 11664-11673: awa-model/src/queue_storage.rs:11664-11673 — change
the current unconditional match-on-lock failure to inspect the sqlx error and
only return PruneOutcome::Blocked when the DB error code is the Postgres lock
contention code "55P03"; for any other error propagate it via map_sqlx_error (or
return Err(map_sqlx_error(err))) instead of swallowing it into Blocked.
awa-model/src/queue_storage.rs:11999-12008 — apply the same selective check:
treat the NOWAIT lock failure as Blocked only when the error is "55P03",
otherwise propagate the error with map_sqlx_error.
awa-model/src/queue_storage.rs:12227-12236 — for the claim-prune NOWAIT path
keep NOWAIT semantics but do not convert non-lock errors to
PruneOutcome::Blocked; detect "55P03" and return Blocked only for that code, and
for other sqlx errors propagate them using map_sqlx_error. Use a small helper or
inline check on err.as_database_error().and_then(|d| d.code()) == Some("55P03")
to identify lock contention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 471a5d06-8b88-4b50-a8fe-f710dce2586d
📒 Files selected for processing (21)
awa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/migrations/v033_receipt_rescue_cursors.sqlawa-model/src/migrations.rsawa-model/src/queue_storage.rsawa-python/src/client.rsawa-worker/src/client.rsawa-worker/src/maintenance.rsawa/tests/queue_storage_benchmark_test.rsawa/tests/queue_storage_runtime_test.rsawa/tests/receipt_plane_chaos_test.rscorrectness/storage/AwaDeadTupleContract.tlacorrectness/storage/AwaSegmentedStorage.tlacorrectness/storage/AwaStorageLockOrder.tlacorrectness/storage/MAPPING.mdcorrectness/storage/README.mddocs/adr/019-queue-storage-redesign.mddocs/adr/023-receipt-plane-ring-partitioning.mddocs/adr/032-failed-terminal-retention.mddocs/architecture.mddocs/configuration.mddocs/grafana/awa-dashboard.json
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
correctness/storage/AwaStorageLockOrder.tla (1)
14-16:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClarify NOWAIT semantics in the module header comment.
Line 14 says
ACCESS EXCLUSIVE NOWAIT, but Line 15 says prune “waits.” With NOWAIT, the maintenance path should fail fast/back off rather than queue behind in-flight writers, so this wording is contradictory.Suggested wording fix
-\* - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune -\* waits for in-flight claim/complete writes to commit before -\* truncating). +\* - LOCK TABLE ACCESS EXCLUSIVE NOWAIT on the partition child (so prune +\* fails fast under contention instead of queueing ahead of worker +\* traffic before truncating).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@correctness/storage/AwaStorageLockOrder.tla` around lines 14 - 16, The module header comment in AwaStorageLockOrder.tla contains contradictory semantics regarding lock behavior in lines 14-16. The comment states ACCESS EXCLUSIVE NOWAIT (which means fail immediately if the lock cannot be acquired) but then says prune "waits" for in-flight writes (which contradicts NOWAIT behavior). Revise the comment to accurately reflect the intended lock semantics: if using NOWAIT, the language should indicate the operation fails fast or backs off rather than waits; if the intention is to wait, remove NOWAIT from the description or adjust the lock specification accordingly to ensure consistency between the lock mode and the described behavior.
♻️ Duplicate comments (1)
awa-model/src/queue_storage.rs (1)
8552-8567:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrioritize the prune-blocking slot in deadline rescue too.
This loop still spends the global deadline-rescue budget in numeric slot order. Under sustained expired-deadline backlog on lower-numbered slots,
remainingcan hit zero before the oldest initialized slot is visited, which leavesprune_oldest_claims()stuck behind rescueable open claims even after the stale-receipt path was fixed.♻️ Minimal fix sketch
async fn rescue_expired_receipt_deadlines_tx<'a>( &self, tx: &mut sqlx::Transaction<'a, sqlx::Postgres>, ) -> Result<Vec<DeletedLeaseRow>, AwaError> { + let schema = self.schema(); let mut rescued = Vec::new(); let mut remaining = RECEIPT_RESCUE_BATCH_LIMIT; + let preferred_slot: Option<i32> = sqlx::query_as::<_, (i32, i64, i32)>(&format!( + r#" + SELECT current_slot, generation, slot_count + FROM {schema}.claim_ring_state + WHERE singleton = TRUE + "# + )) + .fetch_optional(tx.as_mut()) + .await + .map_err(map_sqlx_error)? + .and_then(|(current_slot, generation, slot_count)| { + oldest_initialized_ring_slot(current_slot, generation, slot_count) + .map(|(slot, _)| slot) + .filter(|slot| *slot >= 0 && (*slot as usize) < self.claim_slot_count()) + }); + + if let Some(slot) = preferred_slot { + let mut slot_rescued = self + .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining) + .await?; + remaining = remaining.saturating_sub(slot_rescued.len() as i64); + rescued.append(&mut slot_rescued); + if remaining == 0 { + return Ok(rescued); + } + } + for slot in 0..self.claim_slot_count() { + let slot = slot as i32; + if Some(slot) == preferred_slot { + continue; + } let mut slot_rescued = self - .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot as i32, remaining) + .rescue_expired_receipt_deadlines_for_slot_tx(tx, slot, remaining) .await?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 8552 - 8567, The loop that rescues expired receipt deadlines by iterating through slots in numeric order (0..self.claim_slot_count()) can exhaust the remaining budget before reaching the oldest initialized slot, which blocks prune_oldest_claims() from progressing. Refactor the loop to prioritize the oldest initialized slot (the prune-blocking slot) first by visiting it before iterating through other slots in order. This ensures that critical pruning operations are not blocked by backlog on lower-numbered slots when the deadline-rescue budget is limited.
🧹 Nitpick comments (1)
awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql (1)
57-63: 💤 Low valueLease-claim-receipts detection heuristic is fragile.
The detection relies on string matching within the function body (
position(...IN v_claim_runtime_def)). If the function text changes format (e.g., different quoting, whitespace, or schema aliasing), this could produce false negatives. However, thev_schema = 'awa'fallback ensures the default schema always works, and the heuristic is only used for custom schemas where the function was explicitly defined.This is acceptable for a migration that runs once per schema upgrade.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql` around lines 57 - 63, The lease-claim-receipts detection uses string matching on the function body via the position function to detect INSERT statements, which is fragile to formatting changes. While the comment acknowledges this is acceptable for a one-time migration due to the v_schema = 'awa' fallback, add a code comment in the migration explaining the heuristic's limitations and why it is acceptable (single-run migration, explicit function definition in custom schemas, fallback for default schema). This documents the technical debt and helps future maintainers understand the tradeoff without refactoring the detection logic itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@awa/tests/queue_storage_runtime_test.rs`:
- Around line 8255-8256: The test currently sets idle_threshold to 120 seconds
but only ages last_claimed_at by 90 seconds, which keeps the lease on the fresh
path and prevents the stale-lease branch from being exercised. To fix this,
increase the duration used to age last_claimed_at (currently 90 seconds) to
exceed the idle_threshold value (120 seconds), so the stale-lease code path is
properly tested and the final assertion comparing after_stale and after_fresh
can hold true. This change is needed both where last_claimed_at is initially
aged and in any related assertion or verification sections.
In `@correctness/storage/MAPPING.md`:
- Line 67: The PruneLeaseSegment mapping row on line 67 of the MAPPING.md file
describes the prune operation but omits the ACCESS EXCLUSIVE NOWAIT lock
acquisition detail that is part of the current PR's prune semantics. Update the
PruneLeaseSegment row to explicitly mention the NOWAIT behavior in the lock
acquisition specification, making it consistent with the adjacent mapping rows
on lines 66 and 69 and aligned with the lock-order specification narrative.
- Line 195: In the ready-prune race note at line 195 in the MAPPING.md file,
replace the reference to `PruneLeaseSegment` with `PruneReadySegment` because
the paragraph is describing the ready-partition `prune_oldest` concern, not the
lease segment concern. This ensures the documentation correctly identifies which
spec transition addresses the specific issue being discussed and avoids
cross-spec confusion.
In `@correctness/storage/README.md`:
- Around line 150-151: The README.md description of AwaStorageLockOrder.cfg at
lines 150-151 enumerates the modeled flows but omits terminal-delete from the
list. Since AwaStorageLockOrder.tla now includes TerminalDeletePlan and
StartTerminalDelete, you need to add terminal-delete to the comma-separated list
of modeled flows in the AwaStorageLockOrder.cfg description to keep the
configuration summary complete and accurate.
---
Outside diff comments:
In `@correctness/storage/AwaStorageLockOrder.tla`:
- Around line 14-16: The module header comment in AwaStorageLockOrder.tla
contains contradictory semantics regarding lock behavior in lines 14-16. The
comment states ACCESS EXCLUSIVE NOWAIT (which means fail immediately if the lock
cannot be acquired) but then says prune "waits" for in-flight writes (which
contradicts NOWAIT behavior). Revise the comment to accurately reflect the
intended lock semantics: if using NOWAIT, the language should indicate the
operation fails fast or backs off rather than waits; if the intention is to
wait, remove NOWAIT from the description or adjust the lock specification
accordingly to ensure consistency between the lock mode and the described
behavior.
---
Duplicate comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 8552-8567: The loop that rescues expired receipt deadlines by
iterating through slots in numeric order (0..self.claim_slot_count()) can
exhaust the remaining budget before reaching the oldest initialized slot, which
blocks prune_oldest_claims() from progressing. Refactor the loop to prioritize
the oldest initialized slot (the prune-blocking slot) first by visiting it
before iterating through other slots in order. This ensures that critical
pruning operations are not blocked by backlog on lower-numbered slots when the
deadline-rescue budget is limited.
---
Nitpick comments:
In `@awa-model/migrations/v035_receipt_deadline_rescue_cursors.sql`:
- Around line 57-63: The lease-claim-receipts detection uses string matching on
the function body via the position function to detect INSERT statements, which
is fragile to formatting changes. While the comment acknowledges this is
acceptable for a one-time migration due to the v_schema = 'awa' fallback, add a
code comment in the migration explaining the heuristic's limitations and why it
is acceptable (single-run migration, explicit function definition in custom
schemas, fallback for default schema). This documents the technical debt and
helps future maintainers understand the tradeoff without refactoring the
detection logic itself.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26af158f-1f1a-4c9a-a565-f1de279e2f2a
📒 Files selected for processing (18)
awa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/migrations/v033_receipt_rescue_cursors.sqlawa-model/migrations/v034_receipt_terminal_delete_closures.sqlawa-model/migrations/v035_receipt_deadline_rescue_cursors.sqlawa-model/src/migrations.rsawa-model/src/queue_storage.rsawa/tests/queue_storage_runtime_test.rscorrectness/storage/AwaDeadTupleContract.tlacorrectness/storage/AwaSegmentedStorage.tlacorrectness/storage/AwaStorageLockOrder.tlacorrectness/storage/MAPPING.mdcorrectness/storage/README.mddocs/adr/019-queue-storage-redesign.mddocs/adr/023-receipt-plane-ring-partitioning.mddocs/adr/026-narrow-terminal-history.mddocs/architecture.mddocs/benchmarking.mddocs/configuration.md
✅ Files skipped from review due to trivial changes (4)
- docs/benchmarking.md
- docs/adr/026-narrow-terminal-history.md
- docs/configuration.md
- docs/adr/023-receipt-plane-ring-partitioning.md
🚧 Files skipped from review as they are similar to previous changes (3)
- awa-model/migrations/v033_receipt_rescue_cursors.sql
- awa-model/migrations/v023_install_queue_storage_substrate.sql
- docs/adr/019-queue-storage-redesign.md
5k/s offered retest after receipt deadline cursor workRun id: Command: uv run bench run --systems awa \
--producer-rate 5000 \
--producer-mode fixed \
--worker-count 64 \
--awa-completion-batch-size 1024 \
--sample-every 10 \
--wait-event-sample-every 2 \
--phase warmup=warmup:2m \
--phase clean_5k=clean:20m \
--phase recovery_5k=recovery:5mCompared against previous same-shape run Throughput
So this change is throughput-neutral for this 5k/s offered single-queue shape. It bounds/removes one class of repeated deadline-rescue scan risk, but it does not by itself move the e2e drain ceiling. Storage/dead-tuple shapeThe append-only/tombstone storage shape held under load:
Instrumentation notesI captured
My read: this PR is still worth keeping because it gives a bounded deadline-rescue cursor and keeps the storage invariants clean, but the next meaningful performance work should target slot-prune/live-claim proof so maintenance can answer from a cursor/ledger rather than repeatedly reproving over receipt claim history. |
Summary
ACCESS EXCLUSIVE NOWAITso best-effort maintenance backs off instead of queueing relation locks ahead of workers.250msto1000ms; faster rotation was only mutating the one-row lease-ring metadata more often under pinned MVCC horizons.done_entriesas durable closure evidence, non-success paths keep explicitlease_claim_closures, and terminal-delete paths materialize aterminal_removedclosure before removing terminal evidence.v034_receipt_terminal_delete_closures.sqland setCURRENT_VERSION = 34.Why
The #169 long-horizon gate run against main at
7be5bd8failed with a maintenance lock convoy. During the pinned-MVCC phase, diagnostics caughtLOCK TABLE awa.leases_N IN ACCESS EXCLUSIVE MODEwaiting behind the pinned transaction. Because queuedACCESS EXCLUSIVElocks sit ahead of later relation lockers, best-effort maintenance could stall worker traffic even though the operation was supposed to back off quickly.After the NOWAIT fix removed that convoy, later verification still found a smaller late-idle slowdown. The remaining pressure was not ready/done segment churn: those partitions stayed append-only with zero dead tuples. The residual shape was metadata/receipt coordination work: receipt rescue scans over growing closed-claim evidence, and frequent lease-ring singleton updates under a pinned snapshot. This PR follows that evidence instead of tuning around it.
The receipt-evidence refinement removes the duplicate success-path closure write without weakening safety. Successful receipt completion already writes the durable terminal fact synchronously; claim prune can treat that terminal row as closure evidence. Queue prune now refuses to delete terminal evidence while same-segment claim rows still depend on it, and cold terminal-delete paths first materialize an explicit closure if they remove the terminal fact before claim prune.
Long-Horizon Evidence
Shape: 1 Awa replica, 64 workers, fixed
800/soffered rate, completion batch1024, phaseswarmup=5m -> clean_1=20m -> idle_1=idle-in-tx:60m -> recovery_1=10m, 30s metric samples, 5s wait-event samples, Postgres18.3-alpine.Failed Main Run
Result dir:
results/custom-20260612T090023Z-412cc6, Awa7be5bd8.clean_1idle_1recovery_1NOWAIT-Only Branch
Result dir:
results/custom-20260612T112048Z-c860b0, Awaba8a4cf.clean_1idle_1recovery_1NOWAIT fixed the relation-lock convoy, but it was not the whole #169 mechanism. Late-idle backlog was still visible.
Final Stacked Branch
Result dir:
results/custom-20260612T195938Z-837e7d, Awad1e1744before the rebase onto #350. The rebased branch preserves the same #349 maintenance changes, then adds the receipt-evidence refinement and v034 compatibility migration.clean_1idle_1recovery_1The pinned-reader stress condition was active:
oldest_idle_in_tx_age_sclimbed to about3,584sandsnapshot_xminstayed flat throughidle_1.Interpretation
This branch fixes the original named mechanism: maintenance relation-lock convoy. It also fixes the later receipt-rescue / lease-ring metadata drag enough that the 60-minute pinned-reader phase sustains the offered
800/swith bounded depth and sub-second p99 pickup.The append-only segment contract held. In the final SQL snapshot,
ready_entries_*,done_entries_*,ready_tombstones_*,queue_terminal_count_deltas_*, and receipt closure partitions had zero dead tuples. Residual dead tuples were in small mutable coordination/counter tables, mainlylease_ring_state,queue_terminal_live_counts,claim_ring_slots, and ring state rows. Duringidle_1, raw samples peaked atlease_ring_state=3,584,queue_terminal_live_counts=3,328, andclaim_ring_slots=343dead tuples.pg_stat_statementsstill identifiesclaim_ready_runtimeas the load-bearing query: final snapshot showed about235kcalls,4.56Mclaimed rows, and17.2msmean time. Completion was much cheaper: about181kbatched completion calls,4.56Mcompleted rows, and1.7msmean. The remaining claim cost is consistent with bounded receipt/coordination checks, not ready/done heap churn. WAL stayed around2.3KBper completed job and about20WAL records per completed job during the pinned-reader phase.This is a good #169 gate result for the
800/slong-horizon shape. It is not evidence for the5k/sor10k/sceiling; those still need separate offered-rate runs after this lands or while CI runs.Local Verification
cargo fmt --all -- --checkcargo check --package awa-model --all-featurescargo clippy --package awa-model --all-targets --all-features -- -D warningsDATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test migration_test— 42 passedDATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa-model --test sql_only_storage_upgrade_test— 2 passedDATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test— 99 passedDATABASE_URL=postgres://bench:bench@localhost:15555/awa_test cargo test -p awa --test queue_storage_runtime_test prune— 14 passedTLC:
AwaSegmentedStorage.cfg,AwaSegmentedStorageInterleavings.cfgAwaDeadTupleContract.cfgAwaStorageLockOrder.cfgAwaSegmentedStorageRacesSafe.cfg,AwaSegmentedStorageRacesMultiWorker.cfgAwaShardedPrune.cfgAwaPartitionedQueueRouting.cfgAwaStorageTransition.cfg*TraceIncompleteviolation.NoDeadlockas expected.Review Notes
326cdd6: pruneTRUNCATEfailures now returnBlockedonly for lock contention and propagate other database errors.Summary by CodeRabbit