test: Parquet merge pipeline verification suite by g-talbot · Pull Request #6369 · quickwit-oss/quickwit

g-talbot · 2026-05-02T01:53:40Z

Summary

End-to-end verification stack for the Parquet merge pipeline. Closes the
loop from formal spec → exhaustive model → production runtime checks →
trace-conformance proof, so a divergence between production and the
verified model surfaces as a test failure or a Datadog metric, not a
silent corruption.

Verification pyramid (single source of truth)

TLA+ specs (docs/internals/specs/tla/)
        │ mirrors
Shared invariants (quickwit-dst/src/invariants/merge_pipeline.rs)  ← single Rust module
        │ used by                          │ called by
Stateright models                Production check_invariant! sites
        │                                  │
        └────── Trace conformance test ────┘

The Stateright model and production code evaluate the same Rust
predicate functions on the same state struct. Drift impossible by
construction.

What landed

TLA+ — MergePipelineShutdown.tla
Extended from the original orderly-shutdown spec to cover crash and
restart at every transition. CompleteMerge split into UploadMergeOutput

PublishMergeAndFeedback so a crash between upload and metastore
replace is reachable (orphan_outputs failure mode). Multi-lifetime
support (Restart can fire after graceful shutdown, modeling fresh
process invocations). New invariants:

RowsConserved (MP-4): sum of rows in published splits = total ingested rows
LeakIsObjectStoreOnly (MP-10): orphans never become durable data
BoundedWriteAmp (MP-5), MP1_LevelHomogeneity, NoOrphanInPlanner (MP-8), NoOrphanWhenConnected (MP-9)
Action property RestartReSeedsAllImmature (MP-11): every Restart re-seeds correctly
Liveness NoPersistentOrphan: orphans always eventually clear

Two TLC configs: primary (multi-lifetime, ~16K states, 1s) and chains
(deep merge chain, ~218K states, 12s). 11 invariants + 3 temporal
properties verified.

Shared invariants — quickwit-dst/src/invariants/merge_pipeline.rs
New MergePipelineState struct mirrors the TLA+ VARIABLES block
literally; used directly by the Stateright model (no parallel
definition). 9 pure-function predicates (MP-1, MP-4..MP-11) with unit
tests for passing and failing paths.

Stateright — quickwit-dst/src/models/merge_pipeline.rs
Refactored to use the shared state and predicates. Mirrors the TLA+
multi-phase merge and multi-lifetime semantics. Three configs (small,
multi_lifetime, deep_chains) all pass.

Production runtime checks — quickwit-indexing/.../metrics_pipeline/
Eight new check_invariant! call sites, all forwarding to the existing
DogStatsD recorder (pomsky.invariant.{checked,violated}{invariant=MP-X}):

ParquetMergeExecutor — MP-1 (level homogeneity), MP-2 (≥2 inputs), MP-3 (scope), MP-4 (input rows == output rows)
ParquetMergePlanner::new — MP-11 (post-restart re-seed completeness, the formal "fetch_immature_splits is correct" claim)

Trace conformance — parquet_merge_pipeline_trace_conformance_test.rs
Production emits a MergePipelineEvent stream at every modeled
transition (Restart, PlanMerge, UploadMergeOutput,
PublishMergeAndFeedback, DisconnectMergePlanner,
RunFinalizeAndQuit, etc.). The test replays the captured events
through the same predicates Stateright verifies. Two scenarios pass:

Normal path: 4 splits → cascade of merges → mature output
Crash mid-cascade: publish failure forces respawn; production re-seeds
cleanly across the crash boundary; rows conserved end-to-end

A divergence between production and the verified model would surface
here as a predicate failure with the offending event in the panic
message.

Original verification suite (still here)
Property test, sketch integration test, crash/restart integration test,
multi-round merge integration test — unchanged.

What's deferred

CS-3 (no compaction before threshold): the TLA+ model has it, but
production ParquetMergePolicyConfig doesn't expose
compaction_start_time yet. TODO in
parquet_merge_planner::record_splits_if_necessary so the check lands
alongside the feature.
MP-5..MP-10 lifted to runtime: currently checked only by Stateright +
trace test; would need a periodic background task to materialize the
full MergePipelineState for runtime evaluation. Not added because
the trace-conformance proof already covers them.

Coverage table (23 invariants in registry)

13/23 runtime-checked in production via check_invariant!
6/23 (MP-5..MP-10) verified by Stateright + trace conformance
4/23 (SS-3, SS-4, DM-4, DM-5) by construction (unfalsifiable runtime check)

Sesh-mode rule

Added a strong rule to .claude/skills/sesh-mode/SKILL.md: when a
verification check fails, never silently weaken or remove the invariant.
Almost always either (a) the implementation has a real bug, or (b) the
property is over-strong — and the failing trace is revealing the
weaker safety property the design does guarantee.

Test plan

TLC: MergePipelineShutdown.cfg passes (~16K states, 1s); _chains.cfg passes (~218K states, 12s); 11 invariants + 3 temporal properties
cargo nextest run -p quickwit-dst --features model-checking — 49 tests pass + 1 ignored deep_chains passes when run
cargo nextest run -p quickwit-indexing --features metrics metrics_pipeline — 54 tests pass (52 prior + 2 new trace conformance)
Clippy clean (-Dwarnings)
Nightly fmt clean
cargo doc --no-deps -p quickwit-indexing --features metrics clean
cargo machete clean

🤖 Generated with Claude Code

Property test verifies that no merge task ever contains a split that should have been filtered: mature by ops (>= max_merge_ops), mature by size (>= target), time-matured (created_at + maturation_period <= now), or missing a window. Generates random splits across the maturity boundary and tests at the actor level. Also makes test helpers pub(super) so sibling test modules (sketch, crash, multi-round) can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verifies that sketch splits dispatch to the correct metastore RPCs (stage_sketch_splits, publish_sketch_splits) and that the merged output has ParquetSplitKind::Sketches with correct metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Crash/restart test: - Injects publish failure on 2nd call → pipeline detects failure, kills actors, respawns - Verifies list_metrics_splits called on respawn (re-seeding) - Verifies pipeline generation >= 2 (respawn occurred) - Verifies original splits eventually replaced (no data loss) Multi-round merge test: - 4 input splits → 2 round-1 merges → 1 round-2 merge - Verifies num_merge_ops=2 on final output - Verifies all original + intermediate splits replaced - Verifies MC-1: total rows preserved across lifecycle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Models the two-phase shutdown: DisconnectMergePlanner breaks feedback, RunFinalizeMergePolicyAndQuit drains cold windows, then in-flight merges complete. Safety invariants: - NoSplitLoss: every merge input is published or in-flight - NoDuplicateMerge: no split in multiple concurrent merges - FinalizeWithinBound: at most MaxFinalizeOps finalize operations - ShutdownOnlyWhenDrained: shutdown requires empty in-flight set Liveness: - ShutdownEventuallyCompletes under weak fairness Two configs: _small (hundreds of states) and full (larger space). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Model checks 5 properties across ingest, merge, crash, and restart: - MC-1 lifecycle: total rows preserved (no loss/duplication) - Bounded WA: merge_ops never exceeds max_merge_ops - No duplicate merge: no split in multiple concurrent merges - No orphan after restart: all immature splits re-seeded - MP-1: level homogeneity (by construction) Small model (~instant), full model (~seconds, gated behind #[ignore]). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Document the actual tla2tools.jar path from brew cask tla+-toolbox - Add "Run All Specs" one-liner for quick verification - Add spec catalog with state counts and invariant names - Remove stale references to brew formula (doesn't exist) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI clippy lint (`clippy::question-mark`, -D warnings) rejected the explicit match-and-return-None pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ifetime Phase A of the verification-gap closure (PR #6369). The original spec only modeled orderly two-phase shutdown — no Crash action, no Restart, no row-conservation claim. The four invariants were either trivially true by construction or static set-membership, so the spec proved nothing about the failure modes that motivated it. This rewrite: - Adds Crash and Restart actions. Crash invalidates in-memory state at any point in the pipeline; uploaded-but-unpublished merge outputs become orphan_outputs. Restart re-seeds cold_windows from durable published_splits (models fetch_immature_splits). - Splits CompleteMerge into UploadMergeOutput then PublishMergeAndFeedback so a Crash between the two phases is reachable. This is what catches the "leaked but invisible" failure mode — output blob in S3 with no metastore reference. The model proves these are object-store-only garbage (LeakIsObjectStoreOnly), not durable data loss. - Adds split_rows / split_merge_ops / split_window ghost functions so RowsConserved expresses the strong "no data loss, no duplication" property: total ingested rows equal sum of rows in published_splits in every reachable state. - Adds NoOrphanWhenConnected (state invariant), RestartReSeedsAllImmature (action property) and NoPersistentOrphan (liveness leads-to). Together these capture: while connected the planner sees every immature split; every Restart correctly recovers all immature splits; in any run with restart budget remaining, orphans always eventually clear. - Extends Restart to fire after graceful shutdown too (multi-lifetime), so the cross-process recovery claim is explicit. Bounded by MaxRestarts. The replaced NoOrphanAfterRestart invariant fired during initial TLC runs — the failing trace was a *legitimate* state in production (publish during shutdown disconnect leaves the output untracked by the planner until the next process invocation). The fix wasn't to remove the invariant but to split the safety claim from the recovery claim: NoOrphanWhenConnected for the steady state, RestartReSeedsAllImmature for the recovery transition, NoPersistentOrphan for cross-lifetime liveness. Recorded as a sesh-mode rule: never silently weaken an invariant that produced a counter-example. Single config drops the small/full split — full config now runs in ~12s on a workstation (217,854 distinct states, 11 invariants + 3 temporal properties verified). The states/ directory dropped by TLC is now gitignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The primary `MergePipelineShutdown.cfg` is now optimized for multi-lifetime exercise (MaxIngests=3, MaxRestarts=2) — 15,732 states in ~1s — covering the cross-process recovery claim added in the previous commit. The new `MergePipelineShutdown_chains.cfg` keeps MaxIngests=4 with MaxRestarts=1 for deeper merge-chain coverage (217,854 states, ~10s): exercises level-0 → level-1 → level-2-mature compaction so BoundedWriteAmp is checked across the full chain and concurrent in-flight merges interact. Combined run is ~11s. Both share the same invariant + property set; only the constants differ. The TLA+ run-all loop in CLAUDE.md now globs every .cfg and resolves the matching .tla, supporting any number of configs per spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ariants Phase B of the verification-gap closure (PR #6369). Eliminates drift between the merge-pipeline model and any production-side checks by making them evaluate the *same* Rust functions on the *same* state. quickwit-dst/src/invariants/merge_pipeline.rs (new): - MergePipelineState struct mirroring the TLA+ VARIABLES block (planner_alive, in_flight_merges, cold_windows, published_splits, splits, orphan_outputs, lifecycle counters, total_ingested_rows ghost) - Pure-function predicates corresponding 1:1 to TLA+ invariants: rows_conserved, bounded_write_amp, no_split_loss, no_duplicate_merge, no_orphan_in_planner, no_orphan_when_connected, leak_is_object_store_only, mp1_level_homogeneity, restart_re_seeds_all_immature - Helper orphan_set mirrors the TLA+ OrphanSet operator - 17 unit tests covering passing/failing paths for each predicate quickwit-dst/src/invariants/registry.rs: - Adds MP4..MP11 invariant IDs (preserved short-code naming convention) - Each maps to a TLA+ invariant from MergePipelineShutdown.tla quickwit-dst/src/models/merge_pipeline.rs: - State type is now invariants::merge_pipeline::MergePipelineState (literal sharing — no parallel definition, no conversion layer) - CompleteMerge split into UploadMergeOutput + PublishMergeAndFeedback to match the TLA+ multi-phase merge completion. A Crash between phases orphans the upload (orphan_outputs) without losing data. - Restart re-seeds cold_windows from durable published_splits (mirrors fetch_immature_splits) and resets shutdown_complete / finalize state, modelling a fresh process invocation. Bounded by max_restarts. - DrainComplete added so graceful-shutdown is a distinct terminal action (matching TLA+). - properties() calls the shared predicate functions instead of inline closures — model and runtime checks evaluate identical Rust code. - Three configs: small (fast iteration), multi_lifetime (3 lifetimes, matches MergePipelineShutdown.cfg), deep_chains (level 0→1→2 mature, matches MergePipelineShutdown_chains.cfg). - Restart's next_state debug_asserts MP-11 (restart_re_seeds_all_immature) on its post-state, encoding the action property as a runtime check. All 50 tests pass under `cargo nextest run -p quickwit-dst --features model-checking` (33 invariant unit tests + 4 merge-pipeline model configs + 13 other models). Multi-lifetime checks in 0.1s, deep-chains in 0.6s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase C of the verification-gap closure (PR #6369). Production code now evaluates the *same* invariant predicates that the Stateright model and TLA+ spec verify, with results forwarded to the existing DogStatsD recorder (already wired in quickwit-cli/src/logger.rs). Every check emits `pomsky.invariant.checked{invariant=MP-X}` and, on failure, `pomsky.invariant.violated{invariant=MP-X}` — usable directly as Datadog alert criteria. ParquetMergeExecutor (post-merge, pre-uploader): - MP-2 (HasMinimumSplits): merge has >= 2 inputs - MP-1 (LevelHomogeneity): all inputs share num_merge_ops level - MP-3 (ScopeCompatibility): all inputs share sort_fields and window - MP-4 (RowsConserved): sum of input num_rows == sum of output num_rows. Empty-output path checks input rows are all zero (otherwise data was silently dropped). The non-empty path checks the strong row-preservation property — the same MC-1 / RowsConserved invariant from the TLA+ spec. ParquetMergePlanner::new (post-restart re-seed): - MP-11 (RestartReSeedsAllImmature): every split the merge policy classifies as still-immature (and which has a window for compaction) is recorded in scoped_young_splits and known_split_ids. Filters out mature splits, time-expired splits, and pre-Phase-31 splits before comparing — these are intentional drops, not the failure mode MP-11 protects against. Cargo.toml: quickwit-indexing now depends on quickwit-dst for the shared invariant module and check_invariant! macro. Also fixes a pre-existing rustdoc warning: the parquet_merge_planner docs referenced `[`MergePlanner`]` as an intra-doc link, but the type lives in another crate and can't resolve. Changed to plain backticks. All 52 metrics_pipeline tests pass; clippy + nightly fmt + machete + cargo doc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… model Phase D of the verification-gap closure (PR #6369). The TLA+ spec and Stateright model are now backed by *production code that actually emits events at every modeled transition*, and a test replays those events through the same predicates the model verifies. A divergence between production and the model surfaces here as a predicate failure on real production state. This closes the loop: TLA+ → Stateright (Phase A+B) → production checks (Phase C) → trace conformance (Phase D). quickwit-dst/src/events/merge_pipeline.rs (new): - MergePipelineEvent enum with one variant per modeled action, using production types (String split IDs, Range<i64> windows). Notably no Crash variant — process death cannot emit anything; crashes are inferred during replay by the absence of events between actions and a subsequent Restart. - Pluggable observer (fn pointer) following the existing set_invariant_recorder pattern. No-op single-atomic-load when no observer is installed; production hot path stays cheap. Production emission sites (always-on, no feature flag): - ParquetMergePipeline::spawn_pipeline → Restart, after fetch_immature_splits - ParquetMergePipeline::FinishPendingMergesAndShutdownPipeline → DisconnectMergePlanner and RunFinalizeAndQuit - ParquetMergePlanner::send_merge_ops → PlanMerge - ParquetUploader::handle (post-upload, pre-publish) → UploadMergeOutput for merge outputs - Publisher::handle (Parquet publish path) → IngestSplit (when no replaced_split_ids) or PublishMergeAndFeedback (when replacing) quickwit-indexing/.../parquet_merge_pipeline_trace_conformance_test.rs (new): - Two scenarios: 1. Normal happy path: 4 splits → 2 merges → 1 mature output. Replay verifies MP-1, MP-4..MP-10 hold at every state. ~0.6s. 2. Crash mid-cascade: publish failure injected on the 2nd merge forces respawn. Trace covers Restart → PlanMerge → Upload → crash → Restart → re-seed → PlanMerge → Upload → Publish. ~1.1s. - StateMirror reconstructs MergePipelineState from events using a production-id → model-u32 interner, applies the same predicate functions used by the Stateright model. - The mock metastore tracks `staged` and `published` separately so list_metrics_splits returns only what real metastores would (not staged-but-failed). Earlier test-side bug here would have masked divergences; the trace test caught it immediately. What this catches that prior tests do not: - The crash test only verified re-seeding *happened*. The trace test verifies that the post-restart state preserves all formal safety invariants (no orphans the planner can never reclaim, no row drift, no level mixing on re-merge, etc.). - A divergence between production order/atomicity and the model's atomic actions surfaces as either a state-mirror error ("event can't be applied") or a predicate failure with the offending event in the panic message. Both tests pass with the current production code: production behavior matches the TLA+ spec for the scenarios exercised. The test infrastructure is now ready to surface real divergences if they exist, and to be extended with adversarial scenarios (multiple back-to-back crashes, finalize-during-crash, concurrency stress). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CS-3 ("splits before compaction_start_time are never compacted") is defined in the TLA+ model and the InvariantId registry, but the production `ParquetMergePolicyConfig` does not yet expose a `compaction_start_time` field. Adding a runtime check now would have nothing to filter against. Records the gap as a planner-level TODO so the check lands alongside the feature when it's implemented (filter splits in `record_splits_if_necessary` + verify via `check_invariant!`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

g-talbot and others added 7 commits May 1, 2026 21:53

fix: nightly rustfmt import ordering across all new test files

217b241

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

g-talbot requested a review from mattmkim May 2, 2026 02:04

g-talbot and others added 8 commits May 4, 2026 07:04

fix: replace match with ? operator in CompleteMerge handler

0c1907d

CI clippy lint (`clippy::question-mark`, -D warnings) rejected the explicit match-and-return-None pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into gtt/parquet-merge-verification

9cc2713

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Parquet merge pipeline verification suite#6369

test: Parquet merge pipeline verification suite#6369
g-talbot wants to merge 15 commits intomainfrom
gtt/parquet-merge-verification

g-talbot commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

g-talbot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification pyramid (single source of truth)

What landed

What's deferred

Coverage table (23 invariants in registry)

Sesh-mode rule

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

g-talbot commented May 2, 2026 •

edited

Loading