Priority updates by dvush · Pull Request #920 · flashbots/rbuilder

dvush · 2026-04-27T16:06:28Z

📝 Summary

WIP

💡 Motivation and Context

✅ I have completed the following steps:

Run make lint
Run make test
Added tests (if applicable)

claude · 2026-04-27T16:06:55Z

Claude finished @dvush's task in 7m 39s —— View job

Claude PR Review

Read priority_update files
Review PU integration in builders
Review live_builder PU classifier wiring
Post inline comments

Summary

The new priority-update pipeline is plumbed through ingress → classifier → PUR sim → per-worker overlays → builder commits in a way that mostly hangs together, but a few real correctness issues showed up. Inline comments are posted on the specific lines.

Bugs

Conflict resolver drops PU profit. parallel_builder/conflict_resolvers.rs only adds commit_result.order's coinbase profit into total_profit; the commit_result.priority_updates profits are ignored. This biases group selection in the assembler and makes simulation_cache::CachedSimulationState::total_profit inconsistent with the cached BundleState.
Non-deterministic PU commit order. PendingStateDb::into_used_pu_slots returns HashMap::into_values(), so the slot order — and therefore the order in which PUs are committed in front of a regular order — varies between runs/threads. Two builders/resolvers can produce different state and total_profit for the same group of orders. Sort by OrderId (or use a BTreeMap).
RoutingSink::remove_order always returns None for PU pool removals, even when something was evicted. Either return the evicted PU or document the trait change.
PUR Insert send error swallowed. PURCommandClassifier returns true and removes the order from the main pipeline regardless of whether the PUR worker is alive. If run_pur_sim_worker returns early (e.g. parent state lookup fails), every PU-classified order is silently dropped.

Other issues / smells

PU back-pressure can stall the PUR thread. apply_update does sub.send(cmd).await over a bounded 10k channel for every subscriber. One slow sim worker (holding pool_arc.lock() during simulate_order) can stall PU classification and cancellation for everyone. Consider try_send + drop counter.
PUs don't compose during simulation. simulate_priority_update runs against parent state with no PU overlay, so the pu_data.changeset stored in PendingUpdates reflects on-chain values, not values another already-pending PU would have produced. At commit time the actual sequence may diverge from this simulated changeset, which is also what's used for conflict eviction. Document or fix.
Per-worker Arc<Mutex<PriorityUpdatePool>> is uncontended by construction in sim_worker.rs, but the lock is held across the whole simulate_order call. Document the invariant or drop the Mutex.
Memory: PUData::changeset is held twice — once in PendingUpdates::orders and once in SimulatedOrder.pu_data. Wrap in Arc or move out.
apply_update's error! on pu_data == None is unreachable by every current caller (they pre-check). Make it a debug_assert! or invariant comment.

Pre-existing

block_building_result_assembler.rs still has the panic!("TODO: next_journal_sequence_number not set ...") and a hardcoded 0 in BlockBuildingHelperFromProvider::new. This predates the PR but build_new_block cannot succeed at runtime — worth fixing alongside the new force-TOB plumbing it now contains.

The PR description says WIP and the PR checklist (make lint, make test, tests added) is unchecked, so some of these may be known. Flagging them here so they aren't lost.

claude · 2026-04-27T16:12:54Z

                &|_| Ok(()),
-            )? {
+                &priority_update_pool.read(),
+            )?;


Bug: PU profit is not accumulated into total_profit.

partial_block.commit_order may execute one or more priority-update orders in front of sim_order, and each successful PU contributes coinbase profit (see commit_order_inner in building/mod.rs, where self.coinbase_profit += priority_update.coinbase_profit). Below, only res.coinbase_profit (the main order) feeds handle_successful_commit → total_profit. The PU profits in commit_result.priority_updates are dropped on the floor.

Consequences:

ResolutionResult::total_profit underestimates each group's profit, biasing the assembler's "sort groups by total_profit" against groups whose orders trigger PUs.

simulation_cache::CachedSimulationState::total_profit becomes inconsistent with the cached BundleState, so resumed runs will produce different totals depending on whether they hit the cache.

Sum the Ok entries in commit_result.priority_updates into total_profit (and probably into per_order_profits as well, with the PU's OrderId).

claude · 2026-04-27T16:13:04Z

+    pub fn into_used_pu_slots(self) -> Vec<SlotKey> {
+        self.used_pu_slots.into_values().collect()
+    }


Non-determinism in priority-update commit order.

used_pu_slots is a HashMap<OrderId, SlotKey>, and into_values() returns its values in unspecified order. The result is fed via simulated_order.used_priority_updates into PriorityUpdatePool::get_updates, which preserves the input slot order when building its returned Vec<&Order>. So the order in which priority updates are committed in front of a regular order is non-deterministic between runs / between threads.

Two builders or two parallel resolvers can produce different state for the same group of orders depending on hash randomization, which feeds straight into total_profit and the cached BundleState. Sort the result (e.g. by OrderId) before returning, or store in a BTreeMap, to make commit order deterministic.

claude · 2026-04-27T16:13:13Z

+    fn remove_order(&mut self, id: OrderId) -> Option<Arc<SimulatedOrder>> {
+        self.pool.apply_remove(&id);
+        self.store.remove_order(id)
+    }


PriorityUpdatePool::apply_remove returns nothing, and store.remove_order(id) returns None for any id that lives only in the PU pool. So RoutingSink::remove_order always returns None for PU-pool removals, even when the order was actually evicted. Callers using the return value (e.g. OrderIntakeConsumer::remove_orders chains in ordering_builder.rs) cannot detect that a PU was removed.

If the trait contract is "return Some(order) iff something was removed", track and return the evicted PU; otherwise document that PU removals are silent.

claude · 2026-04-27T16:13:27Z

+impl PURCommandClassifier {
+    pub fn try_consuming_new_order_command(&self, cmd: &OrderPoolCommand) -> bool {
+        match cmd {
+            OrderPoolCommand::Insert(order) => {
+                if order.metadata().priority_update_data.is_none() {
+                    return false;
+                }
+                self.inner.tracked_orders.lock().insert(order.id());
+                let _ = self
+                    .inner
+                    .cmd_sender
+                    .send(OrderPoolCommand::Insert(Arc::clone(order)));
+                true
+            }
+            OrderPoolCommand::Remove(id) => {
+                let known = self.inner.tracked_orders.lock().remove(id);
+                if known {
+                    let _ = self.inner.cmd_sender.send(OrderPoolCommand::Remove(*id));
+                    true
+                } else {
+                    false
+                }
+            }
+        }
+    }
+}


A few classification edge cases to consider:

try_consuming_new_order_command ignores send errors with let _ = ...send(...). For an unbounded_channel the only failure is "receiver dropped" (i.e. the PUR worker has exited). The classifier still returns true, swallowing the order from the main pipeline — so when the PUR thread dies (e.g. parent state lookup failed, see run_pur_sim_worker early-return), every PU-classified order is silently dropped instead of falling back to the regular pipeline. Worth at least a metric / warn.

Remove only forwards if the id was previously seen by Insert (tracked_orders membership). If a Remove arrives for a PU-classified id from a source that classifies differently than this side did (e.g. classifier was added/removed, or the classification rules changed), the cancellation gets routed to the regular pipeline, where it will be a no-op — and the PUR pool still holds the order.

Holding a parking_lot::Mutex is overkill if try_consuming_new_order_command is only ever called from SimulationJob::process_new_commands (a single async task). Consider documenting the threading expectation, or using a RefCell if not, or removing the lock entirely.

claude · 2026-04-27T16:13:41Z

+    let mut changeset = bundle_state.to_plain_state(OriginalValuesKnown::Yes);
+
+    if !changeset.contracts.is_empty() {
+        debug!(
+            reason = "changeset contains contracts",
+            "priority update discarded"
+        );
+        return Ok(None);
+    }
+
+    changeset
+        .storage
+        .retain(|s| s.address != coinbase && !senders.contains(&s.address));


Two correctness questions worth a comment in the code:

The PU is simulated against the parent state with no PU overlay. So PUs are not composed — if PU_A writes slot X and PU_B reads slot X to decide what to write, PU_B's simulation here uses the on-chain value, not PU_A's overlay. At commit time (partial_block.commit_order), they may execute against a different sequence of slot values, potentially diverging from the simulated changeset that's stored in pu_data.changeset and used for conflict eviction in PendingUpdates::add_new_simulated_update. Worth being explicit about this assumption.

changeset.storage.retain(|s| s.address != coinbase && !senders.contains(&s.address)) filters by sender/coinbase address, but a PU bundle may include several txs whose internal calls touch storage on contracts that happen to live at a sender's address (rare but possible). More importantly, the inverse: writes the PU performs to contract storage are kept, which is what we want — so this is fine. Just make sure the filter is intentional re: senders specifically.

claude · 2026-04-27T16:13:52Z

+    pub fn apply_update(&mut self, sim_order: Arc<SimulatedOrder>) -> Vec<OrderId> {
+        let Some(pu_data) = sim_order.pu_data.clone() else {
+            error!(order_id = ?sim_order.id(), "apply_update called with non-PU simulated order");
+            return Vec::new();
+        };
+        let order_id = sim_order.id();
+        let evicted = self
+            .pending
+            .add_new_simulated_update(order_id, pu_data.changeset);
+        for id in &evicted {
+            self.orders.remove(id);
+            self.force_top_of_block.remove(id);
+        }
+        if matches!(
+            sim_order.order.metadata().priority_update_data,
+            Some(PriorityUpdateClass::ForceTopOfBlock)
+        ) {
+            self.force_top_of_block
+                .insert(order_id, Arc::clone(&sim_order));
+        }
+        self.orders.insert(order_id, sim_order);
+        evicted


Minor:

sim_order.pu_data.clone() clones the entire Vec<PlainStorageChangeset>, then it's moved into pending in add_new_simulated_update. Meanwhile the same changeset is also kept inside sim_order (which is then Arc::cloned into self.orders). So every PU stores its changeset twice. With many PUs this is non-trivial memory; consider wrapping PUData::changeset in Arc<...> or moving it out of SimulatedOrder entirely once it's been "consumed" into PendingUpdates.

error! on pu_data being None is reasonable, but the only caller (PUSimWorkerOrderpool::consume_updates and apply_update from PUR worker) already checks sim_order.pu_data.is_some(). So this branch should be unreachable — make it a debug_assert! or document the invariant.

claude · 2026-04-27T16:14:04Z

+impl PUSimulationWorkerState {
+    async fn apply_update(&self, sim_order: Arc<SimulatedOrder>) -> Vec<OrderId> {
+        // Sync critical section: mutate pool, prune closed subs, snapshot subs.
+        let (evicted, subs) = {
+            let mut g = self.inner.lock();
+            let evicted = g.pool.apply_update(Arc::clone(&sim_order));
+            g.subscribers.retain(|s| !s.is_closed());
+            (evicted, g.subscribers.clone())
+        };
+        let cmd = SimulatedOrderCommand::Simulation(sim_order);
+        for sub in subs {
+            let _ = sub.send(cmd.clone()).await;
+        }
+        evicted
+    }


apply_update clones g.subscribers while holding the lock, then awaits sub.send(cmd.clone()).await for each subscriber outside the lock. The subscriber channel is bounded (PU_SUBSCRIBER_CHANNEL_CAPACITY = 10_000), and send().await blocks the PUR worker's task if any subscriber is full.

If one sim worker stalls (e.g. an expensive simulate_order call holding pool_arc.lock() so consume_updates() can't drain), the PUR thread back-pressures all PU classification, including the Cancellation path. That stops new PU updates from reaching even healthy sim workers and the assembler/builder.

Consider try_send with a counter on drops, or break out the slow-subscriber detection so a single stuck worker doesn't take down the whole PU pipeline. At minimum, log when a send actually has to wait.

claude · 2026-04-27T16:14:13Z

+                pu_orderpool.consume_updates();
+
                let order_id = task.order.id();
                let start_time = Instant::now();
-                let mut block_state = BlockState::new_arc(state_provider.clone());
-                let sim_result = simulate_order(
-                    task.parents.clone(),
-                    task.order,
-                    &current_sim_context.block_ctx,
-                    &mut local_ctx,
-                    &mut block_state,
-                );
+                let sim_result = {
+                    let cached = CachedDB::new(
+                        parent_state.clone(),
+                        current_sim_context.block_ctx.shared_cached_reads.clone(),
+                    );
+                    let pool_arc = pu_orderpool.pool();
+                    let pool_guard = pool_arc.lock();
+                    simulate_order(
+                        task.parents.clone(),
+                        task.order,
+                        &current_sim_context.block_ctx,
+                        &mut local_ctx,
+                        &pool_guard,
+                        cached,
+                    )
+                };


pool_arc.lock() is held for the entire duration of simulate_order (which can run a full bundle EVM). consume_updates() also locks the same mutex. Today this is fine because each worker has its own per-worker PUSimWorkerOrderpool and is single-threaded — the lock is uncontended.

But if a future refactor shares the worker pool across tasks, or if consume_updates ever runs concurrently (e.g. driven from a fan-out task), this serializes everything on the slowest sim. Worth either documenting "this lock is uncontended by construction" or removing the Mutex indirection entirely (the worker owns the pool).

claude · 2026-04-28T16:31:19Z

Claude finished @dvush's task in 5m 53s —— View job

Claude PR Review (update)

Check what changed since last review
Review priority_update modules
Check if previous review's bugs are still present
Review new force-TOB code
Post inline comments

Summary

This update reviews commits since the last review (d81389d..19c887b), which add force-TOB classification (PriorityUpdateClass::ForceTopOfBlock), a top-of-block commit pass, and a "skip PU if used slot is overwritten in bundle" guard. Most of the previous review still applies — flagging which findings persist and what's new from the recent commits.

New issues introduced by the latest commits

build_new_block panics regardless of new code. block_building_result_assembler.rs:298 still has panic!("TODO: next_journal_sequence_number not set ...") before the Ok(...) return, so the new force-TOB plumbing added at lines 200–218 is dead at runtime in the live path. Inline comment posted. The same code in build_backtest_block is exercisable, the live path isn't. (Pre-existing TODO, but the PR is now adding correctness-relevant logic into a function that cannot complete.)
Force-TOB orders may be double-executed via PU overlay. commit_force_top_of_block_orders in block_building_helper.rs:56 iterates force-TOB orders and calls commit_order on each. commit_order_inner then pulls in PUs from priority_update_pool.get_updates, which can include other force-TOB orders (they live in the same pool). If A's used_priority_updates references B's slots, A's commit pre-executes B; the loop then attempts B at top level, which fails (nonce reused) and traces success=false. Inline comment posted.
Resolver/live state divergence around force-TOB. Conflict resolution computes total_profit and a cached BundleState against parent state with no force-TOB pre-committed; build_new_block commits force-TOB first and then applies the resolved groups. The cached state may not match live state at the boundary, biasing the assembler's profit-sort and risking divergence in SharedSimulationCache. Inline comment posted.
slot_overwritten_in_bundle (priority_update/mod.rs:120) edge cases. Relies on StorageSlot::is_changed() which is present != original; a write that restores the slot to its original value will report unchanged, so the PU is re-applied on top. Selfdestruct/wipe is also not specially handled. Inline comment posted.

Previous review's findings — still present (not flagged inline this run; see prior review for context)

PU profit dropped from total_profit in parallel_builder/conflict_resolvers.rs:226 (only res.coinbase_profit is summed; commit_result.priority_updates profits are ignored).
Non-deterministic PU commit order via PendingStateDb::into_used_pu_slots returning HashMap::into_values().
RoutingSink::remove_order returns None for PU-pool removals even when the PU was evicted.
PURCommandClassifier::try_consuming_new_order_command swallows Insert send errors with let _ = ...send(...). If run_pur_sim_worker early-returns (e.g. parent state lookup fails), every PU-classified order is silently dropped.
PUSimulationWorkerState::apply_update does sub.send(cmd).await on a bounded 10k channel for each subscriber — one stalled sim worker back-pressures the entire PUR pipeline (including cancellations).
simulate_priority_update runs PUs against parent state with no PU overlay, so PUs don't compose; the stored pu_data.changeset may not match what the same PU produces at commit time.
Per-worker Arc<Mutex<PriorityUpdatePool>> lock is held across simulate_order (full EVM run) — uncontended today, but a footgun for any future fan-out.
PUData::changeset is held twice (PendingUpdates::orders + SimulatedOrder.pu_data).
apply_update's error! on pu_data == None is unreachable from current callers.

The PR is still marked WIP and the author's checklist (lint/test/added tests) is unchecked, so a few of these may already be on the to-do list.

claude · 2026-04-28T16:35:58Z

+        for (sim_order, commit_result, order_commit_time) in block_building_helper
+            .commit_force_top_of_block_orders(
+                &mut self.local_ctx,
+                &self.priority_update_pool.read(),
+            )?
+        {
+            let (success, gas_used, execution_error) = match commit_result.order {
+                Ok(res) => (true, res.space_used.gas, None),
+                Err(err) => (false, 0, Some(err)),
+            };
+            trace!(
+                order_id = ?sim_order.id(),
+                success,
+                order_commit_time_mus = order_commit_time.as_micros(),
+                gas_used,
+                ?execution_error,
+                "Executed force-TOB order"
+            );
+        }


build_new_block still panics at line 298 (panic!("TODO: next_journal_sequence_number not set ...")), so this new force-TOB block is dead code at runtime. The function is called from the live build path and aborts the builder thread before the Ok(...) return. The same block added in build_backtest_block is exercisable, but the live path isn't. Either fix the TODO in this PR, or hold this plumbing until the underlying BlockBuildingHelperFromProvider::new issue is resolved — right now the new logic added here cannot be exercised live and isn't covered by the test suite either.

claude · 2026-04-28T16:36:12Z

+    fn commit_force_top_of_block_orders(
+        &mut self,
+        local_ctx: &mut ThreadBlockBuildingContext,
+        priority_update_pool: &PriorityUpdatePool,
+    ) -> Result<Vec<(Arc<SimulatedOrder>, OrderCommitResult, Duration)>, CriticalCommitOrderError>
+    {
+        let force_orders = priority_update_pool.force_top_of_block_orders();
+        let mut results = Vec::with_capacity(force_orders.len());
+        for sim_order in force_orders {
+            let start = Instant::now();
+            let commit_result =
+                self.commit_order(local_ctx, &sim_order, priority_update_pool, &|_| Ok(()))?;
+            results.push((sim_order, commit_result, start.elapsed()));
+        }
+        Ok(results)
+    }


Force-TOB orders may be double-executed (and counted-then-failed) when they read each other's slots.

This loop iterates force-TOB orders sorted by OrderId and commits each via commit_order, which goes through commit_order_inner and calls priority_update_pool.get_updates(state, &order.used_priority_updates). used_priority_updates records every PU slot the order's simulation read — which includes slots written by other force-TOB PUs (they live in the same pool).

So if force-TOB order A is committed first and its used_priority_updates references a slot owned by force-TOB order B:

commit_order for A will pre-execute B as a PU sub-step (consuming B's nonces, contributing B's profit to coinbase_profit).

The loop then reaches B and calls commit_order(B, ...). B's top-level execution fails (nonce too low / already-included), and the trace logs success = false for B.

Net effect: B's profit is captured (good) but the visible status is misleading, and any downstream check on the per-order OrderCommitResult will treat B as failed.

Also: the regular-PU side of the same call may pull in non-force-TOB PUs, which were not intended to run at TOB. If that's the design, a comment makes it explicit; otherwise consider a dedicated commit_order_no_pu_overlay to prevent the cascade.

claude · 2026-04-28T16:36:24Z

+    /// Orders that must be committed at the top of every built block, sorted
+    /// by [`OrderId`] for deterministic inclusion order across builders. The
+    /// builder iterates this list once at the start of `build_block` and
+    /// commits each before the regular order loop runs.
+    pub fn force_top_of_block_orders(&self) -> Vec<Arc<SimulatedOrder>> {
+        let mut orders: Vec<_> = self.force_top_of_block.values().cloned().collect();
+        orders.sort_by_key(|sim| sim.id());
+        orders
+    }


Two follow-ups on the new force-TOB bucket:

The conflict resolver (conflict_resolvers.rs::run_conflict_task) builds each group's total_profit against parent state with no force-TOB orders pre-committed. Then build_new_block / build_backtest_block commit force-TOB at the very top before appending the resolved groups. If a group order's used_priority_updates references a slot owned by a force-TOB order, the resolver still pre-executes that PU via commit_order_inner, so the value the order observes matches between resolver and live build — but the actual gas / refund metering during the resolver's PU sub-step could differ from the live slot_overwritten_in_bundle path (which now skips that PU). The cached total_profit and BundleState in simulation_cache::CachedSimulationState will then be inconsistent with the post-force-TOB live state. Worth either pre-committing force-TOB inside the resolver too, or invalidating the simulation cache when the force-TOB set changes.

Sorting force-TOB by OrderId is deterministic, but OrderId order is essentially arbitrary from the user's perspective. If two orders both want the genuine top of block, a user can't predict who wins. If a tiebreaker exists in product semantics (e.g. who paid more, who arrived first), that should drive ordering rather than the hash.

claude · 2026-04-28T16:36:36Z

+fn slot_overwritten_in_bundle<DB>(state: &BlockState<DB>, slot: &SlotKey) -> bool {
+    let Some(account) = state.bundle_state().state.get(&slot.address) else {
+        return false;
+    };
+    let key = U256::from_be_bytes(slot.key.0);
+    account
+        .storage
+        .get(&key)
+        .map(|s| s.is_changed())
+        .unwrap_or(false)
+}


slot_overwritten_in_bundle will return false for slots an earlier order deleted/zeroed in this build.

StorageSlot::is_changed() returns present_value != original_value. If a previously-committed order writes a slot from 42 back to 42 (or the underlying revm bundle representation reports it as unchanged), this returns false — and the PU is then re-applied on top, even though the bundle state has effectively touched that slot already.

More importantly, this only handles the storage dimension. The PU's read may be of an account that another order already destroyed (selfdestruct → wipe_storage on the bundle). In that case account.storage doesn't contain the key; the function returns false and the PU is re-attempted.

These are edge cases, but the PR description ("skip PU if used slot is overwritten") implies safety here — worth a unit test covering: (a) revert-to-original write, (b) selfdestruct of the contract, (c) account never-touched-but-PU-pool-has-overlay (which works correctly, returns false, PU runs).

claude Bot reviewed Apr 27, 2026

View reviewed changes

dvush added 13 commits April 28, 2026 18:23

initial

8df3c9f

priority update simulation

22bbd90

pending state storage

58063c2

PUR simulation job

6b59800

PU commit scaffold

7706d7f

adapt to a new state

285ab5e

better PU orderpool maintain

8bc557e

detect used PU orders

d81389d

parallel builder has priority update pool

bf59b4f

PU classificaton

5174cf0

skip PU if used slot is overwritten

a424582

TOB pu updates

39c66c7

fmt

19c887b

dvush force-pushed the priority_updates branch from 13f45ec to 19c887b Compare April 28, 2026 16:30

claude Bot reviewed Apr 28, 2026

View reviewed changes

Conversation

dvush commented Apr 27, 2026

📝 Summary

💡 Motivation and Context

✅ I have completed the following steps:

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review

Summary

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (update)

Summary

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 28, 2026 •

edited

Loading