Conversation
|
Claude finished @dvush's task in 7m 39s —— View job Claude PR Review
SummaryThe new priority-update pipeline is plumbed through ingress → classifier → PUR sim → per-worker overlays → builder commits in a way that mostly hangs together, but a few real correctness issues showed up. Inline comments are posted on the specific lines. Bugs
Other issues / smells
Pre-existing
The PR description says WIP and the PR checklist ( |
| &|_| Ok(()), | ||
| )? { | ||
| &priority_update_pool.read(), | ||
| )?; |
There was a problem hiding this comment.
Bug: PU profit is not accumulated into total_profit.
partial_block.commit_order may execute one or more priority-update orders in front of sim_order, and each successful PU contributes coinbase profit (see commit_order_inner in building/mod.rs, where self.coinbase_profit += priority_update.coinbase_profit). Below, only res.coinbase_profit (the main order) feeds handle_successful_commit → total_profit. The PU profits in commit_result.priority_updates are dropped on the floor.
Consequences:
ResolutionResult::total_profitunderestimates each group's profit, biasing the assembler's "sort groups by total_profit" against groups whose orders trigger PUs.simulation_cache::CachedSimulationState::total_profitbecomes inconsistent with the cachedBundleState, so resumed runs will produce different totals depending on whether they hit the cache.
Sum the Ok entries in commit_result.priority_updates into total_profit (and probably into per_order_profits as well, with the PU's OrderId).
| pub fn into_used_pu_slots(self) -> Vec<SlotKey> { | ||
| self.used_pu_slots.into_values().collect() | ||
| } |
There was a problem hiding this comment.
Non-determinism in priority-update commit order.
used_pu_slots is a HashMap<OrderId, SlotKey>, and into_values() returns its values in unspecified order. The result is fed via simulated_order.used_priority_updates into PriorityUpdatePool::get_updates, which preserves the input slot order when building its returned Vec<&Order>. So the order in which priority updates are committed in front of a regular order is non-deterministic between runs / between threads.
Two builders or two parallel resolvers can produce different state for the same group of orders depending on hash randomization, which feeds straight into total_profit and the cached BundleState. Sort the result (e.g. by OrderId) before returning, or store in a BTreeMap, to make commit order deterministic.
| fn remove_order(&mut self, id: OrderId) -> Option<Arc<SimulatedOrder>> { | ||
| self.pool.apply_remove(&id); | ||
| self.store.remove_order(id) | ||
| } |
There was a problem hiding this comment.
PriorityUpdatePool::apply_remove returns nothing, and store.remove_order(id) returns None for any id that lives only in the PU pool. So RoutingSink::remove_order always returns None for PU-pool removals, even when the order was actually evicted. Callers using the return value (e.g. OrderIntakeConsumer::remove_orders chains in ordering_builder.rs) cannot detect that a PU was removed.
If the trait contract is "return Some(order) iff something was removed", track and return the evicted PU; otherwise document that PU removals are silent.
| impl PURCommandClassifier { | ||
| pub fn try_consuming_new_order_command(&self, cmd: &OrderPoolCommand) -> bool { | ||
| match cmd { | ||
| OrderPoolCommand::Insert(order) => { | ||
| if order.metadata().priority_update_data.is_none() { | ||
| return false; | ||
| } | ||
| self.inner.tracked_orders.lock().insert(order.id()); | ||
| let _ = self | ||
| .inner | ||
| .cmd_sender | ||
| .send(OrderPoolCommand::Insert(Arc::clone(order))); | ||
| true | ||
| } | ||
| OrderPoolCommand::Remove(id) => { | ||
| let known = self.inner.tracked_orders.lock().remove(id); | ||
| if known { | ||
| let _ = self.inner.cmd_sender.send(OrderPoolCommand::Remove(*id)); | ||
| true | ||
| } else { | ||
| false | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
A few classification edge cases to consider:
-
try_consuming_new_order_commandignores send errors withlet _ = ...send(...). For anunbounded_channelthe only failure is "receiver dropped" (i.e. the PUR worker has exited). The classifier still returnstrue, swallowing the order from the main pipeline — so when the PUR thread dies (e.g. parent state lookup failed, seerun_pur_sim_workerearly-return), every PU-classified order is silently dropped instead of falling back to the regular pipeline. Worth at least a metric / warn. -
Removeonly forwards if the id was previously seen byInsert(tracked_ordersmembership). If aRemovearrives for a PU-classified id from a source that classifies differently than this side did (e.g. classifier was added/removed, or the classification rules changed), the cancellation gets routed to the regular pipeline, where it will be a no-op — and the PUR pool still holds the order. -
Holding a
parking_lot::Mutexis overkill iftry_consuming_new_order_commandis only ever called fromSimulationJob::process_new_commands(a single async task). Consider documenting the threading expectation, or using aRefCellif not, or removing the lock entirely.
| let mut changeset = bundle_state.to_plain_state(OriginalValuesKnown::Yes); | ||
|
|
||
| if !changeset.contracts.is_empty() { | ||
| debug!( | ||
| reason = "changeset contains contracts", | ||
| "priority update discarded" | ||
| ); | ||
| return Ok(None); | ||
| } | ||
|
|
||
| changeset | ||
| .storage | ||
| .retain(|s| s.address != coinbase && !senders.contains(&s.address)); |
There was a problem hiding this comment.
Two correctness questions worth a comment in the code:
-
The PU is simulated against the parent state with no PU overlay. So PUs are not composed — if PUA writes slot X and PUB reads slot X to decide what to write, PUB's simulation here uses the on-chain value, not PUA's overlay. At commit time (
partial_block.commit_order), they may execute against a different sequence of slot values, potentially diverging from the simulatedchangesetthat's stored inpu_data.changesetand used for conflict eviction inPendingUpdates::add_new_simulated_update. Worth being explicit about this assumption. -
changeset.storage.retain(|s| s.address != coinbase && !senders.contains(&s.address))filters by sender/coinbase address, but a PU bundle may include several txs whose internal calls touch storage on contracts that happen to live at a sender's address (rare but possible). More importantly, the inverse: writes the PU performs to contract storage are kept, which is what we want — so this is fine. Just make sure the filter is intentional re: senders specifically.
| pub fn apply_update(&mut self, sim_order: Arc<SimulatedOrder>) -> Vec<OrderId> { | ||
| let Some(pu_data) = sim_order.pu_data.clone() else { | ||
| error!(order_id = ?sim_order.id(), "apply_update called with non-PU simulated order"); | ||
| return Vec::new(); | ||
| }; | ||
| let order_id = sim_order.id(); | ||
| let evicted = self | ||
| .pending | ||
| .add_new_simulated_update(order_id, pu_data.changeset); | ||
| for id in &evicted { | ||
| self.orders.remove(id); | ||
| self.force_top_of_block.remove(id); | ||
| } | ||
| if matches!( | ||
| sim_order.order.metadata().priority_update_data, | ||
| Some(PriorityUpdateClass::ForceTopOfBlock) | ||
| ) { | ||
| self.force_top_of_block | ||
| .insert(order_id, Arc::clone(&sim_order)); | ||
| } | ||
| self.orders.insert(order_id, sim_order); | ||
| evicted |
There was a problem hiding this comment.
Minor:
-
sim_order.pu_data.clone()clones the entireVec<PlainStorageChangeset>, then it's moved intopendinginadd_new_simulated_update. Meanwhile the samechangesetis also kept insidesim_order(which is thenArc::cloned intoself.orders). So every PU stores its changeset twice. With many PUs this is non-trivial memory; consider wrappingPUData::changesetinArc<...>or moving it out ofSimulatedOrderentirely once it's been "consumed" intoPendingUpdates. -
error!onpu_databeingNoneis reasonable, but the only caller (PUSimWorkerOrderpool::consume_updatesandapply_updatefrom PUR worker) already checkssim_order.pu_data.is_some(). So this branch should be unreachable — make it adebug_assert!or document the invariant.
| impl PUSimulationWorkerState { | ||
| async fn apply_update(&self, sim_order: Arc<SimulatedOrder>) -> Vec<OrderId> { | ||
| // Sync critical section: mutate pool, prune closed subs, snapshot subs. | ||
| let (evicted, subs) = { | ||
| let mut g = self.inner.lock(); | ||
| let evicted = g.pool.apply_update(Arc::clone(&sim_order)); | ||
| g.subscribers.retain(|s| !s.is_closed()); | ||
| (evicted, g.subscribers.clone()) | ||
| }; | ||
| let cmd = SimulatedOrderCommand::Simulation(sim_order); | ||
| for sub in subs { | ||
| let _ = sub.send(cmd.clone()).await; | ||
| } | ||
| evicted | ||
| } |
There was a problem hiding this comment.
apply_update clones g.subscribers while holding the lock, then awaits sub.send(cmd.clone()).await for each subscriber outside the lock. The subscriber channel is bounded (PU_SUBSCRIBER_CHANNEL_CAPACITY = 10_000), and send().await blocks the PUR worker's task if any subscriber is full.
If one sim worker stalls (e.g. an expensive simulate_order call holding pool_arc.lock() so consume_updates() can't drain), the PUR thread back-pressures all PU classification, including the Cancellation path. That stops new PU updates from reaching even healthy sim workers and the assembler/builder.
Consider try_send with a counter on drops, or break out the slow-subscriber detection so a single stuck worker doesn't take down the whole PU pipeline. At minimum, log when a send actually has to wait.
| pu_orderpool.consume_updates(); | ||
|
|
||
| let order_id = task.order.id(); | ||
| let start_time = Instant::now(); | ||
| let mut block_state = BlockState::new_arc(state_provider.clone()); | ||
| let sim_result = simulate_order( | ||
| task.parents.clone(), | ||
| task.order, | ||
| ¤t_sim_context.block_ctx, | ||
| &mut local_ctx, | ||
| &mut block_state, | ||
| ); | ||
| let sim_result = { | ||
| let cached = CachedDB::new( | ||
| parent_state.clone(), | ||
| current_sim_context.block_ctx.shared_cached_reads.clone(), | ||
| ); | ||
| let pool_arc = pu_orderpool.pool(); | ||
| let pool_guard = pool_arc.lock(); | ||
| simulate_order( | ||
| task.parents.clone(), | ||
| task.order, | ||
| ¤t_sim_context.block_ctx, | ||
| &mut local_ctx, | ||
| &pool_guard, | ||
| cached, | ||
| ) | ||
| }; |
There was a problem hiding this comment.
pool_arc.lock() is held for the entire duration of simulate_order (which can run a full bundle EVM). consume_updates() also locks the same mutex. Today this is fine because each worker has its own per-worker PUSimWorkerOrderpool and is single-threaded — the lock is uncontended.
But if a future refactor shares the worker pool across tasks, or if consume_updates ever runs concurrently (e.g. driven from a fan-out task), this serializes everything on the slowest sim. Worth either documenting "this lock is uncontended by construction" or removing the Mutex indirection entirely (the worker owns the pool).
|
Claude finished @dvush's task in 5m 53s —— View job Claude PR Review (update)
SummaryThis update reviews commits since the last review ( New issues introduced by the latest commits
Previous review's findings — still present (not flagged inline this run; see prior review for context)
The PR is still marked WIP and the author's checklist (lint/test/added tests) is unchecked, so a few of these may already be on the to-do list. |
| for (sim_order, commit_result, order_commit_time) in block_building_helper | ||
| .commit_force_top_of_block_orders( | ||
| &mut self.local_ctx, | ||
| &self.priority_update_pool.read(), | ||
| )? | ||
| { | ||
| let (success, gas_used, execution_error) = match commit_result.order { | ||
| Ok(res) => (true, res.space_used.gas, None), | ||
| Err(err) => (false, 0, Some(err)), | ||
| }; | ||
| trace!( | ||
| order_id = ?sim_order.id(), | ||
| success, | ||
| order_commit_time_mus = order_commit_time.as_micros(), | ||
| gas_used, | ||
| ?execution_error, | ||
| "Executed force-TOB order" | ||
| ); | ||
| } |
There was a problem hiding this comment.
build_new_block still panics at line 298 (panic!("TODO: next_journal_sequence_number not set ...")), so this new force-TOB block is dead code at runtime. The function is called from the live build path and aborts the builder thread before the Ok(...) return. The same block added in build_backtest_block is exercisable, but the live path isn't. Either fix the TODO in this PR, or hold this plumbing until the underlying BlockBuildingHelperFromProvider::new issue is resolved — right now the new logic added here cannot be exercised live and isn't covered by the test suite either.
| fn commit_force_top_of_block_orders( | ||
| &mut self, | ||
| local_ctx: &mut ThreadBlockBuildingContext, | ||
| priority_update_pool: &PriorityUpdatePool, | ||
| ) -> Result<Vec<(Arc<SimulatedOrder>, OrderCommitResult, Duration)>, CriticalCommitOrderError> | ||
| { | ||
| let force_orders = priority_update_pool.force_top_of_block_orders(); | ||
| let mut results = Vec::with_capacity(force_orders.len()); | ||
| for sim_order in force_orders { | ||
| let start = Instant::now(); | ||
| let commit_result = | ||
| self.commit_order(local_ctx, &sim_order, priority_update_pool, &|_| Ok(()))?; | ||
| results.push((sim_order, commit_result, start.elapsed())); | ||
| } | ||
| Ok(results) | ||
| } |
There was a problem hiding this comment.
Force-TOB orders may be double-executed (and counted-then-failed) when they read each other's slots.
This loop iterates force-TOB orders sorted by OrderId and commits each via commit_order, which goes through commit_order_inner and calls priority_update_pool.get_updates(state, &order.used_priority_updates). used_priority_updates records every PU slot the order's simulation read — which includes slots written by other force-TOB PUs (they live in the same pool).
So if force-TOB order A is committed first and its used_priority_updates references a slot owned by force-TOB order B:
commit_orderforAwill pre-executeBas a PU sub-step (consumingB's nonces, contributingB's profit tocoinbase_profit).- The loop then reaches
Band callscommit_order(B, ...).B's top-level execution fails (nonce too low / already-included), and the trace logssuccess = falseforB.
Net effect: B's profit is captured (good) but the visible status is misleading, and any downstream check on the per-order OrderCommitResult will treat B as failed.
Also: the regular-PU side of the same call may pull in non-force-TOB PUs, which were not intended to run at TOB. If that's the design, a comment makes it explicit; otherwise consider a dedicated commit_order_no_pu_overlay to prevent the cascade.
| /// Orders that must be committed at the top of every built block, sorted | ||
| /// by [`OrderId`] for deterministic inclusion order across builders. The | ||
| /// builder iterates this list once at the start of `build_block` and | ||
| /// commits each before the regular order loop runs. | ||
| pub fn force_top_of_block_orders(&self) -> Vec<Arc<SimulatedOrder>> { | ||
| let mut orders: Vec<_> = self.force_top_of_block.values().cloned().collect(); | ||
| orders.sort_by_key(|sim| sim.id()); | ||
| orders | ||
| } |
There was a problem hiding this comment.
Two follow-ups on the new force-TOB bucket:
-
The conflict resolver (
conflict_resolvers.rs::run_conflict_task) builds each group'stotal_profitagainst parent state with no force-TOB orders pre-committed. Thenbuild_new_block/build_backtest_blockcommit force-TOB at the very top before appending the resolved groups. If a group order'sused_priority_updatesreferences a slot owned by a force-TOB order, the resolver still pre-executes that PU viacommit_order_inner, so the value the order observes matches between resolver and live build — but the actual gas / refund metering during the resolver's PU sub-step could differ from the liveslot_overwritten_in_bundlepath (which now skips that PU). The cachedtotal_profitandBundleStateinsimulation_cache::CachedSimulationStatewill then be inconsistent with the post-force-TOB live state. Worth either pre-committing force-TOB inside the resolver too, or invalidating the simulation cache when the force-TOB set changes. -
Sorting force-TOB by
OrderIdis deterministic, butOrderIdorder is essentially arbitrary from the user's perspective. If two orders both want the genuine top of block, a user can't predict who wins. If a tiebreaker exists in product semantics (e.g. who paid more, who arrived first), that should drive ordering rather than the hash.
| fn slot_overwritten_in_bundle<DB>(state: &BlockState<DB>, slot: &SlotKey) -> bool { | ||
| let Some(account) = state.bundle_state().state.get(&slot.address) else { | ||
| return false; | ||
| }; | ||
| let key = U256::from_be_bytes(slot.key.0); | ||
| account | ||
| .storage | ||
| .get(&key) | ||
| .map(|s| s.is_changed()) | ||
| .unwrap_or(false) | ||
| } |
There was a problem hiding this comment.
slot_overwritten_in_bundle will return false for slots an earlier order deleted/zeroed in this build.
StorageSlot::is_changed() returns present_value != original_value. If a previously-committed order writes a slot from 42 back to 42 (or the underlying revm bundle representation reports it as unchanged), this returns false — and the PU is then re-applied on top, even though the bundle state has effectively touched that slot already.
More importantly, this only handles the storage dimension. The PU's read may be of an account that another order already destroyed (selfdestruct → wipe_storage on the bundle). In that case account.storage doesn't contain the key; the function returns false and the PU is re-attempted.
These are edge cases, but the PR description ("skip PU if used slot is overwritten") implies safety here — worth a unit test covering: (a) revert-to-original write, (b) selfdestruct of the contract, (c) account never-touched-but-PU-pool-has-overlay (which works correctly, returns false, PU runs).
📝 Summary
WIP
💡 Motivation and Context
✅ I have completed the following steps:
make lintmake test