Skip to content

DO NOT MERGE: Kill scheduler on overlay v2#5326

Open
graydon wants to merge 2 commits into
stellar:overlay-v2-sharedfrom
graydon:kill-scheduler
Open

DO NOT MERGE: Kill scheduler on overlay v2#5326
graydon wants to merge 2 commits into
stellar:overlay-v2-sharedfrom
graydon:kill-scheduler

Conversation

@graydon

@graydon graydon commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This removes the Action Scheduler / Action Queues from core on the overlay-v2-shared branch. The hypothesis here is that it's no longer needed with the overlay process doing all the networking.

Copilot AI review requested due to automatic review settings June 16, 2026 02:09
@graydon graydon changed the base branch from master to overlay-v2-shared June 16, 2026 02:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the in-process Action Scheduler/Action Queues and routes main-thread “actions” through the VirtualClock’s ASIO io_context, consistent with the overlay-v2-shared hypothesis that networking work is handled elsewhere.

Changes:

  • Removed util::Scheduler implementation and its unit tests.
  • Simplified VirtualClock action posting to enqueue std::function<void()> and forward them to ASIO, removing scheduler-driven fairness/load-shedding.
  • Updated Application / ApplicationImpl / AppConnector main-thread posting APIs to drop Scheduler::ActionType.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/util/Timer.h Removes scheduler ownership/typing from VirtualClock and updates related comments/APIs.
src/util/Timer.cpp Removes scheduler integration from crank() and posts pending actions directly to ASIO.
src/util/test/SchedulerTests.cpp Deletes scheduler unit tests.
src/util/Scheduler.h Deletes scheduler interface.
src/util/Scheduler.cpp Deletes scheduler implementation.
src/process/ProcessManagerImpl.h Adds missing <map> include for std::map member usage.
src/main/ApplicationImpl.h Updates postOnMainThread signature to remove action type.
src/main/ApplicationImpl.cpp Routes main-thread posts through new VirtualClock::postAction(std::function<void()>&&) API.
src/main/Application.h Updates Application::postOnMainThread signature to remove action type.
src/main/AppConnector.h Updates connector postOnMainThread signature to remove action type.
src/main/AppConnector.cpp Updates connector call-through to match new Application API.
Comments suppressed due to low confidence (1)

src/util/Timer.cpp:416

  • This loop holds mPendingActionQueueMutex while calling asio::post repeatedly. That can unnecessarily block background threads calling postAction(), and it makes the critical section include allocations and executor bookkeeping. Consider swapping the queue into a local variable under the lock, then posting without holding the mutex.
    // Transfer any pending actions to ASIO, counting them as "progress" also.
    {
        LOCK_GUARD(mPendingActionQueueMutex, guard);
        while (!mPendingActionQueue.empty())
        {
            asio::post(mIOContext, std::move(mPendingActionQueue.front()));
            mPendingActionQueue.pop();
            progressCount++;
        }
    }

Comment thread src/util/Timer.cpp
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

auto& vec = mPendingQSetFetches[h];
vec.push_back(envelope);

P1 Badge Request missing quorum sets instead of only tracking them

When an SCP envelope references a quorum set that is not already cached, this only appends the envelope to mPendingQSetFetches; unlike the txset path below, no IPC request is sent and the Rust overlay has no qset fetch path. On a fresh node or after a validator changes quorum sets, those envelopes stay in FETCHING indefinitely, so SCP cannot process messages from that validator.


self.overlay_handle.submit_tx(tx, 0, 0);

P1 Badge Relay peer transactions beyond the first hop

When a transaction is learned from a peer, the handler only inserts it into the local Rust mempool. Since only local Core submissions call broadcast_tx, any topology that is not fully connected will stop transaction propagation at the origin's direct peers, leaving validators further away unable to nominate the transaction.


return Err(XdrError::UnsupportedFeeBump);

P1 Badge Preserve fee-bump transactions in the Rust overlay

For every TransactionEnvelope::TxFeeBump, parsing returns UnsupportedFeeBump; the submit and receive paths then drop the transaction after C++ has already reported PENDING to the caller. Fee-bump transactions are valid Stellar transactions, so this makes them impossible to propagate or include through the new overlay.


mApp.getOverlayManager().broadcastTransaction(env, tx->getFullFee(),
tx->getNumOperations());
return TxSubmitStatus::TX_STATUS_PENDING;

P2 Badge Keep standalone submissions in a local path

In RUN_STANDALONE mode RustOverlayManager::start() skips starting the IPC connection, but recvTransaction still sends the transaction to the overlay and immediately returns PENDING. In standalone/manual-close workflows this send is a no-op and triggerNextLedger reads an empty overlay mempool, so submitted transactions disappear instead of being included.


return mApp.getHerder().getSCPStateForPeer(ledgerSeq);

P2 Badge Post SCP state collection to the main thread

This callback is invoked by OverlayIPC::readerLoop, so it traverses Herder/SCP state from the IPC reader thread while the main thread may be mutating the same state. The neighboring receive callbacks post back to the main thread; peer GET_SCP_STATE requests should do the same or otherwise synchronize access to avoid races and corrupted SCP snapshots.


std::tie(proposedSet, applicableProposedSet) =
makeTxSetFromTransactions(txPhases, mApp, lowerBoundCloseTimeOffset,
upperBoundCloseTimeOffset, invalidTxPhases);

P2 Badge Evict invalid mempool entries after txset building

makeTxSetFromTransactions reports invalid entries through invalidTxPhases, but the new overlay path never sends those hashes back to Rust for removal. High-fee invalid or stale transactions can therefore be returned at the top of every nomination request until age eviction, crowding out valid lower-fee transactions because only a bounded number of candidates is requested each round.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d11f6b304

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread overlay/src/main.rs
};
use metrics::OverlayMetrics;

const CONFIGURED_PEER_RECONNECT_INTERVAL: Duration = Duration::from_secs(1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throttle unresolved peer retries outside the 1s loop

When a configured DNS peer has never resolved or connected, it is absent from peer_hostnames, so every reconnect tick treats it as unknown_addrs and launches another DNS-resolution task, even though spawn_peer_retry_task is already doing exponential backoff for the same unresolved hosts. With this interval at 1s, a slow or missing Kubernetes DNS record will be retried every second (and can overlap for multiple peers), bypassing the intended backoff and potentially issuing duplicate raw dials when it finally resolves; track in-flight/unresolved hosts or leave them solely to the retry task.

Useful? React with 👍 / 👎.

Comment thread src/util/Scheduler.cpp
@@ -1,392 +0,0 @@
// Copyright 2020 Stellar Development Foundation and contributors. Licensed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove deleted scheduler files from the VS project

When building through Builds/VisualStudio/stellar-core.vcxproj, the project still references ..\..\src\util\Scheduler.cpp, ..\..\src\util\test\SchedulerTests.cpp, and ..\..\src\util\Scheduler.h; after this deletion those files no longer exist, so the Visual Studio build will fail even though the source files were removed. Please delete the stale project/filter entries along with the scheduler sources.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants