DO NOT MERGE: make connection loop more aggressive, use hickory DNS client by graydon · Pull Request #5328 · stellar/stellar-core

graydon · 2026-06-17T08:10:17Z

this is an attempt at making overlay v2 a little more aggressive about trying to connect to its peers, as well as switching to a custom DNS resolver (so we can clear any cached failed lookups) and an improved sync of metrics when the /info endpoint of core gets hit (which is what supercluster listens to). there's a corresponding change in supercluster to help with DNS publishing.

I don't know which if any of these will help, but the goal here is to try to improve the behavior of supercluster runs that just stall out trying to get to their target connection count.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: faeed5a5a7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T08:17:24Z

        // Falls back to address-based dials for peers we haven't connected to yet.
-        // This is a fallback — targeted reconnection on disconnect handles the fast path.
-        let mut reconnect_interval = tokio::time::interval(Duration::from_secs(30));
+        let mut reconnect_interval = tokio::time::interval(CONFIGURED_PEER_RECONNECT_INTERVAL);


Avoid overlapping raw reconnect dials

When a configured peer has not reached PeerConnected yet (startup, DNS change, or a slow QUIC handshake), it remains in unknown_addrs, and the reconnect branch raw-dials it from a spawned task on every timer tick. With this interval now at 1s, any resolution/handshake taking longer than a tick can create concurrent address-only dials to the same peer; the code below even notes those raw dials cannot be deduplicated by libp2p before the PeerId is known, so this can inflate pending connections and duplicate handshakes rather than just retry faster. Please gate raw dials on an in-flight/resolved set or keep the fast timer to PeerId-based retries only.

Useful? React with 👍 / 👎.

make connection loop more aggressive, use hickory DNS client

faeed5a

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE: make connection loop more aggressive, use hickory DNS client#5328

DO NOT MERGE: make connection loop more aggressive, use hickory DNS client#5328
graydon wants to merge 1 commit into
stellar:overlay-v2-sharedfrom
graydon:aggressive-reconnect

graydon commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

graydon commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant