DO NOT MERGE: sketch eager random broadcast trees in overlay-v2#5318
DO NOT MERGE: sketch eager random broadcast trees in overlay-v2#5318graydon wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9948d05294
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let mut encoded = Vec::with_capacity(TX_TTL_BYTES + message.len()); | ||
| encoded.extend_from_slice(&tx_message.ttl.to_be_bytes()); | ||
| encoded.extend_from_slice(&message); |
There was a problem hiding this comment.
Preserve compatibility when sending TX frames
Because the TX stream protocol remains /stellar/tx/1.0.0, a mixed-version peer will accept this stream and then try to decode the payload as a StellarMessage. Prefixing every TX with the TTL makes upgraded nodes send bytes that older nodes cannot parse, so any rolling upgrade or connection to an unupgraded peer drops all TX responses/eager TXs even though this decoder still accepts legacy frames. Please either negotiate/bump the protocol or keep legacy TX encoding when talking to peers that have not advertised TTL support.
Useful? React with 👍 / 👎.
| let peers_to_send = select_eager_tx_peers(peers, None, initial_fanout); | ||
| let ttl = self.state.tx_eager_depth.load(Ordering::Relaxed) as u32; |
There was a problem hiding this comment.
Avoid relaying after the initial fanout already covers peers
For peer counts up to TX_FLOOD_EAGER_FANOUT, select_eager_tx_peers already sends the transaction to every connected peer, but calculate_eager_depth returns 1 and that TTL is forwarded here. In a 4-peer overlay this makes each recipient relay the full TX to the other three peers, producing 12 duplicate full-TX sends that will be dropped by dedup; use one less hop for the initial broadcast or make calculate_eager_depth account for the initial fanout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 138ee10bb8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let (ttl, data) = match StellarMessage::from_xdr(data, Limits::none()) { | ||
| Ok(StellarMessage::Transaction(envelope)) => { | ||
| let tx = | ||
| crate::xdr::canonical_transaction_xdr(envelope).map_err(to_invalid_data)?; | ||
| return Ok(TxStreamMessage::Tx(TxMessage { tx, ttl: 0 })); | ||
| } | ||
| Ok(message) => return Self::decode_stellar_message(message), |
There was a problem hiding this comment.
Decode TTL-prefixed TX frames before legacy messages
When the eager depth equals a valid MessageType discriminator, upgraded peers can drop eager TXs before the TTL is stripped. With the default fanout, any node with 5–16 peers broadcasts with ttl=2, and the first StellarMessage::from_xdr(data, ...) treats the TTL bytes as MessageType::Auth instead of falling through to the TTL-prefixed decode path, so decode_stellar_message rejects the frame as an unexpected TX-stream message. This affects normal upgraded-to-upgraded eager flooding on moderately connected nodes; check the TTL prefix first (or otherwise disambiguate it) rather than attempting a legacy StellarMessage parse on the whole buffer.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fcafa33388
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if peer_count <= 1 { | ||
| return 0; |
There was a problem hiding this comment.
Seed eager floods with a relay hop
When the origin has exactly one connected peer, such as the A-B-C chain exercised by test_scp_relay_three_nodes, this returns TTL 0. The receiver only relays SCP/TX when ttl > 0, and received payloads are marked seen before Core can rebroadcast them, so B delivers the message locally but never forwards it to C. This breaks propagation from edge nodes in non-fully-connected topologies.
Useful? React with 👍 / 👎.
| encoded.extend_from_slice(&ttl.to_be_bytes()); | ||
| encoded.extend_from_slice(&message); |
There was a problem hiding this comment.
Preserve SCP wire compatibility
This prefixes every outbound SCP frame with a TTL while the negotiated stream remains /stellar/scp/1.0.0. During rolling upgrades, older peers on the same protocol will parse these bytes as the StellarMessage discriminator rather than stripping a TTL, so upgraded nodes can no longer send SCP messages to unupgraded peers. Please bump/negotiate the protocol or keep legacy SCP encoding unless the peer advertises TTL support.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9eec9ec485
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let hash = blake2b_hash(envelope); | ||
| let ttl = self.state.overlay_eager_depth.load(Ordering::Relaxed) as u32; |
There was a problem hiding this comment.
Mark locally broadcast SCP as seen
When a node originates an SCP flood with ttl >= 2 (for example with the default fanout once it has at least 5 known/connected peers), a relay can send the same envelope back through a different peer. Since broadcast_scp no longer inserts the local hash into scp_seen, that returning copy is treated as a fresh inbound SCP and forwarded back to Core, unlike TX broadcasts which still mark tx_seen; this can create duplicate consensus delivery and additional redundant relays for messages we originated.
Useful? React with 👍 / 👎.
9eec9ec to
cff0b1a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cff0b1a758
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| Ok(other) => { | ||
| return Err(io::Error::new( | ||
| io::ErrorKind::InvalidData, | ||
| format!("unexpected SCP stream StellarMessage {}", other.name()), | ||
| )); |
There was a problem hiding this comment.
Decode TTL-prefixed SCP before rejecting parsed variants
When the eager depth equals a valid StellarMessage discriminator, an upgraded peer can drop upgraded SCP floods before the TTL is stripped. For example, with the default fanout a node sized for 5–16 peers sends ttl=2; the first parse_stellar_message(data) can classify those TTL bytes as another message type (as with the TX decoder path) and this Ok(other) branch returns an error instead of falling through to parse data[4..] as the actual SCP envelope, so normal eager SCP propagation fails for those topologies. Try the TTL-prefixed decode before rejecting non-SCP legacy parses, or otherwise disambiguate the framing.
Useful? React with 👍 / 👎.
| self.libp2p_handle | ||
| .set_known_peer_count(all_peers.len().saturating_sub(1)) | ||
| .await; |
There was a problem hiding this comment.
Count every configured peer when sizing eager trees
all_peers is the concatenation of Core's configured known/preferred remote peer addresses, and self-addresses are only skipped later while resolving, so subtracting one under-reports the network size whenever the config does not include this node. With 5 configured peers but only a subset connected, the eager depth is computed for 4 peers (ttl=1) rather than 5 (ttl=2), making startup or partial-connectivity broadcasts die one hop earlier than intended. Pass the full length, or subtract only addresses proven to be local.
Useful? React with 👍 / 👎.
This is a sketch of an eager tx propagation through multiple redundant random broadcast trees, assuming a full connectivity topology at the overlay level. It's done against overlay v2. Only opening to make some builds and run some experiments. don't merge.