Conversation
| && let Some(peer) = self.relay_peers.get(&peer_id).cloned() | ||
| { | ||
| self.pending_relays.remove(&peer_id); | ||
| Self::queue_relay_dial( |
There was a problem hiding this comment.
This should have a back-off delay: https://github.com/ObolNetwork/charon/blob/v1.7.1/p2p/relay.go#L47
| } | ||
|
|
||
| /// Encodes bytes as lowercase `0x`-prefixed hex. | ||
| /// In case of empty bytes, returns an empty string. |
There was a problem hiding this comment.
This change will affect the eth2util serialization.
From here: https://github.com/NethermindEth/pluto/blob/main/crates/eth2api/src/spec/bellatrix.rs#L121
The bellatrix extra_data use the Hex0x which uses encode_0x_hex.
In go-eth2-client, this field expect "0x" when the extra_data is empty: https://github.com/attestantio/go-eth2-client/blob/master/spec/bellatrix/executionpayload.go#L91
Re-dial attempts on ConnectionClosed and DialFailure now wait with exponential backoff (base=1s, multiplier=1.6, max=120s) matching Charon's DefaultConfig. A pinned Sleep future in the behaviour registers the waker so the swarm is woken at the right time. Backoff state is reset on successful ConnectionEstablished. Co-Authored-By: Bohdan Ohorodnii <35969035+varex83@users.noreply.github.com>
encode_0x_hex always returns "0x{hex}" (including "0x" for empty),
matching the go-eth2-client expectation for fields like extra_data.
A new encode_hex_or_empty function and HexBytes serde adapter return ""
for empty bytes, matching Charon's to0xHex convention used throughout
cluster/definition JSON for optional fields like unsigned signatures.
All cluster crates switch from Hex0x to HexBytes. Regression tests
cover both serialization behaviors and deserialization of "" vs "0x".
Co-Authored-By: Bohdan Ohorodnii <35969035+varex83@users.noreply.github.com>
|
@claude review it |
|
Claude finished @varex83's task in 3m 12s —— View job Code Review
Overall AssessmentThe PR correctly addresses several real bugs (relay reconnection, malformed multiaddrs, empty-hex serialisation). The structure is sound, but there are two functional issues worth fixing before merge.
|
I think that's how it's done in ping implementation in libp2p, so we should be okay with that, wdyt @iamquang95 ? |
|
@varex83 I would implememnt by this the rust-libp2p push to event on But the relay-router add |
Apply ±20% jitter to relay reconnect backoff, mirroring Charon's expbackoff.DefaultConfig (Jitter=0.2) so that nodes don't retry in lockstep after a relay outage. Co-Authored-By: varex83 <varex83@gmail.com>
|
@claude please review |
|
Claude finished @varex83's task in 1m 41s —— View job Code Review
Overall AssessmentSince the last review, two of the functional issues have been addressed and the PR is in good shape. One defensive fix is still missing, and there's a minor nit on Previously flagged — now fixed ✅
Jitter in Still present
|
Summary
This PR fixes several bugs discovered during DKG integration testing, primarily around relay connection lifecycle management in the P2P layer.
Relay reconnection on disconnect / dial failure (
relay.rs)Previously, if a relay connection dropped or a dial attempt failed, the node would not attempt to re-establish the connection, leaving it permanently unable to route traffic through that relay.
relay_peers: HashMap<PeerId, Peer>toMutableRelayReservationto remember all known relay peers so they can be re-dialed when a connection is lost.connected_relays: HashSet<PeerId>to skip redundant dials when a connection is already established or in-flight.on_swarm_eventnow handlesConnectionClosed(last connection dropped) andDialFailureto trigger a re-dial, clearing stale pending/connected state first.DialOpts::unknown_peer_id()toDialOpts::peer_id(...).condition(DisconnectedAndNotDialing)so libp2p can deduplicate concurrent dial attempts.Relay-ready settling delay (
relay.rs)RelayRouterwas immediately attempting to route peers through a relay the moment the connection was established, before the relay reservation handshake could complete. This caused circuit-dial attempts to fail silently.connected_relays: HashMap<PeerId, Instant>toRelayRouterto record when each relay became connected.relay_ready()which gates routing behind a 2-secondRELAY_READY_DELAYso the reservation handshake has time to finish.RelayRouter::run_relay_routernow iterates overconnected_relays(only relays with an active connection) instead of all configured relays, and skips any that haven't settled yet.DisconnectedAndNotDialingto avoid redundant dial attempts.Fix malformed circuit multiaddresses (
relay.rs,utils.rs)Relay peer addresses stored in the
Peerstruct sometimes already included a trailing/p2p/<peer-id>component. Appending another/p2p/<relay-id>/p2p-circuit(or/p2p-circuit/p2p/<target-id>) on top produced invalid multiaddresses that libp2p silently rejected.queue_relay_dialandmulti_addrs_via_relaynow strip any existing/p2p/...protocol components from the base address before constructing circuit or direct-dial addresses.Fix
encode_0x_hexfor empty input (serde_utils.rs)encode_0x_hex(&[])previously returned"0x"instead of"", which caused downstream deserialization failures for optional byte fields encoded as empty strings.