feat(core): multi-link routing by gaoyifan · Pull Request #123 · encodeous/nylon

gaoyifan · 2026-05-29T00:47:28Z

Background

Currently, nylon employs a "one neighbor, many candidate endpoints, one best endpoint" model. In this implementation, a neighbor is keyed strictly by its NodeId. While multiple remote endpoints can be configured for a single peer, only the single best-performing endpoint is active for routing at any given time.
This approach has several limitations:

Lack of Interface Awareness: It cannot distinguish between different physical paths, such as reaching a peer via a local WAN interface versus a local LAN interface. 
Restricted Control Plane: Control messages and probes are tied to the peer's primary endpoint, preventing independent liveness and metric tracking for alternative paths.
Limited Path Redundancy: It treats all connections to a peer as a single next-hop, rather than treating independent physical or logical links as distinct routing adjacencies.

To address these constraints, this PR implements a Multi-Link Routing design. The core change shifts the routing adjacency from a "node-level" view to a "link-level" view. By introducing LocalBind (local interface/source selection), each unique (Peer, LocalBind, RemoteEndpoint) tuple is now treated as an independent routing link. This allows the router to independently track metrics for multiple paths between the same two nodes and select the optimal link for traffic based on real-time performance and local policy.

Full design: docs/reference/multi-link-routing.mdx.

What changed

state: add LocalBindID, RemoteEndpointID, LinkID, and Link; store links in RouterState; key selected routes by next-hop link (SelRoute.NhLink).
config: parse local binds and structured endpoint IDs while keeping plain string-endpoint compatibility; reject explicit binds off Linux.
conn: pair a remote endpoint with a local bind selector (sticky source / IP_PKTINFO) so the same remote address on different binds is a distinct link.
discovery: probe the local bind × remote endpoint product, track probes by link, dedupe duplicate transport tuples, and skip bind/endpoint address-family mismatches.
router: resolve every incoming control packet to a link before it reaches router logic; select the lowest-metric active link per peer with stable tie-breaking; carry a per-retraction acknowledgment token; keep seqno-request suppression router-wide.
forwarding: set TCElement.ToEp from the selected link endpoint so data follows the selected link rather than the peer default endpoint.
status / IPC: expose per-link bind, endpoint, and neighbour-route info.

Supporting commits

fix(conn): StdNetBind.Send reused a pooled net.UDPAddr whose IP slice could have been shrunk to 4 bytes by a prior IPv4 send, truncating the next IPv6 destination (e.g. 2001:db8::1 → 2001:db8::). The link then never collected RTT samples and its metric stayed at INF. Resize the slice before copying.
perf(core): batch a received bundle's control packets into a single dispatch that recomputes routes at most once, and coalesce pong-driven recomputation behind a pending flag, to avoid saturating the dispatch queue on multi-link meshes.

Testing

The feature has currently been tested across a total of 12 nodes deployed in different geographic regions over the public Internet, with continuous operation exceeding 24 hours. The test coverage includes:

Dual-stack nodes with both IPv4 and IPv6 connectivity.
IPv4-only nodes.
Multi-homed nodes with multiple network interfaces, each assigned its own independent IP address.
Nodes without any publicly reachable endpoint configuration.

encodeous · 2026-05-29T02:22:11Z

Hi Yifan. Thanks for the PR, I appreciate the enthusiasm.

Nylon already supports multi-endpoint probing. It will only send data through the active/best link, but will continually probe (send control packets) over all configured endpoints.

My suggestion is to look at code under polyamide, and compare the diff to wireguard-go upstream (use git subtree).

Can you double check?

Also, can you elaborate on "Lack of Interface Awareness"? Nylon currently does not support sending packets directly over a specified interface, but that should be a relatively small change without needing to do a large refactor.

Thanks

P.S: This is a very big change, if possible, split it into a set of smaller PRs so it is easier for me to review.

gaoyifan · 2026-05-30T09:09:12Z

Thank you very much for the comment and suggestions.

I re-checked the current probing logic in polyamide and Nylon, and you are right: Nylon already continuously probes all configured remote endpoints for a peer and sends data through the active/best endpoint. My original PR description was inaccurate.

What I meant to describe is the lack of local egress/source/interface awareness. For example, if nodes A and B each have three interfaces, A1/A2/A3 and B1/B2/B3, and A1/B1 are the default egress paths, today A will probe B1/B2/B3 mostly from A1, while B will probe A1/A2/A3 mostly from B1. So Nylon observes only part of the possible interface-pair combinations. With explicit local binds, it can probe the full local bind × remote endpoint set, including paths such as A2-B2, A2-B3, A3-B2, and A3-B3, which may otherwise never be selected by the host routing table.

So this PR is not intended to replace Nylon’s existing multi-endpoint probing. It reuses polyamide’s multi-endpoint support, and tries to add local egress as part of the link identity and metric model. Most of the larger changes come from carrying that link identity as first class citizen through probes, control packets, routing state, forwarding, and status output.

I agree the current PR is too large. I’ll try to restructure it into smaller, easier-to-review pieces, and see whether the local bind/source selection part can be extracted first with a smaller router change.

If you have guidance on what the smallest acceptable version should look like, I would really appreciate it.

Thanks again for creating Nylon. It's been a pleasure to work with the codebase and learn from its design.

gaoyifan · 2026-05-30T11:09:04Z

I have force-pushed a rewritten history with smaller, buildable commits that might make the dependency chain easier to review.

If you have a particular smallest acceptable version in mind for this PR, I would be very grateful for your guidance.

encodeous · 2026-05-30T14:28:28Z

Hi Yifan,

Thanks for the quick response and tidying up the commit history!

Regarding your changes, I looked over the diff, as well as your design doc.

Here are some comments:

I think it might not be necessary to change the routing/adjacency model
- Much of what this change involves can be just as succinctly implemented at the endpoint-level.
- When I designed nylon, I intentionally separated the routing level (decision of which nodes to visit), and the link level (which connection to traverse from one node to the next).
  - At the routing level, there is no need to have multiple edges between nodes, since they are (and should be) functionally the same, outside of a single metric score.
  - Thus, we can simply surface the single "best" link without losing generality
  - Additionally, as you mentioned in the "Risks" section, WireGuard ultimately requires at most one link between two nodes, so we still have to pick one "best" link.
  - As you also stated, we do not support bandwidth aggregation, ECMP, etc, so I don't see any necessity for this at the routing level. Besides, we don't even have the correct routing algorithm for supporting those use cases (consider the difference between Max-Flow, and shortest path)
  - I think you should work around NylonEndpoint to include the local bind (interface, src addr), this might be a bit tricky, and I think we should discuss how the API/user experience would look like for central/local configuration.
However, I do think it would be a good idea to somehow let the user decide which interface(s) a specific endpoint should be reached over. Thus, the changes in polyamide for supporting this, is indeed, necessary machinery, so I would love to see that in a separate PR. However, do note that we do need to support non-Linux platforms such as macOS.

Let's discuss about this before making more changes to the code. We also need a less clunky API for specifying the interface.

One way trivial way would to just produce I*E links from I interfaces, and E endpoints per peering (but this can also lead to a mess).

I'd love to hear your POV

gaoyifan · 2026-05-30T18:14:34Z

I completely agree with your separation of the link and routing models. I think I had fallen into the mindset of FRRouting-style designs, where multiple interfaces are explicitly exposed to Babel. In retrospect, Nylon's design is much more elegant: it finds a very nice optimal substructure by keeping all multi-link complexity confined to the peer-to-peer layer, which significantly simplifies the overall architecture.

Your last comments actually inspired me to think about a different possible design. Using semantics similar to the Timestamp Sub-TLV from RFC 9616, it may be possible to improve asymmetric routing behavior relatively easily within the current endpoints model.

Consider an extreme example: nodes A and B have two paths, A1 <-> B1 and A2 <-> B2, with the following one-way latencies:

A1 -> B1: 1 ms
B1 -> A1: 100 ms
A2 -> B2: 100 ms
B2 -> A2: 1 ms

In theory, if asymmetric routing is allowed, the best RTT would be:

A1 -> B1 | B2 -> A1 = 1 ms + 1 ms = 2 ms

rather than 101 ms.

In practice, asymmetric routing is quite common on the Internet. Under the current endpoints model, exploiting this property at a single-hop level actually becomes relatively straightforward. However, this would likely require changes to the Ping/Pong packet format, replacing the current random-token + PingBuf RTT measurement mechanism with a Timestamp Sub-TLV-based measurement model. It would also be more efficient. My understanding is that this would be a fairly self-contained optimization.

Would you prefer implementing something like this together with the interface-awareness work, or opening a separate issue for discussion and potentially addressing it in a later PR?

Regarding the configuration schema, I initially considered a design that would automatically discover interfaces and addresses instead of requiring the current manual nylon_binds[] configuration. However, it seems somewhat tricky in practice:

It would require introducing platform-specific interface discovery and parsing logic, such as AF_NETLINK on Linux or getifaddrs/ioctl on macOS, which would add a substantial amount of code (although perhaps there are third-party libraries that provide cleaner abstractions).
Not every interface is necessarily expected—or appropriate—to be used as a Nylon underlay interface. Loopback interfaces, VPN virtual interfaces, internal-only interfaces, Thunderbolt interfaces, and others may be unsuitable in certain environments. To address this, we would likely need either heuristic filtering rules or some more sophisticated detection mechanism. As far as I know, Tailscale chose the former approach with a fixed filtering policy, but static block lists are inherently inflexible and cannot accommodate all deployment scenarios.
To preserve the semantics of automatic full-mesh connectivity, we would likely need to monitor interface and address changes from the kernel. That would further increase implementation complexity.

It is also worth noting that, for multi-homing to work correctly, it is usually necessary to either explicitly bind sockets using SO_BINDTODEVICE and ensure that a corresponding default route exists on that interface, or manually configure policy routing, such as:

ip rule add from <public IP A1> lookup <some routing table>
ip rule add from <public IP A2> lookup <another routing table>

For those reasons, I was thinking of starting with a purely manual configuration model for this feature. In practice, since the nylon cluster deployment is handled by Ansible + an AI agent, the configuration burden is not actually too high. On the contrary, purely manual specification makes the expected behavior clearer and more predictable.

That said, perhaps we can find a middle ground between simplicity and completeness. For example, we could provide a built-in heuristic interface-name filter (such as a regular-expression-based rule) and automatically use all addresses on matching interfaces as source addresses. At the same time, users could override the interface or address filtering rules when necessary. This would allow most nodes to work with the default configuration, while only a small number of special cases would require manual configuration.

We could also defer dynamic interface/address change handling for now, to avoid introducing too much uncertainty in the initial implementation.

Do you have any preference regarding which direction would make the most sense?

encodeous · 2026-05-30T19:54:59Z

Hmm, in regards to asymmetric routing... I have actually added an experimental implementation over a year ago, but have since removed it.

In theory, yes, this is definitely a case where nylon can actually improve latency
However... Outside of special datacenters (or GPS), the typical time drift is on the order of 5ms. (Basically accounting for the latency for NTP)
This means that: there is no way to compare timestamps between two servers.
This implies, there is no simple and reliable way to measure asymmetrical latency
If you know how, let me know :)

In regards to interfaces.

I also think its a good starting point to just specify interfaces when desired. Since nylon runs as root in most deployments anyways, I think it's fine to do SO_BINDTODEVICE. Do note, we now might need to bind to multiple interfaces, so the polyamide change needs to be thought out...

I think when you do need to specify an interface, that interface tends to typically not change a lot. Your "middle ground" approach makes sense to me.

Probably by default, we just want to use the system routing table, thus no bind to interface
In each node's config, we should be able to add rules for specific endpoints to override which interface(s) it can be reached over.

Let's not worry about dynamically changing interfaces yet!

gaoyifan · 2026-05-30T22:00:50Z

As far as I know, there are roughly several approaches to routing with asymmetric paths without GPS or Atomic clock:

1. NTP-synchronized clocks and one-way delay estimation

This is the most straightforward approach. If clocks are synchronized, we can compare absolute timestamps and estimate one-way latency directly. Routing decisions can then be made based on the estimated one-way delays.

The downside is that the error can be quite large. For asymmetric paths, NTP synchronization error is often on the same order of magnitude as the network latency being measured, making the estimates rather noisy.

2. Only measure cycle latency, without solving for one-way delay

Instead of trying to estimate one-way delays, we can work entirely with cycle latency. Cycle latency can be measured accurately using a mechanism similar to Babel's Timestamp Sub-TLV.

The key observation is that clock offsets cancel out when measuring a cycle. As a result, time synchronization errors do not affect the final measurement.

Reference:

https://gemini.google.com/share/d470d773636d

3. Estimate one-way delays using measurements from many nodes

This approach first estimates one-way delays across the network and then applies the first method for routing decisions.

See the paper:

https://ieeexplore.ieee.org/document/1638554

or my old notes below:

based on "One-way delay estimation using network-wide measurements"
    https://ieeexplore.ieee.org/document/1638554


the theory

 n: number of nodes

 td: (clock) time differencies between nodes, this is a n-dimension vector
    td[i] denotes (clock) time at node i - (clock) time at node 0
        so natually td[0] is always 0
        they're independant so this forms a (n-1)-dimension space
    this is not time zone, but similar
    td[i,j] denotes (clock) time at node j - (clock) time at node i
        so td[i,j] = td[j] - td[i]
    we can't measure them directly

 d: delay/latency between nodes, this is a n*n matrix
    d[i,j] denotes delay from node i to node j
        so natually diagonal entries are zero

 dm: delay measured
    dm[i,j] = d[i,j] + td[i,j] = d[i,j] + (td[j] - td[i])
        again diagonal entries are zero
        we might not have all of them due to incomplete tests;
    this is the only measure we can/will get

 natually d[i,j] should always > 0
    thus (d[i,j] = ) dm[i,j] + td[i] - td[j] > 0
        this is a half space
    for every measurement, we get a half space constraint
    intersection of multiple half spaces = convex polytope
    and td[1] ~ td[n-1] is the 0~(n-2)th variable in that (n-1)-d space

Several years ago I ran some simulation experiments. For a 100-node cluster with roughly 50% symmetric routes, the average one-way-delay estimation error could be kept below 2 ms. Running the solver on a CPU, a single optimization round in PyTorch took on the order of a few seconds.

The larger the absolute number of symmetric links, the stronger the constraints become, and the more accurate the one-way-delay estimates are across the entire cluster.

For an arbitrary strongly connected directed graph, it can be shown mathematically that—ignoring the small amount of drift inherent to hardware clocks—the cycle-latency approach (method 2) is equivalent, from a routing perspective, to routing based on perfectly accurate one-way-delay measurements.

Intuitively, the best route from A to B must ultimately be part of some minimum cycle containing both A and B. Since method 2 can already compute the exact latency of arbitrary cycles, it provides equivalent routing information.

From that perspective, method 3 is probably not particularly useful for routing itself. However, since we were discussing one-way delay estimation, I thought it was an interesting idea worth sharing. :)

For the asymmetric-path scenario we discussed earlier between two peers, this is actually just the special case of a two-node graph, and can obviously be solved using method 2 as well.

Regarding the "middle-ground" configuration approach, I'd like to make the proposal a bit more concrete. If we don't have any major disagreements, I plan to implement something along these lines in the near future:

The default behavior remains unchanged and fully compatible with current behavior prior to this PR.
Support interface-name filtering using regular expressions, either as a whitelist or a blacklist. Whitelist and blacklist modes are mutually exclusive. Specifying either enables the multi-interface feature.
Support IP-address filtering using either a whitelist or a blacklist. Specifying either enables the multi-interface feature.
Interface filtering (2) and IP filtering (3) can coexist and are combined with logical AND semantics.

Please let me know if there's anything I've overlooked or misunderstood. And don't hesitate to share any additional thoughts or concerns.

update:

We can directly use negative assertions in regular expressions, without distinguishing between a blacklist and a whitelist.
Introducing a scheme based on relative ts one-way delay, which is also the behavior defined by the Babel RFC 9616, has an additional benefit: by abandoning PingBuf, it fundamentally avoids race conditions like Direct peer endpoint can become inactive despite successful probes #124.

encodeous · 2026-06-01T23:36:13Z

Method 3 sounds interesting, could be an interesting research topic!

It sounds like Method 2 is similar what we implement, with the difference of accounting for processing time. Could be worth implementing, but right now I don't notice too much processing time overhead (since probes are handled in-dataplane, without dispatch). I think this would be good in a separate PR :)

As for the interface bind, I think we can flesh out the interface filtering semantics later-- what you have right now, I think is ok.

One part I'm not super clear with right now, is how you intend to bind to multiple interfaces, and send/recv.

Looks like we'd need to create multiple binds: https://github.com/encodeous/nylon/blob/main/polyamide/device/device.go#L509

And, be able to send via some bind (so that the kernel can just use the default src addr):
https://github.com/encodeous/nylon/blob/main/polyamide/device/peer.go#L159

Maybe you can try it out, and see :P

gaoyifan · 2026-06-02T03:59:02Z

I pushed a new version that works by carrying auxiliary information when sendmsg. This approach doesn't require changing binding socket to specific interface or local IP. While it might not be as efficient as socket level bindings, the implementation is much simpler. Maybe this is a suitable choice for the initial implementation.

Please note that the configuration-related code has not been completed yet. This version is a PoC for multi-interface sending mechanism.

Additionally, the multi-interface sending mechanism for macOS has not yet been implemented.

Copilot AI review requested due to automatic review settings May 29, 2026 00:47

gaoyifan marked this pull request as draft May 30, 2026 10:47

gaoyifan force-pushed the feat/multi-link-routing branch from ebe46ad to ea5f42f Compare May 30, 2026 10:48

gaoyifan marked this pull request as ready for review May 30, 2026 11:09

gaoyifan force-pushed the feat/multi-link-routing branch 3 times, most recently from 19f079d to 6bef547 Compare May 30, 2026 11:57

gaoyifan added 9 commits June 1, 2026 14:44

fix(conn): resize pooled UDP addresses before copy

03e854f

fix(conn): preserve control data when splitting UDP GRO packets

bbe8711

fix(device): preserve explicit endpoint sources across route changes

497dca4

fix(router): coalesce control packet route computation

f553677

docs: describe endpoint-local bind model

6bf1968

feat(conn): support endpoint source selectors

214c117

feat(state): model endpoint-local bind selectors

e1356bf

feat(core): apply manual endpoint-local binds

f83d1ab

feat(status): expose endpoint-local bind selectors

f0ab00d

gaoyifan force-pushed the feat/multi-link-routing branch from 6bef547 to f0ab00d Compare June 2, 2026 04:08

gaoyifan marked this pull request as draft June 2, 2026 04:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): multi-link routing#123

feat(core): multi-link routing#123
gaoyifan wants to merge 9 commits into
encodeous:mainfrom
gaoyifan:feat/multi-link-routing

gaoyifan commented May 29, 2026

Uh oh!

encodeous commented May 29, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

encodeous commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

encodeous commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026 •

edited

Loading

Uh oh!

encodeous commented Jun 1, 2026

Uh oh!

gaoyifan commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gaoyifan commented May 29, 2026

Background

What changed

Supporting commits

Testing

Uh oh!

encodeous commented May 29, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

encodeous commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026

Uh oh!

encodeous commented May 30, 2026

Uh oh!

gaoyifan commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. NTP-synchronized clocks and one-way delay estimation

2. Only measure cycle latency, without solving for one-way delay

3. Estimate one-way delays using measurements from many nodes

Uh oh!

encodeous commented Jun 1, 2026

Uh oh!

gaoyifan commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gaoyifan commented May 30, 2026 •

edited

Loading

gaoyifan commented Jun 2, 2026 •

edited

Loading