outbound/urltest: fix gateway freeze when relay stops forwarding by HouMinXi · Pull Request #4256 · SagerNet/sing-box

HouMinXi · 2026-06-29T09:18:12Z

Summary

Fix transparent gateway freeze caused by URLTestGroup.urlTest() blocking indefinitely when a relay server accepts TCP but stops forwarding data.

Problem

When a relay becomes unresponsive at the application layer (TCP keepalive keeps the connection alive), three mechanisms lock simultaneously:

bufio.CopyConn goroutines block in Read() forever (no idle timeout on relay connections)
URL test probe goroutines also block in Read() waiting for HTTP response from the dead relay
batch.Wait() requires ALL probes to complete, so one stuck probe blocks the entire health check
checking atomic flag stays true, silencing all future timer-triggered health checks
selectedOutbound is never updated, routing all new connections to the dead relay

Confirmed by SIGQUIT goroutine dump during a live stall: 205 goroutines total, 168 blocked in CopyConn IO wait, 1 in batch.Wait() semacquire for 3 minutes. Full dump available in #4255.

Fix

common/urltest/urltest.go: Set SetReadDeadline on the dialed connection. Context cancellation does not interrupt net.Conn.Read() when the connection comes from a custom DialContext. Uses relative timeout to avoid issues with time already consumed by the dial phase.

protocol/group/urltest.go:

Derive per-probe testCtx from batch context (was g.ctx), so batch cancellation propagates
Wrap batch.Wait() with a hard timer (2*TCPTimeout). On timeout, proceed with available results
Delete stale history for incomplete probes to prevent performUpdateCheck from selecting a stuck outbound

Testing

Deployed patched binary on the affected gateway (N100 iStoreOS, tproxy + urltest with 6 exit nodes). Before patch: 8 stalls in 2 hours with intervals shrinking to 57s. After patch: monitoring for stability.

Ref: #4255 (goroutine dump and full analysis), #4144 (same symptom on different deployments), #1620 (same symptom on N100 tproxy, closed as stale)

When a relay server accepts TCP but stops forwarding application data, URL test probe goroutines block in Read() indefinitely. batch.Wait() then blocks forever, keeping the checking flag true and suppressing all future health checks. selectedOutbound is never updated, so new connections keep routing to the dead relay. This creates a triple self-locking loop that makes the gateway completely unresponsive. Fix three issues: 1. Set SetReadDeadline on the URL test connection. Context cancellation does not interrupt net.Conn.Read() when the connection was obtained through a custom DialContext. Use a relative timeout to avoid issues with clock time already consumed by the dial phase. 2. Add a hard timeout (2*TCPTimeout) around batch.Wait(). When the timeout fires, proceed with whatever results are available rather than blocking indefinitely. 3. Propagate batch context to individual probes by deriving testCtx from batchCtx instead of g.ctx, so batch cancellation reaches stuck probes. 4. Clean up stale history entries for probes that did not complete within the timeout, preventing performUpdateCheck from selecting a stuck outbound based on outdated results. Root cause confirmed by SIGQUIT goroutine dump during a live stall event on a tproxy gateway (205 goroutines, 168 blocked in CopyConn, 1 semacquire in batch.Wait for 3 minutes). Fixes SagerNet#4255 Ref: SagerNet#4144 SagerNet#1620 Signed-off-by: Minxi Hou <houminxi@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

outbound/urltest: fix gateway freeze when relay stops forwarding#4256

outbound/urltest: fix gateway freeze when relay stops forwarding#4256
HouMinXi wants to merge 1 commit into
SagerNet:testingfrom
HouMinXi:fix/urltest-batch-timeout

HouMinXi commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

HouMinXi commented Jun 29, 2026

Summary

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant