From b240f8517cd79b4d6bf579e1704a9a5144dcbe6d Mon Sep 17 00:00:00 2001 From: VirusAlex Date: Fri, 1 May 2026 11:30:44 +0300 Subject: [PATCH] fix(perf): TCP connection pool sized for global chunk-worker count MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The puller can run up to fileParallelism × chunksPerFile chunk workers concurrently — each file has its own chunk semaphore but all files share the BlobPuller's connection pool. Pre-v0.4.1 the pool was sized to just chunksPerFile (default 8), so with the default fileParallelism=4 we had 8×4=32 chunk workers competing for 8 sockets. Symptom: the Performance modal's "Pool acquire wait" stat sat at p50 ~280 ms / p95 1.6 s — a quarter of every chunk's wall clock spent waiting for a free socket from a starved pool, not doing useful work. Reported by VirusAlex on a v0.4.0 live run. Fix: poolSize = chunksPerFile × fileParallelism. Gives a 1:1 socket-per- worker ratio so acquire() never blocks under steady-state load. The TCP server's MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible product (default 32; even pathological configs like 16×16=256 stay under). HTTP path is unaffected — java.net.http.HttpClient manages its own connection pool internally on virtual threads, so it never had this contention. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/main/java/dev/netcopy/transfer/Puller.java | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/main/java/dev/netcopy/transfer/Puller.java b/src/main/java/dev/netcopy/transfer/Puller.java index bb609ee..1bfaa3c 100644 --- a/src/main/java/dev/netcopy/transfer/Puller.java +++ b/src/main/java/dev/netcopy/transfer/Puller.java @@ -753,8 +753,19 @@ private BlobPuller createBlobPuller(JobState job) { if (host == null) { throw new IllegalArgumentException("peerUrl has no host: " + job.peerUrl()); } + // Pool sized to the GLOBAL chunk-worker concurrency, not per-file. The puller + // can have up to `fileParallelism × chunksPerFile` chunk workers running at + // once (each file gets its own chunk semaphore but all files share the + // BlobPuller's connection pool). Pre-v0.4.1 the pool was sized to just + // `chunksPerFile`, so with the defaults 8×4=32 chunk workers were competing + // for 8 sockets and the Performance modal's "pool acquire wait" sat at + // p50 ~280ms — a quarter of every chunk's wall clock. Multiplying out gives + // us a 1:1 socket-per-worker ratio with no contention. The TCP server's + // MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible product. + int poolSize = Math.max(1, job.chunksPerFile()) + * Math.max(1, job.fileParallelism()); yield new TcpBlobPuller(host, job.peerTcpPort(), peerToken, - Math.max(1, job.chunksPerFile()), bytesObserver); + poolSize, bytesObserver); } }; }