From b240f8517cd79b4d6bf579e1704a9a5144dcbe6d Mon Sep 17 00:00:00 2001
From: VirusAlex <alexey.egupov@norse.bh>
Date: Fri, 1 May 2026 11:30:44 +0300
Subject: [PATCH] fix(perf): TCP connection pool sized for global chunk-worker
 count
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The puller can run up to fileParallelism × chunksPerFile chunk workers
concurrently — each file has its own chunk semaphore but all files share
the BlobPuller's connection pool. Pre-v0.4.1 the pool was sized to just
chunksPerFile (default 8), so with the default fileParallelism=4 we had
8×4=32 chunk workers competing for 8 sockets.

Symptom: the Performance modal's "Pool acquire wait" stat sat at
p50 ~280 ms / p95 1.6 s — a quarter of every chunk's wall clock spent
waiting for a free socket from a starved pool, not doing useful work.
Reported by VirusAlex on a v0.4.0 live run.

Fix: poolSize = chunksPerFile × fileParallelism. Gives a 1:1 socket-per-
worker ratio so acquire() never blocks under steady-state load. The TCP
server's MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible
product (default 32; even pathological configs like 16×16=256 stay under).

HTTP path is unaffected — java.net.http.HttpClient manages its own
connection pool internally on virtual threads, so it never had this
contention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 src/main/java/dev/netcopy/transfer/Puller.java | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/main/java/dev/netcopy/transfer/Puller.java b/src/main/java/dev/netcopy/transfer/Puller.java
index bb609ee..1bfaa3c 100644
--- a/src/main/java/dev/netcopy/transfer/Puller.java
+++ b/src/main/java/dev/netcopy/transfer/Puller.java
@@ -753,8 +753,19 @@ private BlobPuller createBlobPuller(JobState job) {
                 if (host == null) {
                     throw new IllegalArgumentException("peerUrl has no host: " + job.peerUrl());
                 }
+                // Pool sized to the GLOBAL chunk-worker concurrency, not per-file. The puller
+                // can have up to `fileParallelism × chunksPerFile` chunk workers running at
+                // once (each file gets its own chunk semaphore but all files share the
+                // BlobPuller's connection pool). Pre-v0.4.1 the pool was sized to just
+                // `chunksPerFile`, so with the defaults 8×4=32 chunk workers were competing
+                // for 8 sockets and the Performance modal's "pool acquire wait" sat at
+                // p50 ~280ms — a quarter of every chunk's wall clock. Multiplying out gives
+                // us a 1:1 socket-per-worker ratio with no contention. The TCP server's
+                // MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible product.
+                int poolSize = Math.max(1, job.chunksPerFile())
+                             * Math.max(1, job.fileParallelism());
                 yield new TcpBlobPuller(host, job.peerTcpPort(), peerToken,
-                    Math.max(1, job.chunksPerFile()), bytesObserver);
+                    poolSize, bytesObserver);
             }
         };
     }