Add Support for Batching to Downstreams in Router by dchoi-viant · Pull Request #60 · viant/mly

dchoi-viant · 2026-01-28T21:10:26Z

Add batching (enabled by default, technically a breaking behavior change) to downstreams in router.
Move from work routines to semaphore based routine bounding.
Refactor ReloadIfNeeded() and related to separate file.
Fix unload counter double-counting.

… created by endly.

…ice with model management.

… export Prometheus options to CLI.

- Add PID based start for services

…ests.

…echnically breaking, but should've been an unused API surface).

… descriptions of client batching payload features.

Co-authored-by: Cursor (Opus-4.6)

Previously, a partial or aborted body read on a 200 OK response was silently discarded and httpPost returned (possibly-empty data, nil). Callers then unmarshaled empty bytes and the failure surfaced downstream as "Invalid JSON, wrong char ' ' found at position 0", which is indistinguishable from a server-side encoding bug. Return the read error wrapped with the partial byte count so the caller sees an honest transport failure instead of a silent empty-body success. Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

…cated metrics Replace the implicit-commit pattern in writeResponse where the first writer.Write call auto-fired WriteHeader(200) at the same time as the body bytes. Under load this produced a wire trace of "200 OK + empty body" when the body Write failed (broken pipe due to client cancellation), and the server-side error was silently discarded by the surrounding code. writeResponse now: - Marshals first; on marshal failure returns a typed responseMarshalError so ServeHTTP can emit an explicit 5xx with a meaningful body. - Sets Content-Type and Content-Length explicitly so clients can detect a truncated body via io.ErrUnexpectedEOF instead of receiving a silent empty 200 OK. - Calls WriteHeader(200) explicitly so the status line is committed in a known order, not as a side effect of the first Write. - Returns a typed responseCommittedError when Write fails after the status was committed, so ServeHTTP knows to log unconditionally rather than calling http.Error (which would silently drop the new status code with a "superfluous WriteHeader" warning). Inline the single-caller handleAppRequest helper into ServeHTTP -- the io.Writer narrowing it provided is no longer useful now that writeResponse takes an http.ResponseWriter directly. Add a new gmetric counter provider in service/stat (NewHandler) that extends NewCtxErrOnly with two response-write failure classes (responseMarshalError, responseCommittedError) so they can be alerted independently of the generic error bucket. The first three keys (error, canceled, deadlineExceeded) preserve their indices so existing Prometheus dashboards and alerts continue to emit at the same labels. Add unit tests in service/handler_test.go covering the success path, total and partial post-commit Write failures, the Content-Length invariant across a range of Response shapes, and the typed-error chain participation in errors.As / errors.Is. Document the response wire contract in README under /v1/api/model/%s/eval (Content-Type and Content-Length explicit) and the new HTTPHandler metric keys under /v1/api/metric/operations. Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

The /v1/api/model/%s/eval endpoint is the only one with a structured request/response shape; collapse the previous flat bullet list under /v1/api/model and promote each model endpoint to a peer top-level section so eval can grow sub-sections (Methods, Request body, Successful response, Error response, Example) without distorting the surrounding density. Document the current contract: - Methods (GET / POST) with single-mode vs batch-mode semantics. - Request body shape (input keys + reserved batch_size / cache_key). - Successful response shape (status, dictHash, data, serviceTimeMcs) with Content-Type and Content-Length set explicitly so clients can detect transport failures via short-read against declared length. - Error response status-code table (400 / 413 / 429 / 500) with a note that the body format is plain text today and is expected to align with the success response shape in a future release. - Curl examples for both methods. Promote the meta endpoints to peer ## sections with one-line descriptions of their purpose for the mly client bootstrap path. Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

…us codes Unify the wire-format contract across success and failure: every response from /v1/api/model/%s/eval is now a JSON-encoded Response object with Content-Type: application/json and explicit Content-Length, regardless of whether the prediction succeeded or any phase failed. Errors retain their proper HTTP status codes (400 / 413 / 429 / 500); only the body shape changes, from plain-text http.Error output to the same Response struct used on success with status="error" and the error field populated. writeResponse now takes an http.StatusXxx parameter so the same explicit-commit machinery (marshal first, declare Content-Length, WriteHeader, Write) can serve both 200 success and 4xx/5xx error responses. writeError is a new helper that: - Calls SetError(err) and clears response.Data so the marshal cannot fail on a Data value that may have triggered the original failure. - Delegates to writeResponse with the supplied status code. - On post-commit Write failure, appends ResponseCommittedError to hStats and logs unconditionally (matching the success-path contract). - On marshal failure of the cleared error response (essentially impossible -- the struct now contains only string + int fields), falls back to http.Error so the client at least receives a status code. All five http.Error call sites in ServeHTTP (GET-path query parse, POST body read, POST body unmarshal, no-request fall-through, and the post-Service.Do error block) now route through writeError. The post-Service.Do block keeps its existing responseCommittedError short-circuit since the success-response status is already on the wire and cannot be changed. Cross-repo audit (viant/mly, adelphic/mly, adelphic/mediator) found no consumer that parses the previous plain-text error body format. The one near-miss -- mediator's filter/ml/roas/service.go isTimeoutError -- substring-matches "context deadline exceeded" / "timeout" / "deadline exceeded" / "context canceled" against the error string, which keeps working because the substring is now embedded in the JSON error body wrapped by the client's httpPost error format. Update tests: - Existing writeResponse tests get the new status parameter. - New TestWriteResponse_HonorsStatusParam confirms the supplied status reaches the wire across the four error status codes plus 200. - New TestWriteError_EmitsJSONErrorWithStatus locks in the wire shape (status code, Content-Type, Content-Length, JSON body with status=error and populated error message; cleared Data). - New TestWriteError_FallsBackToHTTPErrorOnCommittedFailure confirms hStats records the post-commit failure when the body write fails after the error status was committed. Update README error response section to describe the new contract. Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

Surface the server-side error message into the caller's Response struct on non-2xx responses, so callers that defensively check response.Error (e.g. mediator's ml/fraud and ml/fraudv3 services) finally observe predict-time errors instead of silently swallowing them. Two coordinated changes: 1. httpPost now returns the response body alongside the error on a non-2xx terminal status. Previously it returned (nil, err) on non-200, which discarded the body and prevented any structured parsing. The retry-loop logic is updated to capture the body in a postBody variable so terminal-error iterations preserve the bytes for the outer return. Successful iterations return data directly with err == nil as before. 2. Run() does a best-effort gojay.Unmarshal of the body into the caller's response struct before returning the err. For a v0.20.0+ server's JSON error body this populates response.Status = "error" and response.Error = "<message>". For an older server's plain-text body the unmarshal silently fails and the response struct stays untouched. The non-nil err return is unchanged in either case and remains the source-of-truth signal; this is purely additive population of the response struct for the callers that want it. Backward compatibility is preserved both directions: - New v0.20.0 client against an older server: response.Error stays empty (best-effort unmarshal silently fails), err is returned as before, and any consumer that checks err keeps working unchanged. - Older client against a new v0.20.0 server: client never sees the populated response.Error (the older client's httpPost returns nil body on non-200), but err is still surfaced and consumers that check err keep working unchanged. Add TestService_Run_ParsesErrorBody covering four cases: 400+JSON, 500+JSON, 400+plain-text, 500+plain-text. Each asserts both the err return and the response.Error/Status population behavior so the contract is locked in for both server-version axes. Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

Record the v0.20.0 wire-format change: error responses from /v1/api/model/%s/eval now share the same shape as success responses (JSON-encoded Response object, Content-Type: application/json, explicit Content-Length) instead of plain text. HTTP status codes are unchanged. The client side picks this up automatically: response.Error is populated from the parsed body on non-2xx responses, so callers can rely on either the err return value or response.Error as the error signal. Older clients connecting to a v0.20.0+ server still work (the err return is unchanged); newer clients connecting to a pre- v0.20.0 server still work (the best-effort body parse silently fails on plain text and response.Error stays empty, with the err return remaining the source-of-truth signal). Made-with: Cursor Co-authored-by: Claude <noreply@anthropic.com>

…requests Add stat.Shed key to the http counter provider and append it in postRequest when getHost returns an error wrapping common.ErrNodeDown. Existing _down (trip event) and _error counters are unchanged; the new _shed disambiguates breaker-rejected requests from requests that reached httpPost and failed there. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Made-with: Cursor

FlagUp and FlagDown now use atomic.CompareAndSwapInt32 on Down so the flag is consistent with IsUp's atomic.LoadInt32, and FlagUp's resetDuration reset runs under the mutex so it cannot clobber a concurrent FlagDown's resetDuration *= 2 (lost-update bug that defeated exponential backoff under flapping). Adds breaker_test.go (first tests in the package) covering the data race under -race, backoff accumulation across trips, and CAS-based idempotency of FlagUp / FlagDown. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Made-with: Cursor

Pending Inc/Dec via metric.EnterThenExit captures the recent-bucket index at Enter time and decrements that bucket on Exit; rotation between Enter and Exit lands the decrement in the wrong bucket. Also: Dir is not a string, so MultiCounter takes a mutex per Inc/Dec which serializes hot paths. Both _pending and _pending_Max remain operationally useful in different regimes (high-QPS vs low-QPS operations); the doc comment explains which signal to query in each case. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Made-with: Cursor

…ss-through Adds LatencyBreaker parallel to the existing connection-failure Breaker. Trips on latest > LatestThreshold OR rolling > RollingThreshold; recovers after K consecutive observations satisfy both thresholds. While ON, allows a configurable fraction of requests through to drive recovery sensing without committing real load. Host.IsUp() now requires both breakers to be up. Service.init wires a LatencyBreaker into each host when at least one threshold is non-zero (zero -> disabled, backward-compatible). postRequest feeds elapsed time to LatencyBreaker.Observe after each httpPost. Caller opts in by setting Config.LatencyBreaker* fields or via WithLatencyBreaker. Pass-through fraction defaults to 0.01 (1%). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Made-with: Cursor

Brings in latency-aware breaker, race fixes in circut.Breaker, the ClientHTTP_shed marker, and prior v0.20.x server response/error handling work. Made-with: Cursor # Conflicts: # service/handler.go # shared/client/service.go

Add WithPrometheusMetrics(false) for short-lived helper clients that should not register long-lived native Prometheus series. Use it for server startup self-test clients so completed self-tests do not leave zero-valued mly_client_* series in the process-wide registry. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Made-with: Cursor

WithPrometheusMetrics(false) skips native Prometheus registration, leaving observer and counter fields nil. Guard observation helpers and direct counter increments so helper clients can opt out without panicking. Exercise the opt-out path in the client Run test. Co-authored-by: GPT-5.5 <noreply@openai.com> Made-with: Cursor

Validate latency-breaker thresholds, rolling window, recovery count, and pass-through fraction during Service initialization. Normalize defaults in one path so direct Config usage and WithLatencyBreaker behave the same. Also wrap Service.init errors with context while preserving the underlying cause via %w. Co-authored-by: GPT-5.5 <noreply@openai.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dchoi-viant added 19 commits November 25, 2025 16:45

Clean up e2e, prepare more test cases, Triton model repository is now…

3d40cb7

… created by endly.

Update comments while considering Field and IO design.

f820846

Refactored out intermediate step in gRPC type handling.

de616a6

Add model IO detection to Triton models.

e692430

Undo set_sdk change.

29169d7

Fix test to conform with metadata interface.

e0335fd

Fix bug if there is a load error in Router.

f27c1a7

Fix test to correctly detect auxiliary handling bug.

96ffff9

Remove unused code, minor refactor.

8e72330

Reduce copied code between request and requets tests.

75850e3

Router now can detect and check signatures.

3a1f955

Add support for multiple routers using the same Triton Inference serv…

306eb25

…ice with model management.

Retroactively remove support for stating float as an alias for float32

d922d1d

Add Prometheus metrics to client.

34bb186

Allow CLI to have non-batch, empty string payloads.

3363823

Clean up client Prometheus setup, better buckets for batch sizes, add…

034b7b9

… export Prometheus options to CLI.

Add support for batching in router.

540d5b2

Fix double decrement on unload.

f918e0b

Fix nil panics in tests.

6e2e8ad

dchoi-viant requested review from tkrishnaviant and yfan-viant January 28, 2026 22:05

dchoi-viant added 9 commits January 28, 2026 14:42

Fix bug with router registration.

3948cbd

Fix whitespace.

882908f

Add e2e for router

bf98130

- Add PID based start for services

Add support for reordered IO in downstream Evaluators for router.

7af71fd

Fix ineffective queue and throttling. Minor code clarifications.

259c039

Add auxiliary in test case for router.

73b1b36

Minor comment reformatting.

7ec1d4f

Add back queueing, minor fix in router config init, refactor router t…

c868968

…ests.

Add ARCHITECTURE doc, minor comment updates, removed unused method (t…

b6100a8

…echnically breaking, but should've been an unused API surface).

Update ARCHITECTURE.md to clarify potential design issues and enhance…

2ed59ce

… descriptions of client batching payload features.

dchoi-viant force-pushed the router-dev branch from 034b7b9 to 473b9c3 Compare February 12, 2026 23:40

Add debugging binaries to gitignore, comment router.

3633eec

dchoi-viant force-pushed the router-batching branch from 056777d to 3633eec Compare March 2, 2026 23:13

dchoi-viant and others added 22 commits March 2, 2026 15:14

Add some comments.

6cb409e

Fix leak in reloading.

10fb18a

Add Prometheus Gauge for model health/reload success, some refactors.

75fa792

Also make Prom gauge for reload OK 1 on boot.

511e001

Update documentation for dictionary hash behavior.

c10e88b

add client support for outputing intended outputs to a file

5e6b360

Co-authored-by: Cursor (Opus-4.6)

fix(server handler): no longer swallow Marshal error

c5ad92d

Merge v0.20.x into router-batching

8551b1f

Brings in latency-aware breaker, race fixes in circut.Breaker, the ClientHTTP_shed marker, and prior v0.20.x server response/error handling work. Made-with: Cursor # Conflicts: # service/handler.go # shared/client/service.go

Merge remote-tracking branch 'origin/v0.20.x' into router-batching

22b5a0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Batching to Downstreams in Router#60

Add Support for Batching to Downstreams in Router#60
dchoi-viant wants to merge 52 commits into
router-devfrom
router-batching

dchoi-viant commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dchoi-viant commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dchoi-viant commented Jan 28, 2026 •

edited

Loading