[fix] Honor served_model_name and surface HTTP errors in RemoteInferenceEngine by discobot · Pull Request #1783 · NovaSky-AI/SkyRL

discobot · 2026-06-13T13:20:18Z

Remote engines ignore served_model_name and RemoteInferenceEngine.generate() swallows the resulting vLLM 404, so a model-name mismatch surfaces as IndexError: list index out of range. Both failure points confirmed on current main (f4d7990).

create_remote_inference_engines_from_config now resolves model_name as served_model_name when set, falling back to the policy model path — the same resolution InferenceEngineClient.__init__ and the local-engine path already do. Removes the now-resolved TODO(tgriggs).
generate() raises a RuntimeError with the request URL, model name, status, and error body on non-200 responses, matching the pattern pause_generation/resume_generation already use in the same file.
chat_completion()/completion() are deliberately unchanged: the HTTP endpoint proxies their response bodies back to callers, so raising there would change proxy semantics.

Testing: new CPU tests in tests/backends/skyrl_train/inference_engines/test_remote_inference_engine.py against a mock vLLM server that returns vLLM's exact 404 error body — they fail before the fix and pass after. The surrounding inference-engine, remote inference client, and remote weight loader tests pass locally (147 tests); pre-commit is clean on the touched files.

…nceEngine create_remote_inference_engines_from_config hardcoded the policy model path as the model name, ignoring generator.inference_engine.served_model_name, so requests to a vLLM server started with --served-model-name were rejected with a 404. RemoteInferenceEngine.generate then parsed the error body as if it were a normal response and returned empty outputs, which surfaced as an opaque IndexError in InferenceEngineClient.generate. Honor served_model_name when set and raise a RuntimeError with the status and error body on non-200 responses. Adds CPU tests with a mock vLLM server covering both paths. Fixes NovaSky-AI#1672.

gemini-code-assist

Code Review

This pull request improves error handling in the remote inference engine by raising a RuntimeError with the response body when a non-200 status code is returned. It also updates the engine initialization to use served_model_name if available, falling back to the policy model path, and adds comprehensive tests. The review feedback suggests handling potential decoding errors when reading the response text and adding validation to ensure a model name is resolved.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-13T13:21:17Z

+                if resp.status != 200:
+                    # Surface the error body (e.g. vLLM's 404 for an unknown model name) instead of
+                    # silently parsing it into empty outputs.
+                    error_body = await resp.text()


Using await resp.text() without specifying error handling can raise a UnicodeDecodeError if the remote server returns a non-UTF-8 error response (e.g., binary data or corrupted encoding on a 500 Internal Server Error). This secondary exception would mask the original HTTP status code and make debugging harder. Consider using errors="replace" to gracefully handle any decoding issues.

Suggested change

error_body = await resp.text()

error_body = await resp.text(errors="replace")

gemini-code-assist · 2026-06-13T13:21:17Z

+    # Use served_model_name if provided, otherwise fall back to the model path.
+    # served_model_name allows using a different model name for HTTP requests than the actual
+    # model path. See InferenceEngineConfig.served_model_name in skyrl/train/config/config.py.
+    model_name = ie_cfg.served_model_name if ie_cfg.served_model_name is not None else cfg.trainer.policy.model.path


If both generator.inference_engine.served_model_name and trainer.policy.model.path are None or empty, model_name will be None or empty. This will cause type mismatches or cryptic errors later when making remote inference requests. It is safer to validate that a valid model name is resolved and raise a clear ValueError early.

model_name = ie_cfg.served_model_name if ie_cfg.served_model_name is not None else cfg.trainer.policy.model.path if not model_name: raise ValueError( "Model name must be specified. Please set either `generator.inference_engine.served_model_name` " "or `trainer.policy.model.path`." )

discobot mentioned this pull request Jun 13, 2026

RemoteInferenceEngine swallows vLLM's 404, surfaces as opaque IndexError #1672

Open

gemini-code-assist Bot reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Honor served_model_name and surface HTTP errors in RemoteInferenceEngine#1783

[fix] Honor served_model_name and surface HTTP errors in RemoteInferenceEngine#1783
discobot wants to merge 1 commit into
NovaSky-AI:mainfrom
discobot:fix/1672-remote-engine-404-served-model-name

discobot commented Jun 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	error_body = await resp.text()
	error_body = await resp.text(errors="replace")

Conversation

discobot commented Jun 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant