Skip to content

[fix] Honor served_model_name and surface HTTP errors in RemoteInferenceEngine#1783

Open
discobot wants to merge 1 commit into
NovaSky-AI:mainfrom
discobot:fix/1672-remote-engine-404-served-model-name
Open

[fix] Honor served_model_name and surface HTTP errors in RemoteInferenceEngine#1783
discobot wants to merge 1 commit into
NovaSky-AI:mainfrom
discobot:fix/1672-remote-engine-404-served-model-name

Conversation

@discobot

Copy link
Copy Markdown
Contributor

Fixes #1672.

Remote engines ignore served_model_name and RemoteInferenceEngine.generate() swallows the resulting vLLM 404, so a model-name mismatch surfaces as IndexError: list index out of range. Both failure points confirmed on current main (f4d7990).

  • create_remote_inference_engines_from_config now resolves model_name as served_model_name when set, falling back to the policy model path — the same resolution InferenceEngineClient.__init__ and the local-engine path already do. Removes the now-resolved TODO(tgriggs).
  • generate() raises a RuntimeError with the request URL, model name, status, and error body on non-200 responses, matching the pattern pause_generation/resume_generation already use in the same file.
  • chat_completion()/completion() are deliberately unchanged: the HTTP endpoint proxies their response bodies back to callers, so raising there would change proxy semantics.

Testing: new CPU tests in tests/backends/skyrl_train/inference_engines/test_remote_inference_engine.py against a mock vLLM server that returns vLLM's exact 404 error body — they fail before the fix and pass after. The surrounding inference-engine, remote inference client, and remote weight loader tests pass locally (147 tests); pre-commit is clean on the touched files.

…nceEngine

create_remote_inference_engines_from_config hardcoded the policy model path as the model name, ignoring generator.inference_engine.served_model_name, so requests to a vLLM server started with --served-model-name were rejected with a 404. RemoteInferenceEngine.generate then parsed the error body as if it were a normal response and returned empty outputs, which surfaced as an opaque IndexError in InferenceEngineClient.generate. Honor served_model_name when set and raise a RuntimeError with the status and error body on non-200 responses. Adds CPU tests with a mock vLLM server covering both paths.

Fixes NovaSky-AI#1672.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling in the remote inference engine by raising a RuntimeError with the response body when a non-200 status code is returned. It also updates the engine initialization to use served_model_name if available, falling back to the policy model path, and adds comprehensive tests. The review feedback suggests handling potential decoding errors when reading the response text and adding validation to ensure a model name is resolved.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

if resp.status != 200:
# Surface the error body (e.g. vLLM's 404 for an unknown model name) instead of
# silently parsing it into empty outputs.
error_body = await resp.text()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using await resp.text() without specifying error handling can raise a UnicodeDecodeError if the remote server returns a non-UTF-8 error response (e.g., binary data or corrupted encoding on a 500 Internal Server Error). This secondary exception would mask the original HTTP status code and make debugging harder. Consider using errors="replace" to gracefully handle any decoding issues.

Suggested change
error_body = await resp.text()
error_body = await resp.text(errors="replace")

# Use served_model_name if provided, otherwise fall back to the model path.
# served_model_name allows using a different model name for HTTP requests than the actual
# model path. See InferenceEngineConfig.served_model_name in skyrl/train/config/config.py.
model_name = ie_cfg.served_model_name if ie_cfg.served_model_name is not None else cfg.trainer.policy.model.path

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If both generator.inference_engine.served_model_name and trainer.policy.model.path are None or empty, model_name will be None or empty. This will cause type mismatches or cryptic errors later when making remote inference requests. It is safer to validate that a valid model name is resolved and raise a clear ValueError early.

    model_name = ie_cfg.served_model_name if ie_cfg.served_model_name is not None else cfg.trainer.policy.model.path
    if not model_name:
        raise ValueError(
            "Model name must be specified. Please set either `generator.inference_engine.served_model_name` "
            "or `trainer.policy.model.path`."
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RemoteInferenceEngine swallows vLLM's 404, surfaces as opaque IndexError

1 participant