Rollout Routing Replay (R3) for the `fsdp` backend, using `vllm==0.22` by jamesbraza · Pull Request #1772 · NovaSky-AI/SkyRL

jamesbraza · 2026-06-10T06:27:46Z

Relates to #815

Forward hooks on each HuggingFace MoE router replay the expert choices that vLLM recorded during rollout, recomputing the gate scores so the router stays trainable.

Also pulls in vllm==0.22.0 for vllm-project/vllm#39568, the fix for vllm-project/vllm#40692.

gemini-code-assist

Code Review

This pull request introduces MoE Rollout Routing Replay (R3) support for HuggingFace models under the FSDP backend, implementing forward hooks on softmax routers to replay expert selections while keeping gate weights trainable. It also updates the FSDP worker, model wrapper, configuration validation, and dependency pins (vLLM 0.22.x), alongside adding comprehensive tests. The review feedback suggests adding a defensive check to ensure rollout_expert_indices is 4D to prevent potential IndexError crashes, and moving rollout_expert_indices to the same device as nnz_indices before indexing to avoid device mismatch errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Replays vLLM's rollout expert selections in the training forward pass via forward hooks on transformers-v5 `*TopKRouter` gates (softmax family: OlMoE, Qwen2/3-MoE), recomputing gate scores from the live router logits so the gate stays trainable. Handles sample packing (unpad nnz gather) and Ulysses SP slicing of the replay indices. The flag is per-worker (policy and ref); a startup warning is logged when KL is on but the ref is not replaying. Enable with trainer.policy.fsdp_config.moe_enable_routing_replay=true plus generator.inference_engine.enable_return_routed_experts=true on the vLLM mp executor backend. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: James Braza <jamesbraza@gmail.com>

vLLM 0.20.2's routed-experts capture overruns its shared-memory buffer on hybrid-attention MoE models with multiple KV-cache groups (vllm-project/vllm#40692); vLLM 0.22.0 ships the rewritten capture (vllm-project/vllm#39568). flashinfer moves to 0.6.11.post2, matching vLLM 0.22.x's own exact pins; torch stays 2.11.0+cu128 via the cu129 vLLM wheel index. vLLM >= 0.22 also natively defines start/finish_weight_update on GPUWorker (vllm-project/vllm#39212) and asserts that worker extensions may not shadow Worker attrs, which killed every WorkerProc at EngineCore init with the driver seeing only "Engine core initialization failed. ... Failed core proc(s): {}". Rename NewInferenceWorkerWrap's two colliding methods with a skyrl_ prefix and update the collective_rpc method strings to match; client-facing RemoteInferenceClient method names are unchanged. The same API (mandatory since vLLM 0.21.0) requires wrapping /update_weights with /start_weight_update and /finish_weight_update, and its non-packed IPC receive expects bare rebuild_cuda_tensor args in the handle dict. Make RemoteInferenceClient.update_named_weights issue the wrapping calls itself (each call carries the full weight set) and send bare handle args from the GPU weight-sync test's IpcTrainer. test_update_weights_flow[no_pd] and test_update_weights_ipc pass end-to-end on 0.22.1; the pd_1P1D param remains blocked by a pre-existing regression on main (NovaSky-AI#1759 switched the dependency to nixl-cu12, which ships module nixl_cu12, while vLLM imports nixl). Validated end-to-end on Qwen3-30B-A3B GRPO with R3 enabled (non-colocated 2-node, new inference stack): healthy rewards and rollout_train_logprobs_abs_diff_mean ~= 0.01. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: James Braza <jamesbraza@gmail.com>

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread skyrl/backends/skyrl_train/utils/hf_router_replay.py

Comment thread skyrl/backends/skyrl_train/utils/hf_router_replay.py

jamesbraza force-pushed the r3-fsdp branch from dc60b61 to 0f725da Compare June 10, 2026 06:45

jamesbraza changed the title ~~Rollout Routing Replay (R3) for the fsdp backend~~ Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22.0 Jun 10, 2026

jamesbraza changed the title ~~Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22.0~~ Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22 Jun 10, 2026

jamesbraza force-pushed the r3-fsdp branch from 0f725da to 119d2ee Compare June 10, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollout Routing Replay (R3) for the `fsdp` backend, using `vllm==0.22`#1772

Rollout Routing Replay (R3) for the `fsdp` backend, using `vllm==0.22`#1772
jamesbraza wants to merge 2 commits into
NovaSky-AI:mainfrom
EdisonScientific:r3-fsdp

jamesbraza commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesbraza commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant