Skip to content

Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22#1772

Open
jamesbraza wants to merge 2 commits into
NovaSky-AI:mainfrom
EdisonScientific:r3-fsdp
Open

Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22#1772
jamesbraza wants to merge 2 commits into
NovaSky-AI:mainfrom
EdisonScientific:r3-fsdp

Conversation

@jamesbraza

Copy link
Copy Markdown
Contributor

Relates to #815

Forward hooks on each HuggingFace MoE router replay the expert choices that vLLM recorded during rollout, recomputing the gate scores so the router stays trainable.

Also pulls in vllm==0.22.0 for vllm-project/vllm#39568, the fix for vllm-project/vllm#40692.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces MoE Rollout Routing Replay (R3) support for HuggingFace models under the FSDP backend, implementing forward hooks on softmax routers to replay expert selections while keeping gate weights trainable. It also updates the FSDP worker, model wrapper, configuration validation, and dependency pins (vLLM 0.22.x), alongside adding comprehensive tests. The review feedback suggests adding a defensive check to ensure rollout_expert_indices is 4D to prevent potential IndexError crashes, and moving rollout_expert_indices to the same device as nnz_indices before indexing to avoid device mismatch errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread skyrl/backends/skyrl_train/utils/hf_router_replay.py
Comment thread skyrl/backends/skyrl_train/utils/hf_router_replay.py
Replays vLLM's rollout expert selections in the training forward pass
via forward hooks on transformers-v5 `*TopKRouter` gates (softmax
family: OlMoE, Qwen2/3-MoE), recomputing gate scores from the live
router logits so the gate stays trainable. Handles sample packing
(unpad nnz gather) and Ulysses SP slicing of the replay indices. The
flag is per-worker (policy and ref); a startup warning is logged when
KL is on but the ref is not replaying.

Enable with trainer.policy.fsdp_config.moe_enable_routing_replay=true
plus generator.inference_engine.enable_return_routed_experts=true on
the vLLM mp executor backend.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: James Braza <jamesbraza@gmail.com>
@jamesbraza jamesbraza changed the title Rollout Routing Replay (R3) for the fsdp backend Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22.0 Jun 10, 2026
@jamesbraza jamesbraza changed the title Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22.0 Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22 Jun 10, 2026
vLLM 0.20.2's routed-experts capture overruns its shared-memory buffer
on hybrid-attention MoE models with multiple KV-cache groups
(vllm-project/vllm#40692); vLLM 0.22.0 ships the rewritten capture
(vllm-project/vllm#39568). flashinfer moves to 0.6.11.post2, matching
vLLM 0.22.x's own exact pins; torch stays 2.11.0+cu128 via the cu129
vLLM wheel index.

vLLM >= 0.22 also natively defines start/finish_weight_update on
GPUWorker (vllm-project/vllm#39212) and asserts that worker extensions
may not shadow Worker attrs, which killed every WorkerProc at
EngineCore init with the driver seeing only "Engine core
initialization failed. ... Failed core proc(s): {}". Rename
NewInferenceWorkerWrap's two colliding methods with a skyrl_ prefix
and update the collective_rpc method strings to match; client-facing
RemoteInferenceClient method names are unchanged.

The same API (mandatory since vLLM 0.21.0) requires wrapping
/update_weights with /start_weight_update and /finish_weight_update,
and its non-packed IPC receive expects bare rebuild_cuda_tensor args
in the handle dict. Make RemoteInferenceClient.update_named_weights
issue the wrapping calls itself (each call carries the full weight
set) and send bare handle args from the GPU weight-sync test's
IpcTrainer. test_update_weights_flow[no_pd] and
test_update_weights_ipc pass end-to-end on 0.22.1; the pd_1P1D param
remains blocked by a pre-existing regression on main (NovaSky-AI#1759 switched
the dependency to nixl-cu12, which ships module nixl_cu12, while vLLM
imports nixl).

Validated end-to-end on Qwen3-30B-A3B GRPO with R3 enabled
(non-colocated 2-node, new inference stack): healthy rewards and
rollout_train_logprobs_abs_diff_mean ~= 0.01.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: James Braza <jamesbraza@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant