Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22#1772
Rollout Routing Replay (R3) for the fsdp backend, using vllm==0.22#1772jamesbraza wants to merge 2 commits into
fsdp backend, using vllm==0.22#1772Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces MoE Rollout Routing Replay (R3) support for HuggingFace models under the FSDP backend, implementing forward hooks on softmax routers to replay expert selections while keeping gate weights trainable. It also updates the FSDP worker, model wrapper, configuration validation, and dependency pins (vLLM 0.22.x), alongside adding comprehensive tests. The review feedback suggests adding a defensive check to ensure rollout_expert_indices is 4D to prevent potential IndexError crashes, and moving rollout_expert_indices to the same device as nnz_indices before indexing to avoid device mismatch errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Replays vLLM's rollout expert selections in the training forward pass via forward hooks on transformers-v5 `*TopKRouter` gates (softmax family: OlMoE, Qwen2/3-MoE), recomputing gate scores from the live router logits so the gate stays trainable. Handles sample packing (unpad nnz gather) and Ulysses SP slicing of the replay indices. The flag is per-worker (policy and ref); a startup warning is logged when KL is on but the ref is not replaying. Enable with trainer.policy.fsdp_config.moe_enable_routing_replay=true plus generator.inference_engine.enable_return_routed_experts=true on the vLLM mp executor backend. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: James Braza <jamesbraza@gmail.com>
fsdp backendfsdp backend, using vllm==0.22.0
fsdp backend, using vllm==0.22.0fsdp backend, using vllm==0.22
vLLM 0.20.2's routed-experts capture overruns its shared-memory buffer on hybrid-attention MoE models with multiple KV-cache groups (vllm-project/vllm#40692); vLLM 0.22.0 ships the rewritten capture (vllm-project/vllm#39568). flashinfer moves to 0.6.11.post2, matching vLLM 0.22.x's own exact pins; torch stays 2.11.0+cu128 via the cu129 vLLM wheel index. vLLM >= 0.22 also natively defines start/finish_weight_update on GPUWorker (vllm-project/vllm#39212) and asserts that worker extensions may not shadow Worker attrs, which killed every WorkerProc at EngineCore init with the driver seeing only "Engine core initialization failed. ... Failed core proc(s): {}". Rename NewInferenceWorkerWrap's two colliding methods with a skyrl_ prefix and update the collective_rpc method strings to match; client-facing RemoteInferenceClient method names are unchanged. The same API (mandatory since vLLM 0.21.0) requires wrapping /update_weights with /start_weight_update and /finish_weight_update, and its non-packed IPC receive expects bare rebuild_cuda_tensor args in the handle dict. Make RemoteInferenceClient.update_named_weights issue the wrapping calls itself (each call carries the full weight set) and send bare handle args from the GPU weight-sync test's IpcTrainer. test_update_weights_flow[no_pd] and test_update_weights_ipc pass end-to-end on 0.22.1; the pd_1P1D param remains blocked by a pre-existing regression on main (NovaSky-AI#1759 switched the dependency to nixl-cu12, which ships module nixl_cu12, while vLLM imports nixl). Validated end-to-end on Qwen3-30B-A3B GRPO with R3 enabled (non-colocated 2-node, new inference stack): healthy rewards and rollout_train_logprobs_abs_diff_mean ~= 0.01. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: James Braza <jamesbraza@gmail.com>
Relates to #815
Forward hooks on each HuggingFace MoE router replay the expert choices that vLLM recorded during rollout, recomputing the gate scores so the router stays trainable.
Also pulls in
vllm==0.22.0for vllm-project/vllm#39568, the fix for vllm-project/vllm#40692.