[megatron] Add seq packing support for qwen3.5#1769
Conversation
Qwen3.5 hybrid Gated-DeltaNet checkpoints report a ...ForConditionalGeneration arch and auto-dispatch through megatron-bridge's VL bridge -> Qwen3VLModel, which packs + CP-shards sequences inside its own forward. Under SkyRL sample packing that double-packs and corrupts the cu_seqlens fed to the GDN varlen kernel, aborting in the backward. When language_model_only=True, route these checkpoints to megatron-core's native GPTModel + GDN thd path (vision tower dropped), which supports packed sequences directly. Implemented as thin SkyRL subclasses of the stock Qwen35MoEBridge / Qwen35Bridge that feed text_config into the inherited provider logic via a shim and re-prefix the weight mappings to model.language_model.*; registered on sentinel ...ForCausalLM source keys so the real VL-bridge registration is not clobbered. maybe_force_qwen35_text_bridge rewrites the loaded architectures to the sentinel before to_megatron_provider; the worker calls it gated on policy/ref language_model_only. Verified logprob parity vs vLLM with sample packing on: - Qwen3.5-0.8B (dense), TP=2: diff mean 0.008 - Qwen3.5-35B-A3B (MoE), TP=4 EP=4: diff mean 0.010 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request enables sample packing for Qwen3.5 hybrid Gated-DeltaNet (GDN) models on the Megatron backend by routing them to the native GPTModel path when language_model_only=True. Feedback highlights a critical issue where calling get_text_config() on the Hugging Face configuration will raise an AttributeError at runtime, and suggests updating an outdated comment in the shell script.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| # Qwen3.5 flags | ||
| REMOVE_MICROBATCH_PADDING=false # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644 | ||
| REMOVE_MICROBATCH_PADDING=True # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644 |
There was a problem hiding this comment.
The comment on this line is outdated and misleading because this pull request is specifically adding support for sample packing with GDN layers for Qwen3.5. Let's update the comment to reflect that sample packing is now supported and enabled.
| REMOVE_MICROBATCH_PADDING=True # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644 | |
| REMOVE_MICROBATCH_PADDING=True # Enable sample packing for GDN layers in megatron |
No description provided.