[megatron] Add seq packing support for qwen3.5 by erictang000 · Pull Request #1769 · NovaSky-AI/SkyRL

erictang000 · 2026-06-10T01:26:31Z

No description provided.

Qwen3.5 hybrid Gated-DeltaNet checkpoints report a ...ForConditionalGeneration arch and auto-dispatch through megatron-bridge's VL bridge -> Qwen3VLModel, which packs + CP-shards sequences inside its own forward. Under SkyRL sample packing that double-packs and corrupts the cu_seqlens fed to the GDN varlen kernel, aborting in the backward. When language_model_only=True, route these checkpoints to megatron-core's native GPTModel + GDN thd path (vision tower dropped), which supports packed sequences directly. Implemented as thin SkyRL subclasses of the stock Qwen35MoEBridge / Qwen35Bridge that feed text_config into the inherited provider logic via a shim and re-prefix the weight mappings to model.language_model.*; registered on sentinel ...ForCausalLM source keys so the real VL-bridge registration is not clobbered. maybe_force_qwen35_text_bridge rewrites the loaded architectures to the sentinel before to_megatron_provider; the worker calls it gated on policy/ref language_model_only. Verified logprob parity vs vLLM with sample packing on: - Qwen3.5-0.8B (dense), TP=2: diff mean 0.008 - Qwen3.5-35B-A3B (MoE), TP=4 EP=4: diff mean 0.010 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request enables sample packing for Qwen3.5 hybrid Gated-DeltaNet (GDN) models on the Megatron backend by routing them to the native GPTModel path when language_model_only=True. Feedback highlights a critical issue where calling get_text_config() on the Hugging Face configuration will raise an AttributeError at runtime, and suggests updating an outdated comment in the shell script.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T19:09:16Z


 # Qwen3.5 flags
-REMOVE_MICROBATCH_PADDING=false # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644
+REMOVE_MICROBATCH_PADDING=True # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644


The comment on this line is outdated and misleading because this pull request is specifically adding support for sample packing with GDN layers for Qwen3.5. Let's update the comment to reflect that sample packing is now supported and enabled.

Suggested change

REMOVE_MICROBATCH_PADDING=True # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644

REMOVE_MICROBATCH_PADDING=True # Enable sample packing for GDN layers in megatron

…3_5_seq_pack

…it should be 0 for blackwell

erictang000 and others added 4 commits June 10, 2026 01:26

x

6c75a96

remove delegate packing path

befa114

trim comments

f23e59c

erictang000 marked this pull request as ready for review June 15, 2026 19:07

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

erictang000 added 2 commits June 15, 2026 20:41

x

9ac1ed6

Merge branch 'main' of https://github.com/erictang000/SkyRL into qwen…

d70cb49

…3_5_seq_pack

erictang000 changed the title ~~[WIP] Add seq packing support for qwen3.5~~ [megatron] Add seq packing support for qwen3.5 Jun 15, 2026

erictang000 added 5 commits June 15, 2026 20:53

x

320bb73

x

f3bd7d8

x

4970d0d

x

6fb252d

set fla_tilelang=1 by default for hopper, leave commetn about it how …

77de225

…it should be 0 for blackwell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] Add seq packing support for qwen3.5#1769

[megatron] Add seq packing support for qwen3.5#1769
erictang000 wants to merge 11 commits into
NovaSky-AI:mainfrom
erictang000:qwen3_5_seq_pack

erictang000 commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	REMOVE_MICROBATCH_PADDING=True # sample packing is not yet supported for GDN layers in megatron - see: https://github.com/NVIDIA/Megatron-LM/pull/2644
	REMOVE_MICROBATCH_PADDING=True # Enable sample packing for GDN layers in megatron

Conversation

erictang000 commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant