Skip to content

feat: Add npu megatron support#380

Open
UsernameFull wants to merge 10 commits into
alibaba:mainfrom
UsernameFull:npu_megatron
Open

feat: Add npu megatron support#380
UsernameFull wants to merge 10 commits into
alibaba:mainfrom
UsernameFull:npu_megatron

Conversation

@UsernameFull
Copy link
Copy Markdown
Contributor

@UsernameFull UsernameFull commented Mar 16, 2026

Summary

This PR adds support for Huawei Ascend NPU devices with Megatron-Core backend, enabling ROLL framework to run reinforcement learning training on NPU hardware.

Key Changes

1. Platform Detection Priority

File: roll/platforms/__init__.py

Changes: Reordered platform detection to check NPU before CUDA.

Reason: NPU devices were incorrectly falling back to CUDA platform. Prioritizing NPU detection ensures NpuPlatform is properly initialized when torch_npu is available.

2. Device-Agnostic Operations

File: roll/pipeline/base_worker.py

Changes:

  • Replaced "cuda" with current_platform.device_type
  • Replaced torch.cuda.memory_allocated() with current_platform.memory_allocated()

3. MindSpeed Integration

File: mcore_adapter/src/mcore_adapter/training_args.py

Changes: Added optional import of mindspeed.megatron_adaptor .

Reason: MindSpeed is Huawei's library providing NPU-specific Megatron optimizations. The adaptor patches Megatron-Core for NPU compatibility while maintaining GPU compatibility via try-except.

4. NPU Attention Mask Format

File: roll/distributed/strategy/megatron_strategy.py

Changes: Added NPU-specific attention mask transformation to 4D format.

Reason: NPU requires 4D attention masks [B, 1, S, S] instead of standard 2D [B, S] . This hardware-specific requirement ensures correct attention computation on NPU.

if hasattr(torch, "npu") and torch.npu.is_available() and attention_mask is 
not None:
    attention_mask = attention_mask.bool()
    attention_mask = attention_mask[:, None, None, :].expand(B, 1, S, S)

5. Optimizer Compatibility

File: roll/third_party/megatron/optimizer.py

Changes: Added support for no_weight_decay_cond , scale_lr_cond , lr_mult parameters.

6. Example Configurations

Files:

- examples/ascend_examples/qwen3_4B_dpo_megatron.yaml
- examples/ascend_examples/qwen3_8b_rlvr_deepspeed.yaml
- examples/ascend_examples/run_dpo_pipeline.sh

Reason: Provides ready-to-use NPU training examples demonstrating proper device mapping and strategy configuration for both DPO and RLVR pipelines.

Impact

Benefits:

  • Enables megatron on Huawei Ascend NPU hardware
  • Maintains full backward compatibility with GPU systems
  • Follows existing platform abstraction patterns

Requirements

  • Huawei Ascend NPU with torch_npu installed
  • MindSpeed(v0.15.3) library for NPU Megatron support

@UsernameFull UsernameFull force-pushed the npu_megatron branch 2 times, most recently from fb2e7dc to acfad89 Compare March 17, 2026 06:54
@UsernameFull UsernameFull changed the title [WIP]Add npu megatron support feat: Add npu megatron support Apr 2, 2026
@UsernameFull UsernameFull force-pushed the npu_megatron branch 3 times, most recently from 1e7f794 to 0af5e74 Compare April 2, 2026 08:42
Comment thread roll/platforms/__init__.py
Comment thread mcore_adapter/src/mcore_adapter/training_args.py
@HuangJoJo
Copy link
Copy Markdown
Collaborator

plz attach qwen3-8B rlvr megatron training curves

@UsernameFull
Copy link
Copy Markdown
Contributor Author

plz attach qwen3-8B rlvr megatron training curves

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants