Skip to content

Update 2dfsdp logical rule#3805

Merged
copybara-service[bot] merged 1 commit intomainfrom
chengnuojin-fix-2dfsdp
May 4, 2026
Merged

Update 2dfsdp logical rule#3805
copybara-service[bot] merged 1 commit intomainfrom
chengnuojin-fix-2dfsdp

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented May 4, 2026

Description

Update 2dfsdp logical rule supporting FSDP+FSDP_T

Tests

Train compile test: https://paste.googleplex.com/4570213325602816

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

The Pull Request correctly updates the logical axis rules for the DeepSeek3 2D FSDP configuration. It enables more efficient memory distribution for the large vocabulary embedding layer and aligns the MoE logical axes with the core model implementation.

🔍 General Feedback

  • Optimization: Enabling 2D sharding (fsdp and fsdp_transpose) for embed_vocab is a solid improvement for handling the large embedding weights in a 2D mesh configuration.
  • Consistency: Adding the mlp_moe rule ensures consistent sharding behavior across MoE components, matching the expected axis names in the model code.
  • Cleanup: Removing unused axis names (embed_no_exp, mlp_only_tensor, etc.) reduces clutter and potential confusion in the configuration file.

@copybara-service copybara-service Bot merged commit 1e72989 into main May 4, 2026
51 checks passed
@copybara-service copybara-service Bot deleted the chengnuojin-fix-2dfsdp branch May 4, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants