NNX migration prep (2/N): NNX utils and sharding utilities#3470
Merged
copybara-service[bot] merged 1 commit intomainfrom Apr 17, 2026
Merged
NNX migration prep (2/N): NNX utils and sharding utilities#3470copybara-service[bot] merged 1 commit intomainfrom
copybara-service[bot] merged 1 commit intomainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
4fc37b6 to
722386f
Compare
d8dd362 to
b013c20
Compare
4 tasks
954fded to
9b2900b
Compare
4 tasks
7c4588a to
546580f
Compare
87414f9 to
d7fc0f1
Compare
6f51ed1 to
82573c1
Compare
bvandermoon
reviewed
Apr 15, 2026
90f5024 to
dc4d588
Compare
4 tasks
bvandermoon
approved these changes
Apr 16, 2026
- Add utils to manipulate the NNX shardings with abstract state of a
model
- also add unit tests for the utils
- Extract mesh creation function to maxtext_utils.get_mesh_from_config()
- also add unit tests for this func
Note:
flax v0.12 has DeprecationWarning in multiple places:
- DeprecationWarning: '.value' access is now deprecated. Use
variable.get_value() or variable[...] (for [Array]).
- DeprecationWarning: 'VariableState' was removed, this is just
an alias to 'Variable'. Plase use 'Variable' directly instead.
But since the code needs to work with post-training, which currently
requires flax v0.11, we didn't change code for these warnings.
dc4d588 to
6a0f895
Compare
gobbleturk
approved these changes
Apr 17, 2026
This was referenced Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NNX Migration Route Map
pure_nnxflag,init_state_fn,TrainStateNNX, NNX utils. Linen workflow unchanged. (PR #3427)get_abstract_state_nnx,get_named_sharding_nnx,set_named_sharding_nnx,get_partition_spec_nnx,get_mesh_from_config. (PR #3470)TrainStateNNX, model creation, gradient accumulation, checkpointing, and training loop dispatch. (PR #3500)Description
maxtext_utils_nnx.py) — Functions to manipulate NNX model shardings using abstract model state:get_named_sharding_nnx,set_named_sharding_nnx,get_partition_spec_nnx, and memory movement helpers (move_memory_to_host/move_memory_to_device).get_abstract_stateNNX path — Addedget_abstract_state_nnxtomaxtext_utils.py, which usesnnx.get_abstract_modelto return a flatnnx.State(rather than a fullTrainStateNNX), and updatedget_abstract_stateto dispatch to it whenpure_nnx=True.maxtext_utils.get_mesh_from_config()— Extracted mesh creation into a standalone function with unit tests.tests/unit/maxtext_utils_nnx_test.pyand extendedtests/unit/maxtext_utils_test.pyto cover the new mesh and sharding utilities.Note on Flax deprecation warnings:
Flax v0.12 emits
DeprecationWarningfor.valueaccess andVariableState. These are intentionally left unaddressed because post-training currently requires Flax v0.11 compatibility.Tests
Pre-train Test Result
View Result
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.