feat(language): Opacus DP training with wrap_model=False#279
Merged
Conversation
Raise the Opacus floor and refresh uv.lock so installs resolve to 1.6.0, which adds non-wrapping mode, FSDP/mixed-precision DP improvements, and assorted accountant/clipping fixes while maintaining torch>=2.6 alignment. Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Attach per-sample gradient hooks without GradSampleModule wrapping so the Hugging Face module hierarchy stays intact. Call GradSampleHooks.cleanup() after the training loop to remove hooks and Opacus monkey-patched attrs. Depends on Opacus >= 1.6 (non-wrapping mode). Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Opacus 1.x per-sample gradient hooks hit NotImplementedError on NestedTensorCPU (aten::new_empty). For DP training, collate CTXSEQ as padded dense tensors with -1 padding; SequentialContextEmbedders already masks -1 and maps to embedding index 0. Non-DP sequential training keeps nested CTXSEQ collate for unchanged behavior. Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Language-model DP training calls
PrivacyEngine.make_private(..., wrap_model=False)so Opacus attachesGradSampleHooksto the originalPreTrainedModel. After the training loop,GradSampleHooks.cleanup()removes hooks and Opacus-added parameter attributes.This branch is up to date with
cursor/opacus-1.6.0-upgrade-7c69(includes the tabular NestedTensor + DP CI fix from #278).Stack
mainfirst (Opacus 1.6.0 + nested CTXSEQ fix for DP tabular).mainafter deps: bump Opacus to 1.6.0 #278 lands).Verification
ruff check mostlyai/engine/_language/training.pypytest tests/unit— 236 passedpytest tests/end_to_end/test_tabular_sequential.py::TestTabularTrainingStrategy::test_training_strategy— both parametrizations passed locallyNotes