Skip to content

feat(language): Opacus DP training with wrap_model=False#279

Merged
mplatzer merged 5 commits into
mainfrom
cursor/opacus-dp-wrap-model-false-7c69
May 8, 2026
Merged

feat(language): Opacus DP training with wrap_model=False#279
mplatzer merged 5 commits into
mainfrom
cursor/opacus-dp-wrap-model-false-7c69

Conversation

@mplatzer
Copy link
Copy Markdown
Contributor

@mplatzer mplatzer commented May 5, 2026

Summary

Language-model DP training calls PrivacyEngine.make_private(..., wrap_model=False) so Opacus attaches GradSampleHooks to the original PreTrainedModel. After the training loop, GradSampleHooks.cleanup() removes hooks and Opacus-added parameter attributes.

This branch is up to date with cursor/opacus-1.6.0-upgrade-7c69 (includes the tabular NestedTensor + DP CI fix from #278).

Stack

  1. Merge deps: bump Opacus to 1.6.0 #278 into main first (Opacus 1.6.0 + nested CTXSEQ fix for DP tabular).
  2. Then merge feat(language): Opacus DP training with wrap_model=False #279 (or rebase onto main after deps: bump Opacus to 1.6.0 #278 lands).

Verification

  • ruff check mostlyai/engine/_language/training.py
  • pytest tests/unit — 236 passed
  • pytest tests/end_to_end/test_tabular_sequential.py::TestTabularTrainingStrategy::test_training_strategy — both parametrizations passed locally

Notes

Open in Web Open in Cursor 

cursoragent and others added 4 commits May 5, 2026 19:23
Raise the Opacus floor and refresh uv.lock so installs resolve to 1.6.0,
which adds non-wrapping mode, FSDP/mixed-precision DP improvements, and
assorted accountant/clipping fixes while maintaining torch>=2.6 alignment.

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Attach per-sample gradient hooks without GradSampleModule wrapping so the
Hugging Face module hierarchy stays intact. Call GradSampleHooks.cleanup()
after the training loop to remove hooks and Opacus monkey-patched attrs.

Depends on Opacus >= 1.6 (non-wrapping mode).

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Opacus 1.x per-sample gradient hooks hit NotImplementedError on NestedTensorCPU
(aten::new_empty). For DP training, collate CTXSEQ as padded dense tensors with -1
padding; SequentialContextEmbedders already masks -1 and maps to embedding index 0.

Non-DP sequential training keeps nested CTXSEQ collate for unchanged behavior.

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
@cursor cursor Bot mentioned this pull request May 6, 2026
Base automatically changed from cursor/opacus-1.6.0-upgrade-7c69 to main May 6, 2026 06:24
@mplatzer mplatzer marked this pull request as ready for review May 8, 2026 12:06
@mplatzer mplatzer merged commit edade8f into main May 8, 2026
7 checks passed
@mplatzer mplatzer deleted the cursor/opacus-dp-wrap-model-false-7c69 branch May 8, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants