feat(language): Opacus DP training with wrap_model=False by mplatzer · Pull Request #279 · mostly-ai/mostlyai-engine

mplatzer · 2026-05-05T19:29:39Z

Summary

Language-model DP training calls PrivacyEngine.make_private(..., wrap_model=False) so Opacus attaches GradSampleHooks to the original PreTrainedModel. After the training loop, GradSampleHooks.cleanup() removes hooks and Opacus-added parameter attributes.

This branch is up to date with cursor/opacus-1.6.0-upgrade-7c69 (includes the tabular NestedTensor + DP CI fix from #278).

Stack

Merge deps: bump Opacus to 1.6.0 #278 into main first (Opacus 1.6.0 + nested CTXSEQ fix for DP tabular).
Then merge feat(language): Opacus DP training with wrap_model=False #279 (or rebase onto main after deps: bump Opacus to 1.6.0 #278 lands).

Verification

ruff check mostlyai/engine/_language/training.py
pytest tests/unit — 236 passed
pytest tests/end_to_end/test_tabular_sequential.py::TestTabularTrainingStrategy::test_training_strategy — both parametrizations passed locally

Notes

Tabular DP training is unchanged in API; collator behavior for CTXSEQ under DP is fixed on the base branch (deps: bump Opacus to 1.6.0 #278).

Raise the Opacus floor and refresh uv.lock so installs resolve to 1.6.0, which adds non-wrapping mode, FSDP/mixed-precision DP improvements, and assorted accountant/clipping fixes while maintaining torch>=2.6 alignment. Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

Attach per-sample gradient hooks without GradSampleModule wrapping so the Hugging Face module hierarchy stays intact. Call GradSampleHooks.cleanup() after the training loop to remove hooks and Opacus monkey-patched attrs. Depends on Opacus >= 1.6 (non-wrapping mode). Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

Opacus 1.x per-sample gradient hooks hit NotImplementedError on NestedTensorCPU (aten::new_empty). For DP training, collate CTXSEQ as padded dense tensors with -1 padding; SequentialContextEmbedders already masks -1 and maps to embedding index 0. Non-DP sequential training keeps nested CTXSEQ collate for unchanged behavior. Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

cursoragent and others added 4 commits May 5, 2026 19:23

merge: sync Opacus upgrade branch (nested CTXSEQ DP fix)

7094559

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

cursor Bot mentioned this pull request May 6, 2026

deps: bump Opacus to 1.6.0 #278

Merged

Base automatically changed from cursor/opacus-1.6.0-upgrade-7c69 to main May 6, 2026 06:24

Merge branch 'main' into cursor/opacus-dp-wrap-model-false-7c69

c52917a

mplatzer marked this pull request as ready for review May 8, 2026 12:06

mplatzer merged commit edade8f into main May 8, 2026
7 checks passed

mplatzer deleted the cursor/opacus-dp-wrap-model-false-7c69 branch May 8, 2026 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(language): Opacus DP training with wrap_model=False#279

feat(language): Opacus DP training with wrap_model=False#279
mplatzer merged 5 commits into
mainfrom
cursor/opacus-dp-wrap-model-false-7c69

mplatzer commented May 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mplatzer commented May 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mplatzer commented May 5, 2026 •

edited by cursor Bot

Loading