Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[PyTorch] Main_Grad buffer isnt overwritten when overwrite_main_grad=True 2.15.0 bug Something isn't working
#2936 opened Apr 28, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
fix: skip generating fp8 & mxfp8 checkpoints if unsupported
#2935 opened Apr 27, 2026 by kainzhong Collaborator Loading…
4 of 13 tasks
[JAX] Fix MNIST L2 jax test instability
#2933 opened Apr 27, 2026 by tdophung Collaborator Draft
6 of 13 tasks
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Draft
13 tasks
Implement per-token NVFP4 fprop recipe
#2931 opened Apr 27, 2026 by zianglih Contributor Loading…
8 of 13 tasks
[Common/PyTorch] Add MXFP8 cast-and-transpose op
#2930 opened Apr 26, 2026 by jeweldave Loading…
[PyTorch] Avoid removing usages from quantized weight tensors 2.15.0 bug Something isn't working
#2929 opened Apr 25, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928 opened Apr 25, 2026 by eyupcanakman Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925 opened Apr 25, 2026 by ksivaman Member Loading…
7 of 13 tasks
Make TE Sequential Grouped linear Op CUDA graphable
#2923 opened Apr 24, 2026 by vthumbe1503 Collaborator Draft
13 tasks
[PyTorch] Add distributed Muon optimizer 2.16.0
#2920 opened Apr 23, 2026 by vcherepanov-nv Collaborator Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919 opened Apr 23, 2026 by CarlosGomes98 Contributor Draft
13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
Variable Grouped Swizzle
#2914 opened Apr 22, 2026 by int-smart Contributor Loading…
8 of 13 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906 opened Apr 21, 2026 by yaox12 Member Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900 opened Apr 18, 2026 by timmoon10 Collaborator Loading…
9 of 13 tasks
Improve the dimension checks for the FP8 recipes
#2894 opened Apr 16, 2026 by ptrendx Member Loading…
13 tasks
ProTip! What’s not been updated in a month: updated:<2026-03-28.