Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add NVFP4 per-token quantization recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3045 opened May 26, 2026 by cael-ling Contributor Draft
13 tasks
[PyTorch Debug] Add scale_inv_std stat and skip NVFP4 layers in LogFp8TensorStats
#3044 opened May 26, 2026 by pggPL Collaborator Loading…
9 of 13 tasks
docs: expand comm gemm overlap guidance community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3043 opened May 26, 2026 by omribz156 Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3042 opened May 26, 2026 by zianglih Contributor Draft
[PyTorch Debug] Fix scale_inv_min returning 0 for MXFP8/NVFP4
#3041 opened May 25, 2026 by pggPL Collaborator Loading…
6 of 13 tasks
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w… community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3040 opened May 25, 2026 by shangxiaokang Draft
13 tasks
TE_DType in python
#3039 opened May 22, 2026 by vthumbe1503 Collaborator Draft
13 tasks
[PyTorch] Make modules.GroupedLinear graph-safe org-contribution
#3038 opened May 22, 2026 by yaox12 Member Loading…
1 of 13 tasks
[fix] Fix CUTLASS grouped GEMM segfault for empty groups community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3037 opened May 22, 2026 by Baibaifan Loading…
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Expert Parallelism: common C API + NCCL EP backend
#3034 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Add MXFP8 attention unit test with linear and rope layers community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3033 opened May 22, 2026 by layalir Loading…
[Common] Enable NVFP4 2D block scaling in columnwise only
#3027 opened May 21, 2026 by negvet Collaborator Loading…
1 of 13 tasks
[Common] Fix fused MoE aux loss for sequence aux loss
#3018 opened May 21, 2026 by harryzhou2000 Member Loading…
Add the getter and setter of skip_fp8_weight_update_tensor community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3015 opened May 20, 2026 by xrennvidia Collaborator Loading…
6 of 13 tasks
adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py
#3014 opened May 20, 2026 by francesco-bertolotti Contributor Loading…
[PyTorch] NVFP4 RHT cast-fusion: emit GEMM-swizzled scale factors directly
#3011 opened May 19, 2026 by cael-ling Contributor Loading…
8 of 13 tasks
Bitmap topk
#3009 opened May 18, 2026 by tdophung Collaborator Loading…
9 of 13 tasks
Generalized Tensor Parallelism (GTP) org-contribution
#3005 opened May 18, 2026 by fanshiqing Member Loading…
6 of 12 tasks
Add wheel support for Newton-Schulz method via cuSolverMp
#3004 opened May 17, 2026 by ksivaman Member Loading…
6 of 13 tasks
ProTip! Follow long discussions with comments:>50.