Skip to content

OneDNN BRGeMM Micro-Kernel Integration for BF16 MatMul#903

Open
bbhattar wants to merge 5 commits intogoogle:devfrom
Intel-tensorflow:feature/onednn-brgemm
Open

OneDNN BRGeMM Micro-Kernel Integration for BF16 MatMul#903
bbhattar wants to merge 5 commits intogoogle:devfrom
Intel-tensorflow:feature/onednn-brgemm

Conversation

@bbhattar
Copy link
Copy Markdown

This PR integrates OneDNN BRGeMM (Batch-Reduced General Matrix Multiply) micro-kernels as an alternative compute path for BF16 MatMul on Intel Xeon platforms with AMX or AVX-512 BF16 support.

What

When enabled via the GEMMA_ONEDNN_BRGEMM compile-time flag, BF16×BF16 MatMul operations are dispatched to JIT-compiled BRGeMM kernels instead of the Highway SIMD path. This targets Gemma model workloads (FFW projections, attention) on Intel Xeon Scalable (SPR/EMR) processors. At this point support has been added to both CMake and Bazel build systems.

How to Enable

# CMake
cmake -DGEMMA_ONEDNN_BRGEMM=ON ..

# Bazel
bazel build --define gemma_onednn_brgemm=1 ...

Runtime Fallback

When GEMMA_ONEDNN_BRGEMM is enabled at compile time, the BRGeMM path activates for BF16×BF16 operations whose dimensions meet AMX tile constraints (M, N, K ≥ 32 and K % 32 == 0). All other cases — non-BF16 types, smaller or non-aligned dimensions, mixed precision — fall through to the standard Highway SIMD MatMul path automatically.

Changes

File Description
ops/brgemm.h Types, caches, thread-local buffers, UseOneDnnBrgemm(), autotuning candidates
ops/brgemm-inl.h DoMatMul_BRGeMM(): kernel JIT/caching, B-packing with hugepages, tiled parallel execution
ops/matmul-inl.h BRGeMM dispatch block in MatMul() guarded by #if GEMMA_ONEDNN_BRGEMM
ops/matmul.h #include "ops/brgemm.h", brgemm_autotune field in MMPerKey
ops/bench_matmul.cc Check brgemm_autotune.Best() to avoid infinite loop when BRGeMM handles dispatch
CMakeLists.txt GEMMA_ONEDNN_BRGEMM option, FetchContent for OneDNN v3.11, conditional target linking
BUILD.bazel config_setting for gemma_onednn_brgemm, conditional OneDNN dep and defines for x86_64
MODULE.bazel OneDNN v3.11 http_archive dependency
bazel/onednn.BUILD Bazel build rules for OneDNN
util/zones.h kBRGeMM caller enum for thread pool dispatch
util/zones.cc CallerName mapping for kBRGeMM

Testing

  • matmul_test passes with and without GEMMA_ONEDNN_BRGEMM (all original test shapes, types, and correctness checks preserved)
  • bench_matmul runs successfully with BRGeMM enabled
  • No changes to existing tests; zero impact when OneDNN is not enabled or on non-x86 platforms

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 28, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@bbhattar bbhattar force-pushed the feature/onednn-brgemm branch from 629b569 to e072d70 Compare April 28, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant