OneDNN BRGeMM Micro-Kernel Integration for BF16 MatMul#903
Open
bbhattar wants to merge 5 commits intogoogle:devfrom
Open
OneDNN BRGeMM Micro-Kernel Integration for BF16 MatMul#903bbhattar wants to merge 5 commits intogoogle:devfrom
bbhattar wants to merge 5 commits intogoogle:devfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
629b569 to
e072d70
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR integrates OneDNN BRGeMM (Batch-Reduced General Matrix Multiply) micro-kernels as an alternative compute path for BF16 MatMul on Intel Xeon platforms with AMX or AVX-512 BF16 support.
What
When enabled via the
GEMMA_ONEDNN_BRGEMMcompile-time flag, BF16×BF16 MatMul operations are dispatched to JIT-compiled BRGeMM kernels instead of the Highway SIMD path. This targets Gemma model workloads (FFW projections, attention) on Intel Xeon Scalable (SPR/EMR) processors. At this point support has been added to both CMake and Bazel build systems.How to Enable
Runtime Fallback
When
GEMMA_ONEDNN_BRGEMMis enabled at compile time, the BRGeMM path activates for BF16×BF16 operations whose dimensions meet AMX tile constraints (M, N, K ≥ 32 and K % 32 == 0). All other cases — non-BF16 types, smaller or non-aligned dimensions, mixed precision — fall through to the standard Highway SIMD MatMul path automatically.Changes
ops/brgemm.hUseOneDnnBrgemm(), autotuning candidatesops/brgemm-inl.hDoMatMul_BRGeMM(): kernel JIT/caching, B-packing with hugepages, tiled parallel executionops/matmul-inl.hMatMul()guarded by#if GEMMA_ONEDNN_BRGEMMops/matmul.h#include "ops/brgemm.h",brgemm_autotunefield inMMPerKeyops/bench_matmul.ccbrgemm_autotune.Best()to avoid infinite loop when BRGeMM handles dispatchCMakeLists.txtGEMMA_ONEDNN_BRGEMMoption, FetchContent for OneDNN v3.11, conditional target linkingBUILD.bazelconfig_settingforgemma_onednn_brgemm, conditional OneDNN dep and defines for x86_64MODULE.bazelhttp_archivedependencybazel/onednn.BUILDutil/zones.hkBRGeMMcaller enum for thread pool dispatchutil/zones.ccCallerNamemapping forkBRGeMMTesting
matmul_testpasses with and withoutGEMMA_ONEDNN_BRGEMM(all original test shapes, types, and correctness checks preserved)bench_matmulruns successfully with BRGeMM enabled