[TRITON/GLUON]: Add moe_a16w4 gluon(gfx1250) kernel#3277
Open
rahulbatra85 wants to merge 3 commits into
Open
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
c5aa5db to
f62226f
Compare
f62226f to
fed066b
Compare
8a6cabc to
aaefe90
Compare
309ab0b to
c036420
Compare
Contributor
|
@rahulbatra85 check out some gluon gfx1250 optimizations I added to a8w4 kernel that should also help in your case.
|
c036420 to
8a08545
Compare
Contributor
Author
Ok, will update my code. |
Contributor
|
One more thing is we can use tdm for bias too, just have to be careful with waits in epilogue if issuing load beforehand. You can also take a look at how it's done in a8w4 kernel. |
97aceaa to
a0f5bd0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Adds moe_a16w4 gluon kernel
Technical Details
Add moe_a16w4 gluon kernel. First 4-bit weight is scaled and upcasted to bf16 and then a bf16 gemm is performed. The gluon kernel also uses gfx1250 tdm ops and pipelining
Test Plan
Unit tests
Test Result
All tests should pass
Submission Checklist