[fea]: gfx1250 allreduce poc#3802
Open
TennyWang1223 wants to merge 7 commits into
Open
Conversation
Signed-off-by: HaonanWang98 <hwang@amd.com>
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Signed-off-by: HaonanWang98 <hwang@amd.com>
Signed-off-by: HaonanWang98 <hwang@amd.com>
Signed-off-by: HaonanWang98 <hwang@amd.com>
Split gfx1250 (MI450) allreduce kernel and dispatch logic out of the shared custom_all_reduce.cuh into a self-contained compilation unit (custom_all_reduce_gfx1250.cuh/.cu) with its own JIT module (module_custom_all_reduce_gfx1250). This avoids CK header incompatibility on gfx1250 and lets each arch own its Signal struct size (gfx1250: kMaxBlocks=256, old arch: kMaxBlocks=80). Old arch code is unchanged except for removing gfx1250-specific code and reverting kMaxBlocks/kMaxBlocksLegacy back to a single kMaxBlocks=80. Python side selects the correct module at runtime based on gcnArchName. The gfx1250 C++ API uses direct device pointers (no hipIpc) in preparation for a VMM-based IPC implementation (hipIpcOpenMemHandle is not available on gfx1250). Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: HaonanWang98 <hwang@amd.com>
Signed-off-by: HaonanWang98 <hwang@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
allreduce poc for gfx1250
Technical Details
naive impl, no perf optimize
Test Plan
run test script with tp4 and tp2
Test Result
passed
Submission Checklist