Fix: a5 AICore SIMT launch — set localMemorySize + inject SIMT TLVs#764
Open
ChaoZheng109 wants to merge 1 commit into
Open
Fix: a5 AICore SIMT launch — set localMemorySize + inject SIMT TLVs#764ChaoZheng109 wants to merge 1 commit into
ChaoZheng109 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements SIMT metadata TLV injection for AICore kernels to support the legacy launch path. Key changes include defining TLV structures and metadata enums in kernel.cpp, disabling automatic metadata generation via compiler flags in CMakeLists.txt, and setting the local memory size in the host-side device runner. Review feedback recommends renaming internal structures to avoid reserved identifier conflicts and using macros to dynamically generate section names for better maintainability.
Two coupled fixes for the legacy rtKernelLaunchWithHandleV2 path on a5: 1. cfg.localMemorySize was left at 0, so runtime allocated no AICore local memory and SIMT execution failed. Define PLATFORM_AICORE_LOCAL_MEMORY_SIZE (216 KB) and pass it through rtTaskCfgInfo_t. The 216 KB ceiling pairs with the 8 KB advertised below to land exactly on RT_SIMT_REMAIN_UB_SIZE (224 KB = 256 KB UB − 32 KB dcache); runtime's check is strict > so equality is accepted. 2. Runtime reads two TLV records (COMPILER_ALLOC_UB_SIZE / type=7 and AIV_TYPE_FLAG / type=12) from the kernel ELF's `.ascend.meta.<func>` section to populate Kernel::shareMemSize_ and Kernel::kernelVfType_. bisheng only emits these when it can statically infer SIMT use; our SU-dispatcher entry can't be tagged automatically. Inject a hand-written meta record for the AIV variant (ub_size=8192, aiv_type=SIMT_VF_ONLY) and disable bisheng's auto-emission with `-mllvm -cce-dyn-kernel-stack-size=false` so the runtime parser, which keys kernelInfoMap by section name and overwrites instead of merging, doesn't shadow our values with NO_VF / shareMemSize=0.
3f848f9 to
e7a9644
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two coupled fixes for the legacy
rtKernelLaunchWithHandleV2path on a5 AICore. Without both,rtKernelLaunchWithHandleV2either allocates no local memory for the kernel or refuses the launch withACL_ERROR_RT_PARAM_INVALID (107000).1. Set
cfg.localMemorySizecfg.localMemorySizewas left at0inlaunch_aicore_kernel, so the runtime reserved no AICore local memory and SIMT execution failed. IntroducePLATFORM_AICORE_LOCAL_MEMORY_SIZE = 216 KBand pass it throughrtTaskCfgInfo_t. The 216 KB ceiling pairs with the 8 KB advertised by the TLV record (fix #2) to land exactly onRT_SIMT_REMAIN_UB_SIZE(224 KB = 256 KB UB − 32 KB dcache); runtime's check is strict>so equality is accepted.2. Inject SIMT TLVs into the AICore ELF
Runtime reads two TLV records from
.ascend.meta.<funcname>at register time:RT_FUNCTION_TYPE_COMPILER_ALLOC_UB_SIZE(type=7) →Kernel::shareMemSize_RT_FUNCTION_TYPE_AIV_TYPE_FLAG(type=12) →Kernel::kernelVfType_bisheng only emits these when it can statically infer the kernel uses SIMT intrinsics. Our SU-dispatcher entry doesn't satisfy that — vector ops live in task
.ofiles invoked throughaicore_execute. Fix in two layers:ub_size = PLATFORM_AICORE_SHARE_MEM_SIZE(8 KB),aiv_type = SIMD_SIMT_MIX_VF. The dispatcher routes task.ofiles containing both SIMD and SIMT vector kernels, so MIX_VF avoids runtime's per-type restrictions.-mllvm -cce-dyn-kernel-stack-size=falseto the AICore build flags. Without it, bisheng auto-emits a sibling section with the same name, which runtime's parser (kernelInfoMap keyed by section name) overwrites instead of merging — so the auto-emittedNO_VF / shareMemSize=0would shadow our values.TLV type IDs
7/12mirrorrtFunctionMetaTypeinruntime/runtime/elf_base.h;AIVTypevalues are not exposed in any CANN C/C++ header (only inascendc_identify_meta_section_info.py). Both are documented inline insimt_meta.hfor traceability.Files
src/a5/platform/include/common/platform_config.h— newPLATFORM_AICORE_LOCAL_MEMORY_SIZEandPLATFORM_AICORE_SHARE_MEM_SIZEconstantssrc/a5/platform/onboard/host/device_runner.cpp— setcfg.localMemorySizeinlaunch_aicore_kernelsrc/a5/platform/onboard/aicore/CMakeLists.txt— add-mllvm -cce-dyn-kernel-stack-size=falsesrc/a5/platform/onboard/aicore/simt_meta.h— TLV struct/enum definitionssrc/a5/platform/onboard/aicore/kernel.cpp— hand-written SIMT TLV record using the extracted types+107 lines across 5 files, no deletions.
Test plan
bisheng-readelf -S build/lib/<…>/aicore_kernel.oshows exactly one.ascend.meta.aicore_kernel_0_mix_aivsection (no shadow).[type=7, len=4, ub_size=8192]and[type=12, len=4, aiv_type=4].examples/) launches withoutACL_ERROR_RT_PARAM_INVALID (107000)from runtime'sCheckAndGetTotalShareMemorySize.