Skip to content

Fix: a5 AICore SIMT launch — set localMemorySize + inject SIMT TLVs#764

Open
ChaoZheng109 wants to merge 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix-a5-aicore-simt-tlv
Open

Fix: a5 AICore SIMT launch — set localMemorySize + inject SIMT TLVs#764
ChaoZheng109 wants to merge 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix-a5-aicore-simt-tlv

Conversation

@ChaoZheng109
Copy link
Copy Markdown
Collaborator

@ChaoZheng109 ChaoZheng109 commented May 13, 2026

Summary

Two coupled fixes for the legacy rtKernelLaunchWithHandleV2 path on a5 AICore. Without both, rtKernelLaunchWithHandleV2 either allocates no local memory for the kernel or refuses the launch with ACL_ERROR_RT_PARAM_INVALID (107000).

1. Set cfg.localMemorySize

cfg.localMemorySize was left at 0 in launch_aicore_kernel, so the runtime reserved no AICore local memory and SIMT execution failed. Introduce PLATFORM_AICORE_LOCAL_MEMORY_SIZE = 216 KB and pass it through rtTaskCfgInfo_t. The 216 KB ceiling pairs with the 8 KB advertised by the TLV record (fix #2) to land exactly on RT_SIMT_REMAIN_UB_SIZE (224 KB = 256 KB UB − 32 KB dcache); runtime's check is strict > so equality is accepted.

2. Inject SIMT TLVs into the AICore ELF

Runtime reads two TLV records from .ascend.meta.<funcname> at register time:

  • RT_FUNCTION_TYPE_COMPILER_ALLOC_UB_SIZE (type=7) → Kernel::shareMemSize_
  • RT_FUNCTION_TYPE_AIV_TYPE_FLAG (type=12) → Kernel::kernelVfType_

bisheng only emits these when it can statically infer the kernel uses SIMT intrinsics. Our SU-dispatcher entry doesn't satisfy that — vector ops live in task .o files invoked through aicore_execute. Fix in two layers:

  • Hand-write a meta record for the AIV variant: ub_size = PLATFORM_AICORE_SHARE_MEM_SIZE (8 KB), aiv_type = SIMD_SIMT_MIX_VF. The dispatcher routes task .o files containing both SIMD and SIMT vector kernels, so MIX_VF avoids runtime's per-type restrictions.
  • Add -mllvm -cce-dyn-kernel-stack-size=false to the AICore build flags. Without it, bisheng auto-emits a sibling section with the same name, which runtime's parser (kernelInfoMap keyed by section name) overwrites instead of merging — so the auto-emitted NO_VF / shareMemSize=0 would shadow our values.

TLV type IDs 7 / 12 mirror rtFunctionMetaType in runtime/runtime/elf_base.h; AIVType values are not exposed in any CANN C/C++ header (only in ascendc_identify_meta_section_info.py). Both are documented inline in simt_meta.h for traceability.

Files

  • src/a5/platform/include/common/platform_config.h — new PLATFORM_AICORE_LOCAL_MEMORY_SIZE and PLATFORM_AICORE_SHARE_MEM_SIZE constants
  • src/a5/platform/onboard/host/device_runner.cpp — set cfg.localMemorySize in launch_aicore_kernel
  • src/a5/platform/onboard/aicore/CMakeLists.txt — add -mllvm -cce-dyn-kernel-stack-size=false
  • src/a5/platform/onboard/aicore/simt_meta.h — TLV struct/enum definitions
  • src/a5/platform/onboard/aicore/kernel.cpp — hand-written SIMT TLV record using the extracted types

+107 lines across 5 files, no deletions.

Test plan

  • AICore object builds with the new bisheng flag without errors.
  • bisheng-readelf -S build/lib/<…>/aicore_kernel.o shows exactly one .ascend.meta.aicore_kernel_0_mix_aiv section (no shadow).
  • TLV bytes dump to [type=7, len=4, ub_size=8192] and [type=12, len=4, aiv_type=4].
  • An a5 onboard example (e.g. a small example under examples/) launches without ACL_ERROR_RT_PARAM_INVALID (107000) from runtime's CheckAndGetTotalShareMemorySize.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements SIMT metadata TLV injection for AICore kernels to support the legacy launch path. Key changes include defining TLV structures and metadata enums in kernel.cpp, disabling automatic metadata generation via compiler flags in CMakeLists.txt, and setting the local memory size in the host-side device runner. Review feedback recommends renaming internal structures to avoid reserved identifier conflicts and using macros to dynamically generate section names for better maintainability.

Comment thread src/a5/platform/onboard/aicore/kernel.cpp Outdated
Comment thread src/a5/platform/onboard/aicore/kernel.cpp Outdated
Two coupled fixes for the legacy rtKernelLaunchWithHandleV2 path on a5:

1. cfg.localMemorySize was left at 0, so runtime allocated no AICore
   local memory and SIMT execution failed. Define
   PLATFORM_AICORE_LOCAL_MEMORY_SIZE (216 KB) and pass it through
   rtTaskCfgInfo_t. The 216 KB ceiling pairs with the 8 KB advertised
   below to land exactly on RT_SIMT_REMAIN_UB_SIZE (224 KB = 256 KB UB
   − 32 KB dcache); runtime's check is strict > so equality is accepted.

2. Runtime reads two TLV records (COMPILER_ALLOC_UB_SIZE / type=7 and
   AIV_TYPE_FLAG / type=12) from the kernel ELF's `.ascend.meta.<func>`
   section to populate Kernel::shareMemSize_ and Kernel::kernelVfType_.
   bisheng only emits these when it can statically infer SIMT use; our
   SU-dispatcher entry can't be tagged automatically. Inject a
   hand-written meta record for the AIV variant (ub_size=8192,
   aiv_type=SIMT_VF_ONLY) and disable bisheng's auto-emission with
   `-mllvm -cce-dyn-kernel-stack-size=false` so the runtime parser,
   which keys kernelInfoMap by section name and overwrites instead of
   merging, doesn't shadow our values with NO_VF / shareMemSize=0.
@ChaoZheng109 ChaoZheng109 force-pushed the fix-a5-aicore-simt-tlv branch from 3f848f9 to e7a9644 Compare May 13, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant