Add: a5 chevron launch path for AICore SIMT validation#734
Open
ChaoZheng109 wants to merge 2 commits into
Open
Add: a5 chevron launch path for AICore SIMT validation#734ChaoZheng109 wants to merge 2 commits into
ChaoZheng109 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new constant PLATFORM_AICORE_LOCAL_MEMORY_SIZE (224 KB) in platform_config.h and updates the DeviceRunner::launch_aicore_kernel function in device_runner.cpp to utilize this constant when configuring the AICore task's local memory size. I have no feedback to provide.
cfg.localMemorySize was left unset in launch_aicore_kernel, leaving the field at zero so the runtime allocated no local memory for the AICore task and SIMT execution failed. Define PLATFORM_AICORE_LOCAL_MEMORY_SIZE (224 KB) in platform_config.h and pass it through rtTaskCfgInfo_t so every a5 AICore launch reserves the required local memory.
6eeb806 to
0e39aa9
Compare
a5316aa to
7724cb5
Compare
Introduce a parallel AICore launch path on a5 that uses the bisheng
chevron syntax (`kernel<<<numBlocks, dynamic_shmem_sz, stream>>>`)
instead of rtKernelLaunchWithHandleV2. The chevron form is required
for SIMT kernels because it lowers to LaunchAscendKernel, which in
turn programs the AICore local memory window via the SIMT-specific
runtime path.
- chevron_launch.cpp: __global__ __aicore__ entry that delegates to
the existing runtime aicore_execute() handshake loop, plus an
extern "C" host wrapper that calls
`aicore_chevron_entry<<<blockDim, PLATFORM_AICORE_LOCAL_MEMORY_SIZE,
stream>>>(runtime)`. Compiled with `bisheng --asc-aicore-lang
--npu-arch=dav-c310` into a host-linkable .o whose .ascend.kernel.*
section embeds the AICore ELF.
- host/CMakeLists.txt: add a custom command that runs bisheng on
chevron_launch.cpp and links the resulting object into
host_runtime.so.
- device_runner.cpp: env-gated branch in launch_aicore_kernel —
`SIMPLER_USE_CHEVRON_LAUNCH=1` selects the chevron path; default
behavior unchanged.
Refactor aicore_execute into a header so chevron_launch.cpp can link
cleanly. The chevron mix kernel needs `aicore_execute` visible on the
same TU it is called from — the device-side linker (ld.lld inside
bisheng) cannot resolve .cube/.vector references across separately
compiled TUs, and bisheng `-c -o file.o` rejects multiple inputs. Move
the body from runtime/{tensormap_and_ringbuffer,host_build_graph}/
aicore/aicore_executor.cpp into a new aicore_executor.h with
`inline __aicore__` linkage. Both the legacy AICore kernel.cpp and
chevron_launch.cpp #include the header and emit their own
instantiation; the host SO uses one launch path at a time, so the
device-side duplication is benign. build_config.py exposes each
runtime's aicore/ dir to the aicore and host targets so both
compilers find the header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7724cb5 to
45d8d4c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two commits that together unblock SIMT execution on a5:
PLATFORM_AICORE_LOCAL_MEMORY_SIZE = 229376(224 KB) and pass it throughrtTaskCfgInfo_t::localMemorySizein the existingrtKernelLaunchWithHandleV2path.kernel<<<numBlocks, dynamic_shmem_sz, stream>>>). The kernel body delegates to the sameaicore_execute()handshake loop so AICPU keeps dispatching tasks via registers. Enabled withSIMPLER_USE_CHEVRON_LAUNCH=1; default behavior unchanged.The chevron form is required for SIMT because it lowers to
LaunchAscendKernel, which programs the AICore local memory window through the SIMT-specific runtime path.aicore_executemoved from.cppto headerThe chevron mix kernel needs
aicore_executevisible on the same TU it is called from —ld.lld(inside bisheng) cannot resolve.cube/.vectorreferences across separately compiled TUs, andbisheng -c -o file.orejects multiple input sources. So:src/a5/runtime/{tensormap_and_ringbuffer,host_build_graph}/aicore/aicore_executor.cpp→ renamed toaicore_executor.h, signature changed toinline __aicore__.kernel.cppand the newchevron_launch.cppboth#include "aicore_executor.h"and emit their own instantiation. The host SO uses one launch path at a time, so the device-side duplication is benign.build_config.pyexposes each runtime'saicore/directory to the aicore and host targets so both compilers find the header. No Python build-driver plumbing needed.Testing
pip install --no-build-isolation .produceschevron_launch.oandhost_runtime.sonm libhost_runtime.so | grep launch_aicore_chevronshows the symbolSIMPLER_USE_CHEVRON_LAUNCH=1exercises the chevron path; log showsusing chevron (<<<>>>) launch path