Add: a5 chevron launch path for AICore SIMT validation by ChaoZheng109 · Pull Request #734 · hw-native-sys/simpler

ChaoZheng109 · 2026-05-11T02:38:32Z

Summary

Two commits that together unblock SIMT execution on a5:

Fix: set AICore localMemorySize for a5 SIMT launch — define PLATFORM_AICORE_LOCAL_MEMORY_SIZE = 229376 (224 KB) and pass it through rtTaskCfgInfo_t::localMemorySize in the existing rtKernelLaunchWithHandleV2 path.
Add: a5 chevron launch path for AICore SIMT validation — parallel launch path using the bisheng chevron syntax (kernel<<<numBlocks, dynamic_shmem_sz, stream>>>). The kernel body delegates to the same aicore_execute() handshake loop so AICPU keeps dispatching tasks via registers. Enabled with SIMPLER_USE_CHEVRON_LAUNCH=1; default behavior unchanged.

The chevron form is required for SIMT because it lowers to LaunchAscendKernel, which programs the AICore local memory window through the SIMT-specific runtime path.

`aicore_execute` moved from `.cpp` to header

The chevron mix kernel needs aicore_execute visible on the same TU it is called from — ld.lld (inside bisheng) cannot resolve .cube/.vector references across separately compiled TUs, and bisheng -c -o file.o rejects multiple input sources. So:

src/a5/runtime/{tensormap_and_ringbuffer,host_build_graph}/aicore/aicore_executor.cpp → renamed to aicore_executor.h, signature changed to inline __aicore__.
The legacy kernel.cpp and the new chevron_launch.cpp both #include "aicore_executor.h" and emit their own instantiation. The host SO uses one launch path at a time, so the device-side duplication is benign.
build_config.py exposes each runtime's aicore/ directory to the aicore and host targets so both compilers find the header. No Python build-driver plumbing needed.

Testing

Build on a5 hardware (CANN 9.x): pip install --no-build-isolation . produces chevron_launch.o and host_runtime.so
nm libhost_runtime.so | grep launch_aicore_chevron shows the symbol
Existing a5 hardware test still passes without env var
Same test with SIMPLER_USE_CHEVRON_LAUNCH=1 exercises the chevron path; log shows using chevron (<<<>>>) launch path

gemini-code-assist

Code Review

This pull request introduces a new constant PLATFORM_AICORE_LOCAL_MEMORY_SIZE (224 KB) in platform_config.h and updates the DeviceRunner::launch_aicore_kernel function in device_runner.cpp to utilize this constant when configuring the AICore task's local memory size. I have no feedback to provide.

cfg.localMemorySize was left unset in launch_aicore_kernel, leaving the field at zero so the runtime allocated no local memory for the AICore task and SIMT execution failed. Define PLATFORM_AICORE_LOCAL_MEMORY_SIZE (224 KB) in platform_config.h and pass it through rtTaskCfgInfo_t so every a5 AICore launch reserves the required local memory.

Introduce a parallel AICore launch path on a5 that uses the bisheng chevron syntax (`kernel<<<numBlocks, dynamic_shmem_sz, stream>>>`) instead of rtKernelLaunchWithHandleV2. The chevron form is required for SIMT kernels because it lowers to LaunchAscendKernel, which in turn programs the AICore local memory window via the SIMT-specific runtime path. - chevron_launch.cpp: __global__ __aicore__ entry that delegates to the existing runtime aicore_execute() handshake loop, plus an extern "C" host wrapper that calls `aicore_chevron_entry<<<blockDim, PLATFORM_AICORE_LOCAL_MEMORY_SIZE, stream>>>(runtime)`. Compiled with `bisheng --asc-aicore-lang --npu-arch=dav-c310` into a host-linkable .o whose .ascend.kernel.* section embeds the AICore ELF. - host/CMakeLists.txt: add a custom command that runs bisheng on chevron_launch.cpp and links the resulting object into host_runtime.so. - device_runner.cpp: env-gated branch in launch_aicore_kernel — `SIMPLER_USE_CHEVRON_LAUNCH=1` selects the chevron path; default behavior unchanged. Refactor aicore_execute into a header so chevron_launch.cpp can link cleanly. The chevron mix kernel needs `aicore_execute` visible on the same TU it is called from — the device-side linker (ld.lld inside bisheng) cannot resolve .cube/.vector references across separately compiled TUs, and bisheng `-c -o file.o` rejects multiple inputs. Move the body from runtime/{tensormap_and_ringbuffer,host_build_graph}/ aicore/aicore_executor.cpp into a new aicore_executor.h with `inline __aicore__` linkage. Both the legacy AICore kernel.cpp and chevron_launch.cpp #include the header and emit their own instantiation; the host SO uses one launch path at a time, so the device-side duplication is benign. build_config.py exposes each runtime's aicore/ dir to the aicore and host targets so both compilers find the header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

ChaoZheng109 force-pushed the fix-a5-aicore-local-memory-size branch from 6eeb806 to 0e39aa9 Compare May 11, 2026 07:30

ChaoZheng109 changed the title ~~Fix: set AICore localMemorySize for a5 SIMT launch~~ Add: a5 chevron launch path for AICore SIMT validation May 11, 2026

ChaoZheng109 force-pushed the fix-a5-aicore-local-memory-size branch 9 times, most recently from a5316aa to 7724cb5 Compare May 11, 2026 11:11

ChaoZheng109 force-pushed the fix-a5-aicore-local-memory-size branch from 7724cb5 to 45d8d4c Compare May 11, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add: a5 chevron launch path for AICore SIMT validation#734

Add: a5 chevron launch path for AICore SIMT validation#734
ChaoZheng109 wants to merge 2 commits into
hw-native-sys:mainfrom
ChaoZheng109:fix-a5-aicore-local-memory-size

ChaoZheng109 commented May 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoZheng109 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

aicore_execute moved from .cpp to header

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChaoZheng109 commented May 11, 2026 •

edited

Loading

`aicore_execute` moved from `.cpp` to header