Skip to content

[DRAFT] Optimize on lightnling Indexer Triton kernels#624

Draft
leonling-ll wants to merge 1 commit into
ROCm:zain/lightning-indexerfrom
leonling-ll:triton_indexer_opt
Draft

[DRAFT] Optimize on lightnling Indexer Triton kernels#624
leonling-ll wants to merge 1 commit into
ROCm:zain/lightning-indexerfrom
leonling-ll:triton_indexer_opt

Conversation

@leonling-ll

@leonling-ll leonling-ll commented Jun 12, 2026

Copy link
Copy Markdown

This is a DRAFT PR which aim show some optimization reference to the lightenling Indexer Triton kernels proposed in #606.

The main changes are in transformer_engine/jax/triton_extensions/indexer.py
The remained files under standalone_indexer are benchmark scripts mirror to same named files in benchmarks, but remove the dependencies of TE and jax for the ease of testing.

The performance before and after this change (shapes following the benchmark scripts):

Kernel Shape (B, oH, T, S, H, d_i) Baseline Optimized Speedup
_score_reduce_kernel 2, 64, 4096, 4096, 64, 128 49.1 ms / 722 TF 39.22 ms / 904 TF 1.25x
_score_dscores_chunk_kernel 2, 64, 1024, 1024, 64, 128 12.1 ms / 183 TF 9.45 ms / 235 TF 1.28x
_score_topk_kernel* 2, 64, 1024, 1024, 64, 128, k=512 8.2 ms / 271 TF 5.14 ms / 431 TF 1.59x

@leonling-ll leonling-ll marked this pull request as draft June 12, 2026 14:16
@leonling-ll leonling-ll requested a review from Micky774 June 12, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant