Skip to content

CUDA: Fix ssm_scan_f32 data-races#24360

Merged
ORippler merged 3 commits into
ggml-org:masterfrom
ORippler:osimons/fix_ssm_scan_f32
Jun 10, 2026
Merged

CUDA: Fix ssm_scan_f32 data-races#24360
ORippler merged 3 commits into
ggml-org:masterfrom
ORippler:osimons/fix_ssm_scan_f32

Conversation

@ORippler

@ORippler ORippler commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Overview

Add required __synchthreads() to avoid data-races in ssm_scan_f32. Also remove unused smem from the kernel.

Additional information

Should supersede #23983 as it fixes the underlying issues (which are data-races, where 4fbecf7 applies to HIP/MUSA backends as well). For more details on the races, refer the individual commit messages.

Should resolve sporadic failures of CUDA CI such as https://github.com/ggml-org/llama.cpp/actions/runs/27192383880/job/80275487186?pr=24331 (verified this on a local DGX Spark)

Requirements

ORippler added 3 commits June 9, 2026 13:57
Could also double-buffer, but alternative is to simply ensure all
threads have read smem* before writing to it again in the next loop
iteration
@ORippler ORippler requested a review from a team as a code owner June 9, 2026 13:44
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 9, 2026

@IMbackK IMbackK left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static analysis looks good, Can reproduce this problem and fix with gfx1100

@ORippler ORippler requested a review from gaugarg-nv June 10, 2026 08:24
@gaugarg-nv

Copy link
Copy Markdown
Contributor

Change looks good to me. Does it have any perf impact?

@ORippler

Copy link
Copy Markdown
Collaborator Author

Does it have any perf impact?

2.7% slowdown on the kernel on a B6000. Can always go for double-buffering should it turn out to affect E2E perf significantly

@ORippler ORippler merged commit fb83cc9 into ggml-org:master Jun 10, 2026
22 checks passed
@ORippler ORippler deleted the osimons/fix_ssm_scan_f32 branch June 10, 2026 12:27
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (329 commits)
  vendor : update LibreSSL to 4.3.2 (ggml-org#24397)
  Remove padding and multiple D2D copies for MTP (ggml-org#24086)
  chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377)
  CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
  ci : bump komac version (ggml-org#24396)
  speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253)
  webui: implement pinned conversations support (ggml-org#21387)
  graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357)
  ci : fix windows release (ggml-org#24369)
  ui: add opt-in run_javascript frontend tool (ggml-org#24244)
  mtmd: build_vit batching (ggml-org#24352)
  vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287)
  vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123)
  ui: Fix excessive style recalculation on hover (ggml-org#24243)
  mtmd: refactor video subproc handling (ggml-org#24316)
  server: log prompts to directory (ggml-org#22031)
  ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158)
  ggml : add GGML_OP_COL2IM_1D (ggml-org#24206)
  server : do not clear slots without unified KV cache (ggml-org#24190)
  models : fix plamo2 attention_key/value_length regression (ggml-org#24317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants