docs: CUDA binary release runbook (sm_120 / Blackwell)#737
Conversation
Captures the end-to-end pipeline for shipping a new pre-compiled CUDA binary release: Makefile bumps in the per-platform binary repos → submodule bump in kspacefirstorder-unified → CI builds against CUDA 13 → tag releases on each binary repo → bump URL pins in kwave/__init__.py. Cross-links #656 (canonical sm_120 issue) and #622 (independent reporter of the same underlying problem) so the open work is tracked in one place. Both issues remain open until the v1.4.0 binary release ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| ### Step 4 — Bump version pins in k-wave-python | ||
|
|
||
| Edit `kwave/__init__.py`: | ||
|
|
||
| ```python | ||
| URL_DICT = { | ||
| "linux": { | ||
| "cuda": [URL_BASE + f"kspaceFirstOrder-CUDA-{PLATFORM}/releases/download/v1.4.0/{EXECUTABLE_PREFIX}CUDA"], | ||
| "omp": [URL_BASE + f"kspaceFirstOrder-OMP-{PLATFORM}/releases/download/v0.4.0/{EXECUTABLE_PREFIX}OMP"], | ||
| }, | ||
| "darwin": { | ||
| "cuda": [], | ||
| "omp": [URL_BASE + f"k-wave-omp-{PLATFORM}/releases/download/v0.4.0/{EXECUTABLE_PREFIX}OMP"], | ||
| }, | ||
| ... | ||
| } | ||
| ``` | ||
|
|
||
| Bump `BINARY_VERSION` if defined elsewhere. Open a PR. CI will re-download against the new URLs. |
There was a problem hiding this comment.
Windows CUDA URL will silently stay on the old binary
The Step 4 snippet shows only "linux" and "darwin" keys (with ... for Windows), and the note "Bump BINARY_VERSION if defined elsewhere" is easy to miss. In kwave/__init__.py, the Windows CUDA URL is generated by get_windows_release_urls("cuda") → PREFIX.format("CUDA", "windows"), where PREFIX embeds BINARY_VERSION (currently "v1.3.0"). The Linux CUDA entry is hardcoded to v1.3.1 and doesn't use BINARY_VERSION, so a developer updating that line might not realize BINARY_VERSION also controls the Windows path. Without an explicit update to BINARY_VERSION = "v1.4.0", Windows users would still download the pre-sm_120 binary after the release.
| Add `sm_100`, `sm_120`, and the forward-compatible PTX `compute_120` entry to `Makefile` on the `cuda-12-support` branch of each repo. | ||
|
|
||
| - **Linux**: [waltsims/kspaceFirstOrder-CUDA-linux#5](https://github.com/waltsims/kspaceFirstOrder-CUDA-linux/pull/5) (opened against `cuda-12-support`) | ||
| - **Windows**: [waltsims/kspaceFirstOrder-CUDA-windows#1](https://github.com/waltsims/kspaceFirstOrder-CUDA-windows/pull/1) — needs minor style cleanup per [review](https://github.com/waltsims/kspaceFirstOrder-CUDA-windows/pull/1#pullrequestreview-) before merge |
| --notes "Adds sm_100 (Blackwell datacenter) and sm_120 (consumer RTX 50xx) compute capabilities, plus PTX compute_120 for forward compatibility. Requires CUDA 13 runtime to consume the new code paths." | ||
|
|
||
| # kspaceFirstOrder-CUDA-windows | ||
| gh release create v1.4.0 ./kspaceFirstOrder-CUDA.exe ./*.dll \ |
There was a problem hiding this comment.
./*.dll glob may pick up unintended files
kwave/__init__.py enumerates exactly 10 DLLs in WINDOWS_DLLS. Using ./*.dll in the gh release create command will attach every .dll present in the working directory at release time, which could include test or build-intermediate DLLs. Listing the same explicit filenames (or at least documenting they should match WINDOWS_DLLS) would be safer and self-documenting.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #737 +/- ##
==========================================
+ Coverage 74.82% 76.62% +1.79%
==========================================
Files 56 57 +1
Lines 8095 8761 +666
Branches 1577 1854 +277
==========================================
+ Hits 6057 6713 +656
+ Misses 1422 1406 -16
- Partials 616 642 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Bump CUDA submodules to sm_120 PR HEADs for Blackwell CI run Submodule SHA bumps: - repos/kspaceFirstOrder-cuda-linux: da4e013 -> 65fbec6 - repos/kspaceFirstOrder-cuda-windows: 319fec6 -> e3e2404 Both new SHAs are the HEAD of the still-unmerged sm_120 PR branches. Pinning to PR HEADs lets the existing multi-platform CI matrix build against the sm_120-capable Makefiles to validate the Blackwell binaries are produced before the upstream PRs merge. Once both upstream PRs land, re-bump to the merge commits on `cuda-12-support` so we are not pinning detached refs. Companion documentation: waltsims/k-wave-python#737. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Re-bump cuda-linux submodule SHA to 7e887cf (gate sm_120 on CUDA 13+) Previous SHA (65fbec6) added sm_100/sm_120/PTX compute_120 unconditionally, which makes the CUDA 12.2.0 leg of this CI matrix fail with: nvcc fatal: Unsupported gpu architecture 'compute_100' The new SHA 7e887cf wraps those three lines in a Makefile `ifeq` guarded on `nvcc --version` major >= 13, so the 12.2 leg builds the original sm_75..sm_90a list and the 13.0 leg additionally builds the Blackwell arches. Pre-existing CI failures on this branch (windows-cuda CUDA 10.2.props not found, windows-openmp hdf5_hl.h not found) are unrelated to this submodule bump and reproduce on the base branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Re-bump cuda-windows submodule SHA to 34480ea (vcxproj CUDA 13 fix) Previous SHA (e3e2404) only updated the Makefile, but the unified CI uses the .vcxproj via MSBuild and that hardcoded "CUDA 10.2.props" / "CUDA 10.2.targets" imports — causing the windows-cuda leg to fail with MSB4019. New SHA 34480ea fixes the .vcxproj to import CUDA 13.0.props/.targets (which the CI's "Register CUDA MSBuild customizations" step now copies into VCTargetsPath/BuildCustomizations from CUDA 13's installation), plus replaces the stale Release|x64 <CodeGeneration> list (which had compute_30..sm_75 from the CUDA 10 era) with the modern set including sm_86 and Blackwell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Re-pin cuda-windows submodule to main + bump to PR #1 merge commit waltsims/kspaceFirstOrder-CUDA-windows#1 (the sm_120 / vcxproj CUDA 13 work) merged today into main (commit a6d6919), and the upstream cuda-12-support branch is no longer the long-lived integration branch. Two coordinated changes: - .gitmodules: switch cuda-windows from `branch = cuda-12-support` to `branch = main` so `git submodule update --remote` picks up new commits from the right ref going forward. - Bump the recorded SHA from 34480ea (which was the PR-head commit on the now-deleted bump-CUDA-sm-suppoprt-to-120 branch) to a6d6919 (the merge commit, reachable from main). cuda-linux still pins `branch = cuda-12-support` because waltsims/ kspaceFirstOrder-CUDA-linux#5 hasn't merged yet. That submodule entry will be flipped to main once the Linux PR lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Re-bump CUDA submodules to PR merge commits on main Both Linux and Windows sm_120 work has now landed on main: - repos/kspaceFirstOrder-cuda-linux: 7e887cf -> 072ec8f (PR #4 "Update cufft error enumeration." merged, bringing the cuda-12-support branch into main: includes sm_120 Makefile gating + cuFFT #ifdef restoration) - repos/kspaceFirstOrder-cuda-windows: a6d6919 -> e8661b1 (PR #2 "Cuda 12 support" merged, bringing v143 toolset, CUDA 12.2/ 13.0 fallback Imports, VerifyCudaCustomizations + VerifyVcpkgRoot targets, and ResolvedVcpkgRoot property) Also flips cuda-linux's .gitmodules pin from `branch = cuda-12-support` to `branch = main` since that branch is no longer the long-lived integration branch (matches the cuda-windows side, which was flipped to main in commit 84f1a88). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Flip cuda-linux submodule pin from cuda-12-support to main PR #4 merged cuda-12-support into main on the upstream cuda-linux repo, so the cuda-12-support branch is no longer the long-lived integration target. Tracks main same as cuda-windows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…po releases Adds release-on-tag.yml (named for the eventual use case but currently only fires on workflow_dispatch). When manually triggered with a version string (e.g. v1.4.0), the workflow runs the existing multi-platform CI matrix (called via workflow_call), downloads each artifact, and uploads it to a release of that version name on the corresponding per-platform binary repository. Two safety features deliberately built in: - Trigger is workflow_dispatch only (no automatic firing on tag push) so a human always confirms the cross-repo publish operation. - A BINARY_RELEASE_TOKEN repo secret is required (PAT or App token with contents:write on the five target repos). The workflow checks for it and fails loudly with a clear error rather than silently no-op'ing or trying default GITHUB_TOKEN. Tiny companion change to ci-multi-platform.yml: add workflow_call to its on: triggers so it can be reused without duplicating the build matrix. No effect on push/PR-triggered runs. Future hardening to consider: - Switch trigger to push: tags: ['v*'] once the workflow has been proven on a couple of manual runs and the secret is known-good. - Add a dry-run mode that builds artifacts but does not publish. Companion docs: waltsims/k-wave-python#737. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…po releases Adds release-on-tag.yml (named for the eventual use case but currently only fires on workflow_dispatch). When manually triggered with a version string (e.g. v1.4.0), the workflow runs the existing multi-platform CI matrix (called via workflow_call), downloads each artifact, and uploads it to a release of that version name on the corresponding per-platform binary repository. Two safety features deliberately built in: - Trigger is workflow_dispatch only (no automatic firing on tag push) so a human always confirms the cross-repo publish operation. - A BINARY_RELEASE_TOKEN repo secret is required (PAT or App token with contents:write on the five target repos). The workflow checks for it and fails loudly with a clear error rather than silently no-op'ing or trying default GITHUB_TOKEN. Tiny companion change to ci-multi-platform.yml: add workflow_call to its on: triggers so it can be reused without duplicating the build matrix. No effect on push/PR-triggered runs. Future hardening to consider: - Switch trigger to push: tags: ['v*'] once the workflow has been proven on a couple of manual runs and the secret is known-good. - Add a dry-run mode that builds artifacts but does not publish. Companion docs: waltsims/k-wave-python#737. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…po releases (#6) Adds release-on-tag.yml (named for the eventual use case but currently only fires on workflow_dispatch). When manually triggered with a version string (e.g. v1.4.0), the workflow runs the existing multi-platform CI matrix (called via workflow_call), downloads each artifact, and uploads it to a release of that version name on the corresponding per-platform binary repository. Two safety features deliberately built in: - Trigger is workflow_dispatch only (no automatic firing on tag push) so a human always confirms the cross-repo publish operation. - A BINARY_RELEASE_TOKEN repo secret is required (PAT or App token with contents:write on the five target repos). The workflow checks for it and fails loudly with a clear error rather than silently no-op'ing or trying default GITHUB_TOKEN. Tiny companion change to ci-multi-platform.yml: add workflow_call to its on: triggers so it can be reused without duplicating the build matrix. No effect on push/PR-triggered runs. Future hardening to consider: - Switch trigger to push: tags: ['v*'] once the workflow has been proven on a couple of manual runs and the secret is known-good. - Add a dry-run mode that builds artifacts but does not publish. Companion docs: waltsims/k-wave-python#737. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing as bloat: this runbook documents a manual release flow for the 5-mirror architecture, which is being deprecated via mirror consolidation (tracked in waltsims/kspacefirstorder-unified#13). Most content is one-time-specific to v1.4.0 (the cuda-12-support branch, the specific PR/issue numbers, the HDF5 ABI mismatch all close with v1.4.0). The architecture diagram + cmake approach are already captured in kspacefirstorder-unified/plans/. Once consolidation lands, the release flow is a single tag on unified — no runbook needed. |
Summary
Adds `docs/development/cuda_binary_release.md` — an end-to-end runbook for publishing a new pre-compiled CUDA binary release. Motivating case is NVIDIA Blackwell (sm_120, RTX 50xx) support, which is blocking real users.
Why now
Two open issues report the same underlying problem (the bundled CUDA binary doesn't include sm_120):
I've already started the upstream work:
The runbook documents the remaining steps so this can be picked up and finished without re-deriving the architecture.
Open work checklist (from the runbook)
Test plan
Docs-only PR. The runbook itself will be exercised by the actual binary release work above.
🤖 Generated with Claude Code
Greptile Summary
Adds
docs/development/cuda_binary_release.md, a docs-only runbook that walks through the five-step process for publishing CUDA binaries with NVIDIA Blackwell (sm_120 / RTX 50xx) support, referencing upstream Makefile PRs and downstreamkwave/__init__.pyURL pins.\"linux\"/\"darwin\"URL updates and leaves Windows as..., while the actual__init__.pyderives Windows CUDA URLs fromBINARY_VERSIONviaget_windows_release_urls— without an explicit call to updateBINARY_VERSION = \"v1.4.0\", Windows users would still receive the old pre-sm_120 binary.#pullrequestreview-) is incomplete and will 404, and the./*.dllglob in the Windowsgh release createcommand could attach unintended build artifacts.Confidence Score: 3/5
Safe to merge as documentation, but the runbook as written would leave Windows users without sm_120 support if followed literally.
The Step 4 snippet omits the Windows URL path entirely and treats the BINARY_VERSION bump as an afterthought. In kwave/init.py, the Windows CUDA URL is entirely derived from BINARY_VERSION through get_windows_release_urls, while the Linux CUDA URL is a separate hardcoded string that does not use BINARY_VERSION at all. A developer following the runbook would update the Linux/Darwin entries directly and might never realize BINARY_VERSION must also be bumped to propagate the change to Windows.
docs/development/cuda_binary_release.md — Step 4 needs an explicit BINARY_VERSION update instruction and should show the Windows URL generation path.
Important Files Changed
BINARY_VERSIONupdate needed to cover the Windows CUDA URL, a broken review link exists in Step 1, and the./*.dllglob in the Windows release command may attach unintended files.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Step 1: Merge Makefile PRs] --> B[Step 2: Bump submodule SHAs] B --> C[CI builds matrix] C --> D{Pick CUDA 13.0.0 artifacts} D --> E[Step 3: Tag v1.4.0 releases] E --> F[Step 4: Update kwave/__init__.py] F --> G[Open k-wave-python PR] G --> H[Step 5: Verify on Blackwell GPU]Reviews (1): Last reviewed commit: "docs: add CUDA binary release runbook fo..." | Re-trigger Greptile