Skip to content

Batch solve#81

Open
govindchari wants to merge 2 commits into
mainfrom
gc/batch-v2
Open

Batch solve#81
govindchari wants to merge 2 commits into
mainfrom
gc/batch-v2

Conversation

@govindchari

@govindchari govindchari commented May 26, 2026

Copy link
Copy Markdown
Member

Two benchmarks were run against the CUDA/cuDSS backend:

  1. Small SOCP batch test: 100 perturbed variants of cvxpy_qoco.socp_0, a tiny SOCP with n = 3, m = 3, p = 2, and one SOC. Batched cuDSS reduced factor+solve time from 0.057893s to
    0.001314s, about 44x faster inside cuDSS, but end-to-end solve time was effectively unchanged: 3.508s serial vs 3.519s batch.
  2. PDG batch test: 100 variants of a larger PDG problem, perturbing initial-condition entries in b, with n = 2698 and nsoc = 598. Batched cuDSS reduced factor+solve time from 0.862373s to
    0.328448s, about 2.6x faster inside cuDSS, and improved end-to-end solve time modestly: 9.074s serial vs 8.694s batch, about 4.4% faster excluding setup.

Main finding: the cuDSS batch API works and speeds up the linear algebra calls, but full solver speedup is limited because most runtime is still spent in QOCO-side per-item work outside
cuDSS. Batch setup is also currently expensive because it initializes a full QOCOSolver for every batch item.

The QOCO-side work can be batched if the batch API is redesigned around shared problem structure plus batched state arrays. Good candidates:

  • Shared once per batch: dimensions, cone structure, sparsity, transposes, KKT symbolic structure, index maps, cuDSS analysis.
  • Per item but stored contiguously: x, s, y, z, rhs, kktres, WtW, lambda, Ds, objective/residual scalars, status flags.
  • Batched GPU kernels: residual computation, objective, mu, stopping metrics, NT scaling, RHS construction, centering, line search, iterate updates, and solution copy-out.

The biggest wins would likely come from batching the per-iteration loops currently visible around compute_kkt_residual, compute_objective, compute_mu, compute_nt_scaling, and
construct_kkt_aff_rhs in algebra/cuda/cudss_backend.cu:1790. Those are currently dispatched per solver item, even though the items have common dimensions and sparsity.

@github-actions

Copy link
Copy Markdown

Download Benchmark Artifacts

Benchmark Summary

Problems Solved

Dataset Main Solved Diff Solved Main Iters Diff Iters Main IR Iters Diff IR Iters
cutest 43 / 62 43 / 62 1069 1069 2301 2301
misc 1 / 3 1 / 3 15 15 2 2
mm 136 / 138 136 / 138 2615 2615 4420 4420
mpc 63 / 64 63 / 64 480 480 587 587

@github-actions

Copy link
Copy Markdown

Download Benchmark Artifacts

Benchmark Summary

Problems Solved

Dataset Main Solved Diff Solved Main Iters Diff Iters Main IR Iters Diff IR Iters
cutest 43 / 62 43 / 62 1069 1069 2301 2301
misc 1 / 3 1 / 3 15 15 2 2
mm 136 / 138 136 / 138 2615 2615 4420 4420
mpc 63 / 64 63 / 64 480 480 587 587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant