Migrate qBraid Target in CUDAQ to qBraid v2#5
Open
TheGupta2012 wants to merge 133 commits intomainfrom
Open
Conversation
Member
Author
|
@ryanhill1 I discarded the commit in current |
a72236c to
46593c0
Compare
* working implementation using openQasm * modified and added test files(incomplete) * fix emulate command alignment * update polling + format * update polling interval and make code more readable * remove ionq fields from target-arguments * fix formatting * Add qBraid mock python server for testing Signed-off-by: Ryan Hill <ryanjh88@gmail.com> * Update __init__.py Signed-off-by: Ryan Hill <ryanjh88@gmail.com> * QbraidTester running correctly * added documentation for qbraid --------- Signed-off-by: Ryan Hill <ryanjh88@gmail.com> Co-authored-by: feelerx <superfeelerxx@gmail.com>
46593c0 to
3b0a1e4
Compare
The deployments cleanup job only removes `default` environment deployments but not `ghcr-ci` ones. Every CI run creates multiple ghcr-ci deployments via dev_environment.yml, leaving "copy-pr-bot temporarily deployed to ghcr-ci — Inactive" entries cluttering PR timelines. Extend the existing cleanup loop to also delete ghcr-ci deployments. The production `ghcr-deployment` environment used by deployments.yml is not affected. Signed-off-by: mitchdz <mitch_dz@hotmail.com>
…DIA#4320) Fixes NVIDIA#4319. The basis-driven pattern selection in `decomposition{basis=...}` failed to select decomposition chains involving `SToR1` and `TToR1` because these patterns were registered with `s(1)`/`t(1)` metadata (controlled-only) despite their implementations handling any control count. The graph lookup in `DecompositionPatternSelection.cpp` used exact hash matching on `OperatorInfo`, so an unbounded `(n)` entry could not match a concrete control count. This left `CCX` gates undecomposed when `t` was not directly in the target basis. The fix updates `SToR1`/`TToR1`/`R1ToU3`/`U3ToRotations` registration to `(n)` and adds `OperatorInfo::matches()` for wildcard control count matching in `incomingPatterns()` and `findGateDist()`. Signed-off-by: Thomas Alexander <talexander@nvidia.com>
…4332) Signed-off-by: Adam Geller <adgeller@nvidia.com>
…IA#4330) This updates the unittest so that cudaq::state objects are used to capture and pass state information (amplitude vectors) into kernels. The new API contract is that this sort of state information shall be passed into CUDA-Q kernels as state objects and not raw vectors. --------- Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Migrating Python bindings from pybind11 to nanobind - Adding nanobind as a submodule - Creating NanobindAdaptors for MLIR C-API type casters - Keeping pybind11 only for upstream MLIR Python extensions - Converting all `*_py.cpp ` binding files, headers, CUDAQuantumExtension.cpp, pyDynamics, interop library, and PYSCF plugin to nanobind --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>
I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 9cd62cf I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 3b0a1e4 I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 1a24c66 Signed-off-by: Harshit <harshit.11235@gmail.com>
I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 925ae39 I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 41fe248 I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: d74243d Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>
This is a rewrite of NVIDIA#4329, using a stateless class with static functions rather than a builder pattern. Signed-off-by: Luca Mondada <luca@mondada.net>
Fixes NVIDIA#4343. Signed-off-by: Sachin Pisal <spisal@nvidia.com>
…VIDIA#4335) When a kernel returns a vector (for `cudaq::run`), we insert `__nvqpp_vectorCopyCtor` which performs a `malloc` + `memcpy` to copy stack data to the heap. After `AggressiveInlining` and `ReturnToOutputLog`, the heap copy becomes dead but remains in the IR. This is normally cleaned up by LLVM's optimization passes, but on code paths that emit MLIR directly (e.g., `nop` for backends that consume `quake`), these dead allocations persist and get sent to the server. This PR adds a new MLIR pass, `eliminate-dead-heap-copy`, that redirects reads from the `malloc`'d buffer to the original `memcpy` source (the stack `alloca`), then erases the dead `malloc`, `memcpy`, and `cc.stdvec_init` ops. This can be added on-demand via target yml file. Update the mock server test to demonstrate that. --------- Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Updating cuquantum version to 26.03.1 --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>
## Background
`cudaq.sample` with `set_target("braket")` fails on v0.14.0+ with:
RuntimeError: [line 10] cannot declare bit register. Only 1 bit
register(s) is/are supported
Amazon Braket's OpenQASM 2.0 parser enforces exactly one classical
register per circuit. The payload CUDA-Q emits for the Bell-state
reproducer in NVIDIA#4341 contains two.
## Root cause
`addPipelineTranslateToOpenQASM` (`lib/Optimizer/CodeGen/Pipelines.cpp`)
was refactored in NVIDIA#3693 to run `ExpandMeasurements` unconditionally. For
`qasm2` backends that run `combine-measurements` in the mid pipeline
(Braket, Scaleway, Quantum Machines), the sequence becomes:
1. Mid pipeline: `combine-measurements` merges per-qubit measurements
into a single `quake.mz` on the whole `!quake.veq` - the intent being
"emit one `creg` spanning all qubits".
2. Translate pipeline: `ExpandMeasurements` re-expands the combined `mz`
into one `mz` per qubit, then loop-unrolls.
3. OpenQASM2.0 emitter: writes one `creg` declaration per `mz`.
Target-specific YAML intent is silently overridden in the translate
pipeline.
## Fix
1. `lib/Optimizer/CodeGen/Pipelines.cpp`: revert
`addPipelineTranslateToOpenQASM` to the thin cleanup it was before
NVIDIA#3693 . Each backend's YAML now drives measurement expansion.
2. `infleqtion.yml` and `tii.yml`: add `jit-high-level-pipeline:
"expand-measurements"`. These targets previously depended on the
unconditional expansion to get one `creg` per measured qubit; the
explicit entry preserves that behavior.
3. `test/Translate/OpenQASM/basic.qke` and
`test/Translate/openqasm2_*.cpp`: update CHECK lines to match the
single-`creg` output for a vector `mz` (which is what the emitter
produces after the fix).
## Impact
| Backend | creg count for `mz(qvector(n))` |
|---|---|
| Braket, Scaleway, Quantum Machines | 1 (single `creg` of size n) |
| Infleqtion, TII | n (preserved via new YAML entry) |
| Quantinuum, IQM, OQC, Anyon, QCI | n (unchanged; already had
`expand-measurements` in YAML) |
The change is scoped to `addPipelineTranslateToOpenQASM`, which only
runs for `codegen-emission: qasm2`. Simulators and non-OpenQASM2.0
backends are unaffected.
## Testing
- `ninja check-cudaq-mlir` passes with the updated CHECK lines.
- `cudaq.translate(kernel, format="openqasm2")` under `set_target(...)`
for Braket, Scaleway, Infleqtion, TII — creg counts match the matrix
above.
- Reproducer from NVIDIA#4341 now emits exactly the "expected" OpenQASM2.0
shown in the issue: `creg var3[2]; measure var0 -> var3;`.
- Manually tested against real servers: `test_braket.py`,
`test_Infleqtion.py`, `test_tii.py`, `test_scaleway.py`.
## Follow-up
An automated local test set up for OpenQASM payload validator will be
added in a separate PR.
Fixes NVIDIA#4341.
---------
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
…frastructure (NVIDIA#4349) ## Summary Reverts PRs - NVIDIA#3800, - NVIDIA#4204, - NVIDIA#4208, - NVIDIA#4266, - NVIDIA#4267. Following an architecture alignment meeting (Apr 17), we are changing direction on how measurement results are represented in CUDA-Q. The `measure_result` standalone class and `!quake.measurements<N>` Quake type introduced by these PRs are being replaced by a new `measure_handle` approach with fundamentally different semantics. This revert restores: * `measure_result` as a typedef to bool (compiler mode) * Multi-qubit mz returning `!cc.stdvec<!quake.measure>` * Removes `!quake.measurements<N>` type, `quake.get_measure`, `quake.measurements_size` ops * Removes `quake.relax_size` extension for measurements * Removes `QIRResultArrayCreate` / `QIRResultArrayGetElementPtr1d` QIR intrinsics * Removes 8 test files added by the reverted PRs ### Forward direction (follow-up PRs): New `measure_handle` Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Skipping identity terms when building the Pauli word and coefficient lists passed to the Krylov kernel. Controlled exp_pauli does not handle the identity terms. We add their contribution back when assembling the Hamiltonian matrix. Fixes https://github.com/NVIDIA/cuda-quantum/actions/runs/24584888146/job/71904057326#step:5:1955 Signed-off-by: Sachin Pisal <spisal@nvidia.com>
…rs (NVIDIA#4351) Fixed the `test_state_mps.py - AttributeError: 'list' object has no attribute 'dtype'` errors in https://github.com/NVIDIA/cuda-quantum/actions/runs/24624569814/job/72005503960#step:7:43857 The fix for the rest of the failure (`RuntimeError: invalid value`) will come in a separate PR. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
This PR removes argument synthesis by default for Python kernels run on the local simulator, instead directly invoking them with the arguments (currently, by constructing a message buffer through `.argsCreator` which is passed to the kernel's `thunk`). This only affects entry point kernels. Benefits: 1. This makes it unnecessary to recompile kernels for different arguments in this setting, simplifying the `reuse_compiler_artifacts` logic. 2. It aligns the python local simulation path more closely with C++, where arguments are similarly not synthesized. 3. As a result of 1 and 2, it is a useful and important first step towards an inter-launch caching strategy for python. --------- Signed-off-by: Adam Geller <adgeller@nvidia.com> Signed-off-by: Luca Mondada <luca@mondada.net> Co-authored-by: Luca Mondada <luca@mondada.net>
Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>
Signed-off-by: Adam Geller <adgeller@nvidia.com>
…VIDIA#4450) Signed-off-by: Adam Geller <adgeller@nvidia.com>
cuda 12.6 doesn't work with clang++ 22.1. Re-enable gcc12 toolchain support to work around this. --------- Signed-off-by: Adam Geller <adgeller@nvidia.com> Signed-off-by: Mitchell <mitchdz@plasticmemories.xyz> Co-authored-by: Mitchell <mitch_dz@hotmail.com> Co-authored-by: Mitchell <mitchdz@plasticmemories.xyz>
While building flang in gcc12, potential OOM errors persisted. This update uses beefier runners with 64GB of RAM, and also restricts ninja to 8 concurrent threads. Signed-off-by: mdzurick <mitch_dz@hotmail.com>
Signed-off-by: Adam Geller <adgeller@nvidia.com>
- If the launched server exits before becoming reachable, waitpid(WNOHANG) breaks out of the 50s ping loop so we move to the next port immediately, instead of throwing `RuntimeError: No usable ports available` only after few minutes. - Dropping static from the mt19937 so seed_offset is honoured on every construction, not just the first one. Fixes: ``` @pytest.fixture(scope="session", autouse=True) def startUpMockServer(): > cudaq.set_target("remote-mqpu", auto_launch=str(num_qpus)) E RuntimeError: No usable ports available tmp/tests/remote/test_remote_platform.py:71: RuntimeError ``` https://github.com/NVIDIA/cuda-quantum/actions/runs/25393104770/job/74490650292#step:7:1282 Signed-off-by: Sachin Pisal <spisal@nvidia.com>
Signed-off-by: Adam Geller <adgeller@nvidia.com>
Signed-off-by: Adam Geller <adgeller@nvidia.com>
…4459) - `move_artifacts` in `scripts/migrate_assets.sh` emitted an `rm` + `rmdir -p` pair per file. With LLVM 22 (~7k files) bundled into `cudaq/lib/llvm`, the generated `uninstall.sh` ballooned to a ~15k-line `if $continue; then ... fi` body, causing bash to segfault mid-uninstall in the "Additional validation (MPI and uninstall)" CI step on ubuntu/debian/fedora/redhat. - Capture top-level entries in `$1` before the move and emit one `rm -rf -- "$2/<entry>"` per entry. Trailing `rm -rf "$CUDA_QUANTUM_PATH"` is unchanged. Co-authored by: AI Signed-off-by: Sachin Pisal <spisal@nvidia.com>
…4463) Tests will soon be removed due to NVIDIA#4276 anyway. Signed-off-by: Adam Geller <adgeller@nvidia.com>
Signed-off-by: Adam Geller <adgeller@nvidia.com> Signed-off-by: Adam T. Geller <adgeller@nvidia.com>
Stacks on top of NVIDIA#4392 This PR introduces a `KernelArgs` type that stores kernel arguments EITHER as 'packed' arguments in a contiguous memory buffer OR as a vector of void*. `KernelArgs` also supports storing both representations, which is used by `hybridLaunchKernel`. Using this type allows us to make the signatures of the launch endpoints more homogeneous, effectively hiding the different conventions as implementation details that only need to be handled within the function implementations. Resulting signatures: ```c++ [[nodiscard]] virtual KernelThunkResultType launchKernel(const std::string &name, KernelThunkType kernelFunc, KernelArgs args); [[nodiscard]] virtual KernelThunkResultType launchModule(const CompiledModule &compiled, KernelArgs args); ``` --------- Signed-off-by: Luca Mondada <luca@mondada.net>
Port to llvm-22. --------- Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
…VIDIA#4405) ## Summary * Extend `--expand-measurements` to scalarize `quake.mz`/`mx`/`my` `%veq -> !cc.stdvec<!cc.measure_handle>`. * Builds on NVIDIA#4404 * No source-language or runtime API change. ## Motivation The pass previously hardcoded `!quake.measure` for per-element output and only handled `quake.discriminate` consumers of the vector result. Handle-typed vector measurements require per-element `!cc.measure_handle` output and can flow to non-discriminate consumers (returns, stores, calls), neither of which the legacy `vector<bool>`-only rewrite supported. ## What Changed - `ExpandRewritePattern` tracks the input stdvec's element type and emits per-element measurements of the matching type (`!quake.measure` or `!cc.measure_handle`). - Consumers are classified as discriminate vs non-discriminate. Handle inputs allocate each buffer only when its consumer class is present; legacy `!cc.stdvec<!quake.measure>` inputs always allocate the i1 buffer so existing AST-Quake CHECK lines stay stable. - Original op is replaced via `replaceOp` (atomic) instead of `eraseOp`, so partial conversion does not try to re-legalize downstream `func.return` consumers. - New lit test `test/Transforms/expand_measurements_handle.qke` covers handle stdvec with each consumer class (return-only, discriminate-only, mixed, `cc.store`), mixed `ref + veq` operands, and `mx`/`my` parity. --------- Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
NVIDIA#2608 is already fixed in main, but this adds regression tests for the future. Signed-off-by: mdzurick <mitch_dz@hotmail.com>
…IDIA#4474) Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
## Summary macOS 26 (Tahoe) SDK removed `__has_builtin` guards from libc++ headers (e.g., `__builtin_ctzg`, `__is_nothrow_convertible`), making them incompatible with LLVM 16's clang. Additionally, Apple Clang 21 introduced new warnings that break the build under `-Werror`. This PR fixes both issues so that CUDA-Q builds and passes all tests on macOS 26 with Apple Clang 21. ### LLVM libc++ runtimes (SDK 26+) - `set_env_defaults.sh`: Auto-detect active SDK version via `xcrun --show-sdk-version`. When SDK >= 26, include `runtimes` in `LLVM_PROJECTS` to build LLVM's own libc++. - `cudaq-quake.cpp`: Use `-nostdinc++` to suppress SDK C++ headers when LLVM's libc++ is available, while keeping `-isysroot` for C standard headers. - `nvq++.in`: Add `-Wl,-syslibroot` for `ld64.lld` to find `libSystem` in the SDK. Add `-lc++abi` during final link when LLVM's `libc++abi` is present. - `build_llvm.sh`: Do not bake SDK sysroot paths into `clang++.cfg` on macOS (they become stale after Xcode updates). Sysroot is resolved at runtime by `nvq++`. ### Relocatable linking - `nvq++.in`: LLVM 16's `ld64.lld` does not implement `-r` (relocatable linking). Probe the configured linker and fall back to system `ld` for the object merge step if needed. - `device_call.cpp`: Update FileCheck pattern to match both Apple `ld` and `ld64.lld` error formats. ### Apple Clang 21 warnings - `CMakeLists.txt`: Add `-Wno-character-conversion` for gtest (third-party). - `server_impl/CMakeLists.txt`: Add `-Wno-deprecated-literal-operator` for Crow (third-party). - `cudaq.cpp`: Fix `-Wnontrivial-memcall` on `memset` with `std::vector<bool>`. - `vqe_tester.cpp`: Add missing `#pragma` to suppress `-Wdeprecated-declarations` for backward-compatibility test (matching existing pattern in `builder_tester.cpp`). --------- Signed-off-by: ikkoham <ikkoham@users.noreply.github.com> Signed-off-by: Thomas Alexander <talexander@nvidia.com> Co-authored-by: Thomas Alexander <talexander@nvidia.com>
The gcc-12 -Wrestrict workaround in runtime/common/CMakeLists.txt and realtime/unittests/CMakeLists.txt was added as a PUBLIC compile option, so cmake propagates it through interface inheritance into nvcc command lines on consumer CUDA targets. nvcc forwards it to its host gcc, which may be a different gcc than CMAKE_CXX_COMPILER (e.g. gcc-12 vs gcc-13) and may not recognize the flag. Wrap it in a $<$<COMPILE_LANGUAGE:CXX>:..> generator expression so it only appears on CXX command lines. Signed-off-by: Chuck Ketcham <cketcham@nvidia.com>
…A#4432) Follow ups as noted by @schweitzpgi and @boschmitt. --------- Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Similar to NVIDIA#4477, we need this fix for the `gtest` target, which will be used to compile `get_state_tester.cu`. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
As we prepare to surface qpu headers in user code, `cudaq.h` is become
significantly slower to parse in a prototyping sandbox. This PR
addresses a slowdown introduced by logger.h as can be seen in the
profile below.
The compiled code is:
```cpp
#include <cudaq.h>
int main() {
return 0;
}
```
The compile time is 3.6s.
```
time clang++ cudaq_inc.cpp -std=c++20 -I ~/cudaq/cq2/install/cudaq/include
real 0m3.603s
```
The profile shows logger.h takes 441ms of parse time.
<img width="689" height="519" alt="logger-profile-1"
src="https://github.com/user-attachments/assets/2a677887-0ef9-432c-a07a-6b5f3c8aabed"
/>
With this patch, the compilation time becomes 2.9s
```
time clang++ cudaq_inc.cpp -std=c++20 -I ~/cudaq/cq2/install/cudaq/include
real 0m2.936s
```
The profile show logger.h only taking 7ms.
<img width="676" height="465" alt="logger-profile-2"
src="https://github.com/user-attachments/assets/9034acc9-829e-4960-97a8-f433c41107de"
/>
This is achieved by replacing the `std::variant` with a `FormatArgument`
which only stores a pointer and an out-of-line appending callback. The
callback is instantiated in logger.cpp.
---------
Signed-off-by: Renaud Kauffmann <rkauffmann@nvidia.com>
Removing sub-skills placeholder as those can be added back when ready Signed-off-by: Sachin Pisal <spisal@nvidia.com>
## Summary
- Bump `DEFAULT_VERSION` in `TiiServerHelper.cpp` from `0.2.2` to
`0.2.4`
## Motivation
- The TII server (`q-cloud.tii.ae`) now enforces a minimum `qibo-client`
version of 0.2.3, returning HTTP 426 (Upgrade Required) for older
clients.
- This breaks both C++ and Python TII targets with:
```
{"detail":"Outdated client version: 0.2.2. Please upgrade to qibo-client >= 0.2.3."}
```
## Testing
- Verified locally against `q-cloud.tii.ae` that requests succeed with
version `0.2.4`.
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
These changes make sure that inlined functions have their scopes preserved through the inlining process. Doing this facilitates keeping track of live ranges from variables declared in the called function's body, which allows for more precise allocation and deallocation of qubits in simulation, etc. Add python regression test. --------- Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
This will fix the errors we are seeing since the apt proxy cache is getting rate limited. Potentially enable in the future again. Signed-off-by: mitchdz <mitch_dz@hotmail.com>
NOTE: This is a re-post of NVIDIA#4413, which I merged into the wrong branch! It's already been reviewed, discussed and approved. --- Stacks on top of NVIDIA#4398 PR NVIDIA#4398 made the first of two steps towards homogenizing all launch endpoints into a same function signature by introducing `KernelArgs`. This PR introduces a `SourceModule` type that stores the kernels themselves in different formats depending on the host language: - a C++ kernel is stored as a name + function pointer. The name is used in the runtime to retrieve the Quake representation of the kernel if required. The function pointer points to the function that was compiled by nvq++ and is used for local simulation. - a Python kernel is stored as a name + MLIR ModuleOp. Both local and remote executions use the MLIR ModuleOp as the source of truth for the kernel definition. If the kernel launch must be executed locally, the MLIR will be compiled and JITed, otherwise it will be compiled and submitted to the remote endpoint. Using this type, we can effectively hide ALL differences between the host languages and the various launching conventions behind a homogeneous API. Resulting signatures: ```c++ [[nodiscard]] virtual KernelThunkResultType launchKernel(const SourceModule &src, KernelArgs args); [[nodiscard]] virtual KernelThunkResultType launchModule(const CompiledModule &compiled, KernelArgs args); [[nodiscard]] virtual CompiledModule compileModule(const SourceModule &src, KernelArgs args, bool isEntryPoint); ``` Note that for Python, the current execution is broken up into `compileModule` -> `launchModule`, whereas for C++, both compile and launch steps are still in one -- hence the signature difference. Changing the C++ launch path to mirror the Python one is planned upcoming work. Signed-off-by: Luca Mondada <luca@mondada.net>
Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add the updates for migrating cudaq
qbraidtarget to use qbraid platform v2. Include updates for jobs, main api update, auth updates for api-key auth, etc.