♻️ Refactor RL actions by SDK by flowerthrower · Pull Request #680 · munich-quantum-toolkit/predictor

flowerthrower · 2026-05-12T14:35:34Z

Description

Restructures RL actions into SDK-specific modules for Qiskit, TKET, and BQSKit. Also moves the corresponding wrapper logic closer to each SDK action implementation and keeps compatibility exports for existing imports.

This immensely improves modularity and reduces SDK-dependent patches scattered across RL modules.
It also enables much cleaner and proper testing of pass invariants.

Moved individual RL passes from actions.py to SDK-specific action modules.
Moved shared action types and registration to rl/actions/__init__.py.
Moved SDK-specific pass runners, layout handling, and masking logic out of PredictorEnv.
Removed parsing.py; conversion/layout helpers now live with the owning SDK.
Updated tests to verify SDK actions do what they promise to do (including invariant checks).

Fixes #668
Fixes #66

Assisted by: GPT5.5 via Codex

Checklist

The pull request only contains commits that are focused and relevant to this change.
I have added appropriate tests that cover the new/changed functionality.
I have updated the documentation to reflect these changes.
I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
I have added migration instructions to the upgrade guide (if needed).
The changes follow the project's style guidelines and introduce no new warnings.
The changes are fully tested and pass the CI checks.
I have reviewed my own code changes.

If PR contains AI-assisted content:

I have disclosed the use of AI tools in the PR description as per our AI Usage Guidelines.
AI-assisted commits include an Assisted-by: [Model Name] via [Tool Name] footer.
I confirm that I have personally reviewed and understood all AI-generated content, and accept full responsibility for it.

## Description This PR addresses critical bugs in the RL training process with the following key changes: **Structure Improvements:** - **Redesigned action validation logic** (`predictorenv.py`): Rewrote `determine_valid_actions_for_state()` with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations. - Added helper methods `is_circuit_laid_out()` and `is_circuit_routed()` to replace the buggy `CheckMap` pass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode. - **Fixed type annotation** (`actions.py`): Corrected `do_while` parameter type from `dict[str, Circuit]` to `PropertySet` and added missing import for Qiskit's `PropertySet`. - **Added reproducibility** (`predictor.py`): Set random seed for non-test training runs to ensure reproducible results. - **Improved VF2Layout error handling** (`predictorenv.py`): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training. **Test Updates:** - Suppressed deprecation warnings in tket routing test --------- Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com> Co-authored-by: flowerthrower <flowerthrower@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…ove-action-pass-imports-and-wrappers

codecov · 2026-05-12T17:49:10Z

Codecov Report

❌ Patch coverage is 94.07407% with 16 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/mqt/predictor/rl/actions/qiskit_actions.py	90.2%	7 Missing ⚠️
src/mqt/predictor/rl/predictorenv.py	88.0%	5 Missing ⚠️
src/mqt/predictor/rl/actions/bqskit_actions.py	93.1%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ts-and-wrappers

…ove-action-pass-imports-and-wrappers

flowerthrower · 2026-06-19T14:45:50Z

@coderabbitai review

coderabbitai · 2026-06-19T14:45:57Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/compilation/test_predictor_rl.py`:
- Around line 25-26: The import statement for qiskit_actions uses an alias that
matches the module name, which Ruff suggests simplifying per PLR0402. Replace
the import statement `import mqt.predictor.rl.actions.qiskit_actions as
qiskit_actions` with a from-import style: `from mqt.predictor.rl.actions import
qiskit_actions`. This eliminates the redundant alias and improves code clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73f17e94-e81a-4060-a17a-eb0063efc4fa

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec22c7 and b3f9f06.

📒 Files selected for processing (8)

CHANGELOG.md
src/mqt/predictor/rl/actions/__init__.py
src/mqt/predictor/rl/actions/bqskit_actions.py
src/mqt/predictor/rl/actions/qiskit_actions.py
src/mqt/predictor/rl/predictorenv.py
tests/compilation/test_helper_rl.py
tests/compilation/test_integration_further_SDKs.py
tests/compilation/test_predictor_rl.py

burgholzer

Just quickly browsed through. LGTM modulo some small comments that should hopefully be easy to resolve

…ove-action-pass-imports-and-wrappers

Co-authored-by: Lukas Burgholzer <burgholzer@me.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

…ove-action-pass-imports-and-wrappers

burgholzer

LGTM

burgholzer · 2026-06-24T20:50:04Z

@CodeRabbit review

coderabbitai · 2026-06-24T20:50:10Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/mqt/predictor/rl/predictorenv.py (1)
193-198: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Restore per-step shaping for Hellinger rewards.

Non-terminal steps always return 0, so estimated_hellinger_distance still only contributes at termination. Please preserve terminal reward behavior while adding the per-step shaping signal for this reward mode.

Based on learnings, estimated_hellinger_distance should compute a per-step shaping signal analogous to the other shaped reward functions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mqt/predictor/rl/predictorenv.py` around lines 193 - 198, The step
handling in predictorenv.py only assigns reward from calculate_reward() on
termination, so estimated_hellinger_distance never provides per-step shaping.
Update the action handling path around the reward assignment to preserve the
terminal reward behavior while also computing and adding a per-step shaping
signal when that reward mode is active, following the same pattern used by the
other shaped reward functions in the environment.
Source: Learnings
tests/compilation/test_predictor_rl.py (1)
190-193: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Verify the private-registry access is explicitly annotated.

Lines 190-193 reach into actions_module._ACTIONS. In this repo's tests, that private-member access is expected to carry # noqa: SLF001 so the intent stays documented even if Ruff later reports the directive as unused. Based on learnings, tests under tests/ intentionally keep # noqa: SLF001 on private-member access for clarity.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/compilation/test_predictor_rl.py` around lines 190 - 193, The test
setup in test_register_action accesses the private actions_module._ACTIONS
registry without the required explicit annotation. Update the monkeypatch line
in test_register_action so the private-member access is clearly documented with
a # noqa: SLF001 comment, matching the repo’s test convention for private
registry access.
Source: Learnings

♻️ Duplicate comments (1)

src/mqt/predictor/rl/actions/bqskit_actions.py (1)

302-305: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Add Google-style Args/Returns sections.

This new helper has a keyword-only argument and boolean return contract, but the docstring is still a one-liner.

As per coding guidelines, **/*.py: Use Google-style docstrings in Python code.

Proposed docstring update

 def is_bqskit_action_available(*, has_parameterized_gates: bool) -> bool:
-    """Return whether a BQSKit action is available for the current circuit state."""
+    """Return whether a BQSKit action is available for the current circuit state.
+
+    Args:
+        has_parameterized_gates: Whether the current circuit contains parameterized gates.
+
+    Returns:
+        True if BQSKit can process the current circuit state; otherwise, False.
+    """
     # BQSKit does not support parameterized gates
     return not has_parameterized_gates

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mqt/predictor/rl/actions/bqskit_actions.py` around lines 302 - 305, The
helper is missing a Google-style docstring for its keyword-only boolean API.
Update the docstring on is_bqskit_action_available to include Args for
has_parameterized_gates and Returns for the bool result, keeping the description
aligned with the current logic.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 193-198: The step handling in predictorenv.py only assigns reward
from calculate_reward() on termination, so estimated_hellinger_distance never
provides per-step shaping. Update the action handling path around the reward
assignment to preserve the terminal reward behavior while also computing and
adding a per-step shaping signal when that reward mode is active, following the
same pattern used by the other shaped reward functions in the environment.

In `@tests/compilation/test_predictor_rl.py`:
- Around line 190-193: The test setup in test_register_action accesses the
private actions_module._ACTIONS registry without the required explicit
annotation. Update the monkeypatch line in test_register_action so the
private-member access is clearly documented with a # noqa: SLF001 comment,
matching the repo’s test convention for private registry access.

---

Duplicate comments:
In `@src/mqt/predictor/rl/actions/bqskit_actions.py`:
- Around line 302-305: The helper is missing a Google-style docstring for its
keyword-only boolean API. Update the docstring on is_bqskit_action_available to
include Args for has_parameterized_gates and Returns for the bool result,
keeping the description aligned with the current logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: af5b8f25-9836-49b3-9d6c-497c948b109d

📥 Commits

Reviewing files that changed from the base of the PR and between b3f9f06 and 2ecf4ad.

📒 Files selected for processing (4)

src/mqt/predictor/rl/actions/__init__.py
src/mqt/predictor/rl/actions/bqskit_actions.py
src/mqt/predictor/rl/predictorenv.py
tests/compilation/test_predictor_rl.py

flowerthrower and others added 17 commits March 11, 2026 12:24

Merge commit '90ec2cf' into fix-RL-training-bug

4a0ed8d

🎨 improve seed and training defaults

68eb338

🎨 adjust test step limits

f9de637

⏪ revert unrelated changes

55e5e08

🎨 pre-commit fixes

ba9042d

🎨 improve comments

dca4827

✅ fix synthesis size limit for bqskit passes

1e523e1

🎨 pre-commit fixes

6d6487a

🎨 reduce test training overhead

7a300a2

🎨 add comments

d64a97f

🎨 reduce number of training steps

9f2697e

🚚 move actions imports

f685fdd

🚚 move parsing logic

74a0f9a

🎨 add changelog entry

512d2b0

✅ add tests

c398324

🎨 add changelog entry

010fa68

flowerthrower linked an issue May 12, 2026 that may be closed by this pull request

🎨 improve action/pass imports and wrappers #668

Open

pre-commit-ci Bot and others added 4 commits May 12, 2026 14:35

🎨 pre-commit fixes

59bdf52

🔀 pull changes from #679

2dd86dd

Merge commit '59bdf52374e47aaf2ba6107453430adf37c458b9' into 668-impr…

ad83a28

…ove-action-pass-imports-and-wrappers

🎨 improve docstring

86d078a

flowerthrower and others added 7 commits May 29, 2026 16:00

Merge remote-tracking branch 'origin/main' into fix-RL-training-bug

788ec25

🎨 imporve error reporting

a474a8f

✅ improve coverage

51b20af

✅ fix test for qiskit<2

7e9a369

🎨 pre-commit fixes

0694749

Merge branch 'main' into fix-RL-training-bug

ce7491a

Merge branch 'fix-RL-training-bug' into 668-improve-action-pass-impor…

067c6de

…ts-and-wrappers

flowerthrower and others added 3 commits June 8, 2026 15:16

🎨 calm down rabbit complaining about Arguments

aa24d03

🎨 improve tests

93a714d

Merge branch 'main' into 668-improve-action-pass-imports-and-wrappers

09389f3

flowerthrower marked this pull request as ready for review June 8, 2026 14:04

flowerthrower added 6 commits June 8, 2026 16:32

🎨 remove import forwarding

69f42b3

🎨 improve tests

846bb39

⏪ revert out-of-scope changes

e9eec21

⏪ revert out-of-scope changes

b8dce82

Merge commit 'be0d490fb5ef5731cf32ca2b2efde1f5b6c7ddf9' into 668-impr…

46414ea

…ove-action-pass-imports-and-wrappers

🔥 remove dead helper

b3f9f06

coderabbitai Bot requested changes Jun 19, 2026

View reviewed changes

Comment thread tests/compilation/test_predictor_rl.py Outdated

flowerthrower and others added 2 commits June 19, 2026 17:15

🎨 use from import

c68a588

Merge branch 'main' into 668-improve-action-pass-imports-and-wrappers

1d4c9da

flowerthrower requested a review from burgholzer June 22, 2026 10:13

burgholzer approved these changes Jun 22, 2026

View reviewed changes

Comment thread src/mqt/predictor/rl/predictorenv.py Outdated

Comment thread src/mqt/predictor/rl/actions/__init__.py Outdated

Comment thread src/mqt/predictor/rl/actions/bqskit_actions.py Outdated

Comment thread src/mqt/predictor/rl/actions/qiskit_actions.py

flowerthrower and others added 8 commits June 22, 2026 22:37

Merge commit '1d4c9daeff4bd9bf0f8af2918ee04c7665144ed0' into 668-impr…

9ebad50

…ove-action-pass-imports-and-wrappers

🎨 pre-commit fixes

8f2fd9c

Apply suggestions from code review

d4e4db1

Co-authored-by: Lukas Burgholzer <burgholzer@me.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

🎨 pre-commit fixes

c50584d

🐛 fix bqskit layout bug

373023c

Merge branch 'main' into 668-improve-action-pass-imports-and-wrappers

a26665f

Merge commit 'a26665f61d4e8911c02d216a8cf46f66568f8545' into 668-impr…

0a55d27

…ove-action-pass-imports-and-wrappers

🎨 fix broken import names

2ecf4ad

burgholzer approved these changes Jun 24, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

♻️ Refactor RL actions by SDK#680

♻️ Refactor RL actions by SDK#680
flowerthrower wants to merge 65 commits into
mainfrom
668-improve-action-pass-imports-and-wrappers

flowerthrower commented May 12, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

flowerthrower commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

burgholzer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burgholzer left a comment

Uh oh!

burgholzer commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

flowerthrower commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flowerthrower commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

burgholzer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burgholzer left a comment

Choose a reason for hiding this comment

Uh oh!

burgholzer commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flowerthrower commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading