Skip to content

♻️ Refactor RL actions by SDK#680

Open
flowerthrower wants to merge 65 commits into
mainfrom
668-improve-action-pass-imports-and-wrappers
Open

♻️ Refactor RL actions by SDK#680
flowerthrower wants to merge 65 commits into
mainfrom
668-improve-action-pass-imports-and-wrappers

Conversation

@flowerthrower

@flowerthrower flowerthrower commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Description

Restructures RL actions into SDK-specific modules for Qiskit, TKET, and BQSKit. Also moves the corresponding wrapper logic closer to each SDK action implementation and keeps compatibility exports for existing imports.

This immensely improves modularity and reduces SDK-dependent patches scattered across RL modules.
It also enables much cleaner and proper testing of pass invariants.

  • Moved individual RL passes from actions.py to SDK-specific action modules.
  • Moved shared action types and registration to rl/actions/__init__.py.
  • Moved SDK-specific pass runners, layout handling, and masking logic out of PredictorEnv.
  • Removed parsing.py; conversion/layout helpers now live with the owning SDK.
  • Updated tests to verify SDK actions do what they promise to do (including invariant checks).

Fixes #668
Fixes #66

Assisted by: GPT5.5 via Codex

Checklist

  • The pull request only contains commits that are focused and relevant to this change.
  • I have added appropriate tests that cover the new/changed functionality.
  • I have updated the documentation to reflect these changes.
  • I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
  • I have added migration instructions to the upgrade guide (if needed).
  • The changes follow the project's style guidelines and introduce no new warnings.
  • The changes are fully tested and pass the CI checks.
  • I have reviewed my own code changes.

If PR contains AI-assisted content:

  • I have disclosed the use of AI tools in the PR description as per our AI Usage Guidelines.
  • AI-assisted commits include an Assisted-by: [Model Name] via [Tool Name] footer.
  • I confirm that I have personally reviewed and understood all AI-generated content, and accept full responsibility for it.

flowerthrower and others added 17 commits March 11, 2026 12:24
## Description

This PR addresses critical bugs in the RL training process with the
following key changes:

**Structure Improvements:**
- **Redesigned action validation logic** (`predictorenv.py`): Rewrote
`determine_valid_actions_for_state()` with a more structured (but
equivalent) state machine that explicitly tracks three circuit states
(synthesized, laid_out, routed) and handles 6 different state
combinations.
- Added helper methods `is_circuit_laid_out()` and `is_circuit_routed()`
to replace the buggy `CheckMap` pass with more reliable state checking.
The new logic supports both the original restricted MDP and a flexible
general MDP mode.

- **Fixed type annotation** (`actions.py`): Corrected `do_while`
parameter type from `dict[str, Circuit]` to `PropertySet` and added
missing import for Qiskit's `PropertySet`.

- **Added reproducibility** (`predictor.py`): Set random seed for
non-test training runs to ensure reproducible results.

- **Improved VF2Layout error handling** (`predictorenv.py`): Replaced
assertion failures with warning logs when VF2Layout doesn't find a
solution, preventing crashes during training.

**Test Updates:**
- Suppressed deprecation warnings in tket routing test

---------

Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>
Co-authored-by: flowerthrower <flowerthrower@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@flowerthrower flowerthrower linked an issue May 12, 2026 that may be closed by this pull request
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.07407% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mqt/predictor/rl/actions/qiskit_actions.py 90.2% 7 Missing ⚠️
src/mqt/predictor/rl/predictorenv.py 88.0% 5 Missing ⚠️
src/mqt/predictor/rl/actions/bqskit_actions.py 93.1% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@flowerthrower flowerthrower marked this pull request as ready for review June 8, 2026 14:04
@flowerthrower

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/compilation/test_predictor_rl.py`:
- Around line 25-26: The import statement for qiskit_actions uses an alias that
matches the module name, which Ruff suggests simplifying per PLR0402. Replace
the import statement `import mqt.predictor.rl.actions.qiskit_actions as
qiskit_actions` with a from-import style: `from mqt.predictor.rl.actions import
qiskit_actions`. This eliminates the redundant alias and improves code clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73f17e94-e81a-4060-a17a-eb0063efc4fa

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec22c7 and b3f9f06.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • src/mqt/predictor/rl/actions/__init__.py
  • src/mqt/predictor/rl/actions/bqskit_actions.py
  • src/mqt/predictor/rl/actions/qiskit_actions.py
  • src/mqt/predictor/rl/predictorenv.py
  • tests/compilation/test_helper_rl.py
  • tests/compilation/test_integration_further_SDKs.py
  • tests/compilation/test_predictor_rl.py

Comment thread tests/compilation/test_predictor_rl.py Outdated
@flowerthrower flowerthrower requested a review from burgholzer June 22, 2026 10:13

@burgholzer burgholzer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quickly browsed through. LGTM modulo some small comments that should hopefully be easy to resolve

Comment thread src/mqt/predictor/rl/predictorenv.py Outdated
Comment thread src/mqt/predictor/rl/actions/__init__.py Outdated
Comment thread src/mqt/predictor/rl/actions/bqskit_actions.py Outdated
Comment thread src/mqt/predictor/rl/actions/qiskit_actions.py

@burgholzer burgholzer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@burgholzer

Copy link
Copy Markdown
Member

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/mqt/predictor/rl/predictorenv.py (1)

193-198: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Restore per-step shaping for Hellinger rewards.

Non-terminal steps always return 0, so estimated_hellinger_distance still only contributes at termination. Please preserve terminal reward behavior while adding the per-step shaping signal for this reward mode.

Based on learnings, estimated_hellinger_distance should compute a per-step shaping signal analogous to the other shaped reward functions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mqt/predictor/rl/predictorenv.py` around lines 193 - 198, The step
handling in predictorenv.py only assigns reward from calculate_reward() on
termination, so estimated_hellinger_distance never provides per-step shaping.
Update the action handling path around the reward assignment to preserve the
terminal reward behavior while also computing and adding a per-step shaping
signal when that reward mode is active, following the same pattern used by the
other shaped reward functions in the environment.

Source: Learnings

tests/compilation/test_predictor_rl.py (1)

190-193: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Verify the private-registry access is explicitly annotated.

Lines 190-193 reach into actions_module._ACTIONS. In this repo's tests, that private-member access is expected to carry # noqa: SLF001 so the intent stays documented even if Ruff later reports the directive as unused. Based on learnings, tests under tests/ intentionally keep # noqa: SLF001 on private-member access for clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/compilation/test_predictor_rl.py` around lines 190 - 193, The test
setup in test_register_action accesses the private actions_module._ACTIONS
registry without the required explicit annotation. Update the monkeypatch line
in test_register_action so the private-member access is clearly documented with
a # noqa: SLF001 comment, matching the repo’s test convention for private
registry access.

Source: Learnings

♻️ Duplicate comments (1)
src/mqt/predictor/rl/actions/bqskit_actions.py (1)

302-305: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Add Google-style Args/Returns sections.

This new helper has a keyword-only argument and boolean return contract, but the docstring is still a one-liner.

As per coding guidelines, **/*.py: Use Google-style docstrings in Python code.

Proposed docstring update
 def is_bqskit_action_available(*, has_parameterized_gates: bool) -> bool:
-    """Return whether a BQSKit action is available for the current circuit state."""
+    """Return whether a BQSKit action is available for the current circuit state.
+
+    Args:
+        has_parameterized_gates: Whether the current circuit contains parameterized gates.
+
+    Returns:
+        True if BQSKit can process the current circuit state; otherwise, False.
+    """
     # BQSKit does not support parameterized gates
     return not has_parameterized_gates
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mqt/predictor/rl/actions/bqskit_actions.py` around lines 302 - 305, The
helper is missing a Google-style docstring for its keyword-only boolean API.
Update the docstring on is_bqskit_action_available to include Args for
has_parameterized_gates and Returns for the bool result, keeping the description
aligned with the current logic.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 193-198: The step handling in predictorenv.py only assigns reward
from calculate_reward() on termination, so estimated_hellinger_distance never
provides per-step shaping. Update the action handling path around the reward
assignment to preserve the terminal reward behavior while also computing and
adding a per-step shaping signal when that reward mode is active, following the
same pattern used by the other shaped reward functions in the environment.

In `@tests/compilation/test_predictor_rl.py`:
- Around line 190-193: The test setup in test_register_action accesses the
private actions_module._ACTIONS registry without the required explicit
annotation. Update the monkeypatch line in test_register_action so the
private-member access is clearly documented with a # noqa: SLF001 comment,
matching the repo’s test convention for private registry access.

---

Duplicate comments:
In `@src/mqt/predictor/rl/actions/bqskit_actions.py`:
- Around line 302-305: The helper is missing a Google-style docstring for its
keyword-only boolean API. Update the docstring on is_bqskit_action_available to
include Args for has_parameterized_gates and Returns for the bool result,
keeping the description aligned with the current logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: af5b8f25-9836-49b3-9d6c-497c948b109d

📥 Commits

Reviewing files that changed from the base of the PR and between b3f9f06 and 2ecf4ad.

📒 Files selected for processing (4)
  • src/mqt/predictor/rl/actions/__init__.py
  • src/mqt/predictor/rl/actions/bqskit_actions.py
  • src/mqt/predictor/rl/predictorenv.py
  • tests/compilation/test_predictor_rl.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🎨 improve action/pass imports and wrappers

2 participants