♻️ Refactor RL actions by SDK#680
Conversation
## Description This PR addresses critical bugs in the RL training process with the following key changes: **Structure Improvements:** - **Redesigned action validation logic** (`predictorenv.py`): Rewrote `determine_valid_actions_for_state()` with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations. - Added helper methods `is_circuit_laid_out()` and `is_circuit_routed()` to replace the buggy `CheckMap` pass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode. - **Fixed type annotation** (`actions.py`): Corrected `do_while` parameter type from `dict[str, Circuit]` to `PropertySet` and added missing import for Qiskit's `PropertySet`. - **Added reproducibility** (`predictor.py`): Set random seed for non-test training runs to ensure reproducible results. - **Improved VF2Layout error handling** (`predictorenv.py`): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training. **Test Updates:** - Suppressed deprecation warnings in tket routing test --------- Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com> Co-authored-by: flowerthrower <flowerthrower@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ove-action-pass-imports-and-wrappers
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/compilation/test_predictor_rl.py`:
- Around line 25-26: The import statement for qiskit_actions uses an alias that
matches the module name, which Ruff suggests simplifying per PLR0402. Replace
the import statement `import mqt.predictor.rl.actions.qiskit_actions as
qiskit_actions` with a from-import style: `from mqt.predictor.rl.actions import
qiskit_actions`. This eliminates the redundant alias and improves code clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 73f17e94-e81a-4060-a17a-eb0063efc4fa
📒 Files selected for processing (8)
CHANGELOG.mdsrc/mqt/predictor/rl/actions/__init__.pysrc/mqt/predictor/rl/actions/bqskit_actions.pysrc/mqt/predictor/rl/actions/qiskit_actions.pysrc/mqt/predictor/rl/predictorenv.pytests/compilation/test_helper_rl.pytests/compilation/test_integration_further_SDKs.pytests/compilation/test_predictor_rl.py
burgholzer
left a comment
There was a problem hiding this comment.
Just quickly browsed through. LGTM modulo some small comments that should hopefully be easy to resolve
…ove-action-pass-imports-and-wrappers
Co-authored-by: Lukas Burgholzer <burgholzer@me.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>
…ove-action-pass-imports-and-wrappers
|
@CodeRabbit review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/mqt/predictor/rl/predictorenv.py (1)
193-198: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftRestore per-step shaping for Hellinger rewards.
Non-terminal steps always return
0, soestimated_hellinger_distancestill only contributes at termination. Please preserve terminal reward behavior while adding the per-step shaping signal for this reward mode.Based on learnings,
estimated_hellinger_distanceshould compute a per-step shaping signal analogous to the other shaped reward functions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mqt/predictor/rl/predictorenv.py` around lines 193 - 198, The step handling in predictorenv.py only assigns reward from calculate_reward() on termination, so estimated_hellinger_distance never provides per-step shaping. Update the action handling path around the reward assignment to preserve the terminal reward behavior while also computing and adding a per-step shaping signal when that reward mode is active, following the same pattern used by the other shaped reward functions in the environment.Source: Learnings
tests/compilation/test_predictor_rl.py (1)
190-193: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winVerify the private-registry access is explicitly annotated.
Lines 190-193 reach into
actions_module._ACTIONS. In this repo's tests, that private-member access is expected to carry# noqa: SLF001so the intent stays documented even if Ruff later reports the directive as unused. Based on learnings, tests undertests/intentionally keep# noqa: SLF001on private-member access for clarity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/compilation/test_predictor_rl.py` around lines 190 - 193, The test setup in test_register_action accesses the private actions_module._ACTIONS registry without the required explicit annotation. Update the monkeypatch line in test_register_action so the private-member access is clearly documented with a # noqa: SLF001 comment, matching the repo’s test convention for private registry access.Source: Learnings
♻️ Duplicate comments (1)
src/mqt/predictor/rl/actions/bqskit_actions.py (1)
302-305: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAdd Google-style
Args/Returnssections.This new helper has a keyword-only argument and boolean return contract, but the docstring is still a one-liner.
As per coding guidelines,
**/*.py: Use Google-style docstrings in Python code.Proposed docstring update
def is_bqskit_action_available(*, has_parameterized_gates: bool) -> bool: - """Return whether a BQSKit action is available for the current circuit state.""" + """Return whether a BQSKit action is available for the current circuit state. + + Args: + has_parameterized_gates: Whether the current circuit contains parameterized gates. + + Returns: + True if BQSKit can process the current circuit state; otherwise, False. + """ # BQSKit does not support parameterized gates return not has_parameterized_gates🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mqt/predictor/rl/actions/bqskit_actions.py` around lines 302 - 305, The helper is missing a Google-style docstring for its keyword-only boolean API. Update the docstring on is_bqskit_action_available to include Args for has_parameterized_gates and Returns for the bool result, keeping the description aligned with the current logic.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 193-198: The step handling in predictorenv.py only assigns reward
from calculate_reward() on termination, so estimated_hellinger_distance never
provides per-step shaping. Update the action handling path around the reward
assignment to preserve the terminal reward behavior while also computing and
adding a per-step shaping signal when that reward mode is active, following the
same pattern used by the other shaped reward functions in the environment.
In `@tests/compilation/test_predictor_rl.py`:
- Around line 190-193: The test setup in test_register_action accesses the
private actions_module._ACTIONS registry without the required explicit
annotation. Update the monkeypatch line in test_register_action so the
private-member access is clearly documented with a # noqa: SLF001 comment,
matching the repo’s test convention for private registry access.
---
Duplicate comments:
In `@src/mqt/predictor/rl/actions/bqskit_actions.py`:
- Around line 302-305: The helper is missing a Google-style docstring for its
keyword-only boolean API. Update the docstring on is_bqskit_action_available to
include Args for has_parameterized_gates and Returns for the bool result,
keeping the description aligned with the current logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: af5b8f25-9836-49b3-9d6c-497c948b109d
📒 Files selected for processing (4)
src/mqt/predictor/rl/actions/__init__.pysrc/mqt/predictor/rl/actions/bqskit_actions.pysrc/mqt/predictor/rl/predictorenv.pytests/compilation/test_predictor_rl.py
Description
Restructures RL actions into SDK-specific modules for Qiskit, TKET, and BQSKit. Also moves the corresponding wrapper logic closer to each SDK action implementation and keeps compatibility exports for existing imports.
This immensely improves modularity and reduces SDK-dependent patches scattered across RL modules.
It also enables much cleaner and proper testing of pass invariants.
actions.pyto SDK-specific action modules.rl/actions/__init__.py.PredictorEnv.parsing.py; conversion/layout helpers now live with the owning SDK.Fixes #668
Fixes #66
Assisted by: GPT5.5 via Codex
Checklist
If PR contains AI-assisted content:
Assisted-by: [Model Name] via [Tool Name]footer.