Feat/medkit integration#1
Open
mfaferek93 wants to merge 15 commits into
Open
Conversation
Declared in package.xml and CMakeLists.txt but never #included anywhere in src/, include/ or test/. Build is green without it; manymove_bringup keeps the dep where it is actually used.
All bt_client_*, collision_test and tutorials shared "bt_client_node", so source_id was indistinguishable across them. Each now matches its executable; the static logger in tree_helper.cpp and two log-level remaps in manymove_bringup launch files moved with them.
No CMake option, no conditional <depend>: this fork exists for the medkit integration. If we ever upstream to pastoriomarco/manymove, put it behind a toggle so vanilla users do not pull medkit transitively.
…tall helper Three pieces for BT action node instrumentation: fault_codes.hpp (MANYMOVE_* constants), fault_reporting.hpp (the FaultReporting capability class), and installFaultReporter in main_imports_helper.hpp wired into all 15 BT entry-point mains. Capability class instead of direct FaultReporter member because BT.CPP factory constructs nodes with fixed signatures, so we pull the reporter from the blackboard in the FaultReporting ctor (same pattern as `node_` already fetched in every action node).
Adds FaultReporting as a second base class to every Action node across planner/objects/signals/gripper/logic/isaac, plus the matching ctor init list call. Condition and decorator nodes are intentionally left out: their FAILURE return is normal control flow.
…austion Four reportFault sites in MoveManipulatorAction: - kPlannerCollisionDetected on the onStart collision branch - kPlannerEstopTriggered when stop_execution is set before motion - kPlannerRetryAttempt on every failed attempt (soft fault, throttled locally by LocalFilter) - kPlannerRetriesExhausted once max_tries is reached reportFaultPassed pairs kPlannerRetryAttempt in the success branch so the local filter resets when a retry eventually succeeds. Skipped the input/blackboard-missing FAILURE returns (lines 96, 147, 174 and ResetTrajectories@356); those are programmer errors caught at startup, not operational faults.
…ait timeout Five reportFault sites covering SetOutput/GetInput action failures, CheckRobotState NOT_READY (CRITICAL), WaitForInput server-unavailable and the wait-timeout path. reportFaultPassed pairs kRobotNotReady on CheckRobotState success and kSignalWaitInputTimeout on the WaitForInput desired-value branch so LocalFilter resets cleanly.
…timeout Five reportFault sites cover the operational failures of AddCollisionObjectAction, RemoveCollisionObjectAction, AttachDetachObjectAction, GetObjectPoseAction and WaitForObjectAction. WaitForObject pairs reportFaultPassed in the success branch. CheckObjectExistsAction is intentionally not instrumented: its FAILURE on \"object missing\" is normal control flow used by callers as a condition, not an operational fault.
Four reportFault sites: GripperCommand server timeout and goal-not-reached, GripperTraj server unavailable and trajectory-failed result. PublishJointStateAction throws on bad inputs and never returns FAILURE, so it has nothing to instrument.
GetLinkPoseAction reports kTfLookupFailed when tf2 raises a TransformException. WaitForKeyBool reports kWaitKeyTimeout on the timeout branch and pairs reportFaultPassed in the success path. Adds kWaitKeyTimeout to fault_codes.hpp. Configuration-error FAILUREs (missing inputs, wrong vector sizes) are left out, as are condition-style nodes (CheckPoseDistance, CheckPoseBounds, CheckKeyBoolValue) where FAILURE is expected control flow.
Two reportFault sites in FoundationPoseAlignmentNode: detection topic silent for too long, and no detection passing the target/min_score filter within the timeout. Both raise kIsaacFoundationPoseFailed. GetEntityPoseNode/SetEntityPoseNode service-error paths and Isaac TF timeouts are left for the isaac demo work in Phase 3, where they'll have specific fault codes once the demo wiring exposes them.
…ager fake_fault_manager.hpp is a small fixture that hosts the /fault_manager/report_fault service and records every accepted request, matching the existing FakeXxxServer pattern in this test directory. test_fault_reporting.cpp uses it to verify the end-to-end plumbing: - installFaultReporter puts a reporter on the blackboard - FaultReporting::reportFault forwards FAILED events with the right fault_code, severity, description and source_id (FQN of the BT client node) - reportFaultPassed forwards a PASSED event for healing - mixin silently no-ops when no reporter is installed (e.g. unit tests building a tree without bt_client scaffolding) Per-instrumentation-site assertions are out of scope here; those would need to drive every BT action node through real action server fakes and belong with the demo work in Phase 3.
- docs/FAULT_CODES.md: full table of MANYMOVE_* codes with severity, trigger condition and pairing semantics, grouped by subsystem. - README.md: short note pointing fork users at the medkit dependency and the catalogue. - CHANGELOG.rst: entry under Forthcoming summarising the BT node rename, the hard medkit dep and the dropped topic_based_ros2_control declaration. - ament_uncrustify --reformat run across the modified manymove_cpp_trees sources. - Lint fixes for new files (cpplint header order / memory include / guard style; ament_copyright BSD-3 boilerplate match). The remaining test_action_nodes_gripper failure is a pre-existing convertFromString registration issue in upstream manymove (verified by running the test from the parent commit before any medkit changes).
After wiring ros2_medkit_fault_reporter into manymove_cpp_trees the CI job needs the medkit packages on the runner. Pulling them via apt from rosdistro keeps the workflow simple (no source overlay clone).
- Wire kRobotResetFailed across ResetRobotStateAction failure branches (goal rejection + non-SUCCEEDED result for reset, unload-traj and load-traj steps). - Instrument Isaac TF transform timeouts with kIsaacFoundationPoseFailed to match the sibling no-message and no-valid-detection timeouts. - Replace silent FAILURE in GripperTrajAction::onRunning !goal_sent_ with BT::RuntimeError, matching the programmer-error policy. - Add reportFaultPassed on onHalted for WaitForInputAction and WaitForObjectAction so abandoned waits do not leave kSignalWaitInputTimeout / kObjectWaitTimeout lingering in FaultManager. - Drop orphan codes kPlannerMissingMoveId (programmer-error path uses throw) and kObjectExistsCheckFailed (CheckObjectExists is intentionally not instrumented; FAILURE is normal control flow there). - Pin medkit apt packages to MEDKIT_VERSION=0.4.0-1 so an upstream bump does not silently change CI behaviour. - Drop humble from the CI matrix until ros-humble-ros2-medkit-* debs appear on packages.ros.org (rosdistro entry exists; debs do not yet). - Update docs/FAULT_CODES.md: add kRobotResetFailed, expand the Isaac FoundationPose entry to cover TF timeouts, and document the reportFaultPassed-on-halt pairing for both wait codes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
ros2_medkit_fault_reporterintomanymove_cpp_treesso every BT action node reports operational faults directly to aFaultManager, instead of routing through/diagnostics. This is the native-instrumentation path the medkit upstream design recommends.What changes
Plumbing
fault_codes.hpp(101 lines): catalogue ofMANYMOVE_*constants grouped by subsystem.fault_reporting.hpp(110 lines):FaultReportingcapability class. Pulls the reporter from the blackboard, same pattern action nodes already use fornode_.installFaultReporterhelper inmain_imports_helper.hpp, wired into all 15 BT executable mains.package.xml+CMakeLists.txt: hard dep onros2_medkit_fault_reporterandros2_medkit_msgs. No CMake toggle (fork-only).Instrumentation (21 action node classes)
reportFaultPassed), retries exhaustedCondition and decorator nodes are intentionally left out: their
FAILUREreturn is normal control flow. Programmer-errorFAILUREreturns (missing inputs, wrong vector sizes) are also skipped.Tests
fake_fault_manager.hpp: matches the existingFakeXxxServerpattern in the test tree.test_fault_reporting.cpp: end-to-end roundtrip covering reporter install, FAILED forwarding, PASSED healing and the silent no-op path when no reporter is present.Docs + CI
docs/FAULT_CODES.md: full catalogue with severity, trigger and pairing semantics.README.md+CHANGELOG.rstentries.ros2_medkit_*debs from rosdistro so the overlay builds.Cleanup
bt_client_*,collision_testand tutorials get unique node names sosource_idis distinguishable per executable.topic_based_ros2_controldeclaration frommanymove_cpp_trees(kept where actually used inmanymove_bringup).Notes
test_action_nodes_gripperfailure (convertFromString registration) is upstream and unrelated to this change.