Skip to content

Feat/medkit integration#1

Open
mfaferek93 wants to merge 15 commits into
mainfrom
feat/medkit-integration
Open

Feat/medkit integration#1
mfaferek93 wants to merge 15 commits into
mainfrom
feat/medkit-integration

Conversation

@mfaferek93
Copy link
Copy Markdown

@mfaferek93 mfaferek93 commented May 12, 2026

Summary

Wires ros2_medkit_fault_reporter into manymove_cpp_trees so every BT action node reports operational faults directly to a FaultManager, instead of routing through /diagnostics. This is the native-instrumentation path the medkit upstream design recommends.

What changes

Plumbing

  • fault_codes.hpp (101 lines): catalogue of MANYMOVE_* constants grouped by subsystem.
  • fault_reporting.hpp (110 lines): FaultReporting capability class. Pulls the reporter from the blackboard, same pattern action nodes already use for node_.
  • installFaultReporter helper in main_imports_helper.hpp, wired into all 15 BT executable mains.
  • package.xml + CMakeLists.txt: hard dep on ros2_medkit_fault_reporter and ros2_medkit_msgs. No CMake toggle (fork-only).

Instrumentation (21 action node classes)

Subsystem Sites Notable
planner 4 collision, e-stop, retry attempt (paired with reportFaultPassed), retries exhausted
signals 5 IO set/get, robot NOT_READY (CRITICAL), wait-input timeout
objects 5 add / remove / attach / get-pose / wait-timeout
gripper 4 command + traj operational failures
logic 2 TF lookup, WaitForKeyBool timeout
isaac 2 FoundationPose detection timeouts

Condition and decorator nodes are intentionally left out: their FAILURE return is normal control flow. Programmer-error FAILURE returns (missing inputs, wrong vector sizes) are also skipped.

Tests

  • fake_fault_manager.hpp: matches the existing FakeXxxServer pattern in the test tree.
  • test_fault_reporting.cpp: end-to-end roundtrip covering reporter install, FAILED forwarding, PASSED healing and the silent no-op path when no reporter is present.

Docs + CI

  • docs/FAULT_CODES.md: full catalogue with severity, trigger and pairing semantics.
  • README.md + CHANGELOG.rst entries.
  • CI workflow installs ros2_medkit_* debs from rosdistro so the overlay builds.

Cleanup

  • bt_client_*, collision_test and tutorials get unique node names so source_id is distinguishable per executable.
  • Drop unused topic_based_ros2_control declaration from manymove_cpp_trees (kept where actually used in manymove_bringup).

Notes

  • The pre-existing test_action_nodes_gripper failure (convertFromString registration) is upstream and unrelated to this change.
  • Per-site assertions would require full action server fakes for every BT node; deferred to the demo integration work.

mfaferek93 added 14 commits May 9, 2026 22:14
Declared in package.xml and CMakeLists.txt but never #included anywhere
in src/, include/ or test/. Build is green without it; manymove_bringup
keeps the dep where it is actually used.
All bt_client_*, collision_test and tutorials shared "bt_client_node",
so source_id was indistinguishable across them. Each now matches its
executable; the static logger in tree_helper.cpp and two log-level
remaps in manymove_bringup launch files moved with them.
No CMake option, no conditional <depend>: this fork exists for the
medkit integration. If we ever upstream to pastoriomarco/manymove, put
it behind a toggle so vanilla users do not pull medkit transitively.
…tall helper

Three pieces for BT action node instrumentation: fault_codes.hpp
(MANYMOVE_* constants), fault_reporting.hpp (the FaultReporting
capability class), and installFaultReporter in main_imports_helper.hpp
wired into all 15 BT entry-point mains.

Capability class instead of direct FaultReporter member because BT.CPP
factory constructs nodes with fixed signatures, so we pull the reporter
from the blackboard in the FaultReporting ctor (same pattern as `node_`
already fetched in every action node).
Adds FaultReporting as a second base class to every Action node across
planner/objects/signals/gripper/logic/isaac, plus the matching
ctor init list call. Condition and decorator nodes are intentionally
left out: their FAILURE return is normal control flow.
…austion

Four reportFault sites in MoveManipulatorAction:
 - kPlannerCollisionDetected on the onStart collision branch
 - kPlannerEstopTriggered when stop_execution is set before motion
 - kPlannerRetryAttempt on every failed attempt (soft fault, throttled
   locally by LocalFilter)
 - kPlannerRetriesExhausted once max_tries is reached

reportFaultPassed pairs kPlannerRetryAttempt in the success branch so
the local filter resets when a retry eventually succeeds.

Skipped the input/blackboard-missing FAILURE returns (lines 96, 147,
174 and ResetTrajectories@356); those are programmer errors caught at
startup, not operational faults.
…ait timeout

Five reportFault sites covering SetOutput/GetInput action failures,
CheckRobotState NOT_READY (CRITICAL), WaitForInput server-unavailable
and the wait-timeout path. reportFaultPassed pairs kRobotNotReady on
CheckRobotState success and kSignalWaitInputTimeout on the WaitForInput
desired-value branch so LocalFilter resets cleanly.
…timeout

Five reportFault sites cover the operational failures of
AddCollisionObjectAction, RemoveCollisionObjectAction,
AttachDetachObjectAction, GetObjectPoseAction and WaitForObjectAction.
WaitForObject pairs reportFaultPassed in the success branch.

CheckObjectExistsAction is intentionally not instrumented: its FAILURE
on \"object missing\" is normal control flow used by callers as a
condition, not an operational fault.
Four reportFault sites: GripperCommand server timeout and goal-not-reached,
GripperTraj server unavailable and trajectory-failed result.
PublishJointStateAction throws on bad inputs and never returns FAILURE,
so it has nothing to instrument.
GetLinkPoseAction reports kTfLookupFailed when tf2 raises a
TransformException. WaitForKeyBool reports kWaitKeyTimeout on the
timeout branch and pairs reportFaultPassed in the success path.

Adds kWaitKeyTimeout to fault_codes.hpp.

Configuration-error FAILUREs (missing inputs, wrong vector sizes) are
left out, as are condition-style nodes (CheckPoseDistance,
CheckPoseBounds, CheckKeyBoolValue) where FAILURE is expected control
flow.
Two reportFault sites in FoundationPoseAlignmentNode: detection topic
silent for too long, and no detection passing the target/min_score
filter within the timeout. Both raise kIsaacFoundationPoseFailed.

GetEntityPoseNode/SetEntityPoseNode service-error paths and Isaac TF
timeouts are left for the isaac demo work in Phase 3, where they'll
have specific fault codes once the demo wiring exposes them.
…ager

fake_fault_manager.hpp is a small fixture that hosts the
/fault_manager/report_fault service and records every accepted request,
matching the existing FakeXxxServer pattern in this test directory.

test_fault_reporting.cpp uses it to verify the end-to-end plumbing:
 - installFaultReporter puts a reporter on the blackboard
 - FaultReporting::reportFault forwards FAILED events with the right
   fault_code, severity, description and source_id (FQN of the BT
   client node)
 - reportFaultPassed forwards a PASSED event for healing
 - mixin silently no-ops when no reporter is installed (e.g. unit
   tests building a tree without bt_client scaffolding)

Per-instrumentation-site assertions are out of scope here; those would
need to drive every BT action node through real action server fakes
and belong with the demo work in Phase 3.
- docs/FAULT_CODES.md: full table of MANYMOVE_* codes with severity,
  trigger condition and pairing semantics, grouped by subsystem.
- README.md: short note pointing fork users at the medkit dependency
  and the catalogue.
- CHANGELOG.rst: entry under Forthcoming summarising the BT node
  rename, the hard medkit dep and the dropped topic_based_ros2_control
  declaration.
- ament_uncrustify --reformat run across the modified manymove_cpp_trees
  sources.
- Lint fixes for new files (cpplint header order / memory include /
  guard style; ament_copyright BSD-3 boilerplate match).

The remaining test_action_nodes_gripper failure is a pre-existing
convertFromString registration issue in upstream manymove (verified by
running the test from the parent commit before any medkit changes).
After wiring ros2_medkit_fault_reporter into manymove_cpp_trees the CI
job needs the medkit packages on the runner. Pulling them via apt from
rosdistro keeps the workflow simple (no source overlay clone).
- Wire kRobotResetFailed across ResetRobotStateAction failure branches
  (goal rejection + non-SUCCEEDED result for reset, unload-traj and
  load-traj steps).
- Instrument Isaac TF transform timeouts with kIsaacFoundationPoseFailed
  to match the sibling no-message and no-valid-detection timeouts.
- Replace silent FAILURE in GripperTrajAction::onRunning !goal_sent_
  with BT::RuntimeError, matching the programmer-error policy.
- Add reportFaultPassed on onHalted for WaitForInputAction and
  WaitForObjectAction so abandoned waits do not leave kSignalWaitInputTimeout
  / kObjectWaitTimeout lingering in FaultManager.
- Drop orphan codes kPlannerMissingMoveId (programmer-error path uses
  throw) and kObjectExistsCheckFailed (CheckObjectExists is intentionally
  not instrumented; FAILURE is normal control flow there).
- Pin medkit apt packages to MEDKIT_VERSION=0.4.0-1 so an upstream bump
  does not silently change CI behaviour.
- Drop humble from the CI matrix until ros-humble-ros2-medkit-* debs
  appear on packages.ros.org (rosdistro entry exists; debs do not yet).
- Update docs/FAULT_CODES.md: add kRobotResetFailed, expand the Isaac
  FoundationPose entry to cover TF timeouts, and document the
  reportFaultPassed-on-halt pairing for both wait codes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant