fix(mcp): retry connect calls on transient grpc errs#1062
Conversation
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
There was a problem hiding this comment.
Pull request overview
This PR adds bounded retry behavior to MCP client connect() calls (sync + async) so that transient gRPC errors during workflow scheduling (CANCELLED, UNAVAILABLE) are retried within the caller-provided timeout budget, and updates the MCP client tests to cover these retry paths.
Changes:
- Add transient gRPC error classification and bounded retry loop around
schedule_new_workflow()in syncDaprMCPClient.connect(). - Add the equivalent retry logic to async
AioDaprMCPClient.connect(). - Add new unit tests validating retry success, non-transient propagation, and deadline exhaustion.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| ext/dapr-ext-workflow/dapr/ext/workflow/mcp.py | Implements transient-error classification + retry loop for sync MCP connect() scheduling, and adjusts remaining timeout passed to completion wait. |
| ext/dapr-ext-workflow/dapr/ext/workflow/aio/mcp.py | Implements the same retry-and-budget logic for async MCP connect(). |
| ext/dapr-ext-workflow/tests/test_mcp_client.py | Adds tests for retry-on-transient-gRPC-error behavior for both sync and async clients. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1062 +/- ##
==========================================
- Coverage 86.63% 82.66% -3.97%
==========================================
Files 84 146 +62
Lines 4473 14693 +10220
==========================================
+ Hits 3875 12146 +8271
- Misses 598 2547 +1949 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* fix(mcp): retry connect calls on transient grpc errs * style: comment cleanup * fix: address copilot feedback * style: appease linter --------- (cherry picked from commit 8571c3e) Signed-off-by: Samantha Coyle <sam@diagrid.io> Signed-off-by: dapr-bot <dapr-bot@users.noreply.github.com> Co-authored-by: Sam <sam@diagrid.io>
Description
This PR wraps the mcp connect calls (sync and async) with a bounded retry that absorbs CANCELLED or UNAVAILABLE grpc errs within the timeout budget specified. Any other err propagates.
Issue reference
We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.
Please reference the issue this PR will close: #[issue number]
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: