Type: bug
Tier: resilience / diagnostics
Tool: all (middleware-wide)
Version: v0.1.0a11
Problem
When the server process outlives a rebuild of its own virtualenv (e.g. a uv sync / dependency bump under a long-lived stdio server), every tool call starts returning the same misleading error:
Internal error: No module named 'rich.traceback'
This message masks both the real failure and the remedy. An agent reading it cannot tell whether its arguments were wrong, the tool is broken, or a retry would help — and the actual remedy (reconnect the server so it re-resolves its environment) is nowhere in the message. In practice this took a whole session of a tool being silently unusable before the cause was understood.
Mechanism
The trigger is environmental, but the masking is a structural defect in the error middleware. In src/libtmux_mcp/middleware.py, ToolErrorResultMiddleware.on_call_tool (v0.1.0a11, lines 333-343) logs the error before converting it to a result:
async def on_call_tool(self, context, call_next):
try:
return await call_next(context)
except Exception as error:
self._log_error(error, context) # ← can itself raise
return _error_tool_result(error, context)
_log_error (lines 292-331) ends with an unguarded log call (lines 318-325):
self.logger.log(
level,
"Error in %s: %s: %s",
method,
error_type,
error,
exc_info=self.include_traceback,
)
When the active logging handler formats that record it lazily imports rich.traceback — fastmcp's RichHandler is built with rich_tracebacks=True and only pulls in rich.traceback the first time it renders an exc_info record (pinned fastmcp>=3.4.0,<4.0.0). If the package moved underneath the running process, that import raises ModuleNotFoundError, _log_error escapes the except, and _error_tool_result never runs. The stock fastmcp transform_errors path then re-wraps the logging exception as the -32603 "Internal error: …", discarding the original tool error entirely.
The asymmetry that pinpoints the gap: the error_callback invocation immediately below (lines 327-331) is already wrapped so a failing callback can't escape — the self.logger.log call on the success path of the same method is not.
Note this is independent of the environmental trigger: any failure inside the logging handler (not just rich.traceback) can replace a real tool error today.
Fix
Never let the error-logging path mask the result — guarantee on_call_tool always reaches _error_tool_result:
except Exception as error:
with contextlib.suppress(Exception):
self._log_error(error, context)
return _error_tool_result(error, context)
Logging inside the suppress would be self-defeating when the handler itself is what's broken, so the suppress stays silent. This mirrors the existing defensive treatment of error_callback.
Secondary (separate follow-up, not required for the fix): fail fast instead of degrading silently — eagerly import the traceback dependency (or run a one-shot self-check) at server startup so a broken environment surfaces loudly at launch rather than on the first tool call. The handshake currently serves tool schemas and instructions fine, which masks the breakage until first execution.
References
Type: bug
Tier: resilience / diagnostics
Tool: all (middleware-wide)
Version: v0.1.0a11
Problem
When the server process outlives a rebuild of its own virtualenv (e.g. a
uv sync/ dependency bump under a long-lived stdio server), every tool call starts returning the same misleading error:This message masks both the real failure and the remedy. An agent reading it cannot tell whether its arguments were wrong, the tool is broken, or a retry would help — and the actual remedy (reconnect the server so it re-resolves its environment) is nowhere in the message. In practice this took a whole session of a tool being silently unusable before the cause was understood.
Mechanism
The trigger is environmental, but the masking is a structural defect in the error middleware. In
src/libtmux_mcp/middleware.py,ToolErrorResultMiddleware.on_call_tool(v0.1.0a11, lines 333-343) logs the error before converting it to a result:_log_error(lines 292-331) ends with an unguarded log call (lines 318-325):When the active logging handler formats that record it lazily imports
rich.traceback— fastmcp'sRichHandleris built withrich_tracebacks=Trueand only pulls inrich.tracebackthe first time it renders anexc_inforecord (pinnedfastmcp>=3.4.0,<4.0.0). If the package moved underneath the running process, that import raisesModuleNotFoundError,_log_errorescapes theexcept, and_error_tool_resultnever runs. The stock fastmcptransform_errorspath then re-wraps the logging exception as the-32603"Internal error: …", discarding the original tool error entirely.The asymmetry that pinpoints the gap: the
error_callbackinvocation immediately below (lines 327-331) is already wrapped so a failing callback can't escape — theself.logger.logcall on the success path of the same method is not.Note this is independent of the environmental trigger: any failure inside the logging handler (not just
rich.traceback) can replace a real tool error today.Fix
Never let the error-logging path mask the result — guarantee
on_call_toolalways reaches_error_tool_result:Logging inside the suppress would be self-defeating when the handler itself is what's broken, so the suppress stays silent. This mirrors the existing defensive treatment of
error_callback.Secondary (separate follow-up, not required for the fix): fail fast instead of degrading silently — eagerly import the traceback dependency (or run a one-shot self-check) at server startup so a broken environment surfaces loudly at launch rather than on the first tool call. The handshake currently serves tool schemas and instructions fine, which masks the breakage until first execution.
References
src/libtmux_mcp/middleware.py@ v0.1.0a11 —ToolErrorResultMiddleware.on_call_tool(333-343),_log_error(292-331; unguardedlogger.logat 318-325; guardederror_callbackat 327-331).RichHandlerconfiguration withrich_tracebacksand the lazyrich.tracebackimport on firstexc_inforender (fastmcp>=3.4.0,<4.0.0).is_errorpreservation, argument-schema classification, unrecognized-argument suggestions) — those refine classification, none guard the logging call — nor by Compose / bulk operations: run multiple actions in one MCP tool call (start with send_keys) #49, Observation-first architecture: eliminate agent lockout from blocking wait tools #61, MCP tool calls fail: invalid JSON when passing session_name / session_id (Cursor agent) #17, or wait_for_text: two-call state/capture race within a single poll tick #50-search_panes: visual-row capture misses wrap-spanning patterns #55.