Skip to content

Error middleware masks the real tool error when the logging path itself fails #66

@tony

Description

@tony

Type: bug
Tier: resilience / diagnostics
Tool: all (middleware-wide)
Version: v0.1.0a11

Problem

When the server process outlives a rebuild of its own virtualenv (e.g. a uv sync / dependency bump under a long-lived stdio server), every tool call starts returning the same misleading error:

Internal error: No module named 'rich.traceback'

This message masks both the real failure and the remedy. An agent reading it cannot tell whether its arguments were wrong, the tool is broken, or a retry would help — and the actual remedy (reconnect the server so it re-resolves its environment) is nowhere in the message. In practice this took a whole session of a tool being silently unusable before the cause was understood.

Mechanism

The trigger is environmental, but the masking is a structural defect in the error middleware. In src/libtmux_mcp/middleware.py, ToolErrorResultMiddleware.on_call_tool (v0.1.0a11, lines 333-343) logs the error before converting it to a result:

async def on_call_tool(self, context, call_next):
    try:
        return await call_next(context)
    except Exception as error:
        self._log_error(error, context)          # ← can itself raise
        return _error_tool_result(error, context)

_log_error (lines 292-331) ends with an unguarded log call (lines 318-325):

self.logger.log(
    level,
    "Error in %s: %s: %s",
    method,
    error_type,
    error,
    exc_info=self.include_traceback,
)

When the active logging handler formats that record it lazily imports rich.traceback — fastmcp's RichHandler is built with rich_tracebacks=True and only pulls in rich.traceback the first time it renders an exc_info record (pinned fastmcp>=3.4.0,<4.0.0). If the package moved underneath the running process, that import raises ModuleNotFoundError, _log_error escapes the except, and _error_tool_result never runs. The stock fastmcp transform_errors path then re-wraps the logging exception as the -32603 "Internal error: …", discarding the original tool error entirely.

The asymmetry that pinpoints the gap: the error_callback invocation immediately below (lines 327-331) is already wrapped so a failing callback can't escape — the self.logger.log call on the success path of the same method is not.

Note this is independent of the environmental trigger: any failure inside the logging handler (not just rich.traceback) can replace a real tool error today.

Fix

Never let the error-logging path mask the result — guarantee on_call_tool always reaches _error_tool_result:

except Exception as error:
    with contextlib.suppress(Exception):
        self._log_error(error, context)
    return _error_tool_result(error, context)

Logging inside the suppress would be self-defeating when the handler itself is what's broken, so the suppress stays silent. This mirrors the existing defensive treatment of error_callback.

Secondary (separate follow-up, not required for the fix): fail fast instead of degrading silently — eagerly import the traceback dependency (or run a one-shot self-check) at server startup so a broken environment surfaces loudly at launch rather than on the first tool call. The handshake currently serves tool schemas and instructions fine, which masks the breakage until first execution.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions