Harden tmux MCP tool surface by tony · Pull Request #76 · tmux-python/libtmux-mcp

tony · 2026-06-13T12:25:03Z

Summary

block tmux format-job execution paths from readonly format inputs
preserve structured tool responses above realistic terminal payload sizes
fail closed on invalid safety tiers and pin startup to stdio transport
add run_command for command completion with exit status, history suppression, and output filtering
use libtmux's reset path for clear_pane
expose active pane and snapshot liveness metadata, ship py.typed, and refresh stale docs

Issues

Closes #68
Closes #69
Closes #70
Closes #71
Closes #72
Closes #73
Closes #74
Closes #75

Verification

$ rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run mypy .; uv run py.test --reruns 0 -vvv; just build-docs;

Passed locally: ruff clean, format unchanged, mypy clean, 577 tests passed, docs build succeeded.

why: search_panes is a readonly tool, but the tmux fast path embeds literal patterns in a format filter. Patterns containing #() can reach tmux's format job evaluator unless they take the Python search path. what: - Treat #( as unsafe for the tmux search fast path - Add a NamedTuple fast-path regression case for format jobs - Add a live marker regression proving #() does not spawn a job

why: display_message is registered as readonly, but tmux #() formats schedule shell jobs. The MCP tool should preserve metadata queries without handing shell-job formats to tmux. what: - Reject #() format jobs before pane resolution - Keep normal tmux format variable expansion unchanged - Add a marker regression for display_message format jobs

why: FastMCP validates structured output for schema-bearing tools. A 50 KB global limiter truncated successful results into text-only responses before tool-level caps could preserve structured metadata. what: - Raise the tail-preserving response backstop to 1 MB - Keep small-cap middleware tests for truncation behavior - Add a client-path regression for schema-bearing successes

why: Invalid LIBTMUX_SAFETY values are security configuration mistakes. Falling back to mutating can expose write tools when the user intended a restricted surface. what: - Resolve invalid safety values to readonly - Make SafetyMiddleware direct invalid tiers fail closed - Update safety text and add import-time visibility regression

why: Inherited FastMCP transport environment can change the server's startup surface and break stdio MCP clients. Local CLI flags should also resolve without starting the MCP server. what: - Parse --help and --version in the console entry point - Run FastMCP with an explicit stdio transport - Add regressions for CLI flags and transport pinning

why: Agents need a one-call terminal workflow that runs a command, waits for completion, captures output, and reports shell exit status without composing send_keys, wait_for_channel, and capture_pane manually. what: - Add RunCommandResult and the async run_command pane tool - Register run_command as a mutating shell-driving tool - Add tmux-backed regressions for status, timeout, truncation, and annotations - Document the new tool and surface it in server instructions

why: clear_pane clears scrollback and should use libtmux's one-call reset path instead of split tmux IPC calls. Its MCP annotations also need to disclose the state-losing behavior. what: - Replace separate reset and clear-history calls with Pane.reset - Register clear_pane with destructive non-idempotent hints in the mutating tier - Add regressions for reset delegation and annotation metadata

why: Agents need window and pane result metadata for reliable follow-up targeting and liveness decisions, and the package advertises typed support. what: - Add active_pane_id to WindowInfo serialization - Add pane_pid, pane_dead, and alternate_on to PaneSnapshot - Ship py.typed and regress stale completion and pagination docs claims - Refresh README and response examples for the current tool and model surface

why: CI exposed that long bash prompts can wrap private run_command synchronization commands, leaving private prompt tails after filtering and splitting user command output from small max_lines tails. what: - Run the user command and private status/wait plumbing as one shell compound list - Add a regression with a CI-shaped wrapped prompt and max_lines=2

codecov-commenter · 2026-06-13T12:36:57Z

Codecov Report

❌ Patch coverage is 86.77686% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.07%. Comparing base (b8e9297) to head (78f4f11).

Files with missing lines	Patch %	Lines
src/libtmux_mcp/tools/pane_tools/io.py	81.17%	13 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   84.84%   85.07%   +0.22%     
==========================================
  Files          42       42              
  Lines        2772     2881     +109     
  Branches      372      385      +13     
==========================================
+ Hits         2352     2451      +99     
- Misses        314      322       +8     
- Partials      106      108       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tony · 2026-06-13T12:55:36Z

Code review

Found 1 issue:

The new run_command tool's command argument is not redacted in the MCP audit log, so a call like run_command(command="psql -U admin -W supersecret") leaks the credential to long-lived audit archives. AuditMiddleware._summarize_args only digests argument values whose name is in _SENSITIVE_ARG_NAMES; everything else is logged in cleartext (truncated to 200 chars). command is missing from that set even though it is the same risk class as the shell payloads already listed (keys, text, content, shell, ...). This repeats the exact gap fixed for respawn_pane's shell in commit 9e1997b ("redact respawn shell payloads": "An agent that passes shell='psql -U user -W secret' would otherwise leak the credential to long-lived audit archives"). Fix: add "command" to _SENSITIVE_ARG_NAMES and its docstring. (historical git context)

libtmux-mcp/src/libtmux_mcp/middleware.py

Lines 350 to 369 in 1c43cb9

    
           #: Argument names that carry user-supplied payloads we never want in logs. 
        
           #: ``keys`` (send_keys), ``text`` (paste_text), ``value`` (set_environment), 
        
           #: ``content`` (load_buffer), ``shell`` (respawn_pane), and ``environment`` 
        
           #: (respawn_pane) can contain commands, secrets, or arbitrary large strings. 
        
           #: Matched by exact name, case-sensitive, to mirror the tool signatures. 
        
           #: 
        
           #: ``environment`` is dict-shaped (``dict[str, str]``); the redaction logic 
        
           #: in :func:`_summarize_args` recognises this and digests each *value* while 
        
           #: leaving the *keys* (env var names like ``DATABASE_URL``) visible — env 
        
           #: var names are operator-debug-useful, but their values are the secret. 
        
           #: All other entries are scalar strings; mixing the two is intentional. 
        
           #: 
        
           #: Note on ``shell`` and ``environment`` redaction: this redacts the MCP 
        
           #: audit log only. ``respawn_pane(shell="env SECRET=... bash")`` and 
        
           #: ``environment={"AWS_SECRET_KEY": "..."}`` may briefly expose the values 
        
           #: via the OS process table and tmux's ``pane_current_command`` metadata 
        
           #: until the spawned shell takes over — see ``docs/topics/safety.md``. 
        
           _SENSITIVE_ARG_NAMES: frozenset[str] = frozenset( 
        
               {"keys", "text", "value", "content", "shell", "environment"} 
        
           )

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

why: run_command's command argument carries the same secret-bearing shell payloads as keys, text, and shell, but it is not in _SENSITIVE_ARG_NAMES, so AuditMiddleware logs it in cleartext to long-lived audit archives. what: - Add an xfail regression asserting _summarize_args digests command

why: command carries the same secret-bearing shell payloads already digested for keys, text, and shell; logging it in cleartext leaks credentials passed to run_command into long-lived audit archives. what: - Add command to _SENSITIVE_ARG_NAMES and name it in the docstring - Drop the xfail and cover command in the sensitive-keys regression

why: _filter_run_command_internal_lines drops any captured line containing a generic substring (mcp_status, "tmux set-option -p", "tmux wait-for -S", "libtmux_mcp_"), silently losing legitimate command output that merely mentions them, with no signal in RunCommandResult. what: - Add an xfail regression keeping output lines that match the generic markers but not the per-call channel or status option

why: joining wrapped capture rows (tmux -J, as search and wait already do) keeps the private sync line one logical row carrying the full per-call channel and status option, so the over-broad substring markers are unnecessary and only caused silent false drops of real output. what: - Capture with join_wrapped=True - Match only the per-call channel and status option - Drop the xfail and assert the sync line is dropped

why: run_command is registered with ANNOTATIONS_SHELL, making six shell consumers, but the comment still said five and omitted it. what: - List run_command in the canonical shell-driving group and fix the count

why: the injection guard now also routes around ``#(`` (format job execution), but the comment described only ``}`` and ``#{`` and said "either sequence". what: - Document ``#(`` and reword to "any of these sequences"

why: Live MCP client verification showed wrapped private run_command sync fragments could leak into later command output, even after valid wrapper-like user output was preserved. what: - Filter exact run_command wrapper fragments across current and prior calls - Shorten private status and wait markers to avoid 80-column orphan wraps - Add regressions for wrapped and previous-call sync fragments

tony · 2026-06-13T16:37:19Z

Code review

Found 1 issue:

The new drop_hex_continuation heuristic in _filter_run_command_internal_lines (commit 1e512b8) silently drops legitimate command output. The flag is armed after dropping any matched line — including via internal_markers, which always matches the current call's joined sync line — and the next line is then dropped if it is 8–32 chars of pure hex. Because an interactive shell echoes the typed payload before executing it, the line after the sync line is the command's first real output. So run_command("git rev-parse --short=10 HEAD"), openssl rand -hex 8 (16 hex), a 12-char docker id, or any short hash/UUID slice is dropped, returning exit_status=0 with missing output and no truncation signal. The hex-drop branch also continues without resetting the flag, so consecutive hex output lines are all dropped. This re-opens the silent false-drop class that commit 76e57a5 deliberately fixed ("the over-broad substring markers ... only caused silent false drops of real output"). Arming the continuation only on the status_line_re/wait_line_re (previous-call wrap) matches, and resetting after a single dropped line, would scope it to wrapped fragments without eating real output. (bug due to the loop below)

libtmux-mcp/src/libtmux_mcp/tools/pane_tools/io.py

Lines 297 to 314 in 1e512b8

    
           for line in lines: 
        
               stripped = line.strip() 
        
               if ( 
        
                   any(marker in line for marker in internal_markers) 
        
                   or status_line_re.search(line) 
        
                   or wait_line_re.search(line) 
        
               ): 
        
                   drop_hex_continuation = True 
        
                   continue 
        
               if ( 
        
                   drop_hex_continuation 
        
                   and 8 <= len(stripped) <= 32 
        
                   and all(char in hex_chars for char in stripped) 
        
               ): 
        
                   continue 
        
               drop_hex_continuation = False 
        
               kept.append(line) 
        
           return kept

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

why: run_command should report exit status even when the user command mutates shell state before the private completion trailer runs. what: - Add strict xfail coverage for PATH mutation before the trailer - Add strict xfail coverage for errexit stopping the current shell list

why: run_command ran the private status trailer in the same interactive shell state as the user command, so PATH changes and errexit could block status reporting. what: - Run the user command in a subshell before reading its status - Invoke private tmux status and wait commands with resolved argv - Promote shell-state regressions from xfail to passing tests

why: run_command sends secret-bearing commands through the pane shell but exposes no equivalent to send_keys suppress_history. what: - Add strict xfail coverage for bash HISTCONTROL ignorespace behavior - Verify the command payload is absent after forcing history to disk

why: run_command command payloads can be persisted by interactive shell history even though MCP audit logs redact the command argument. what: - Add a suppress_history option matching send_keys leading-space behavior - Document the shell support caveat for run_command callers - Promote the shell history regression to a passing test

why: run_command output filtering can still drop legitimate hex output and tmux-like user text while hiding private synchronization fragments. what: - Add strict xfail coverage for hex output after the current sync line - Add strict xfail coverage for user tmux status snippets

why: run_command filtering still treated some current-call sync matches as wrapped fragments and matched broad tmux-looking user output. what: - Keep hex-like command output after the current sync line - Match prior private wrapper fragments without generic @s_ overmatching - Refresh stale filter comments and test descriptions

why: clear_pane now advertises destructive non-idempotent MCP annotations, but the public safety table still listed the mutating defaults. what: - Mark clear-pane destructive=true in the annotation table - Mark clear-pane idempotent=false to match runtime registration

why: the capture comment lost the load-bearing reason for join_wrapped — it keeps the per-call markers on one logical row so the filter's exact-marker match holds even when a wide prompt wraps. what: - Note that join_wrapped preserves the exact-marker match path

why: the docstring named HISTCONTROL (bash-specific) as the requirement, implying zsh and other shells cannot suppress history when the leading-space convention works wherever the shell ignores space-prefixed commands. what: - Describe the generic requirement instead of the bash variable in send_keys and run_command

why: run_command now wraps the command in a subshell, so cd, export, and other shell state changes no longer persist between calls; the docstring and tool page implied execution in the live interactive shell. what: - State the subshell isolation in the run_command docstring and tool page

why: run_command can execute in a requested pane while storing its private exit-status option on tmux's default pane when the target shell lacks TMUX_PANE. what: - Add a strict xfail that targets an inactive pane with TMUX_PANE removed - Clean up the temporary split pane so later pane tests keep their layout

why: run_command sends its private status trailer from inside the target shell, where TMUX_PANE can be missing or wrong. Untargeted set-option can store the exit status on tmux's active pane instead of the requested pane. what: - Pass the resolved pane id to shell-side set-option -p - Teach the sync-line filter the targeted set-option wrapper shape - Graduate the pane-target regression from strict xfail to passing

why: The or "" fallback is unreachable because target_pane_id is guaranteed to be non-None earlier in the run_command function. what: - Update RunCommandResult instantiation in pane_tools/io.py to use target_pane_id

why: the 0.1.x line hardens the read-only and safety surface and adds run_command, but the unreleased CHANGES section had no entry for it. what: - Add What's new for run_command and typed pane/window metadata - Add Fixes for format-job blocking, fail-closed safety, structured response preservation, clear_pane reset, and stdio transport pinning

tony added 9 commits June 13, 2026 06:46

tony force-pushed the fix/issues-68-75-mcp-hardening branch 2 times, most recently from a39b4c0 to 1c43cb9 Compare June 13, 2026 12:43

tony added 7 commits June 13, 2026 08:34

mcp(docs[utils]): Count run_command among ANNOTATIONS_SHELL tools

64366ec

why: run_command is registered with ANNOTATIONS_SHELL, making six shell consumers, but the comment still said five and omitted it. what: - List run_command in the canonical shell-driving group and fix the count

mcp(docs[search]): Note #( in format-injection comment

978413c

why: the injection guard now also routes around ``#(`` (format job execution), but the comment described only ``}`` and ``#{`` and said "either sequence". what: - Document ``#(`` and reword to "any of these sequences"

tony added 9 commits June 13, 2026 11:54

tony added 4 commits June 13, 2026 14:30

mcp(style[pane]): Use target_pane_id in RunCommandResult

839b13e

why: The or "" fallback is unreachable because target_pane_id is guaranteed to be non-None earlier in the run_command function. what: - Update RunCommandResult instantiation in pane_tools/io.py to use target_pane_id

tony force-pushed the fix/issues-68-75-mcp-hardening branch from 0965733 to 839b13e Compare June 13, 2026 19:55

tony merged commit befeb73 into main Jun 13, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden tmux MCP tool surface#76

Harden tmux MCP tool surface#76
tony merged 30 commits into
mainfrom
fix/issues-68-75-mcp-hardening

tony commented Jun 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 13, 2026 •

edited

Loading

Uh oh!

tony commented Jun 13, 2026

Uh oh!

tony commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tony commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues

Verification

Uh oh!

codecov-commenter commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tony commented Jun 13, 2026

Code review

Uh oh!

tony commented Jun 13, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tony commented Jun 13, 2026 •

edited

Loading

codecov-commenter commented Jun 13, 2026 •

edited

Loading