Skip to content

Harden tmux MCP tool surface#76

Merged
tony merged 30 commits into
mainfrom
fix/issues-68-75-mcp-hardening
Jun 13, 2026
Merged

Harden tmux MCP tool surface#76
tony merged 30 commits into
mainfrom
fix/issues-68-75-mcp-hardening

Conversation

@tony

@tony tony commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

  • block tmux format-job execution paths from readonly format inputs
  • preserve structured tool responses above realistic terminal payload sizes
  • fail closed on invalid safety tiers and pin startup to stdio transport
  • add run_command for command completion with exit status, history suppression, and output filtering
  • use libtmux's reset path for clear_pane
  • expose active pane and snapshot liveness metadata, ship py.typed, and refresh stale docs

Issues

Closes #68
Closes #69
Closes #70
Closes #71
Closes #72
Closes #73
Closes #74
Closes #75

Verification

$ rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run mypy .; uv run py.test --reruns 0 -vvv; just build-docs;

Passed locally: ruff clean, format unchanged, mypy clean, 577 tests passed, docs build succeeded.

tony added 9 commits June 13, 2026 06:46
why: search_panes is a readonly tool, but the tmux fast path embeds
literal patterns in a format filter. Patterns containing #() can reach
tmux's format job evaluator unless they take the Python search path.
what:
- Treat #( as unsafe for the tmux search fast path
- Add a NamedTuple fast-path regression case for format jobs
- Add a live marker regression proving #() does not spawn a job
why: display_message is registered as readonly, but tmux #() formats
schedule shell jobs. The MCP tool should preserve metadata queries
without handing shell-job formats to tmux.
what:
- Reject #() format jobs before pane resolution
- Keep normal tmux format variable expansion unchanged
- Add a marker regression for display_message format jobs
why: FastMCP validates structured output for schema-bearing tools.
A 50 KB global limiter truncated successful results into text-only
responses before tool-level caps could preserve structured metadata.
what:
- Raise the tail-preserving response backstop to 1 MB
- Keep small-cap middleware tests for truncation behavior
- Add a client-path regression for schema-bearing successes
why: Invalid LIBTMUX_SAFETY values are security configuration mistakes.
Falling back to mutating can expose write tools when the user intended a
restricted surface.
what:
- Resolve invalid safety values to readonly
- Make SafetyMiddleware direct invalid tiers fail closed
- Update safety text and add import-time visibility regression
why: Inherited FastMCP transport environment can change the server's startup
surface and break stdio MCP clients. Local CLI flags should also resolve
without starting the MCP server.
what:
- Parse --help and --version in the console entry point
- Run FastMCP with an explicit stdio transport
- Add regressions for CLI flags and transport pinning
why: Agents need a one-call terminal workflow that runs a command, waits for
completion, captures output, and reports shell exit status without composing
send_keys, wait_for_channel, and capture_pane manually.
what:
- Add RunCommandResult and the async run_command pane tool
- Register run_command as a mutating shell-driving tool
- Add tmux-backed regressions for status, timeout, truncation, and annotations
- Document the new tool and surface it in server instructions
why: clear_pane clears scrollback and should use libtmux's one-call reset path
instead of split tmux IPC calls. Its MCP annotations also need to disclose the
state-losing behavior.
what:
- Replace separate reset and clear-history calls with Pane.reset
- Register clear_pane with destructive non-idempotent hints in the mutating tier
- Add regressions for reset delegation and annotation metadata
why: Agents need window and pane result metadata for reliable follow-up
targeting and liveness decisions, and the package advertises typed support.
what:
- Add active_pane_id to WindowInfo serialization
- Add pane_pid, pane_dead, and alternate_on to PaneSnapshot
- Ship py.typed and regress stale completion and pagination docs claims
- Refresh README and response examples for the current tool and model surface
why: CI exposed that long bash prompts can wrap private run_command
synchronization commands, leaving private prompt tails after filtering and
splitting user command output from small max_lines tails.
what:
- Run the user command and private status/wait plumbing as one shell compound
  list
- Add a regression with a CI-shaped wrapped prompt and max_lines=2
@codecov-commenter

codecov-commenter commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.77686% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.07%. Comparing base (b8e9297) to head (78f4f11).

Files with missing lines Patch % Lines
src/libtmux_mcp/tools/pane_tools/io.py 81.17% 13 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   84.84%   85.07%   +0.22%     
==========================================
  Files          42       42              
  Lines        2772     2881     +109     
  Branches      372      385      +13     
==========================================
+ Hits         2352     2451      +99     
- Misses        314      322       +8     
- Partials      106      108       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony force-pushed the fix/issues-68-75-mcp-hardening branch 2 times, most recently from a39b4c0 to 1c43cb9 Compare June 13, 2026 12:43
@tony

tony commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. The new run_command tool's command argument is not redacted in the MCP audit log, so a call like run_command(command="psql -U admin -W supersecret") leaks the credential to long-lived audit archives. AuditMiddleware._summarize_args only digests argument values whose name is in _SENSITIVE_ARG_NAMES; everything else is logged in cleartext (truncated to 200 chars). command is missing from that set even though it is the same risk class as the shell payloads already listed (keys, text, content, shell, ...). This repeats the exact gap fixed for respawn_pane's shell in commit 9e1997b ("redact respawn shell payloads": "An agent that passes shell='psql -U user -W secret' would otherwise leak the credential to long-lived audit archives"). Fix: add "command" to _SENSITIVE_ARG_NAMES and its docstring. (historical git context)

#: Argument names that carry user-supplied payloads we never want in logs.
#: ``keys`` (send_keys), ``text`` (paste_text), ``value`` (set_environment),
#: ``content`` (load_buffer), ``shell`` (respawn_pane), and ``environment``
#: (respawn_pane) can contain commands, secrets, or arbitrary large strings.
#: Matched by exact name, case-sensitive, to mirror the tool signatures.
#:
#: ``environment`` is dict-shaped (``dict[str, str]``); the redaction logic
#: in :func:`_summarize_args` recognises this and digests each *value* while
#: leaving the *keys* (env var names like ``DATABASE_URL``) visible — env
#: var names are operator-debug-useful, but their values are the secret.
#: All other entries are scalar strings; mixing the two is intentional.
#:
#: Note on ``shell`` and ``environment`` redaction: this redacts the MCP
#: audit log only. ``respawn_pane(shell="env SECRET=... bash")`` and
#: ``environment={"AWS_SECRET_KEY": "..."}`` may briefly expose the values
#: via the OS process table and tmux's ``pane_current_command`` metadata
#: until the spawned shell takes over — see ``docs/topics/safety.md``.
_SENSITIVE_ARG_NAMES: frozenset[str] = frozenset(
{"keys", "text", "value", "content", "shell", "environment"}
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 7 commits June 13, 2026 08:34
why: run_command's command argument carries the same secret-bearing
shell payloads as keys, text, and shell, but it is not in
_SENSITIVE_ARG_NAMES, so AuditMiddleware logs it in cleartext to
long-lived audit archives.
what:
- Add an xfail regression asserting _summarize_args digests command
why: command carries the same secret-bearing shell payloads already
digested for keys, text, and shell; logging it in cleartext leaks
credentials passed to run_command into long-lived audit archives.
what:
- Add command to _SENSITIVE_ARG_NAMES and name it in the docstring
- Drop the xfail and cover command in the sensitive-keys regression
why: _filter_run_command_internal_lines drops any captured line
containing a generic substring (mcp_status, "tmux set-option -p",
"tmux wait-for -S", "libtmux_mcp_"), silently losing legitimate command
output that merely mentions them, with no signal in RunCommandResult.
what:
- Add an xfail regression keeping output lines that match the generic
  markers but not the per-call channel or status option
why: joining wrapped capture rows (tmux -J, as search and wait already
do) keeps the private sync line one logical row carrying the full
per-call channel and status option, so the over-broad substring markers
are unnecessary and only caused silent false drops of real output.
what:
- Capture with join_wrapped=True
- Match only the per-call channel and status option
- Drop the xfail and assert the sync line is dropped
why: run_command is registered with ANNOTATIONS_SHELL, making six shell
consumers, but the comment still said five and omitted it.
what:
- List run_command in the canonical shell-driving group and fix the count
why: the injection guard now also routes around ``#(`` (format job
execution), but the comment described only ``}`` and ``#{`` and said
"either sequence".
what:
- Document ``#(`` and reword to "any of these sequences"
why: Live MCP client verification showed wrapped private run_command sync
fragments could leak into later command output, even after valid wrapper-like
user output was preserved.

what:
- Filter exact run_command wrapper fragments across current and prior calls
- Shorten private status and wait markers to avoid 80-column orphan wraps
- Add regressions for wrapped and previous-call sync fragments
@tony

tony commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. The new drop_hex_continuation heuristic in _filter_run_command_internal_lines (commit 1e512b8) silently drops legitimate command output. The flag is armed after dropping any matched line — including via internal_markers, which always matches the current call's joined sync line — and the next line is then dropped if it is 8–32 chars of pure hex. Because an interactive shell echoes the typed payload before executing it, the line after the sync line is the command's first real output. So run_command("git rev-parse --short=10 HEAD"), openssl rand -hex 8 (16 hex), a 12-char docker id, or any short hash/UUID slice is dropped, returning exit_status=0 with missing output and no truncation signal. The hex-drop branch also continues without resetting the flag, so consecutive hex output lines are all dropped. This re-opens the silent false-drop class that commit 76e57a5 deliberately fixed ("the over-broad substring markers ... only caused silent false drops of real output"). Arming the continuation only on the status_line_re/wait_line_re (previous-call wrap) matches, and resetting after a single dropped line, would scope it to wrapped fragments without eating real output. (bug due to the loop below)

for line in lines:
stripped = line.strip()
if (
any(marker in line for marker in internal_markers)
or status_line_re.search(line)
or wait_line_re.search(line)
):
drop_hex_continuation = True
continue
if (
drop_hex_continuation
and 8 <= len(stripped) <= 32
and all(char in hex_chars for char in stripped)
):
continue
drop_hex_continuation = False
kept.append(line)
return kept

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 9 commits June 13, 2026 11:54
why: run_command should report exit status even when the user command mutates
shell state before the private completion trailer runs.

what:
- Add strict xfail coverage for PATH mutation before the trailer
- Add strict xfail coverage for errexit stopping the current shell list
why: run_command ran the private status trailer in the same interactive
shell state as the user command, so PATH changes and errexit could block
status reporting.

what:
- Run the user command in a subshell before reading its status
- Invoke private tmux status and wait commands with resolved argv
- Promote shell-state regressions from xfail to passing tests
why: run_command sends secret-bearing commands through the pane shell but
exposes no equivalent to send_keys suppress_history.

what:
- Add strict xfail coverage for bash HISTCONTROL ignorespace behavior
- Verify the command payload is absent after forcing history to disk
why: run_command command payloads can be persisted by interactive shell
history even though MCP audit logs redact the command argument.

what:
- Add a suppress_history option matching send_keys leading-space behavior
- Document the shell support caveat for run_command callers
- Promote the shell history regression to a passing test
why: run_command output filtering can still drop legitimate hex output and
tmux-like user text while hiding private synchronization fragments.

what:
- Add strict xfail coverage for hex output after the current sync line
- Add strict xfail coverage for user tmux status snippets
why: run_command filtering still treated some current-call sync matches as
wrapped fragments and matched broad tmux-looking user output.

what:
- Keep hex-like command output after the current sync line
- Match prior private wrapper fragments without generic @s_ overmatching
- Refresh stale filter comments and test descriptions
why: clear_pane now advertises destructive non-idempotent MCP annotations,
but the public safety table still listed the mutating defaults.

what:
- Mark clear-pane destructive=true in the annotation table
- Mark clear-pane idempotent=false to match runtime registration
why: the capture comment lost the load-bearing reason for join_wrapped —
it keeps the per-call markers on one logical row so the filter's
exact-marker match holds even when a wide prompt wraps.
what:
- Note that join_wrapped preserves the exact-marker match path
why: the docstring named HISTCONTROL (bash-specific) as the requirement,
implying zsh and other shells cannot suppress history when the
leading-space convention works wherever the shell ignores space-prefixed
commands.
what:
- Describe the generic requirement instead of the bash variable in
  send_keys and run_command
tony added 4 commits June 13, 2026 14:30
why: run_command now wraps the command in a subshell, so cd, export, and
other shell state changes no longer persist between calls; the docstring
and tool page implied execution in the live interactive shell.
what:
- State the subshell isolation in the run_command docstring and tool page
why: run_command can execute in a requested pane while storing its private
exit-status option on tmux's default pane when the target shell lacks
TMUX_PANE.

what:
- Add a strict xfail that targets an inactive pane with TMUX_PANE removed
- Clean up the temporary split pane so later pane tests keep their layout
why: run_command sends its private status trailer from inside the target
shell, where TMUX_PANE can be missing or wrong. Untargeted set-option can
store the exit status on tmux's active pane instead of the requested pane.

what:
- Pass the resolved pane id to shell-side set-option -p
- Teach the sync-line filter the targeted set-option wrapper shape
- Graduate the pane-target regression from strict xfail to passing
why: The or "" fallback is unreachable because target_pane_id is guaranteed
to be non-None earlier in the run_command function.

what:
- Update RunCommandResult instantiation in pane_tools/io.py to use
  target_pane_id
@tony tony force-pushed the fix/issues-68-75-mcp-hardening branch from 0965733 to 839b13e Compare June 13, 2026 19:55
why: the 0.1.x line hardens the read-only and safety surface and adds
run_command, but the unreleased CHANGES section had no entry for it.
what:
- Add What's new for run_command and typed pane/window metadata
- Add Fixes for format-job blocking, fail-closed safety, structured
  response preservation, clear_pane reset, and stdio transport pinning
@tony tony merged commit befeb73 into main Jun 13, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment