Skip to content

feat(mcp): adaptive tool retrieval (relevance-gated MCP)#50

Merged
electron-rare merged 24 commits into
masterfrom
feat/mcp-adaptive-retrieval
Jun 2, 2026
Merged

feat(mcp): adaptive tool retrieval (relevance-gated MCP)#50
electron-rare merged 24 commits into
masterfrom
feat/mcp-adaptive-retrieval

Conversation

@electron-rare

Copy link
Copy Markdown

Problem

ISAAC sends every registered tool schema on every request. With the default plugin MCP set loaded that is ~95 tools (~41k tokens/turn). Measured: at 95 tools the worker (Qwen3-Coder-30B Q4) intermittently streams a well-formed but empty required array (read_file {"paths":[]}) → validation fails → retry loop. At ~25 tools: 0 empty-array events. Tool-list bloat is the proven primary cause (and the dominant per-turn latency driver).

Approach

Replace "inject all ~70 MCP tools every request" with relevance-gated injection, reusing the registry's existing contextRequirements(ctx)=>boolean filter (no new mechanism):

  • MCP tool specs carry contextRequirements: ctx => ctx.activeMcpTools === undefined || ctx.activeMcpTools.has(name).
  • A session ActiveMcpToolSet is seeded from the first user prompt (cosine top-K over local MiniLM embeddings — @huggingface/transformers, all-MiniLM-L6-v2 ONNX — cap K=8, threshold τ=0.3; the base set may be empty) and grown on demand by a new always-on find_tools(query) native tool.
  • Native tools are never gated. Tool vectors are embedded once and cached on disk (~/.dirac/mcp-tool-vectors.json).
  • --no-mcp stays fully off; --mcp a,b scopes the candidate pool; new --mcp-top-k / --mcp-threshold tuning flags.
  • Fail-safe: embedder failure → empty active set (native-only), never the 95-tool flood. Seed runs on both task start and resume. The set is cleared to empty on task teardown (no cross-task leak).

Also includes a sharper corrective message on missing/empty required params (independent guard).

Evidence

  • Embedder verified at runtime: MiniLM loads + embeds (384-d, ~3.9s first call incl. download).
  • e2e: 0 empty-array events with retrieval ON (vs 2/run at 95 tools) — feature goal met.
  • Full suite 1385 passing / 0 failing; new unit tests for cosine/config/embedder/index/active-set/gate/find_tools.
  • Critic review: ACCEPT-WITH-RESERVATIONS → CRITICAL (resume flood) + MAJOR (index crash on partial embed) fixed.

⚠️ Merge blockers / caveats

  1. Supply-chain HITL (BLOCKER): @huggingface/transformers@3.3.3 (pinned) + the all-MiniLM-L6-v2 ONNX weights must be mirrored into the ailiance org before merge per policy. Runtime must load from the vendored/cached path, not the public HF CDN.
  2. Worker chat-template 400 ("roles must alternate") observed in e2e on worker 8002 — a Jinja/worker-config issue independent of this code; validate completion-rate on a stable worker before declaring the latency/UX win.
  3. Tech-debt (non-blocking, documented): process-global singleton (cleared on teardown; subagents share the parent's set by design); transformers.js pulls its own @img/sharp-libvips (~25 MB native; duplicate-lib warning, benign for text feature-extraction — consider npm dedupe).

Spec: docs/superpowers/specs/2026-06-01-mcp-adaptive-retrieval-design.md · Plan: docs/superpowers/plans/2026-06-01-mcp-adaptive-retrieval.md

🤖 Generated with Claude Code

Design for replacing all-MCP-tools-every-request (~95 tools, 41k tokens,
proven cause of empty-array tool calls) with relevance-gated injection:
local MiniLM embedding, per-session base set (cap K + threshold tau),
on-demand find_tools(query) meta-tool, native tools always-on.
Generic 'retry with complete response' let the model repeat an empty
required array; the message now explicitly names the empty-array mistake
and instructs to re-issue with the value populated.
Copilot AI review requested due to automatic review settings June 2, 2026 06:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the "send all ~95 MCP tool schemas every request" behavior with relevance-gated injection. MCP tool specs now carry a contextRequirements gate keyed off a session-scoped ActiveMcpToolSet, which is seeded on task start/resume from the first user prompt via a local MiniLM embedder (transformers.js) and grown on demand via a new always-on find_tools native tool. Includes fail-safe degradation, on-disk vector cache, CLI tuning flags, and a sharper missing-parameter error message.

Changes:

  • New src/core/mcp/retrieval/ module: cosine, Embedder, ToolVectorIndex, ActiveMcpToolSet, config, session plus unit tests.
  • New find_tools native tool (spec + handler + enum + registration); mcpToolToSpec gated via ctx.activeMcpTools; seed/clear hooks in Task.startTask/resumeTaskFromHistory/abortTask; subagent + ApiRequestHandler propagate the snapshot.
  • Pinned @huggingface/transformers@3.3.3 (with onnxruntime/sharp transitive deps), new --mcp-top-k/--mcp-threshold CLI flags, esbuild externals for native bindings, sharper missingToolParameterError, and design/plan docs.

Reviewed changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/core/mcp/retrieval/* New retrieval module (cosine, embedder, index, active set, config, session) + tests
src/core/mcp/bootstrap.ts Adds contextRequirements gate on MCP specs and bootstraps embedder/index/active-set with fail-safe
src/core/task/index.ts Seeds active set on start/resume; clears on abort; first-user-text extraction helper
src/core/task/ApiRequestHandler.ts Threads activeMcpTools snapshot into SystemPromptContext
src/core/task/tools/subagent/SubagentRunner.ts Subagent inherits parent's active set snapshot
src/core/task/tools/handlers/FindToolsToolHandler.ts (+test) New handler invoking expand and reporting activated tools
src/core/task/tools/ToolExecutorCoordinator.ts Registers FindToolsToolHandler
src/core/prompts/system-prompt/{types.ts,tools/find_tools.ts,tools/init.ts} Adds context field, new spec, registration
src/core/prompts/system-prompt/tests/snapshots/*.snap Snapshot updates for find_tools across providers
src/core/prompts/responses.ts Sharper missing/empty-required-param message + minor formatting
src/shared/tools.ts Adds FIND_TOOLS enum member
cli/src/index.ts New --mcp-top-k / --mcp-threshold flags on both commands
cli/esbuild.mts Marks @huggingface/transformers and onnxruntime-node as externals
package.json / package-lock.json Pin @huggingface/transformers@3.3.3 (+ transitive sharp/onnxruntime)
docs/superpowers/{specs,plans}/2026-06-01-mcp-adaptive-retrieval-*.md Design + implementation plan

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@electron-rare

Copy link
Copy Markdown
Author

Tech-debt (c) — sharp/libvips duplicate: investigated, deferred to the dependency-mirroring step

Update to the caveat in the description (the earlier "consider npm dedupe" suggestion is not viable):

  • Root cause: the repo uses sharp@0.34.5 (→ @img/sharp-libvips-darwin-arm64@1.2.4) while @huggingface/transformers@3.3.3 pins sharp@^0.33.5 (→ libvips 1.0.4). Two incompatible majors → two native libvips load → the macOS objc duplicate-class warning.
  • npm dedupe cannot merge them (incompatible majors). A scoped overrides forcing transformers' sharp to 0.34.5 was tried and leaves the lockfile in an invalid state (npm keeps 0.33.5 and flags it) — it does not dedupe and is a risky cross-major force on a third-party's pinned dep.
  • Impact: cosmetic for this feature. We only use the text feature-extraction path; sharp (image processing) is never invoked. The embedder is verified working (384-d output, ~3.9s first call) despite the warning.
  • Proper fix belongs to the supply-chain mirroring task (blocker V1: native function-calling in eu-kiki workers #1): when vendoring @huggingface/transformers into the ailiance org, drop/patch its image deps (sharp) since text-only inference doesn't need them, or align both to a single sharp at mirror time. No safe lockfile-only fix exists pre-mirror.

@electron-rare

Copy link
Copy Markdown
Author

Follow-up on the two merge gates (2026-06-02)

(2) Worker "roles must alternate" 400 — diagnosed + validated.
Root cause: the 400 came from the omlx fallback (worker 8500), not the primary FC worker 8002. Gateway logs show 8002 was unreachable during the e2e (Worker 8002 unreachable 21:17/21:27), so the FC force-route fell back to 8500, whose chat template requires strict user/assistant alternation and rejects the agent's assistant(tool_calls) → tool → user sequence. normalize_message_roles (gateway inference_defaults.py:338) intentionally preserves tool-use messages as structural boundaries, so it does not (and should not) flatten them — 8002's --jinja template handles them; omlx's does not.

  • Validated: with 8002 up, e2e is 2/2 COMPLETED, 0 empty-array, 0 chat-template-400 (multi-file read+summarize). The agent tool flow works end to end.
  • Durable angle: 8002 also 400'd earlier on 41039 tokens > ctx 40960 — i.e. the 95-tool bloat overflowed its context and contributed to the instability. This feature reduces that (relevance-gated prompts stay well under 40k), so it lessens the 8002 overflow/flap that triggers the bad fallback.
  • Known gateway limitation (separate follow-up, NOT this PR): the omlx (8500) FC fallback cannot serve multi-turn tool-calling sequences (template incompatibility). Hardening it (backend-specific message transform, or excluding omlx from FC_CAPABLE_PORTS for tool turns) is gateway work tracked separately; rushing a tool-message transform risks degrading FC quality.

(1) Supply-chain mirror — prepared for HITL (per policy, not auto-executed).
The ailiance dependency policy is explicit ("HITL before any fork/push; no pushes"), so I did not push weights autonomously. Concrete progress instead:

  • Runtime is now mirror-ready: Embedder.ts reads AILIANCE_EMBED_MODEL (point at the mirror) and AILIANCE_EMBED_OFFLINE=1 (forbids HF CDN fetch). Default unchanged; no code change needed post-mirror. (commit 1e98344)
  • Runbook added: docs/superpowers/runbooks/2026-06-02-mcp-retrieval-supply-chain-mirror.md — model mirror to Ailiance-fr/all-MiniLM-L6-v2, npm vendoring (with sharp trim that resolves tech-debt (c)), offline wiring, SBOM/audit steps.

Both gates now have a clear, low-risk path to closure.

Mirror Ailiance-fr/all-MiniLM-L6-v2 created from upstream Xenova snapshot
(sha 751bff37 -> mirror 5cbc3683). npm transformers@3.3.3 pin-only per org
strategy (no vendor). Embedder verified loading from the mirror.
@electron-rare electron-rare merged commit d31e7f6 into master Jun 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants