feat(agent-think): issue repro/fix agent on Workers + Containers#1861
Merged
Conversation
A Think agent that reproduces and fixes cloudflare/agents GitHub issues inside a container-backed @cloudflare/workspace VFS, triggered from an issue comment (@agent-think <instruction>) via a GitHub App webhook worker. Supersedes the CI-based /repro + /pr Actions from #1844 — same skills, but running as a persistent Worker + pre-warmed container instead of Actions runners. Architecture (patterns from aron/cloudflare-workspaces-prototype): - AgentThink WorkerEntrypoint: dispatch() RPC from the webhook worker; returns in ~1s (submitMessages only — container gh/git auth happens inside the durable turn via beforeTurn, so the caller's waitUntil cancellation window can never kill the run) - ThinkAgent DO owns the Workspace (SQLite VFS) + the durable turn; two exec backends: container (full Linux: gh/git/npm/node/wrangler) and just-bash isolate for cheap text ops - Sandbox DO hosts the Cloudflare Container; WarmPool DO keeps one pre-warmed and hands them out per session - live thread UI (Vite + React) at /thread/:session - skills (reproduce / open-pr) mounted read-only from R2; repros must ship a minimal Vite frontend so maintainers can click the deployed URL and watch the failing behavior in a UI The worker holds no GitHub App credentials — the webhook worker mints a short-lived installation token per dispatch. Note: requires the enable_abortsignal_rpc compat flag — the container backend's health probe passes an AbortSignal over cross-DO RPC. Verified end-to-end in prod on issue #1859: trigger comment to bot reply in 5s, 30-minute turn with real clone/install/deploy, structured repro report posted back on the issue.
|
- Replace HANDOFF.md (session-log style) with AGENTS.md: aims, how the system works, the rules we hold ourselves to, and the edge cases that cost real debugging time (abortsignal RPC flag, lossy tail, stale container reconcile, R2 skill seeding, Access, WARP builds). - Vitest configs move next to their suites (test/, tests-e2e/); root keeps only vite.config.ts (thread UI build). - Env files standardise on .env + .env_example (the e2e harness reads .env); drop .dev.vars.example and the stray root WARP pem copy. - Drop the vite alias workaround for agents/chat/react — the subpath export exists upstream now, plain resolution works.
@cloudflare/worker-bundler, ws, @types/ws: imported nowhere. @cloudflare/workers-types: redundant — tsconfig consumes the wrangler-generated worker-configuration.d.ts runtime types only. (isomorphic-git and @platformatic/vfs stay: optional peers of @cloudflare/workspace whose main entry — which we bundle — imports both; git.diff runs on isomorphic-git.)
- Commit worker-configuration.d.ts (wrangler types) and stop ignoring it — CI has no way to generate it, so the tsconfig types reference failed with TS2688 on a fresh checkout. - tsconfig extends agents/tsconfig (verbatimModuleSyntax et al.), with the types list overridden to the generated runtime file — keeping @cloudflare/workers-types alongside it would conflict. client.tsx now typechecks too (was outside the old include). - compatibility_date 2026-05-26 -> 2026-06-11 (repo standard), both configs; types regenerated against it. - write tool now takes the same per-file lock as edit: its stat-then-write mode preservation had the same interleaving window edit's read-modify-write guards against. Lock extracted to src/tools/fs/file-lock.ts.
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
The store already mkdir -p's the parent on every write; telling the model saves it a container exec mkdir round-trip first.
Model: openai/gpt-5.5 through the default AI Gateway's model catalog
(Unified Billing over the AI binding — no provider key). The
providers: [openai] plugin is required: workers-ai-provider refuses
{provider}/{model} slugs without it (verified empirically: text, tool
calls, and streaming all work with the plugin; raw env.AI.run works
either way but Think needs an AI SDK LanguageModel).
Command center: the root URL is now a dashboard run by a singleton
CommandCenterAgent (synced-state registry of every thread + counters).
ThinkAgent reports dispatch/tool/turn events fire-and-forget —
observing must never break a run. The UI gains a ChatGPT-style left
sidebar listing threads reverse-chronologically, live over agents
state sync; /thread/:session renders inside the same shell. The old
plain-text root banner is gone (root serves the SPA, with a worker
fallback where asset-first routing is not emulated).
Main screen leads with per-repo cards (name, github link, issue/status counts) per the wireframe; the sidebar gains a search filter and a ChatGPT-style Recents treatment. Sidebar persists across thread navigation (unchanged).
The repo#issue idempotency key silently swallowed re-mentions: once an issue's first turn completed, submitMessages returned the old submission (accepted:false) and nothing ran. The key now includes the triggering commentId (passed by gh-app); dev dispatches without one get a random key. Webhook redeliveries are already deduped in gh-app's KV before dispatch, so nothing is lost.
Log (never throw) when a lifecycle report fails, and emit one structured line per registry update — silent-success and silent-failure were indistinguishable in the logs.
CI has no vite output (dist/client is not committed), so the root-route test read an empty body. The test config now points ASSETS at a committed fixture with the SPA root node.
Cloudflare Access on the domain passes authenticated HTTP but eats WebSocket upgrades (zero WS ever reached the worker — the thread view only worked via useAgentChat's HTTP get-messages polling). Plain useAgent state sync has no such fallback, so the command center rendered empty. GET /api/command-center returns the registry snapshot; the client hydrates from it and polls while the WS is not connected.
ThreadMeta carries the GitHub issue title and who mentioned
@agent-think (login + avatar). Activity rows and the sidebar show the
title; the requester's avatar sits on each row with a hover tooltip
('login: instruction'). Both flow from the webhook payload through
dispatch; old threads without the fields fall back to the instruction.
…es died at the assets router The assets layer forwards ordinary no-asset-match requests to the worker but not WebSocket upgrades, so every wss:// connect to /agents/* failed while plain HTTP worked — which is why the command center sat on the HTTP fallback and showed 'disconnected'. (Corrects the earlier Access diagnosis; Access passes authenticated WS fine.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #1844 — the same
reproduce/open-prskills, but as a single persistent Think agent on Workers + Cloudflare Containers instead of CI Actions runners.What this is
@agent-think <instruction>on any issue (e.g. "reproduce this issue", "open a PR fixing this") dispatches a run that works in a real Linux container (gh/git/npm/node/wrangler), streams a live thread UI, and reports back on the issue as the agent-think GitHub App.Model:
openai/gpt-5.5through the default AI Gateway's model catalog (Unified Billing over the AI binding, no provider key;providers: [openai]plugin required for catalog slugs — verified for text, tool calls, and streaming). Turn idempotency is per triggering comment, so re-mentions on an issue start fresh turns while RPC retries stay deduped.Patterns follow aron/cloudflare-workspaces-prototype (hackspace): the Agent DO owns the Workspace; compute is a separate warm-pooled Sandbox DO dialed per-connect.
Why not the #1844 shape
submitMessages(idempotency key =repo#issue), survives DO eviction, and both verbs on an issue share one workspace/thread.examples/*house style): maintainers click the deployedworkers.devURL, press Trigger bug, and watch expected-vs-actual in the page. Deploys usewrangler deploy --temporarywith a claimable preview account.Notes for reviewers
enable_abortsignal_rpccompat flag is required — the container backend's health probe passes anAbortSignalover cross-DO RPC (documented inwrangler.jsonc).agent-think/AGENTS.mdhas the aims, architecture, self-imposed rules, and the edge cases that cost real debugging time.team-apps) and holds all App credentials; this package is public-safe by construction.Verified in prod
Green end-to-end run on #1859 (2026-07-02): trigger comment → 👀 + "🧠 on it" in 5s → 30-min durable turn (clone,
npm install, repro project, real temporary-account deploy) → structured repro report posted by agent-think[bot] with live URL + claim link.🤖 Generated with Claude Code