Skip to content

feat(agent-think): issue repro/fix agent on Workers + Containers#1861

Merged
mattzcarey merged 15 commits into
mainfrom
feat/agent-think
Jul 3, 2026
Merged

feat(agent-think): issue repro/fix agent on Workers + Containers#1861
mattzcarey merged 15 commits into
mainfrom
feat/agent-think

Conversation

@mattzcarey

@mattzcarey mattzcarey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Supersedes #1844 — the same reproduce / open-pr skills, but as a single persistent Think agent on Workers + Cloudflare Containers instead of CI Actions runners.

What this is

@agent-think <instruction> on any issue (e.g. "reproduce this issue", "open a PR fixing this") dispatches a run that works in a real Linux container (gh/git/npm/node/wrangler), streams a live thread UI, and reports back on the issue as the agent-think GitHub App.

@agent-think reproduce this issue        (issue comment)
   │  issue_comment webhook → gh-app (private worker: verifies sig, member-gates,
   │  mints short-lived installation token, reacts 👀, RPCs dispatch)
   ▼
agent-think (this package — holds NO GitHub App credentials)
   ├─ AgentThink WorkerEntrypoint — dispatch() returns in ~1s (durable submit only)
   ├─ ThinkAgent DO — owns the Workspace (SQLite VFS) + the durable turn;
   │    container gh/git auth runs inside the turn (beforeTurn), not in dispatch
   ├─ Sandbox DO — container host; WarmPool DO keeps one pre-warmed
   ├─ CommandCenterAgent DO — singleton registry (threads + counters);
   │    ThinkAgent reports lifecycle events fire-and-forget
   └─ UI: `/` command center (metrics, per-repo cards, ChatGPT-style thread
        sidebar, live over agents state sync) · /thread/:session thread view

Model: openai/gpt-5.5 through the default AI Gateway's model catalog (Unified Billing over the AI binding, no provider key; providers: [openai] plugin required for catalog slugs — verified for text, tool calls, and streaming). Turn idempotency is per triggering comment, so re-mentions on an issue start fresh turns while RPC retries stay deduped.

Patterns follow aron/cloudflare-workspaces-prototype (hackspace): the Agent DO owns the Workspace; compute is a separate warm-pooled Sandbox DO dialed per-connect.

Why not the #1844 shape

  • No Actions runner lifecycle: the turn runs via Think's durable submitMessages (idempotency key = repo#issue), survives DO eviction, and both verbs on an issue share one workspace/thread.
  • Webhook → running agent in ~5s; a pre-warmed container skips boot cost.
  • Every repro ships a minimal Vite + React frontend (skill enforces the exact 7-file recipe, matching the examples/* house style): maintainers click the deployed workers.dev URL, press Trigger bug, and watch expected-vs-actual in the page. Deploys use wrangler deploy --temporary with a claimable preview account.

Notes for reviewers

  • enable_abortsignal_rpc compat flag is required — the container backend's health probe passes an AbortSignal over cross-DO RPC (documented in wrangler.jsonc).
  • agent-think/AGENTS.md has the aims, architecture, self-imposed rules, and the edge cases that cost real debugging time.
  • The gh-app webhook worker lives in internal GitLab (team-apps) and holds all App credentials; this package is public-safe by construction.

Verified in prod

Green end-to-end run on #1859 (2026-07-02): trigger comment → 👀 + "🧠 on it" in 5s → 30-min durable turn (clone, npm install, repro project, real temporary-account deploy) → structured repro report posted by agent-think[bot] with live URL + claim link.

🤖 Generated with Claude Code

A Think agent that reproduces and fixes cloudflare/agents GitHub issues
inside a container-backed @cloudflare/workspace VFS, triggered from an
issue comment (@agent-think <instruction>) via a GitHub App webhook
worker. Supersedes the CI-based /repro + /pr Actions from #1844 — same
skills, but running as a persistent Worker + pre-warmed container
instead of Actions runners.

Architecture (patterns from aron/cloudflare-workspaces-prototype):
- AgentThink WorkerEntrypoint: dispatch() RPC from the webhook worker;
  returns in ~1s (submitMessages only — container gh/git auth happens
  inside the durable turn via beforeTurn, so the caller's waitUntil
  cancellation window can never kill the run)
- ThinkAgent DO owns the Workspace (SQLite VFS) + the durable turn;
  two exec backends: container (full Linux: gh/git/npm/node/wrangler)
  and just-bash isolate for cheap text ops
- Sandbox DO hosts the Cloudflare Container; WarmPool DO keeps one
  pre-warmed and hands them out per session
- live thread UI (Vite + React) at /thread/:session
- skills (reproduce / open-pr) mounted read-only from R2; repros must
  ship a minimal Vite frontend so maintainers can click the deployed
  URL and watch the failing behavior in a UI

The worker holds no GitHub App credentials — the webhook worker mints
a short-lived installation token per dispatch.

Note: requires the enable_abortsignal_rpc compat flag — the container
backend's health probe passes an AbortSignal over cross-DO RPC.

Verified end-to-end in prod on issue #1859: trigger comment to bot
reply in 5s, 30-minute turn with real clone/install/deploy, structured
repro report posted back on the issue.
@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 233afbf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

Open in Devin Review

Comment thread agent-think/wrangler.jsonc Outdated
Comment thread agent-think/test/wrangler.jsonc Outdated
Comment thread agent-think/src/tools/fs/tools/write.ts Outdated
Comment thread agent-think/tsconfig.json
- Replace HANDOFF.md (session-log style) with AGENTS.md: aims, how the
  system works, the rules we hold ourselves to, and the edge cases that
  cost real debugging time (abortsignal RPC flag, lossy tail, stale
  container reconcile, R2 skill seeding, Access, WARP builds).
- Vitest configs move next to their suites (test/, tests-e2e/); root
  keeps only vite.config.ts (thread UI build).
- Env files standardise on .env + .env_example (the e2e harness reads
  .env); drop .dev.vars.example and the stray root WARP pem copy.
- Drop the vite alias workaround for agents/chat/react — the subpath
  export exists upstream now, plain resolution works.
@cloudflare/worker-bundler, ws, @types/ws: imported nowhere.
@cloudflare/workers-types: redundant — tsconfig consumes the
wrangler-generated worker-configuration.d.ts runtime types only.
(isomorphic-git and @platformatic/vfs stay: optional peers of
@cloudflare/workspace whose main entry — which we bundle — imports
both; git.diff runs on isomorphic-git.)
- Commit worker-configuration.d.ts (wrangler types) and stop ignoring
  it — CI has no way to generate it, so the tsconfig types reference
  failed with TS2688 on a fresh checkout.
- tsconfig extends agents/tsconfig (verbatimModuleSyntax et al.), with
  the types list overridden to the generated runtime file — keeping
  @cloudflare/workers-types alongside it would conflict. client.tsx now
  typechecks too (was outside the old include).
- compatibility_date 2026-05-26 -> 2026-06-11 (repo standard), both
  configs; types regenerated against it.
- write tool now takes the same per-file lock as edit: its
  stat-then-write mode preservation had the same interleaving window
  edit's read-modify-write guards against. Lock extracted to
  src/tools/fs/file-lock.ts.
@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1861

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1861

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1861

create-think

npm i https://pkg.pr.new/create-think@1861

hono-agents

npm i https://pkg.pr.new/hono-agents@1861

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1861

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1861

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1861

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1861

commit: 233afbf

The store already mkdir -p's the parent on every write; telling the
model saves it a container exec mkdir round-trip first.
Model: openai/gpt-5.5 through the default AI Gateway's model catalog
(Unified Billing over the AI binding — no provider key). The
providers: [openai] plugin is required: workers-ai-provider refuses
{provider}/{model} slugs without it (verified empirically: text, tool
calls, and streaming all work with the plugin; raw env.AI.run works
either way but Think needs an AI SDK LanguageModel).

Command center: the root URL is now a dashboard run by a singleton
CommandCenterAgent (synced-state registry of every thread + counters).
ThinkAgent reports dispatch/tool/turn events fire-and-forget —
observing must never break a run. The UI gains a ChatGPT-style left
sidebar listing threads reverse-chronologically, live over agents
state sync; /thread/:session renders inside the same shell. The old
plain-text root banner is gone (root serves the SPA, with a worker
fallback where asset-first routing is not emulated).
Main screen leads with per-repo cards (name, github link, issue/status
counts) per the wireframe; the sidebar gains a search filter and a
ChatGPT-style Recents treatment. Sidebar persists across thread
navigation (unchanged).
The repo#issue idempotency key silently swallowed re-mentions: once an
issue's first turn completed, submitMessages returned the old
submission (accepted:false) and nothing ran. The key now includes the
triggering commentId (passed by gh-app); dev dispatches without one get
a random key. Webhook redeliveries are already deduped in gh-app's KV
before dispatch, so nothing is lost.
Log (never throw) when a lifecycle report fails, and emit one
structured line per registry update — silent-success and silent-failure
were indistinguishable in the logs.
CI has no vite output (dist/client is not committed), so the root-route
test read an empty body. The test config now points ASSETS at a
committed fixture with the SPA root node.
Cloudflare Access on the domain passes authenticated HTTP but eats
WebSocket upgrades (zero WS ever reached the worker — the thread view
only worked via useAgentChat's HTTP get-messages polling). Plain
useAgent state sync has no such fallback, so the command center
rendered empty. GET /api/command-center returns the registry snapshot;
the client hydrates from it and polls while the WS is not connected.
ThreadMeta carries the GitHub issue title and who mentioned
@agent-think (login + avatar). Activity rows and the sidebar show the
title; the requester's avatar sits on each row with a hover tooltip
('login: instruction'). Both flow from the webhook payload through
dispatch; old threads without the fields fall back to the instruction.
…es died at the assets router

The assets layer forwards ordinary no-asset-match requests to the
worker but not WebSocket upgrades, so every wss:// connect to
/agents/* failed while plain HTTP worked — which is why the command
center sat on the HTTP fallback and showed 'disconnected'. (Corrects
the earlier Access diagnosis; Access passes authenticated WS fine.)
@mattzcarey mattzcarey merged commit d1ce3cb into main Jul 3, 2026
4 checks passed
@mattzcarey mattzcarey deleted the feat/agent-think branch July 3, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant