release: v0.3.3 — runtime reliability (timeout, worktree isolation, digest async, --org)#810
Closed
agents-squads[bot] wants to merge 41 commits into
Closed
release: v0.3.3 — runtime reliability (timeout, worktree isolation, digest async, --org)#810agents-squads[bot] wants to merge 41 commits into
agents-squads[bot] wants to merge 41 commits into
Conversation
…1/7] (#731) * refactor(core): run engine decomposition, context helpers, squad parser improvements Core runtime refactoring from v0.3.0 development cycle: - run-context.ts: expanded context helpers for goal injection, feedback, state - run-modes.ts: simplified run modes, removed per-squad limits - run-types.ts: added conversation_agents field type - execution-engine.ts: phase-ordered execution, role-based context - agent-runner.ts: bot identity injection, guardrail hooks, tool sets - squad-parser.ts: findProjectRoot, skills loading, dynamic discovery - env-config.ts: environment URL resolution additions Original commits: ~25 from develop (refactors, type fixes, context system updates) Backup tag: pre-v0.3.0-backup Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review — configurable cred path, use parseAgentFrontmatter, fix staleness calc - execution-engine.ts: GCP credential path now configurable via SQUADS_GCP_CREDENTIALS_DIR env var (was hardcoded ~/.squads/secrets/). Use parseAgentFrontmatter() instead of fragile regex for model detection. - run-context.ts: Replace magic number 86400000 with MS_PER_DAY constant, use Math.floor instead of Math.round for staleness calculation. Co-Authored-By: Claude <noreply@anthropic.com> * fix(types): add model field to AgentFrontmatter interface Typecheck failed because parseAgentFrontmatter() returns AgentFrontmatter which didn't include the model property. Co-Authored-By: Claude <noreply@anthropic.com> * fix(lint): remove unused imports in agent-runner — SOFT_DEADLINE_RATIO, preflightExecutorCheck, pushCognitionSignal, findMemoryDir, timeoutMins Co-Authored-By: Claude <noreply@anthropic.com> * fix(lint): remove all 20 unused variable warnings across 9 files Cleaned up unused imports and variables flagged by eslint: - agent-runner.ts: DEFAULT_TIMEOUT_MINUTES, bold, gradient - scorecard-engine.ts: readFileSync - org-cycle.ts: logObservability, ObservabilityRecord - outcomes.ts: prefixed unmergedPRs with _ - repo-enforcement.ts: resolve - run-context.ts: removed unused readDirMd function + readdirSync - run-modes.ts: spawn, getProjectRoot, checkLocalCooldown, DEFAULT_SCHEDULED_COOLDOWN_MS, saveTranscript, reportExecutionStart, reportConversationResult, getBridgeUrl, ora - run-utils.ts: findMemoryDir - squad-loop.ts: Squad type Zero warnings remaining. Zero type errors. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…v0.3.0 — 2/7] (#732) * feat(run): workflow rewrite — smart skip, org cycle, wave execution, focus/resume Run engine and workflow rewrite from v0.3.0 development cycle. Fixes applied from Gemini Code Assist review: - HIGH: task directive now includes planPrompt context (was bypassed) - HIGH: converged reflects actual status (was forced true) - MEDIUM: setTimeout cleared on close/error (resource leak) - MEDIUM: skip logic query limit bumped to 500 - MEDIUM: fallback assigns ALL workers, not just first - Added CLI_RUN_COMPLETE telemetry event - Removed unused imports (dirname, homedir, bold) Co-Authored-By: Claude <noreply@anthropic.com> * fix(test): add findProjectRoot to squad-parser mock in workflow tests Co-Authored-By: Claude <noreply@anthropic.com> * fix(test): findProjectRoot mock should use mockReturnValue (sync, not async) findProjectRoot() returns string|null, not a Promise. Co-Authored-By: Claude <noreply@anthropic.com> * fix(test): update workflow tests for spawn-based agent execution workflow.ts now uses spawn instead of execSync. Updated test mocks: - Added createMockChild helper for spawn-based child processes - Added appendFileSync to fs mock - Added observability mock (snapshotGoals, diffGoals, logObservability) - All 16 tests pass Co-Authored-By: Claude <noreply@anthropic.com> * fix: remove hardcoded squad names from org cycle waves Wave definitions had our internal squad names (research, intelligence, cli, marketing, etc.) hardcoded. A user's squads would never match. Now: all planned squads run in a single parallel wave. Custom wave ordering can be added later via SQUAD.md `wave:` field. Co-Authored-By: Claude <noreply@anthropic.com> * fix: remove hardcoded git commit of .agents/memory/ between waves Auto-committing hq memory between waves was our internal pattern, not a product feature. Users won't have .agents/memory/ in their project root. Removed. Co-Authored-By: Claude <noreply@anthropic.com> * refactor: extract plan prompt to templates/prompts/plan.md "No prompts in code" — behavioral instructions live in markdown. Extracted the inline planPrompt template string to a markdown file with {{VARIABLE}} placeholders. TypeScript loads and substitutes. Also: squadContext is now included in the template (was passed as empty string, losing goals/priorities context). Co-Authored-By: Claude <noreply@anthropic.com> * fix(lint): remove unused execSync import from run.ts No longer needed after removing hardcoded git commit between waves. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* feat(conversation): agents talk + use tools, cognition engine, convergence Conversation mode rewrite and cognition engine from v0.3.0 cycle: - conversation.ts: Rewritten so agents talk AND use tools (was text-only). Parallel same-role agents within cycles. Hard-stop on lead completion. Squad cwd resolution for all agent turns. Transcript serialization fixes. Agent classification by name first, then role description. - cognition.ts: Local-first intelligence engine. Quality grading. Escalation pause for daemon. Signal synthesis via Claude CLI. Push memory signals after daemon cycles. Co-Authored-By: Claude <noreply@anthropic.com> * refactor: remove cognition.ts changes from this PR Cognition engine is not actively used (post-pivot, daemon is stopped). Changes parked in future/cognition-t2 branch for Tier 2 reactivation. This PR now only contains conversation.ts changes. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
… — 4/7] (#734) * feat(commands): add review, credentials, goals, log commands + minor fixes New commands: - credentials.ts: per-squad GCP service account management - goals.ts: goals dashboard with status tracking - log.ts: run history with timestamps, duration, status - review.ts: post-cycle evaluation dashboard Fixes applied: - Added CLI_LOG telemetry event - Removed unused imports (writeFileSync, formatRelativeTime) - Removed unused variables (blockedStr, achievedStr) - Fixed hardcoded org name in review.ts issue URL resolution Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review on credentials.ts - Use static renameSync import instead of dynamic import('fs') - Remove redundant --all handling (dedicated create-all command exists) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(credentials): remove hardcoded squad names, read config from SQUAD.md credentials.ts had our internal squad names and GCP roles hardcoded. Now fully agnostic: - Permissions read from SQUAD.md `credentials.gcp.roles/apis` fields - Squads discovered dynamically from squads directory - No hardcoded squad names, org names, or internal structure - Helpful error message shows users how to configure their SQUAD.md - create-all discovers squads with GCP config automatically Co-Authored-By: Claude <noreply@anthropic.com> * test(credentials): add 8 tests for SQUAD.md GCP credentials parser Extracted parseGcpCredentials() as pure function for testability. Tests cover: inline YAML, quoted values, multiple APIs, missing config, empty content, roles without apis, mixed SQUAD.md content. All 8 pass. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…735) * feat(init): demo agent scaffold, what's next guidance, email capture Init UX improvements from v0.3.0 cycle: - "What's next" guidance after init with actionable next steps - Opt-in email capture for product updates - Demo squad scaffold with hello-world starter agent - IDP catalog seeding for agent frontmatter schemas - Competitor collection during init - Hints for empty business description - cli.run.complete telemetry event Co-Authored-By: Claude <noreply@anthropic.com> * fix(test): update E2E to expect 5 squads (4 core + demo) Init now creates a demo squad with hello-world agent. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
… — 6/7] (#736) * feat(security): PreToolUse guardrail hooks for spawned agent sessions guardrail.json template injected into all spawned Claude sessions. Prevents agents from running destructive commands, force-pushing, publishing packages, or accessing secrets directly. Co-Authored-By: Claude <noreply@anthropic.com> * fix(security): add npm/yarn/pnpm publish to guardrail blocked commands Gemini review caught missing publish checks. Agents should never publish packages — that requires founder approval. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…#737) * test+docs: coverage + tier 2 docs + version bump to 0.3.0 Tests added (213 new tests): - catalog.test.ts: catalog command tests - dashboard.test.ts: dashboard engine, renderers, loader tests - services.test.ts: services command tests - first-run.e2e.test.ts: updated for demo squad scaffold - guardrail.test.ts: guardrail hook tests - init.test.ts: expanded init command tests - telemetry.test.ts: telemetry event tests Docs: - docs/tier2.md: Tier 2 architecture documentation Version: - package.json: bump to 0.3.0 Note: cli.test.ts failures are pre-existing on develop (not introduced by this PR). Co-Authored-By: Claude <noreply@anthropic.com> * docs: remove tier2.md — internal architecture, not product docs Hardcoded our repo structure, ports, service names. Belongs in private engineering repo, not the public CLI. Co-Authored-By: Claude <noreply@anthropic.com> * test: replace mock-heavy tests with real integration tests Before: 2,299 lines mocking fs, squad-parser, child_process, etc. Testing mocks, not the product. False confidence. After: 465 lines testing real files on real filesystem. - catalog: real IDP directory with YAML files - dashboard: zero mocks, real data structures into renderers - services: real docker-compose.yml in temp dir - init: real temp directory, verify actual files created 39 tests, all passing. 80% less code, 100% more real coverage. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* fix(services): make agnostic — remove hardcoded paths and internal assumptions Before: searched for docker-compose.yml in ~/agents-squads/engineering/docker/ and hardcoded squads-postgres container name, internal DB table names. Now: - Discovers docker-compose.yml from project root, ./docker/, ./infra/, SQUADS_COMPOSE_FILE env var, or --file flag - Uses docker compose ps against user's compose file - Removed hardcoded port output and DB introspection - --file option on all 3 subcommands (up/down/status) - Health check verifies containers are actually running - Updated tests to match new agnostic implementation Co-Authored-By: Claude <noreply@anthropic.com> * test(services): update tests for agnostic services command - Use SQUADS_COMPOSE_FILE env var instead of hardcoded engineering path - Check --file option on all subcommands - Fix health check mock to return 'running' state - Updated status test for Docker not installed case Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…0.3.0] (#739) * fix(telemetry): restore write-only API key — telemetry broken since March 14 Commit 6261882 removed the telemetry key and replaced it with an env var that no user has set. Result: zero telemetry events since ~March 14. Write-only analytics keys are standard practice (Segment, PostHog, Mixpanel all ship them in public code). The key can only write events; it cannot read, delete, or access any data. Users can still opt out. Closes #388 (GitHub Traffic API — this restores our primary data signal) Co-Authored-By: Claude <noreply@anthropic.com> * fix: use plain string for telemetry key, drop base64 obfuscation Gemini review: base64 encoding adds no security and reduces transparency. Plain string is honest — it's a write-only key, nothing to hide. Co-Authored-By: Claude <noreply@anthropic.com> * fix: lock telemetry key — no env var override Telemetry goes to our infrastructure only. No reason to let users redirect it. They can opt out, but not redirect. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
….0] (#740) * fix(run): UX improvements — prerequisites check, no-args squad list, schedule hint (#675, #694, #695) - Add checkPrerequisites() validating Node >= 18 and Claude CLI before run - Show available squads with missions when `squads run` invoked without args - Display scheduling tip after first successful squad run (persisted in ~/.squads-cli/schedule-hint-shown) Co-Authored-By: Claude <noreply@anthropic.com> * fix: skip prerequisites check in CI/test environments checkPrerequisites() called process.exit(1) when Claude CLI not found, killing the test runner. Now skips when CI or VITEST env vars are set. Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review — remove redundant CLI check, fix cron hint, cleanup - Removed redundant Claude CLI check (preflightExecutorCheck handles it) - Removed non-existent --cron flag from schedule hint - Removed unused runAutopilot import (replaced by squad listing) - Added VITEST to skip conditions Co-Authored-By: Claude <noreply@anthropic.com> * fix(lint): remove unused execSync import Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
Tags matching v<semver>-<suffix> (e.g., v0.3.0-rc.1) publish to @next and mark the GitHub Release as pre-release. Clean semver tags (v0.3.0) continue publishing to @latest. Enables a burn-in channel for major releases — users opt in with `npm i squads-cli@next` before we promote to @latest. Co-Authored-By: Claude <noreply@anthropic.com>
) * fix(workflow): role-based timeouts + anti-collision rules in plan prompt Two root causes of poor org run quality: 1. Workers timed out at 8 minutes (hardcoded) — can't complete real work like creating PRs, running BQ queries, or writing reports. Now role-based: scanners 10min, verifiers 15min, leads+workers 30min. 2. Multiple squads created duplicate deliverables (e.g., both ops and cli tried to create the v0.3.0 release PR). Plan prompt now includes explicit rules: only work on YOUR goals, check depends_on before acting, verify before creating, no PII on public repos. Closes #742 (partially — timeout portion) Co-Authored-By: Claude <noreply@anthropic.com> * fix: use DEFAULT_TIMEOUT_MINUTES + SQUADS_AGENT_TIMEOUT_MINUTES env var No hardcoded values. Timeout comes from: 1. SQUADS_AGENT_TIMEOUT_MINUTES env var (user override) 2. DEFAULT_TIMEOUT_MINUTES from run-types.ts (30 min) Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review — timeout declaration order + dependency check instructions - workflow.ts: move timeout declaration before event handlers (no-use-before-define) - plan.md: specify how to check depends_on (read goals.md status field, use gh CLI) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…nfig [v0.3.0] (#749) * fix(audit): remove hardcoded values, extract prompts, parameterize config 5 audit findings remediated: 1. tier-detect.ts: use getApiUrl/getBridgeUrl from env-config 2. agent-runner/workflow/run-modes: replace company-lead string match with frontmatter role 3. cognition.ts: parameterize company name via SQUADS_COMPANY_NAME 4. run-modes.ts: extract lead prompt to templates/prompts/lead-mode.md 5. lead-orchestrator.ts: extract orchestrator prompt to templates/prompts/orchestrator.md Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review — use replaceAll for template tags - lead-orchestrator.ts: {{WORKERS}} now uses regex for consistency - run-modes.ts: all template tags use replaceAll() for multi-occurrence safety Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* feat: add project-level config system (.squads/config.yml) Centralizes runtime settings (agent timeout, token budget, cost ceiling, company name, compose file, telemetry) into a single project config file with env var > config file > constant default resolution order. - New: src/lib/config.ts — loader with minimal YAML parser, no deps - New: templates/config.example.yml — ships with package - Updated: workflow.ts reads token_budget + cost_ceiling from config - Updated: cognition.ts reads company_name from config (was hardcoded) - Updated: services.ts reads compose_file from config - Updated: telemetry.ts checks config for telemetry opt-out - Updated: init.ts generates .squads/config.yml + gitignore entry Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini review — YAML parser, gitignore check, config resolution - config.ts: allow uppercase YAML keys (normalized to lowercase), fix comment stripping for quoted values and comment-only values - init.ts: exact line match for gitignore entry (not substring) - services.ts: remove redundant env var check, use loadProjectConfig() as single config source Co-Authored-By: Claude <noreply@anthropic.com> * test(services): reset config cache in beforeEach Config cache held a stale null compose_file across tests, so the env-var override case failed because earlier tests had already cached the unset state. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* feat(templates): evaluation-first goals + add growth squad Every non-demo starter squad now ships with a first-run "Squad evaluation" goal so `squads run <squad>` produces deliverable output on first invocation: audit the domain against BUSINESS_BRIEF.md and output a baseline report with top priorities. Adds a new `growth` squad (4 agents — growth-lead, funnel-analyst, experiment-runner, growth-critic) distinct from marketing: growth owns AARRR funnel, experiments, and kills vanity metrics. Marketing creates content, growth measures and distributes. Growth exposed via: - Use-case option in `squads init` - `--pack growth` flag - Included in `--pack all` - Included in `full-company` use case Closes #751 * fix: address Gemini review — marketing dep + use-case + state files - growth use case now includes getMarketingSquad() (declared dependency) - --pack processing updates selectedUseCase so getFirstRunCommand suggests the right first agent (e.g. growth-lead instead of always research/lead) - --pack growth now also installs marketing (dependency) - Added initial state.md for funnel-analyst, experiment-runner, growth-critic so their first-run Read() calls do not fail --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Pre-release candidate for v0.3.0. Will publish to @next dist-tag via release.yml (tag matches v<semver>-<suffix> pattern). Users can test with: npm i -g squads-cli@next Promotes to @latest after burn-in by tagging main with v0.3.0. Co-Authored-By: Claude <noreply@anthropic.com>
Both publish.yml (manual) and release.yml (tag-triggered) passed
NODE_AUTH_TOKEN: \${{ secrets.NPM_TOKEN }} to npm publish, which npm
prefers over OIDC. With a stale NPM_TOKEN, publishes failed 404 and
OIDC was never attempted.
Changes:
- Remove NODE_AUTH_TOKEN from both publish steps — npm falls back to OIDC
via the trusted publisher already configured on npmjs.com
- Upgrade Node to 22 and install npm@latest so npm >= 11.5.1 is used
(required for OIDC trusted publisher authentication)
- publish.yml: detect pre-release dist-tag from package.json version
(matches release.yml behavior) so rc versions go to @next, not @latest
Closes #754
Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
…0 + token) Conflicts arose because: - main shipped at 0.3.0 via #743 - develop bumped to 0.3.0-rc.1 for @next burn-in - develop replaced release.yml NPM_TOKEN with OIDC trusted publishing Resolution: take develop's side for all three files. Publishing 0.3.0-rc.1 to @next is the intended path, and OIDC replaces the stale NPM_TOKEN that caused the original 404.
…ange) (#778) First phase of "Chief = governed Claude-with-the-CLI" (hq#418). The Agent Contract is the typed, git-versioned definition of what each agent may do — formalizing what SQUAD.md + agent frontmatter + the 7-layer run-context already imply, plus the missing governance fields, with a hard default:deny. - src/lib/agent-contract.ts: AgentContract type; deriveContract (pure — maps existing role/context_from/budget/timeout + run-context role budgets/layers, adds tool_grants{read|write|consequential}, autonomy, hitl_gate, write_scope, credential_scope, resource_ceiling, workspace_id, evaluator with conservative role defaults); validateContract; isEnforceableTool (allowedTools vocabulary). - src/commands/contract.ts: `squads contract validate [--squad] [--json]` — derives + validates every agent, non-zero exit on any violation (CI/pre-commit gate for the repo holding the definitions). - run-context.ts: export ROLE_BUDGETS/ROLE_SECTIONS (single source of truth). - test/agent-contract.test.ts: 17 tests — role derivation, real-shaped lead, and rejection of unenforceable/unjailed-write/ungated-consequential/unbounded. - docs/agent-contract.md: field → Agent SDK primitive mapping (the P1 target). Verified: `squads contract validate` → 112/112 real hq agents valid (exit 0, no runtime change); over-scoped fixtures fail; full suite 1792 tests + lint green. P0 is schema + validation only — enforcement (SDK permission callback) is P1. Closes #777 Tracks: agents-squads/hq#418 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
…779) P1 of chief-cli-runtime (hq#418), per the founder's 2026-05-29 direction: keep agent capability broad, catch the dangerous OUTCOMES with hooks rather than a tight tool allowlist. The highest-leverage leak is an agent auto-committing a credential/PII into a (possibly public) repo, so guard that first. - src/lib/secret-scan.ts: scanText/scanDiff for high-confidence credential shapes (Anthropic/OpenAI/GitHub/Slack/AWS/Google/Stripe keys, private keys), Chilean RUT, secret-named quoted literals, and an operator-controlled name/codename denylist (.agents/config/forbidden-strings.txt, gitignored). Scans only ADDED diff lines; redacts every match (never echoes a secret). - execution-engine.ts: autoCommitAgentWork scans `git diff --cached` after staging; on any finding it unstages and REFUSES to commit/push (safe failure: work stays local, surfaced as an error). - test/secret-scan.test.ts: 8 tests (each key type, quoted-literal vs env-ref FP, added-only diff lines, redaction, denylist). Verified: live-caught a staged ghp_ token (redacted); non-e2e suite 1761 tests + lint green. (Local e2e tests are pre-existing flaky — real-process timeouts.) Part of #777 follow-on / hq#418. Next P1: harden the Bash guardrail denylist (curl|sh, force-push variants) + contract tool-level grants + root_run_id. Tracks: agents-squads/hq#418 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
…squads-cli#780) (#781) Don't rebuild what Anthropic ships: Claude Code (which our agents run on) has a built-in OS sandbox (Seatbelt/bubblewrap) — FS isolation + a default-deny network proxy with a domain allowlist. P2 configures it via the spawn's existing --settings injection instead of hand-rolling Seatbelt profiles or a proxy. - src/lib/sandbox-settings.ts: buildSandboxSettings() → Claude Code settings (sandbox.enabled + filesystem allowWrite[worktree+memory]/denyRead[~/.ssh,~/.aws] + network.allowedDomains + allowManagedDomainsOnly + excludedCommands for gh/gcloud/docker which fail Go-TLS under Seatbelt), merging the guardrail hooks. Non-strict keeps Claude Code's escape hatch so a sandbox-incompatible command degrades gracefully. readGuardrailHooks/writeSandboxSettingsFile/sandboxEnabled. - execution-engine.ts: behind SQUADS_SANDBOX=1 (opt-in — default behavior UNCHANGED), pass the merged sandbox settings via --settings + set CLAUDE_CODE_SUBPROCESS_ENV_SCRUB=1 (native subprocess credential scrub). Verified: Seatbelt blocks out-of-scope writes on macOS (syscall-level); `claude` accepts the generated sandbox settings and runs clean; 6 unit tests; non-e2e suite 1767 tests + lint green. NOT yet default-on: gated behind SQUADS_SANDBOX=1 until a real `squads run` smoke confirms a sandboxed agent completes (allowed hosts, gh excluded) AND off-allowlist egress + write-outside-scope + ~/.ssh read are blocked. Headless network-block needs the managed-settings path (allowManagedDomainsOnly) — to verify in the smoke. Tracks: agents-squads/hq#418 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
…the bundle) (#782) A real `squads run --SQUADS_SANDBOX=1` smoke caught it: sandbox-settings.ts used lazy `require('fs')`/`require('path')`, which throws "Dynamic require of fs is not supported" in the ESM bundle the moment the sandbox path runs. Switch to top-level ESM imports. Add tests exercising writeSandboxSettingsFile + readGuardrailHooks (the file-I/O path the bug broke at runtime). Re-verified with the smoke: a sandboxed agent now COMPLETES (ran pwd/ls, wrote an in-scope file, 12.6s) and the settings file is generated + applied. Tracks: agents-squads/hq#418 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
Clears GHSA-5xrq-8626-4rwp (vitest <4.1.0 critical) that the `security` CI job (npm audit --audit-level=critical) flags repo-wide. Same 4.x line, no breaking change; full suite (1808 tests) passes on 4.1.8. Closes #784 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#783) * feat(client): generate API types from squads-api OpenAPI spec Vendor a committed openapi.json snapshot and generate src/client/ types with @hey-api/openapi-ts (types-only, zero runtime deps). Adds a CI drift guard and migrates the agent-executions call site to bind its request paths to the generated spec types, so a renamed route breaks the CLI build. This is the local, no-vendor replacement for the Stainless SDK path (Stainless shut down its hosted generator; Fern's free tier doesn't cover SDK gen). Tracks: agents-squads/hq#419 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: encode executionId path param (Gemini review) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ny rules (#786) * feat(governance): block agent edits to governance files via native deny rules squads run / the daemon inject templates/guardrail.json via --settings; this adds Claude Code native permissions.deny rules for Edit/Write/MultiEdit on goals.md, priorities.md, directives.md, SQUAD.md. Autonomous agents are refused by Claude Code itself; interactive founder/cofounder sessions (no injected --settings) edit freely. The boundary lands exactly on "human-in-session can; agent can't". Carries the deny rules through the sandbox path too (readGuardrailPermissions + buildSandboxSettings), so enabling SQUADS_SANDBOX doesn't silently drop them. Supersedes #765, whose PreToolUse hook read a non-existent CLAUDE_TOOL_INPUT env var and never fired. Re-scopes epic #764 to enforcement-only (drops the proposal channel / coherence command / issue-template layers). Verified end-to-end: an agent is denied editing goals.md/directives.md while state.md succeeds. Closes #764 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: address Gemini review — explicit **/ globs + accurate sandbox doc - guardrail.json: use **/<file>.md globs (explicit any-depth) instead of bare names. Verified to block nested, ./-relative, absolute, and root paths. - governance.md: correct the sandbox note — allowWrite includes cwd, so the OS sandbox does NOT block a bash redirect to a governance file inside the repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ayers (#770) * feat(context): founder-context + per-squad alignment as first-class layers Adds two new layers to gatherSquadContext() so agents run aligned with the founder's live strategic state, not just squad-internal goals: - L9: .agents/memory/company/founder-context.md (universal) Live business state, active client work, priority pipeline, in-flight threads, recent decisions, dismissed topics, hands-off PRs. - L10: .agents/memory/{squad}/founder-alignment.md (per-squad) Domain-specific translation: how THIS squad contributes to the founder's current priorities this cycle, with named files/PRs/clients. Both layers inject at the TOP of the action-first ordering so LLMs give them maximum attention. Visible to all 5 roles. Adds a pre-step to `squads run --org` that runs `scripts/founder-context-digest.py` (in project root) when context is stale (>2h) or missing. Hard-blocks the org cycle on digest failure rather than running unaligned agents. Token budgets bumped: scanner 4k→6k, worker/verifier 12k→16k, lead 24k→32k, coo 32k→40k. Closes #769 * feat(context): also discover digest script at .claude/hooks/founder-context-digest.py Preferred path is .claude/hooks/ (version-controlled, fits hq convention). Falls back to scripts/ for projects organized differently. * feat(context): keep L9+L10 generic, remove hq-specific L11 Reverting the L11 (drive-map) layer from gatherSquadContext. L9 (founder-context.md) and L10 (founder-alignment.md) are generic patterns that any squads-cli user can adopt — every business has live strategic context and per-squad contributions to current priorities. These slots stay in the public CLI. L11 was hq-specific (drive-map.md + drive-erp.md naming, our particular Hub sheet / GPS DuckDB architecture). That belongs inline in our hq content, not as a slot in the public loader. Approach: the digest script (hq/.claude/hooks/founder-context-digest.py) embeds drive-map + drive-erp content directly into founder-context.md when generating. Squad agents see the structural reference via L9, not via a separate slot. CLI stays generic; embedding is the user's choice. Effective context for our agents is unchanged (~39.7K chars for lead role). * fix(context): truncate-instead-of-drop + bump budgets for embedded structural ref Two related fixes for context loading in gatherSquadContext: 1. Off-by-4 bug in addLayer: when a layer's content exceeded the remaining budget, the truncation suffix '\n...' (4 chars) pushed text.length beyond budget by exactly 4, triggering the "exhausted" path which DROPPED the entire layer rather than keeping the truncated version. Now reserves space for the suffix so truncation lands exactly within cap. 2. Bump role budgets to fit founder-context.md when it embeds Drive structural reference (drive-map + drive-erp ≈ 30K chars): scanner: 12000 → 50000 (~12500 tokens) worker: 36000 → 60000 (~15000 tokens) lead: 60000 → 80000 (~20000 tokens) coo: 72000 → 100000 (~25000 tokens) verifier: 36000 → 60000 Without these bumps, scanner/worker/verifier dropped L9 entirely because the embedded Drive map + ERP architecture pushed founder-context.md to 42K, exceeding their old budgets. Symptom: the deliverable medium decision matrix wasn't reaching workers, so they defaulted to md-only output instead of producing Doc + Calendar + Sheet artifacts. Verified: all 5 roles now see "Mandatory companion artifacts" matrix and "External communication" approval gate. Tested with customer squad (founder-context.md ≈ 42K, total context 50K-64K depending on role). * fix(context): address gemini review — force propagation, cross-platform python, single-squad refresh, comment consistency (a) force flag propagated to refreshFounderContext in org cycle call (b) cross-platform python detection (win32→python, else python3) + result.error check with timeout/start-failure distinction (c) refresh extended to single-squad runs via lazy import before squad executes (d-e) ROLE_BUDGETS + ROLE_SECTIONS comments updated to consistently mention founder ctx + alignment for all roles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: scanner context includes founder-context layers (9,10) Rebase integration: agent-contract.test.ts (added on develop after this branch) asserted scanner sections as [1,2,3,4,5]; founder-context (L9/L10) is visible to all roles, so the derived scanner contract is now [1,2,3,4,5,9,10]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com>
…into GitHub issues (#767) * feat(brief): add squads brief command — sessions → GitHub issues Reads last N CLI session transcripts from .agents/conversations/cli/, calls claude --print (Haiku) to extract structured tasks per squad, and creates GitHub issues on the correct repos using SQUAD.md repo: mapping. Flags: --sessions N (default 5), --dry-run, --coo (writes founder-focus.md). Closes #766 Co-Authored-By: Claude <noreply@anthropic.com> * fix(brief): replace execSync with spawnSync, use haiku alias, conditional ellipsis - Security: replace execSync shell interpolation with spawnSync argv for gh issue create - Model: use 'haiku' alias instead of pinned 'claude-haiku-4-5-20251001' model id - UX: only append '...' ellipsis when body exceeds 120 chars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* fix(security): make agent guardrail Bash denylist actually fire The bundled PreToolUse Bash hook read a non-existent $CLAUDE_TOOL_INPUT env var (Claude Code delivers the hook payload as JSON on stdin) and a top-level command field (it nests under tool_input.command). Result: the denylist silently exited 0 and never blocked anything during squads run. Fix: read tool_input.command from stdin. Add a regression test that runs the actual hook from templates/guardrail.json against real payloads and asserts the exit code (2 = blocked, 0 = allowed), plus a guard that the command never again references $CLAUDE_TOOL_INPUT. Verified: before, all 6 dangerous payloads slipped through (exit 0); after, all 6 are blocked and all 4 safe commands pass. Closes #787 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): harden guardrail denylist — token matcher + fail-closed (Gemini review) Replace the fragile bash substring `case` with a tokenizing Python matcher (shlex) that resists the bypasses Gemini flagged: flag-order (rm -fr, rm -r -f), extra spaces (rm -rf /), no-trailing-space (git push -f), and flag bundles (git clean -df/-xdf). Now fails CLOSED (exit 2) on unparseable input instead of fail-open. Splits on shell separators so `cd /tmp && rm -rf /` and `sudo rm -rf /` are caught; still scoped to root/home targets so `rm -rf /tmp/build` is allowed. No false positives: git push --follow-tags / -u, git clean -n, npm run publish-docs stay allowed. Expanded the regression test to all of the above plus a fail-closed case (18 block + 13 allow). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#789) Agents can now declare max_context_tokens: N in their YAML frontmatter to cap context assembly to N tokens, overriding the role-level default. Wires the override through both dry-run preview and live execution paths. Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Leads got the full write toolset (toolsByRole.lead = readTools + writeTools), so a lead could author + commit code itself instead of delegating — observed as a ~20-min runaway turn that built+pushed+PR'd an issue and never ran the lead→work→verify cycle. Scope the lead to read + dispatch (Agent) only. Validated locally: lead now plans → dispatches workers in their lanes → reviews → verifier approves (two PRs opened+merged vs one runaway turn). Closes #790 Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
…shipping (#793) (#794) * fix(run): refine lead toolset — state writes yes, code-shipping no #792 stripped ALL write tools from the lead to stop a runaway lead that authored+committed+PR'd code itself. That over-corrected: the review prompt tells the lead to update state.md, which it could no longer do (Gemini flagged on #792). But simply restoring Write/Edit reopens the runaway, since Bash(git:*)/Bash(gh:*) live in readTools (which the lead keeps). Dedicated lead toolset: Write/Edit (state.md / memory — goals.md stays governance-blocked), read-only git/gh + 'gh issue create' (delegate), Agent (dispatch). No git commit/push, no gh pr create/merge, no build-Bash — the lead can update state but cannot ship code, so it must assign workers. Workers/scanners/verifiers unchanged. Closes #793 Co-Authored-By: Claude <noreply@anthropic.com> * fix(run): allow lead to merge ready worker PRs (addresses Gemini #794) The review prompt (workflow.ts:456) instructs the lead to 'gh pr merge --squash --delete-branch --auto', but the refined lead toolset blocked gh pr merge — the lead would fail mid-review. Merging an already CI-passed, reviewed worker PR is orchestration ('lead merges to develop after CI'), not code authoring, and it's safe: the lead still can't commit/push or 'gh pr create', so it lands workers' PRs but never ships its own code. Adds Bash(gh pr merge:*) to the lead git/gh set (renamed readOnlyGitGh → leadGitGh, since it now includes merge). --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
* feat(run): stream agent output live under --verbose (#791) Conversation runs only showed one-line phase labels live; each agent's full output was discarded until the end-of-run transcript, so you couldn't watch a run happen — the exact thing needed to supervise autonomous execution. Under --verbose the spawn handler now line-buffers each agent's stdout and prints it live, prefixed 'agentName |', across all 4 phases (plan/work/review/verify). Non-verbose runs unchanged. Full tool-call streaming (stream-json) is a follow-up. Closes #791 Co-Authored-By: Claude <noreply@anthropic.com> * fix(run): stream-decode agent stdout to preserve multi-byte UTF-8 (Gemini #796) Decoding each stdout chunk with chunk.toString('utf-8') corrupts multi-byte chars (emoji/accents/the │ prefix itself) split across chunk boundaries. Use a streaming TextDecoder (decode with {stream:true}); flush held bytes on close. --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…798) * fix: harden contract/secret-scan/brief/sandbox (Gemini #797 review) Addresses Gemini's review on the v0.3.2 release PR (#797), on already-merged features: - agent-contract: defensive Array.isArray guards on tool_grants/write_scope/ credential_scope (a non-array string would spread to chars -> security bypass); BASH_SCOPED regex now allows spaces (Bash(git status:*)); add OPENAI/AWS to KNOWN_SECRETS - secret-scan: require forbidden denylist terms >= 3 chars (a 1-2 char term would block every auto-commit) - brief: validate the extracted JSON schema before access (no crash on bad LLM output) - sandbox: allow api.openai.com for multi-LLM Skipped Gemini's Bash(squads:*)-for-lead suggestion — letting a lead spawn squad runs is a deliberate capability expansion, not a hardening fix. Co-Authored-By: Claude <noreply@anthropic.com> * fix: address Gemini round-2 — tighter contract/brief/secret-scan validation - BASH_SCOPED: literal space (not \s — \s allows \n/\r/\t) - contract: validate t.sensitivity enum; write_scope + credential_scope element types are strings - brief: validate each task has squad/title/body strings (downstream reads them) - secret-scan: skip non-string forbidden terms (no crash on bad config) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com> Co-authored-by: Claude <noreply@anthropic.com>
… release main had 4 commits (v0.3.0, v0.3.1, OIDC publish fix #756, npm --force #760) never back-merged. Resolutions: keep develop's 0.3.2 version; take main's --force npm-upgrade fix in publish.yml/release.yml (the proven publish config that shipped 0.3.1); add a 0.3.2 CHANGELOG entry. Unblocks develop->main release PR #797. Co-Authored-By: Claude <noreply@anthropic.com>
Node 18 is EOL and vitest/rolldown import styleText from node:util (Node >=20 only), which crashed the Test step on ci(18) and skipped publish — silently blocking the last two tagged releases from npm. - release.yml / publish.yml matrices: [18,20,22] -> [20,22] - engines.node: >=18 -> >=20 - CHANGELOG note under [0.3.2] Closes #801 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
The `run` command description said "no target = autopilot mode", but no-target intentionally lists squads (#694); the org cycle runs via `--org`. Corrected the description and added a discoverable "Run all squads as one cycle: squads run --org" hint to the no-target output. Docs/UX only — no behavior change. Closes #445 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
…438) (#806) * fix(run): --timeout bounds per-agent execution; default 30→15 (#438) In conversation mode the per-agent timeout read only SQUADS_AGENT_TIMEOUT_MINUTES or the hardcoded 30-min default — the --timeout flag was ignored, so a hung agent burned 30 min and --timeout 10 did nothing (flag was only honored in single-agent mode). - Thread --timeout -> AgentRunConfig.timeout -> the per-agent SIGTERM deadline at all 4 conversation spawn sites. - Precedence: SQUADS_AGENT_TIMEOUT_MINUTES env > --timeout > DEFAULT. - Lower DEFAULT_TIMEOUT_MINUTES 30 -> 15; drop the flag's hardcoded '30' default so unset falls through to DEFAULT (single-agent path unchanged). - Fix stale 'autopilot mode' help/desc examples -> list / --org. Behavior change: default per-agent cap now 15 min (was 30); --timeout N now applies in conversation mode. Closes #438 Co-Authored-By: Claude <noreply@anthropic.com> * fix(run): thread --timeout through ConversationOptions to per-agent spawn First pass added options.timeout at the spawn sites but ConversationOptions lacked the field — esbuild built it while tsc failed, and at runtime config.timeout was undefined (fell to DEFAULT, not the --timeout value). Added timeout to ConversationOptions and populated it at all 3 convOptions builders (run.ts x2, run-modes.ts). typecheck + 1852 tests green. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
…447) (#807) refreshFounderContext ran the digest synchronously (spawnSync, 12-min timeout) whenever founder-context.md was stale (>2h), blocking every cold run for minutes (Pass-1 over ~93k chars of sessions). Now: if a (stale) context file exists, spawn the digest DETACHED and proceed immediately with the current copy — the refresh lands for the next run. Block (bounded, sync) only on first-ever generation or when forced (--force / SQUADS_DIGEST_SYNC=1). Adds 'refreshing' status; callers only abort on 'failed', so unaffected. typecheck + 1852 tests + build green. Closes #447 Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
Squad agents run with cwd = the squad's repo checkout. Workers have full
git/gh, so they switch branches, drop files, and open PRs in the user's
working tree — and during an org cycle multiple squads mutate multiple
repos at once. This isolates each squad RUN in its own git worktree.
- New createRunWorktree(repoDir, squadName) in src/lib/worktree.ts returns
{ cwd, cleanup }. ONE worktree per run (shared by plan/execute/review/
verify) so the worker's changes are visible to the reviewer/verifier.
- Base = develop if present, else the repo's current branch.
- Wired into runConversation (workflow.ts): cwd override + try/finally
cleanup on every exit path (success, early return, throw).
- Graceful degradation: non-git dir or failed worktree add → run in-place
with a dim warning, never crash.
- Escape hatch: SQUADS_NO_WORKTREE=1 disables isolation.
- Parallel squads get collision-free dirs/branches (ms + monotonic counter).
Closes #440
Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
- #806 --timeout bounds per-agent execution; default 30→15 - #807 founder-context refresh async (no longer blocks the run) - #808 per-squad-run worktree isolation - #805 --org discoverability + corrected run description Co-authored-by: agents-squads[bot] <266303152+agents-squads[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
Author
📋 Reviewable net diff (the real v0.3.3 change vs
|
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.3.3 — runtime reliability
Promotes the v0.3.3 fixes from
developtomain. Theme:squads runis safe and pleasant to leave unattended.What's in it
--timeoutnow bounds per-agent execution in conversation mode (was ignored); default lowered 30 → 15 min so a hung agent can't burn half an hour.--force/SQUADS_DIGEST_SYNC=1still sync).SQUADS_NO_WORKTREE=1to disable.squads run(no target) documents--org+ hint; corrected the misleading "autopilot mode" description.Like the v0.3.2 release PR (#799),
develop→maindiverged at the commit level from squash-merges, so GitHub shows the accumulated diff. The net change vsmainis only the 0.3.3 fixes + version bump. Recommend squash-merge (as with #799). Root cause + permanent fix tracked in hq#443.After merge — publish v0.3.3
publish.ymlis the npm trusted publisher (a tag push viarelease.ymlwould 404). Run:Verify:
npm view squads-cli dist-tags→latest = 0.3.3; freshnpm i squads-cliruns. Optionally tag for the GitHub Release:git tag v0.3.3 origin/main && git push origin v0.3.3.Completes the v0.3.3 — runtime reliability milestone (#438, #445, #447, #440).