Skip to content

Studio Code: Lower agent thinking level from high to medium#3620

Merged
youknowriad merged 1 commit into
trunkfrom
lower-code-thinking-to-medium
May 26, 2026
Merged

Studio Code: Lower agent thinking level from high to medium#3620
youknowriad merged 1 commit into
trunkfrom
lower-code-thinking-to-medium

Conversation

@youknowriad
Copy link
Copy Markdown
Contributor

Related issues

None.

How AI was used in this PR

Authored end-to-end by Claude (via studio code) — change is a one-line config edit guided by an A/B evaluation. The same agent ran the 10-site eval used to motivate the change.

Proposed Changes

Motivation

Ran an A/B across 5 site-build prompts × 2 thinking levels (10 total builds) on the WordPress.com provider, claude-sonnet-4-6. Same prompts, same model, only thinkingLevel varies.

Medium (n=5) High (n=5) Δ
Avg wallclock per build 450 s (7m 30s) 577 s (9m 37s) +28% slower at high
Avg internal turns 53.8 61.6 +14%
Avg seconds per turn 8.37 9.36 +12% slower per turn at high
Worst single-turn latency 32.0 s 74.1 s 2× worse tail at high

High thinking is consistently slower per turn (the irreducible cost of deeper reasoning), drives ~14% more turns, and has a noticeably worse worst-case turn latency (74 s vs 32 s in the slowest sessions). In side-by-side review of the resulting sites, the medium builds were not visibly worse in design quality.

The 10 evaluation sites can be inspected locally — see Testing Instructions below.

Testing Instructions

  1. Pull this branch and npm install if needed.
  2. Build the CLI: npm run cli:build.
  3. Run node apps/cli/dist/cli/main.mjs code and ask the agent to build a one-page WordPress site (e.g. "Build me a one-page site for a bakery called 'Foo'…").
  4. Confirm the build completes successfully and the resulting site looks reasonable. Expect ~7–8 min wallclock for a typical one-page site (vs ~9–10 min before).

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

Reduces per-build wallclock by ~28% and per-turn time by ~12% on average
across our site-build evaluation, with no observed loss in design quality.
@youknowriad
Copy link
Copy Markdown
Contributor Author

I've tried this over multiple site builds, I'm not sure that I'm able to see any impact on design quality. What do you think?

@youknowriad youknowriad requested a review from Poliuk May 26, 2026 13:13
@wpmobilebot
Copy link
Copy Markdown
Collaborator

📊 Performance Test Results

Comparing 0983b3f vs trunk

app-size

Metric trunk 0983b3f Diff Change
App Size (Mac) 1340.06 MB 1340.06 MB +0.00 MB ⚪ 0.0%

site-editor

Metric trunk 0983b3f Diff Change
load 1726 ms 1765 ms +39 ms ⚪ 0.0%

site-startup

Metric trunk 0983b3f Diff Change
siteCreation 9640 ms 9585 ms 55 ms 🟢 -0.6%
siteStartup 4928 ms 4923 ms 5 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

@youknowriad youknowriad merged commit e3a6b0a into trunk May 26, 2026
11 checks passed
@youknowriad youknowriad deleted the lower-code-thinking-to-medium branch May 26, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants