Studio Code: Lower agent thinking level from high to medium by youknowriad · Pull Request #3620 · Automattic/studio

youknowriad · 2026-05-26T13:11:41Z

Related issues

None.

How AI was used in this PR

Authored end-to-end by Claude (via studio code) — change is a one-line config edit guided by an A/B evaluation. The same agent ran the 10-site eval used to motivate the change.

Proposed Changes

apps/cli/ai/runtimes/pi/index.ts: switch the Studio Code agent's thinkingLevel from 'high' to 'medium'.

Motivation

Ran an A/B across 5 site-build prompts × 2 thinking levels (10 total builds) on the WordPress.com provider, claude-sonnet-4-6. Same prompts, same model, only thinkingLevel varies.

	Medium (n=5)	High (n=5)	Δ
Avg wallclock per build	450 s (7m 30s)	577 s (9m 37s)	+28% slower at high
Avg internal turns	53.8	61.6	+14%
Avg seconds per turn	8.37	9.36	+12% slower per turn at high
Worst single-turn latency	32.0 s	74.1 s	2× worse tail at high

High thinking is consistently slower per turn (the irreducible cost of deeper reasoning), drives ~14% more turns, and has a noticeably worse worst-case turn latency (74 s vs 32 s in the slowest sessions). In side-by-side review of the resulting sites, the medium builds were not visibly worse in design quality.

The 10 evaluation sites can be inspected locally — see Testing Instructions below.

Testing Instructions

Pull this branch and npm install if needed.
Build the CLI: npm run cli:build.
Run node apps/cli/dist/cli/main.mjs code and ask the agent to build a one-page WordPress site (e.g. "Build me a one-page site for a bakery called 'Foo'…").
Confirm the build completes successfully and the resulting site looks reasonable. Expect ~7–8 min wallclock for a typical one-page site (vs ~9–10 min before).

Pre-merge Checklist

Have you checked for TypeScript, React or other console errors?

Reduces per-build wallclock by ~28% and per-turn time by ~12% on average across our site-build evaluation, with no observed loss in design quality.

youknowriad · 2026-05-26T13:12:53Z

I've tried this over multiple site builds, I'm not sure that I'm able to see any impact on design quality. What do you think?

wpmobilebot · 2026-05-26T13:36:23Z

📊 Performance Test Results

Comparing 0983b3f vs trunk

app-size

Metric	trunk	`0983b3f`	Diff	Change
App Size (Mac)	1340.06 MB	1340.06 MB	+0.00 MB	⚪ 0.0%

site-editor

Metric	trunk	`0983b3f`	Diff	Change
load	1726 ms	1765 ms	+39 ms	⚪ 0.0%

site-startup

Metric	trunk	`0983b3f`	Diff	Change
siteCreation	9640 ms	9585 ms	55 ms	🟢 -0.6%
siteStartup	4928 ms	4923 ms	5 ms	⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

Studio Code: Lower agent thinking level from high to medium

0983b3f

Reduces per-build wallclock by ~28% and per-turn time by ~12% on average across our site-build evaluation, with no observed loss in design quality.

github-actions Bot assigned youknowriad May 26, 2026

youknowriad requested a review from Poliuk May 26, 2026 13:13

youknowriad merged commit e3a6b0a into trunk May 26, 2026
11 checks passed

youknowriad deleted the lower-code-thinking-to-medium branch May 26, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Studio Code: Lower agent thinking level from high to medium#3620

Studio Code: Lower agent thinking level from high to medium#3620
youknowriad merged 1 commit into
trunkfrom
lower-code-thinking-to-medium

youknowriad commented May 26, 2026

Uh oh!

youknowriad commented May 26, 2026

Uh oh!

wpmobilebot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

youknowriad commented May 26, 2026

Related issues

How AI was used in this PR

Proposed Changes

Motivation

Testing Instructions

Pre-merge Checklist

Uh oh!

youknowriad commented May 26, 2026

Uh oh!

wpmobilebot commented May 26, 2026

📊 Performance Test Results

app-size

site-editor

site-startup

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants