Studio Code: Lower agent thinking level from high to medium#3620
Merged
Conversation
Reduces per-build wallclock by ~28% and per-turn time by ~12% on average across our site-build evaluation, with no observed loss in design quality.
Contributor
Author
|
I've tried this over multiple site builds, I'm not sure that I'm able to see any impact on design quality. What do you think? |
Collaborator
📊 Performance Test ResultsComparing 0983b3f vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related issues
None.
How AI was used in this PR
Authored end-to-end by Claude (via
studio code) — change is a one-line config edit guided by an A/B evaluation. The same agent ran the 10-site eval used to motivate the change.Proposed Changes
apps/cli/ai/runtimes/pi/index.ts: switch the Studio Code agent'sthinkingLevelfrom'high'to'medium'.Motivation
Ran an A/B across 5 site-build prompts × 2 thinking levels (10 total builds) on the WordPress.com provider,
claude-sonnet-4-6. Same prompts, same model, onlythinkingLevelvaries.High thinking is consistently slower per turn (the irreducible cost of deeper reasoning), drives ~14% more turns, and has a noticeably worse worst-case turn latency (74 s vs 32 s in the slowest sessions). In side-by-side review of the resulting sites, the medium builds were not visibly worse in design quality.
The 10 evaluation sites can be inspected locally — see Testing Instructions below.
Testing Instructions
npm installif needed.npm run cli:build.node apps/cli/dist/cli/main.mjs codeand ask the agent to build a one-page WordPress site (e.g. "Build me a one-page site for a bakery called 'Foo'…").Pre-merge Checklist