diff --git a/website/blog/2026-05-22-agentic-pr-reviews.md b/website/blog/2026-05-22-agentic-pr-reviews.md new file mode 100644 index 0000000..41181f7 --- /dev/null +++ b/website/blog/2026-05-22-agentic-pr-reviews.md @@ -0,0 +1,317 @@ +--- +slug: /2026-05-22-agentic-pr-reviews +canonical_url: https://dfberry.github.io/blog/2026-05-22-agentic-pr-reviews +custom_edit_url: null +sidebar_label: "2026.05.22 My Squad's PR Reviews" +title: "My Squad's PR Reviews: What Sam, Statler, and Scooter Actually Catch" +description: "I set up Sam, Statler, and Scooter from my Squad to review documentation PRs. Here's what the feedback looks like, what it caught on PR #8648, and the PR description template that made the whole loop work." +draft: true +tags: + - AI Agents + - Code Review + - Developer Experience + - GitHub Copilot + - AI assisted +updated: 2026-05-22 08:35 PST +keywords: + - agentic pr reviews + - squad agents pr review + - ai code review + - pr review quality + - agent-led reviews + - code review feedback loop + - github copilot review + - pr description best practices + - automated code review +--- + +# My Squad's PR Reviews: What Sam, Statler, and Scooter Actually Catch + + +![A pink-haired developer studying a wall of PR review comments by morning window light](./media/2026-05-22-agentic-pr-reviews/hero-pr-review-wall.png) + +*The first time an agent review came back with actual reasoning — here's what I checked, here's what I think, here's why — I realized I'd never gotten that from a human reviewer.* + +PR #8648 had 14 review comments across 3 files. Statler found them. + +I was documenting the tool server's database tools — translating entries from `schema.json` into parameter tables for the documentation platform article. Standard work. I've done around 40 of these tool articles at this point. I submitted the PR and Statler, my adversarial Squad agent, came back with this: + +> **Statler — Adversarial Review** +> +> The `subscription` parameter appears in your parameter table for `db_server_list`. The database namespace tools receive subscription via the platform's ambient context — it shouldn't be listed as a required parameter because callers never pass it directly. Compare the pattern in `storage_account_list`: same tool family, no subscription in the table. +> +> This will confuse readers looking for what to actually pass. + +That comment isn't "missing semicolon." It's a specific claim about a pattern across the database namespace, a comparison to another tool family, and a prediction about reader confusion. It was right about all three. + +I fixed the table and replied: "Confirmed — subscription, resource-group, and devlang slug formatting follow the same ambient pattern across storage, database, and container apps. Adding to reviewer notes for the series." Sam (my fact-checking agent) verified the pattern against the upstream `schema.json`. The PR merged with one exchange and no ambiguity. + +The same PR also came back with a second Statler flag I hadn't noticed at all: three of my seven Example sections used `node` as the devlang slug while the rest of the tool server documentation series uses `javascript`. It would have built fine. Readers filtering by language tab would have gotten partial results. + +14 comments. 3 files. Every flag was right. + +--- + +## Why "LGTM" Costs You More Than You Think + +The obvious problem with "LGTM" is that it gives you nothing to work with. You don't know what the reviewer checked. You don't know whether they caught the edge case you were worried about. You don't know whether your interpretation was right. + +The less obvious problem: it breaks the feedback loop going the other direction too. When you know most reviews will be "LGTM," you stop writing detailed PR descriptions. Why spend ten minutes on a scope definition if the reviewer isn't going to read it? So the description gets shorter, which makes future reviews even less useful, which makes future descriptions even shorter. I lived that loop for years. Writing documentation PRs for internal approval cycles where the reviewer was too stretched to engage — I'd get a green stamp or silence, and I'd move on without knowing if what I'd written was right or just unopposed. + +The other failure mode is the pointed review delivered with heat. Strong language, implied criticism, "why did you do it this way" as an accusation rather than a question. That's not better than LGTM. You learn something is wrong without learning what right looks like. You learn to feel bad about the decision, which is not the same as learning to make a better one. + + +![Split watercolor showing chaotic LGTM review on left versus organized agent review on right](./media/2026-05-22-agentic-pr-reviews/before-after-review-culture.png) + +*Before agent reviews, feedback was either a green stamp or a heat storm — rarely the clear, reasoned signal in between.* + +The LGTM loop is self-reinforcing in a particularly bad way: you start writing worse descriptions because the feedback doesn't reward good ones. The description degrades. The reviewer has even less to work with. The review gets even more shallow. By the time you're four months into a project, you're getting one-word approvals on PRs with three-word descriptions. + +Agent reviews break that loop because they respond to description quality directly. Write a vague scope section and you'll get flags that a precise scope would have answered up front. Write a detailed scope section and you'll get flags only on the things that weren't captured by the scope — the edge cases worth discussing. The quality of what you get back scales with the quality of what you put in. That creates the opposite incentive structure from the LGTM world. + +What I wanted — and still want — is a reviewer who reads carefully, explains their reasoning, and raises questions rather than making accusations. That doesn't require AI. It requires time and consistency that most human reviewers, who are also doing their actual jobs, can't reliably provide at the PR-by-PR level. Sam, Statler, and Scooter have time. They're consistent. They read every line. + +--- + +## What My Squad Actually Does in a PR Review + +My review setup uses [Squad v0.9.4](https://github.com/bradygaster/squad) running on the Copilot CLI. Three agents run on documentation PRs: + +**Sam** (Fact Checker) compares my claims against the source. For the tool server articles, that means checking parameter names, types, and descriptions against the upstream `schema.json`, and verifying that tool descriptions match the annotation text in the source repo. + +Sam's output on a good PR is mostly confirmations: "6 of 7 tool descriptions match `schema.json` at commit `a4f9c31`." When something drifts, Sam calls it out specifically: "db_query — your description says 'executes a SQL query against a specified database' but the upstream annotation says 'runs an arbitrary SQL query.' 'Against a specified database' is an addition not in the source. Intentional clarification, or drift?" + +That distinction — between style variance ("executes" vs "runs") and a substantive addition ("against a specified database") — is something Sam catches because it's doing a line-by-line comparison against a specific commit. A reviewer who knows the source well might catch it. A reviewer working from memory or from a general familiarity with the content area won't. + +**Statler** (Adversarial) looks for what will confuse readers, break patterns, or create maintenance problems. Statler's value is in the cross-file, cross-PR pattern work. It's not checking one file against one source — it's checking whether this PR is internally consistent with the rest of the series. + +On PR #8648, Statler caught both the subscription parameter issue and the devlang slug inconsistency in a single pass. The subscription issue required knowing the pattern across the database and storage namespaces. The devlang inconsistency required knowing what I'd used in other articles in the same series. Both of those are things that require scope that extends beyond the current PR — which is exactly the scope Statler operates with. + +On a recent platform skills article PR, Statler caught a parameter marked as "Required" in my table that the CLI JSON schema had as "Optional." I'd introduced the discrepancy by following the pattern from an older article rather than checking the current schema. Statler flagged it with: "Parameter `resource_group` is marked Required in your table but is Optional in the CLI JSON schema at the current version. The older article you may have referenced predates the schema change." + +That's a specific flag with a specific reason and a specific call-out of where I probably went wrong. Without that flag, the article would have shipped with incorrect Required/Optional documentation. Sam wouldn't have caught it — Sam checks descriptions against annotations, not parameter status against schema. Statler caught it because Statler is checking different things. + +**Scooter** (Quality) checks structure: do headings follow the expected pattern, is the frontmatter complete, do parameter tables have the right columns, are there missing Example sections. For the tool server documentation series, that means verifying every tool article has a Description, Parameters table, and Example section in the right order — and that the frontmatter fields are all populated with the right format. + +Scooter catches things like: a missing `updated` timestamp in frontmatter, a Parameters table that has the right columns but in the wrong order, an Example section that exists but is empty because I forgot to fill in the sample call. These are the structural checks I'd run manually if I remembered to. Scooter runs them every time. + +The sequence matters: Sam goes first (verify facts), Statler second (find problems), Scooter last (check structure). If Sam finds something substantively wrong with my descriptions — a mismatch against the upstream source — I want to know that before Scooter tells me the formatting is fine. The facts come first. If Sam fails, the others still run. I want all the feedback, not a fail-fast stop on the first error category. + +For content areas outside my direct expertise, I configure the reviewer with patterns from domain-specific feedback I've received. The database tool articles got a reviewer trained on the domain expert's published review patterns from the database article series — things like the exact format expected for the parameters table, which columns are required, and what "Required" vs "Optional" should actually mean for CLI parameters. The agent applies those standards consistently across every PR, which is something the domain expert can't do while also managing their own workload. + +--- + +## Write the PR Description as a Contract + +The quality of the review scales with the quality of the information in the description. Human reviewers fill in gaps with assumptions and prior knowledge. Sam and Statler work with what's in front of them. + +This forced me to rethink what a PR description is actually for. + +My old mental model: a PR description is a summary so the reviewer knows roughly what to expect. Good description means more detail, maybe a link to the ticket. + +My current model: a PR description is a goal statement and a scope contract. It defines what "done" looks like, what the reviewer should evaluate, and what the author is explicitly not doing. Writing it is the first act of review. + +The shift matters because when you're writing for a thoughtful reader instead of a rubber stamp, you think about the work differently. You ask the questions that the reviewer will ask before the reviewer asks them. You find the edge cases you were going to leave implicit and make them explicit. You notice the scope you've drifted into that isn't in the goal statement. + +For tool server articles — where I'm translating entries from `schema.json` into structured documentation — I now include: + +- **Goal**: which tools this PR covers, from which source file, to which output file +- **Source of truth**: a link to the specific commit in the upstream repo I'm translating from — not "the main branch," the specific commit, so Sam can check the exact same source +- **Scope**: which tools are covered, and explicitly which are not, with reasons for each exclusion +- **Open questions**: things I'm genuinely uncertain about, asked directly to the reviewer +- **Reviewer notes**: the settled decisions from prior PRs in this series — the rules I've already learned + +When Statler reviews that PR, it checks the translation against the stated source and compares patterns against the stated context. When Sam reviews it, it compares my descriptions against the annotation text at the specific commit. When I have a real question — like whether `subscription` belongs in the params table for this tool family — Sam looks at the pattern across the namespace and answers it rather than applying a default. + + +![The pink-haired developer writing a structured PR description that unfurls like a scroll](./media/2026-05-22-agentic-pr-reviews/pr-description-as-contract.png) + +*When you write the PR description for a thoughtful reader — not a rubber stamp — the document becomes the first act of review.* + +The "Out of scope" section is the one I skipped the most when I first started writing this way, and it's the one that does the most work. Stating what you're deliberately not doing forces you to make a decision about it rather than leaving it implicit. Three times in the last month, writing out what's excluded, I noticed I'd edited files that weren't in my stated goal. Two were legitimate scope expansions I should have documented. One was an accidental change from an earlier edit session that I reverted before submitting. The description-writing caught them before review did. + +The "Open questions" section changed my relationship with uncertainty in the work. Before this format, I'd make judgment calls on the ambiguous cases and hope the reviewer agreed. With a question in the description, I get a clear answer either way: Sam either confirms the pattern or flags the deviation. The exchange closes rather than leaving both sides guessing. + +--- + +## What Sam and Statler's Reviews on PR #8648 Actually Said + +For context: PR #8648 covered the database tool documentation. 7 tools across the database namespace, each needing a Description, Parameters table, and Example section. Source was the database namespace block in `schema.json` plus the tool-specific annotations in the upstream repo. + +Sam's review (fact-check pass): + +> **Sam — Fact Check** +> +> I compared all 7 tool descriptions against `schema.json` (commit `a4f9c31`). Six match exactly. One diverges: `db_query` — your description says "executes a SQL query against a specified database" but the upstream annotation says "runs an arbitrary SQL query." "Executes" vs "runs" is a style call, but "against a specified database" is an addition not in the source. Intentional clarification, or drift? +> +> Parameters tables are accurate with the exception flagged by Statler. Example sections are present for all 7 tools. + +Two things to notice: Sam cited the specific commit it compared against (`a4f9c31`), and it separated minor style variance from a substantive addition. "Against a specified database" is something I added that's not in the source. Sam is right to flag it as a question rather than a correction, because it might be a legitimate clarification for readers. The phrasing "runs an arbitrary SQL query" is technically accurate but not particularly helpful to someone trying to understand when to use this tool versus `nosql_query`. + +I answered: yes, intentional — the clarification stays. Sam closed the question with: "Confirmed intentional clarification. Noted for future `db_*` tool descriptions in this series." + +That last line is Statler's territory in terms of the thing it's tracking — consistency across the series — but Sam's acknowledgment that it's noted means the same question won't come up as drift on the next `db_*` PR. The agents coordinate. + +Statler's review ran against a different standard: + +> **Statler — Adversarial Review** +> +> The `subscription` parameter appears in your parameter table for `db_server_list`. The database namespace tools receive subscription via the platform's ambient context — it shouldn't be listed as a required parameter because callers never pass it directly. Compare the pattern in `storage_account_list`: same tool family, no subscription in the table. This will confuse readers looking for what to actually pass. +> +> Secondary: three of the seven Example sections use `node` as the devlang slug. Your other articles in this series use `javascript`. Pick one — readers filtering by language tab will get partial results otherwise. + +I'd completely missed the devlang inconsistency. I'd copied an example from a different article in the series that used `node` and didn't notice it was inconsistent. Statler caught it in a single pass because Statler has context on the full series, not just this PR. + +A linter doesn't catch that — it's not a syntax error, it's a cross-article consistency issue. A "LGTM" definitely doesn't. A human reviewer working through 14 articles in a sprint might not, because the inconsistency spans files rather than existing within one file. + +The subscription issue was more fundamental. I'd included `subscription` because I saw it in other parameter tables in the database namespace — ones that weren't up to date. Statler caught it because Statler knows the pattern, not just the precedent. + +That's 14 comments across 3 files. Every flag was something that needed to change. + +--- + +## How the Feedback Compounds + +The subscription parameter question came up in three consecutive database tool PRs before I added a rule for it. Not because I forgot — I genuinely wasn't sure whether the pattern I was seeing was intentional or a gap in the existing documentation. After the third flag, Sam verified the pattern across the full database namespace and I added a rule to my reviewer notes: "subscription, resource-group, and tenant-id are ambient context params in the database and storage namespaces — do not include in parameter tables." + +That note is now in my PR description template for the tool server documentation series. Statler stops flagging it. Sam stops verifying it. The question is answered and closed. + +```mermaid +graph LR + A[PR Submitted] --> B[Agent Review] + B --> C[Reasoning Captured] + C --> D[Standards Clarified] + D --> E[Next PR Improved] + E --> A +``` + +*Each review's reasoning teaches me — and the template — what "good" looks like for the next PR.* + +The loop works because the reasoning is preserved. When Statler flags something and I close the question, I can look back at that exchange and extract the rule precisely rather than paraphrasing from memory. The reviewer notes in my template are direct quotes or close paraphrases of resolved flags, not reconstructions. That matters because the precision of the rule determines whether Statler will keep flagging edge cases of the same issue. + +The maintenance cost is real. The agents don't automatically update my template. Roughly every five PRs, I spend fifteen minutes reading through the flags I've closed, identifying the ones that represent settled rules, and adding them to the reviewer notes section of the template. If I skip it for two or three weeks, the same flags start showing up again on new PRs — not because Statler forgot, but because I haven't incorporated the lesson. + +It adds up in the right direction. Before I had the subscription rule documented, I spent time on every database tools PR second-guessing whether I had it right. After? It's a known rule that gets verified automatically. That fifteen-minute update bought back roughly two hours of per-PR uncertainty across the series. Not profound math, but it compounds. + +By the time I'd completed ten database tool PRs with this loop running, my reviewer notes for the series were substantial: + +> subscription, resource-group, and tenant-id are ambient context params in the database and storage namespaces — do not include in parameter tables. +> +> Devlang slug is `javascript` not `node` for all JS/TS examples in this series. +> +> Tool descriptions should match upstream annotation text exactly; any additions or clarifications require an open question to the reviewer. +> +> "Required" and "Optional" status must match the CLI JSON schema value, not the pattern from adjacent namespace tools. +> +> Example sections use the published production endpoint format, not the development environment format. + +Five rules. Each one came from a Statler or Sam flag on a specific PR. None of them are things I would have invented independently — they came from the review process surfacing where my defaults diverged from the established patterns. + +The agents don't catch conceptual errors. If I've misunderstood what a tool does at the API level, Sam will verify that my description accurately matches my misunderstanding. Sam checks my text against the annotation — if the annotation is wrong, Sam confirms that my wrong description is consistent. That's a real limitation and it's why human review doesn't disappear from the workflow. Agent reviews handle pattern consistency and fact-checking against documented sources. Human review handles the gap between what's documented and what's operationally true. + +--- + +## Writing for a Thoughtful Reviewer Changes How You Think + + +![The pink-haired developer writing a structured PR description overlooking Bellingham Bay with a lighthouse in the distance](./media/2026-05-22-agentic-pr-reviews/author-mindset-bay-view.png) + +*Writing for a thoughtful reader changes your relationship with your own work — you ask the questions before they're asked of you.* + +Before agent reviews, I wrote PRs for a rubber stamp. That's a rational response to the environment — if most reviews are "LGTM," a detailed description doesn't get you much return. You write just enough to remind yourself what you were doing. + +When you know Sam will compare your descriptions against `schema.json` at commit `a4f9c31`, you check that comparison yourself first. Not every time — that would defeat the purpose — but it shifts what you pay attention to while you write. You think about what the reviewer needs to know rather than what you want to tell them. That's a different question. + +The scope boundary isn't bureaucratic overhead. It's the question that determines whether the PR does what it's supposed to do. Spelling it out before submitting means catching it yourself rather than in a review comment. Stating what you're not doing means making a conscious decision about it rather than leaving it implicit. Both of those move the thinking forward in a way that writing "covered the db namespace tools" doesn't. + +There's something that builds over the months too. Working with the same reviewer configuration across the tool server documentation series has created a shared vocabulary — not just terminology, but a shared understanding of what counts as an in-scope parameter, what belongs in a Parameters table versus an Example section, when "Required" means required by the tool versus required by the namespace context. That vocabulary lives in the reviewer notes in my template. A new article starts from the accumulated decisions of the previous forty, not from zero. + +You'd traditionally build that kind of shared vocabulary by working closely with a human expert over many PRs, until you internalized their standards. Most people don't have consistent access to that. The expert who has the deep context isn't always reviewing your PRs. When they are, consistency is a casualty of a full workload. The same reviewer standards applied to every PR, with no variance based on who's available, builds the vocabulary faster than occasional thorough reviews from domain experts who are also managing six other priorities. + +--- + +## Professional by Default + +Some of the most technically accurate feedback I've gotten in my career was delivered in ways that made it hard to engage with. All-caps emphasis, phrasings that implied I should have known better, "why did you do it this way" as an accusation. That delivery competes with the information transfer. You spend effort managing your response to the tone instead of engaging with the substance. + +Agent reviews are impersonal by default. Not cold — they engage with the content directly. But Statler doesn't get frustrated with the third consecutive PR that has the same devlang issue. Sam doesn't have strong opinions about "executes" vs "runs" that spill into how it phrases a flag. Scooter doesn't passively approve things because it's been a long week. + +The practical effect: I read agent review comments at face value. When Statler raises an issue, it's because a pattern check fired. When Sam flags a divergence, it's because the source doesn't match. There's no subtext to parse. The only question is whether the flag is right. If yes, fix it. If no, explain why. That's the whole exchange. + +Agents also ask rather than assert when something is genuinely ambiguous. A human reviewer who sees something they're uncertain about might mark it wrong, approve silently, or leave a vague comment. Sam says "is this intentional clarification or drift?" That's an invitation to confirm, not a verdict. For documentation work where many scope decisions are legitimately ambiguous — whether a behavior is user-facing or implementation-specific, whether a parameter should be in the table or treated as ambient context — getting a question is almost always the more useful response. + +When I'm overriding a flag deliberately — when I know the pattern diverges but I have a reason — I explain it in the PR description. Not for Sam or Statler, who will flag the deviation regardless. For the humans who read the PR history later and need to know whether the exception was intentional and why it was made. + +--- + +## What Actually Doesn't Work + +Some things still need a person. + +Sam doesn't catch conceptual errors against the truth. If a tool description in `schema.json` is misleading about what the tool actually does at runtime, Sam confirms that my documentation accurately matches the misleading description. It checks descriptions against the source — if the source is wrong, Sam's approval is not a useful signal about correctness. That requires someone who's actually used the tool and knows the gap between the annotation and the behavior. + +The feedback loop only works if I maintain it. The agents don't update my template. That's a fifteen-minute task every few PRs, but it's manual and optional, and if I skip it the same flags come back. The system runs on the work I put into it. + +Some conversations need to happen between people. When a PR decision involves content strategy — whether a new service gets its own article or folds into an existing one, whether a pattern I've established for the tool server series should apply to the platform skills articles — that's not a fact-check question or a pattern violation. It's a direction question. Sam and Statler can tell me whether a PR is consistent with what exists. They can't tell me whether what exists is the right structure going forward. + +My rule of thumb: agent reviews on every documentation PR. Human review on anything involving a judgment call about scope, a new content pattern I haven't done before, or a tool area where I know I'm working outside my documented knowledge. The agent reviews handle the mechanical consistency work so the human review time goes to what actually needs human judgment. A PR that's already been through Sam and Statler arrives at human review with pattern and fact-check issues resolved. The human reviewer's attention goes to the parts that require their specific knowledge. + +The agents also aren't a substitute for a domain expert on the first few PRs in a new content area. When I started the database tool articles, I didn't have enough context to write good reviewer notes — I didn't know what I didn't know. The first two or three PRs needed a human with database namespace knowledge to review, because Sam and Statler can only apply the patterns I've captured and I hadn't captured the important ones yet. The agent reviews got useful after I'd built the baseline. + +--- + +## The Template I Actually Use Now + +This is the PR description template I've built over roughly 60 documentation PRs in the last six months. Several iterations. This is the version I stopped changing. + +```markdown +## Goal + +[One or two sentences: what does this PR accomplish and which source does it translate from?] + +## Source of truth + +- Source file: [link to specific commit in upstream repo] +- Target format: [link to an existing article that matches the expected pattern] +- Reviewer notes: [settled decisions from prior PRs in this series, if any] + +## Scope + +**In scope:** +- [Specific tools, sections, or files covered] + +**Out of scope:** +- [Things intentionally excluded — don't skip this section] + +## Open questions for reviewer + +- [Specific things I'm uncertain about, with enough context to actually answer them] + +## Files changed + +- `[filename]` — [one-line description of what changed and why] +``` + +The "Reviewer notes" field in Source of truth is where settled decisions live. My current notes for the tool server documentation series: + +> subscription, resource-group, and tenant-id are ambient context params in database and storage namespaces — do not include in parameter tables. Devlang slug is `javascript` not `node` for all JS/TS examples in this series. Tool descriptions should match upstream annotation text exactly; any additions or clarifications require an open question to the reviewer. + +Those three lines prevent roughly 8–10 agent comments per PR. They came directly from Statler and Sam's flags across the first three database tool PRs, turned into documented rules. + +To run the reviews with Squad v0.9.4 via Copilot CLI: + +```bash +copilot --agent squad +``` + +Then in the session: + +``` +Sam, Statler, Scooter — review the PR branch in content-repo +``` + +The agents run in sequence, post their findings, and I work through the flags before pushing for human review. Agent review plus fixes: about 30 minutes per PR. That's faster than waiting for a human reviewer to have time, and what I bring to the human review is cleaner because the pattern and fact-check issues are already resolved. + +Before that setup, I was getting back LGTM or silence. Now I'm getting back 14 specific, accurate comments with reasoning. The comments are better than what I was getting from human reviews on most PRs — not because Sam and Statler are smarter than people, but because they read every line, every time, and they tell me what they found. + +The sixty-PR baseline is where the template started earning its cost. The first ten PRs, I was still figuring out what to capture in reviewer notes. By PR 20, the template had stable patterns. By PR 40, I stopped changing it. Now each new article in the series starts from a documented set of conventions that took forty PRs to build. That's the actual return on the fifteen-minute maintenance investment per five PRs — the reviewer notes are the artifact that makes every subsequent PR faster than the one before it. diff --git a/website/blog/2026-05-22-cross-team-agent-reviews.md b/website/blog/2026-05-22-cross-team-agent-reviews.md new file mode 100644 index 0000000..457d839 --- /dev/null +++ b/website/blog/2026-05-22-cross-team-agent-reviews.md @@ -0,0 +1,270 @@ +--- +slug: /2026-05-22-cross-team-agent-reviews +canonical_url: https://dfberry.github.io/blog/2026-05-22-cross-team-agent-reviews +custom_edit_url: null +sidebar_label: "2026.05.22 Cross-Team Agent Reviews" +title: "When Someone Else's Agent Reviews Your PR" +description: "Your PR lands with a team whose standards you don't fully know. Their agent reviews it with full reasoning attached. Here's what I've learned about receiving reviews from agents I didn't configure — and why the standards transfer is the part most people miss." +draft: true +tags: + - AI Agents + - Code Review + - Developer Experience + - GitHub Copilot + - AI assisted +updated: 2026-05-22 08:35 PST +keywords: + - cross-team agent reviews + - ai code review + - pr review standards transfer + - team standards visibility + - agentic pr review + - code review feedback loop + - sme review problem + - documentation standards +--- + +# When Someone Else's Agent Reviews Your PR + + +![Two developers in separate spaces connected by a floating review note — standards transmitted across team boundaries](./media/2026-05-22-cross-team-agent-reviews/cross-team-standards-transfer.png) + +*When the review is professional by default, the standard travels cleanly. No relationship required. No bandwidth negotiation. Just the rule, with the reasoning.* + +My PR went to the platform team for a standards review. Their agent came back with 6 comments. Five were naming and formatting patterns I'd been violating across multiple previous PRs — all of which had gotten approved silently. The sixth was a cross-namespace naming convention that doesn't exist in any onboarding documentation, because the platform team built it over three years of working conversations and never formalized it. + +I fixed all six flags. Took about twenty minutes. I added the rules to my reviewer notes for that content area. My next PR to that team had zero flags on those patterns. + +"LGTM" from a subject matter expert across team boundaries is the worst possible outcome — not the neutral one. They reviewed it. They know things I don't. The knowledge doesn't travel. + +When their agent reviews your PR, the knowledge travels with the verdict. + +This is about the other direction of agent-assisted review: not your own agents reviewing your own work, but someone else's agent reviewing your PR when it crosses into their content area. The feedback loop is the same, but what you're learning is different — you're not refining standards you helped develop, you're getting your first clear view of standards that have been operating invisibly. + +--- + +## Why "LGTM" From an SME Is Worst When You're the Outsider + +Within your own team, a "LGTM" approval has context that makes it at least partially interpretable. You know roughly what the reviewer cares about. You have a read on whether they're signaling "this is genuinely fine" or "I skimmed it and it looked okay" or "I don't have bandwidth right now but I trust you." You can ask a clarifying question without much overhead. The relationship absorbs the ambiguity. + +Cross-team, that context disappears entirely. You're submitting to a team whose priorities and internal standards you don't fully know. When they approve with "LGTM," you can't distinguish three meaningfully different outcomes: "this meets our standards fully," "this doesn't meet our standards but nothing is bad enough to block on," or "I didn't have time to check the things that actually matter." The approval looks identical for all three. + +The result is that you keep making the same mistakes across consecutive PRs, because you never find out they're mistakes. You keep using "server name" in parameter tables when the convention is "resource name." You keep including ambient-context parameters in tool parameter tables when the convention is to omit them. You keep using the language slug from an older article that predates the current convention. The patterns go through on every PR that gets a rushed review. Occasionally one lands with a reviewer who has time and inclination to flag it. You fix that instance. You move on without knowing whether you've learned the rule or just fixed the exception. + +The less visible failure mode: "LGTM" from a cross-team SME is uniquely frustrating because you can't even calibrate what it means. Within your own team, you might know this reviewer well enough to know they always check the API surface but skim prose clarity. You compensate. With a cross-team expert you interact with twice a year, you don't have that read. Their "LGTM" is a black box. + +The other failure at cross-team scale is the pointed correction without context. The reviewer is clearly annoyed that you've violated a convention they consider obvious, and they tell you. "This should be `resource name`, not `server name`." The correction is right. The delivery is terse. The context is absent. You fix the instance. You spend ten minutes wondering whether this is a team-wide convention or one person's strong preference. You decide not to ask because they seem busy and you don't want to be the person who needs hand-holding. Next PR: "server name" again, different file, different article. The conversation repeats. + +Repeat the exchange three times and neither side is learning anything. The reviewer is mildly exasperated. The contributor is mildly confused. The standard still lives only in one person's head. + +What actually helps is a reviewer who applies the same standards consistently, explains the reasoning when they flag something, and distinguishes between "this violates a documented convention" and "I have a question about your intent here." That's not a high bar. It's also not something you can reliably get from a cross-team reviewer who is managing their own deliverables and treating your PR as a secondary obligation. + +An agent from their team applies their documented standards to every PR, with the reasoning attached, regardless of queue length or workload. + +--- + +## What Their Agent Actually Said + +The PR in question was a documentation update covering a set of platform tools. I was adding parameter tables to three articles that covered resources across the compute, storage, and database namespaces. I based the table format on older articles in the series I'd found in the repo. Three of those older articles used "server name" in the parameter description column for resource identifier parameters. I followed that pattern — it was the evidence I had. + +Here's what the platform team's agent returned: + +> **Platform Reviewer — Standards Check** +> +> Parameter table uses "server name" in the Description column for `server_name` parameter in `compute_vm_start`, `compute_vm_stop`, and `db_server_list`. The convention in this content area is "resource name" for all resource identifiers, consistent with the compute and storage namespace articles. See: `storage_account_list`, `vm_list`. +> +> This inconsistency will create problems for readers switching between namespaces and may conflict with automated tooling that normalizes on "resource name." + +Four things in that review comment that a "LGTM" never gives me: + +**The rule, stated explicitly.** Not "change server name to resource name." The convention exists across the entire content area for all resource identifiers. Scope: all of them. + +**The comparison points.** `storage_account_list`, `vm_list`. I can go look at those articles right now and see exactly what correct looks like. I'm not making assumptions about what the reviewer means or has to look up. + +**The reason.** Two reasons, actually: readers switching namespaces, and automated tooling normalization. Now I understand why the rule exists. That matters because it tells me when the rule applies and when an exception might be warranted. If I'm documenting a resource type that is literally and only a server — not a general namespace resource — I know whether "server name" might be justified or whether the convention overrides the specificity. + +**The scope.** "This content area" — not just this article, not just this PR. A rule I carry forward to every subsequent submission to this space. + +Those are four things I now know that I didn't know before the review. I added all four to my reviewer notes for the platform content area. My next three PRs to that team had zero "resource name" flags. Not because the agent changed — it was running the same check. Because I learned the rule. + +The older articles I'd based my work on — the ones using "server name" — were pre-convention, written before the platform team formalized the approach. They'd never been updated because the team hadn't had bandwidth to do a full sweep. My PR was perpetuating an outdated pattern without knowing it. A human reviewer looking at my PR while managing four others in their queue might have noticed the discrepancy but not had time to write out the full reasoning, or might have recognized the pre-convention articles as the source and not flagged it, not knowing I'd copied the old pattern thinking it was current. + +The agent checks the current standard against the current PR, every time, with the reasoning attached. The pre-convention articles don't confuse it. + +--- + +## Team Standards Don't Come With Documentation + +Every team with more than six months of active work has accumulated standards that aren't formally documented. Not out of negligence — most conventions don't start as policy statements. They start as decisions made in the flow of normal work. + +The platform team decided at some point that "resource name" was better than "server name" for resource identifier descriptions. That decision almost certainly happened in a PR comment or a quick conversation, not in a standards meeting. The person who made the call had a reason. The people in the thread carried the reasoning. Future articles got reviewed by those people, who applied the standard. The pattern spread through the content area via review rather than via documentation. It became real — consistently applied, consistently correct — without ever being written down as a rule. + +Six months later, someone from outside the platform team's daily orbit submits a cross-team PR. They base their work on articles that exist in the repo, including the pre-convention ones that haven't been updated. They use "server name." Their PR gets reviewed by someone who knows the convention but is behind on their own work. The review is "LGTM" or a terse one-liner. The contributor makes the change (if told to) without understanding the rule. The next PR from that contributor has the same issue. + +This is how standards fail to transmit across team boundaries. It's not anyone's fault. The expert who reviewed is operating rationally given their constraints. The contributor who made the mistake was following the evidence available to them. The system just doesn't have a good mechanism for getting the expert's contextual knowledge into the contributor's hands at the moment they need it. + +The agent is that mechanism. + +When the platform team configures their reviewer with the resource naming convention — rule, comparison points, reason — that knowledge is available at every cross-team PR review, regardless of whether the expert who created the convention is available. It doesn't require the expert's bandwidth. It doesn't require a relationship between the contributor and the team. It requires that someone put the standard into the reviewer notes once. + +This changes the cross-team knowledge transfer problem from "how do we get the expert's knowledge to the contributor in time" to "how do we get the expert's knowledge into the agent's configuration once." The second problem is easier. It happens once, when the standard is established or articulated, rather than repeatedly, for every cross-team contributor who encounters it. + +There's a secondary benefit that the platform team gets, separate from the cross-team contributors: building the agent configuration forces explicit articulation of standards that were previously tacit. When you have to write "resource name for all resource identifiers, consistent across compute and storage namespaces, required for automated tooling normalization," you're making a rule more precise than it was when it lived in people's heads as "we just say resource name." Some of those standards, when written out, turned out to be underdocumented even in the reviewer notes — "Required/Optional per CLI JSON schema, not per usage observation" needed more specification before it was reliable enough to configure the agent on. The process of articulating standards makes them more useful for the team itself, not just for incoming contributors. + +I've been on both sides of this. Submitting cross-team PRs where I got terse corrections I didn't understand, fixing the instance without learning the rule. Reviewing cross-team PRs myself where I had the contextual knowledge but not the time to write out the full reasoning, leaving a one-liner that the contributor probably filed under "that person's preference" rather than "team convention." Both sides rational given the constraints. Both sides producing an exchange that transfers nothing durable. + +The agent on the reviewing side changes the default. + +--- + +## How the Feedback Loop Works Cross-Team + +The first PR I submit to a team whose standards I don't know: their agent flags several things, all specific, all with reasoning. I fix the flags. The part that compounds over time is what I do with those flags after fixing them. + +For each flag I get, I write the underlying rule into the reviewer notes section of my PR description template for that content area. Not a mental note — an actual written rule, so the next PR starts from it. + +The format I use in the reviewer notes: + +> **[rule type]:** [rule statement, including scope]. See: [comparison point]. Reason: [why]. + +After the first PR to the platform team: +> **Resource identifiers:** Use "resource name" for all resource identifier descriptions in parameter tables across compute, storage, and database namespaces. See `storage_account_list`, `vm_list`. Reason: cross-namespace consistency and automated tooling normalization. + +After the second PR: +> **Required/Optional status:** Match the CLI JSON schema value exactly. Do not inherit Required/Optional status from similar tools in adjacent namespaces — check the schema directly for each tool. See PR #[number] for the subscription parameter pattern. Reason: schema is the source of truth; adjacent patterns can diverge without announcement. + +After the third PR: +> **Example section URLs:** Use published production service endpoint URLs. Development environment URLs don't belong in examples regardless of whether they're technically accurate. See the published compute namespace articles for format. Reason: examples document the public API, not internal infrastructure. + +By the fourth PR, the flags from the first three don't fire. The agent is still running the same checks — those standards are part of their configuration regardless of who submits. But those specific patterns are now in my baseline for this content area. I'm a different contributor than I was at PR 1. I bring documented knowledge of their standards into each PR rather than starting from zero context. + +The front-loading is steeper cross-team than within your own workflow. You have more to learn, and you're starting from zero familiarity with the area rather than from working alongside the people who built the standards. But the compounding effect is identical. + +The limit: the loop is bounded by what's in the agent's configuration. If the platform team updated a convention between my last PR and my current one, their agent reflects the new standard. My notes have the old rule. The agent flags it on my next PR. I update the notes. The correction mechanism is reactive — flagging when I violate the new standard — not proactive notification. For content areas where I submit infrequently, I do a manual check on my reviewer notes against their most recent flag summary before submitting, to see if anything obviously changed. + +This doesn't scale perfectly past about three teams with active reviewer configurations. Four teams means four sets of reviewer notes to maintain, and the manual convention-change checks across four areas are about 45 minutes of overhead per sprint. That's still worth it to me — the alternative is making the same convention mistakes repeatedly across all four teams — but it's a real cost that scales linearly with how many cross-team areas you're working in. + +--- + +## No Relationship Required + +Cross-team review at scale has a structural relationship problem. Getting quality feedback from a team you don't work with daily requires one of: a pre-existing relationship where someone prioritizes your PR, a formal assignment process where review responsibility is explicit, or good timing. Most of the time, cross-team PRs get "LGTM" or a terse one-liner because the reviewer doesn't have bandwidth to go deeper on work they didn't create and aren't accountable for. + +When the team runs agent reviews, that structural problem doesn't disappear — but its impact on any individual PR shrinks substantially. The agent applies their documented standards at full depth regardless of whether there's a relationship between the reviewer and the contributor. It doesn't take more time for a first-time cross-team contributor than for someone who's submitted a hundred PRs to that content area. The standards get applied the same way every time. + +This matters most for contributors who are new to a content area — who don't know the conventions, who are making the first-few-PR mistakes that everyone makes. Those contributors are exactly the ones least likely to have a relationship that unlocks a thorough review. The agent inverts that: the thoroughness of the review doesn't scale with the contributor's relationship capital on that team. + + +![The pink-haired developer reading review feedback calmly in a serene PNW garden while others show review frustration](./media/2026-05-22-cross-team-agent-reviews/psychological-safety-review.png) + +*When feedback is professional by default, you can engage with the content rather than managing the relationship.* + +The professional-by-default aspect matters more cross-team than it does within your own workflow. Inside your own team, delivery problems in review comments come with relationship context you can use to separate the signal from the noise. You know this reviewer is generally terse but accurate. You know they've been heads-down on a release and their threshold for patience is low right now. Cross-team, that context doesn't exist. A terse or pointed comment from a team you don't know well is just terse and pointed. You can't easily check whether "this is incorrect" means "this violates a specific documented standard" or "I'm behind and phrasing things badly." + +An agent doesn't have that variance. It doesn't have good days and bad days. When the platform team's agent says "resource name, not server name, per the content area convention," the tone is identical whether I'm a first-time cross-team contributor or their closest collaborator. I read the flag at face value. The only question is whether it's right. If yes, fix it. If I think there's a legitimate exception, I explain in a reply. That's the whole exchange. + +There's an additional thing agents do in ambiguous cases that changes the review dynamic: they ask rather than assert. + +A human reviewer under time pressure hitting something uncertain has limited options — mark it wrong, approve with a mental note, or leave a vague comment that implies a problem without stating one. All rational given their constraints. None of them give the author a useful signal about what's actually expected. + +The platform team's agent on a PR I submitted recently: + +> The tool description for `compute_vm_start` uses "initiate a virtual machine start operation" where the upstream source text says "start a virtual machine." +> +> "Initiate a start operation" is more complex than the source without being more precise in the standard case — but this may be intentional if there's a distinction between initiating and completing the operation in the async context. Is this a deliberate clarification or drift from source text? + +That's a question, not a verdict. I can answer it: yes, intentional, the async distinction matters for readers trying to understand the return value. Or: no, drift, reverting to source text. Either way I understand exactly what the reviewer is checking and why. The exchange closes cleanly. + +The human version of this review would often skip the question. A reviewer with a full queue might approve (the description isn't wrong, just different) or leave "per source text" — which tells me the result without the reason. I'd make the change and still not know what made the distinction matter. + +For documentation work where many of the interesting calls are genuinely ambiguous — is this variation acceptable or a convention violation? — getting questions is valuable because it surfaces the exact edge case that the standard needs to cover. When I explain that the async distinction justifies the more complex phrasing, and the reviewer confirms it, that exception is now documented in the PR exchange. The next contributor who hits the same case has a precedent to point to. + +--- + +## The Audit Record Matters More When You Can't Ask + +When I've configured my own Squad agents to review my PRs, there's a recovery path when something in the review record is unclear: I can reconstruct context. I know what each agent was configured to check. I can look at the reviewer notes that drove a flag. If a question comes up six months later about why something was decided, I can go back to the configuration and understand the reasoning chain. + +With a cross-team agent review, that recovery path doesn't exist. I don't have access to the platform team's reviewer configuration. I can't query their agent about why a particular standard was established. The PR review record — the actual exchange preserved in the PR — is the only permanent record I have of what their standards were at that moment. + +That makes the captured reasoning more important, not less. + + +![The pink-haired developer viewing a preserved PR review in a museum-style display case, with winter Cascades in the distance](./media/2026-05-22-cross-team-agent-reviews/captured-in-perpetuity.png) + +*A year later, the PR is still there. The reasoning is still there. The "why" — the hardest thing to recover — is captured in the record.* + +Here's the scenario. Eight months from now, someone is looking at an article I submitted to the platform team's content area. They notice the parameter tables use "resource name" consistently and are wondering whether I knew that was a convention or happened to get it right. The PR the platform team's agent reviewed is still there. The agent's comment explains the convention, the comparison points, the reason. The exchange shows I fixed the flags and added the rule to my notes. It's clear I learned the standard. + +More commonly: I'm the one looking at an article I wrote eight months ago, trying to remember the platform team's exact standard for Optional parameters in cross-namespace tools. I don't have the reviewer notes from eight months ago in front of me — I may have updated my template since then. I go back to the PR. The agent's flags are there, the reasoning is there, the exchange is preserved. Reoriented in thirty seconds. + +This is the practical difference between a review record and a review memory. The memory fades. "We talked about the resource naming thing at some point" is not the same as the exchange that explains what the convention is, why it exists, and what it was compared against. The record doesn't fade. + +The before/after is stark. **Before:** I'm looking at a parameter table I wrote eight months ago. It mixes "resource name" and "server name" across different rows. I don't know if this was intentional or a convention violation or an editorial call I made in the moment. The PR history says "LGTM." I have nothing. I'm making a judgment call about which pattern to follow with no basis for the decision. + +**After:** I go to the PR. The agent's comment says "resource name per content area convention, see `storage_account_list`, reason: cross-namespace consistency and tooling normalization." My reply confirms I understood and implemented it. If the table mixes conventions, I know which one is correct. I can fix the inconsistency rather than making another undocumented judgment call. + +That's not a small difference. It's the difference between archaeological guesswork and having an actual record. + + +![The pink-haired developer at the Nooksack River at dusk, holding a lantern beside a marked post — a feeling of something preserved](./media/2026-05-22-cross-team-agent-reviews/nooksack-preserved-reasoning.png) + +*The reasoning, once captured, doesn't drift. It stays exactly where you left it — a marker in the record that future you will actually be able to read.* + +Code comments go stale. Architecture diagrams drift — a diagram from eighteen months ago describes three services; the system now has seven. PR reasoning doesn't drift. A PR from eight months ago that says "this naming convention was intentional, here's why, here's what it was compared against" is useful even if the convention has since changed, because it tells you the decision was deliberate, when it was made, and what information was in scope at the time. You need that to understand whether the current state is a refinement of the original decision or a divergence from it. + +For cross-team work specifically, this matters because you can't always Slack the reviewer from eight months ago and ask "what were you checking when you approved this?" The expert who reviewed it might have moved to a different team. The person who "LGTM'd" it might not remember the context. The agent's review is the record. When the reasoning is explicit in the record, the record is usable. When it's not, you're doing archaeology. + +--- + +## Write the PR to Actually Be Reviewed + +Knowing that a team runs agent reviews changes how you write the PR description before submitting. Not because you need to satisfy the agent — it will run its checks regardless of what you write. Because what makes an agent review useful is the same as what makes a human review useful: clear scope, stated source of truth, explicit questions where you're genuinely uncertain. + +When I'm submitting to a team with agent reviews, my PR description for that content area includes: + +**Goal:** what this PR covers, which source material it's based on, which target format it's following. + +**Source of truth:** a specific commit or version reference for the upstream material I'm working from. Not "based on the upstream repo" — the specific commit, so the agent and any human reviewer can check the exact same source. + +**Scope:** what's in this PR and explicitly what's not, with reasons for each exclusion. The out-of-scope section does the most work. When I've made a deliberate decision to omit something — a parameter that's ambient context, a tool that belongs in a different article, a behavior that's implementation-specific — that decision needs to be visible in the PR. + +**Open questions:** things I'm genuinely uncertain about, stated directly. "I treated `query_timeout` as an ambient parameter rather than including it in the table — is this consistent with how it's handled in `storage_read`?" That's a direct question to the reviewer. The agent either confirms the pattern or flags the deviation. Either way I get a clear answer, not a guess. + +**Reviewer notes:** the standards I've already learned from previous PRs in this content area. This tells the agent and any human reviewer what baseline I'm working from, so flags from this PR add to the baseline rather than re-explaining conventions I've already incorporated. + +The "Open questions" section changes the review from "here is my work, please approve" to "here is my work, here are my decisions, here are the places I wasn't certain — confirm or correct." It produces a more useful review because the agent can address a direct question precisely rather than having to detect a silent assumption. + +There's a side effect I didn't anticipate when I started writing PRs this way: the scope and open-questions sections catch my own drift before submission. Three times in the last six weeks, writing out the scope and excluded items, I noticed I'd edited files that weren't in my stated goal. Two were legitimate scope expansions I should have documented. One was an accidental holdover from an earlier edit session that I reverted before submitting. The description-writing does a pre-review. + +Writing for a specific, thoughtful reviewer changes how you think about what you're doing. The rubber-stamp model lets you be vague because the reviewer will fill in gaps. The cross-team agent model requires precision because the agent will check exactly what you said against exactly what the standards expect. + +--- + +## What This Doesn't Replace + +Cross-team agent reviews handle the documented standards consistently. They don't handle the things that aren't documented yet. + +If the platform team made an architectural decision last month that changed how a category of parameters should be documented, and their reviewer notes haven't been updated to reflect it, their agent won't flag violations of the new approach. The PR will pass their agent review and still be wrong against the current thinking. That requires a human reviewer who knows the current state of the architecture, not just the captured state of the standards. + +The gap between "what the agent knows" and "what the team currently thinks" is permanent and real. Teams evolve faster than reviewer notes. The agent is always applying yesterday's captured knowledge. For stable, well-defined content areas with slow-changing conventions, the gap is small. For areas in active architectural change, it can be significant. + +Agent reviews also don't catch conceptual errors. If I've misunderstood what a tool does at the API level, the agent will verify that my documentation accurately describes my misunderstanding. It checks my descriptions against the documented standard, not against the operational reality. Someone who's actually used the tool in production catches the gap between the documentation and the truth. + +My approach: agent review on every cross-team PR. Human review from someone on the receiving team on anything involving a new content pattern I haven't documented before, a judgment call about scope or architecture, or the first few PRs in a content area where I'm establishing my baseline. The agent handles the documented-standards enforcement; the human handles the delta between the documented standards and current reality. Both are necessary. The agent's consistency makes the human review more efficient — by the time a PR reaches human review, the pattern and convention questions are already resolved. + +When I'm making a deliberate exception to a flagged standard — an edge case I think justifies the deviation — I explain it in the PR description. Not for the agent, which will flag the deviation regardless. For the human who reads the PR later and needs to know whether the exception was intentional and why it was made. + +--- + +## What I've Found From Being On the Receiving End + +The first time a cross-team agent gave me 6 flags, my instinct was to feel like I'd done something wrong. Six flags on a PR sounds like a problem. + +It wasn't. It was five rules I hadn't known and one ambiguous call I now had a clear answer to. The PR before it got "LGTM." I'd violated the same five patterns in that PR too. The "LGTM" just didn't tell me. + +After ten PRs to the platform team's content area, my reviewer notes for that area are substantial. Each entry started as a flag from their agent. The flags have slowed — not because I'm submitting lower-risk PRs, but because the conventions I was violating in the first three PRs are now in my documented baseline. A colleague starting on cross-team PRs to that area can start from my reviewer notes rather than making the first-five-PR mistakes I made. That's not the same as having one of the platform team's experts walk them through the standards — it's a starting point, a documented baseline, something to build from rather than nothing. + +The part I'm still working on: a way to be notified when a team updates their reviewer configuration in ways that affect my established rules. Right now I find out on the next PR when their agent flags a pattern I thought was settled. That's fine — it's a correction mechanism — but proactive notification would be better. I'm submitting to three teams with active agent reviews right now and the reactive-correction approach works. If I were at five or six, I'd be spending more time on the catch-up than on the submissions. + +Six flags with reasoning is worth more than a hundred silent approvals.