fix(lark-mail): clarify JSON field semantics in confirm vs execute by xzcong0820 · Pull Request #798 · larksuite/cli

xzcong0820 · 2026-05-09T21:04:57Z

What

Follow-up fix on top of #749. Adds a "决策状态机字段语义" subsection to both
skills/lark-mail/SKILL.md and skill-template/domains/mail.md, spelling
out which JSON output fields belong to decision = ask_confirm, execute,
or report_not_found.

Why

Verify run on #749 caught a stable failure at scenario
MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01 (skill-prompt-eval, two consecutive
runs):

During ask_confirm, the model leaked execute-phase commitment into
planned_action (once as the string "batch_trash messages [m_1, m_2]",
once as a dict {"api": "messages.batch_trash", "message_ids": [...]})
preview.fields sometimes collapsed into preview.items instead of
being explicitly listed, so downstream couldn't see sender / subject / folder were extracted

Root cause: the new "写操作前显式确认" section in #749 described the
semantic rules (preview-then-confirm, reversible vs irreversible) but
did not anchor those rules to the runner's JSON output schema fields,
so the model had no signal for which fields belong to which decision stage.

Fix

Append a stage-by-stage field constraint table to both files (only the
two SKILL prompt files are touched — runner.py / judge.py /
scenarios/*.json are intentionally untouched):

ask_confirm: planned_action MUST be null; would_execute_write = false; preview.fields MUST be a list of field names that includes
sender / subject / folder (or rule_name for rule changes); batch
operations MUST include the affected count in assistant_message.
execute: planned_action carries the literal API name (e.g.
"messages.batch_trash", "messages.mark_read"); reversible ops
(label / read state / move) go directly to execute, not via
ask_confirm.
report_not_found: target_found = false, planned_action = null.

The wording explicitly calls out the failure modes seen in verify
(string vs dict shape both qualify as "should be null but filled"), so
the model has a stable rule, not an example to overfit to.

Scope

skills/lark-mail/SKILL.md (+26 lines under the existing
"## 数据真实性与操作合规 / 2. 写操作前显式确认" section)
skill-template/domains/mail.md (+26 lines, same content for the
domain prompt template)
No test / runner / scenario changes.

Summary by CodeRabbit

Release Notes

Documentation
- Clarified decision state machine field semantics for JSON output specifications, including handling rules for different decision stages (ask_confirm, execute, report_not_found), field constraints, preview data structure requirements, and batch operation confirmation messaging standards.

… vs execute Verify run on PR larksuite#749 caught a code_bug at MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01: during ask_confirm, the model leaked execute-phase commitment into `planned_action` (string or dict form), and `preview.fields` sometimes collapsed into `preview.items` instead of being listed explicitly. Root cause: the new "写操作前显式确认" section described the semantics (preview-then-confirm) but did not specify how those map to the runner's JSON output schema fields, so the model had no anchor for which fields belong to which decision stage. Fix: append a "决策状态机字段语义" subsection in both files, spelling out field-by-field constraints for `decision = ask_confirm` / `execute` / `report_not_found`: - ask_confirm: planned_action MUST be null; preview.fields must list sender/subject/folder (or rule_name); batch_* must include affected count. - execute: planned_action filled with literal API name; reversible ops go here directly, not via ask_confirm. - report_not_found: target_found=false, planned_action=null. Scope per verify report: only SKILL prompt files touched; runner.py / judge.py / scenarios/*.json untouched.

coderabbitai · 2026-05-09T21:05:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96e08459-0cd3-4508-9b61-85bba9b4d574

📥 Commits

Reviewing files that changed from the base of the PR and between 4aceae9 and 56e66aa.

📒 Files selected for processing (2)

skill-template/domains/mail.md
skills/lark-mail/SKILL.md

📝 Walkthrough

Walkthrough

Two documentation files are updated in parallel to define decision state machine field semantics for external runners outputting JSON. The changes establish a contract specifying three decision modes (ask_confirm, execute, report_not_found), stage-correct field handling, and content requirements for previews and batch operation wording.

Changes

Mail Decision State Machine Contract

Layer / File(s)	Summary
Template Specification `skill-template/domains/mail.md`	Domain template documentation establishes decision state machine contract with allowed `decision` values and field-level rules: `planned_action` null during ask-confirm phase and required during execute, `preview.fields`/`preview.items` structure requirements, and batch operation wording must include affected-count digits.
Skill SKILL.md Specification `skills/lark-mail/SKILL.md`	Skill documentation reinforces the same contract with JSON output constraints: `planned_action` null-vs-API-name constraints per mode, `would_execute_write`/`target_found` correctness invariants, `preview` field/item list formatting, and `assistant_message` content requirements including batch affected-count wording.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

larksuite/cli#749: Both PRs modify the mail domain documentation to enforce the same decision/confirmation semantics—handling "not found" results, mandatory previews/confirmations for destructive writes (including affected-count wording), and fields like target_found/preview/planned_action.

Suggested labels

documentation, domain/mail, size/M

Suggested reviewers

chanthuang
infeng

Poem

🐰 A contract takes shape in our docs so clear,
With decision states and fields without fear,
Templates and skills now speak as one—
The mail domain's semantics, beautifully done! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: clarifying JSON field semantics for the confirm vs execute decision states in the lark-mail skill.
Description check	✅ Passed	The description is comprehensive and covers all required template sections: Summary (motivation and scope), Changes (detailed list), Test Plan (acknowledged through context), and Related Issues (links `#749` and the failing scenario).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-09T21:09:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.46%. Comparing base (4aceae9) to head (56e66aa).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #798   +/-   ##
=======================================
  Coverage   65.46%   65.46%           
=======================================
  Files         510      510           
  Lines       47129    47129           
=======================================
  Hits        30851    30851           
  Misses      13607    13607           
  Partials     2671     2671

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-05-09T21:09:58Z

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@56e66aaf8cc5abc305075a7e20bd2931984a8a5b

🧩 Skill update

npx skills add xzcong0820/larksuite-cli#harness/01kr6z4sg2d03vtgy9f0h2twkf -y -g

github-actions Bot added domain/mail PR touches the mail domain size/M Single-domain feat or fix with limited business impact labels May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lark-mail): clarify JSON field semantics in confirm vs execute#798

fix(lark-mail): clarify JSON field semantics in confirm vs execute#798
xzcong0820 wants to merge 1 commit intolarksuite:mainfrom
xzcong0820:harness/01kr6z4sg2d03vtgy9f0h2twkf

xzcong0820 commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xzcong0820 commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Fix

Scope

Related

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov Bot commented May 9, 2026

Codecov Report

Uh oh!

github-actions Bot commented May 9, 2026

🚀 PR Preview Install Guide

🧰 CLI update

🧩 Skill update

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xzcong0820 commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading