A pattern library for skill authors who want to add proof gates to their LLM agents.
A proof gate is deterministic verification code that the agent writes per task, runs against its own work product, and quotes the observable output as proof, with hard fix-before-pass-through semantics at a workflow boundary. The pattern is sycophancy-resistant and overtraining/undertraining-skew-resistant because the proof artifact is non-textual: neither pressure can quietly soften a grep result or a quoted sentence.
This repository is the reproducibility artifact for the paper "Proof Gates: Sycophancy-Resistant Self-Verification via Agent-Authored Postconditions" (Whittaker & Whittaker 2026). The frozen v1.0.0 tag is the exact snapshot the paper's Footnote 1 cites; the main branch will evolve with community contributions.
@article{whittaker2026proofgates,
title={Proof Gates: Sycophancy-Resistant Self-Verification via Agent-Authored Postconditions},
author={Whittaker, Brian and Whittaker, Naomi M.},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
}Once the paper has a stable arXiv ID, the XXXX.XXXXX placeholder in this README and in CITATION.cff will be updated. GitHub's "Cite this repository" button (powered by CITATION.cff) is also available.
proof-gate-patterns/
├── README.md # this file
├── LICENSE # MIT
├── CITATION.cff # GitHub Cite-button metadata
├── CONTRIBUTING.md # contribution scope
├── patterns/ # canonical gate patterns
│ ├── 01-grep-gate.md # grep-type gate (pattern presence or absence)
│ ├── 02-quote-gate.md # quote-type gate (verbatim sentence as proof)
│ └── 03-read-gate.md # read-type gate (named excerpt from a prerequisite)
├── examples/ # worked examples of agent output under each pattern
│ ├── 01-grep-gate-example.md # corresponds to paper vignette V1 (cross-file invariants)
│ ├── 02-quote-gate-example.md # corresponds to paper vignette V3 (clinical-voice copywriter)
│ ├── 03-read-gate-example.md # corresponds to paper vignette V3 (voice profile load)
│ └── 04-pre-mutation-example.md # corresponds to paper vignette V2 (data-loss hygiene)
└── docs/
├── design-principles.md # the 5 design principles from §2.3 of the paper
├── operator-discipline.md # the 4-step operator process for designing a gate-set
├── failure-modes.md # observed gate failure modes + mitigation (§5.1)
└── necessary-vs-sufficient.md # necessary minimum gate vs sufficient defense (§2.4)
The paper presents three vignettes from a single-operator longitudinal deployment. Each vignette has a corresponding worked example in this repo:
| Paper vignette | Pattern type | Example file |
|---|---|---|
| V1: Cross-file invariants (G1 sibling-scan) | grep gate | examples/01-grep-gate-example.md |
| V2: Data-loss hygiene (G2 pre-mutation assertion) | grep gate (pre-mutation variant) | examples/04-pre-mutation-example.md |
| V3: Clinical-voice copywriter (PG-G1, PG-G3, PG-G10) | quote + read gates | examples/02-quote-gate-example.md + examples/03-read-gate-example.md |
The examples are extracted from production deployment in the paper's RRM Academy context. Domain-specific phrasing is generalized where it would obscure the pattern; the gate mechanics are domain-agnostic.
If you maintain an agent's system prompt or skill file and you want to add proof gates, three operator decisions get you started:
- Enumerate the work-product classes. A fix, a paragraph, a database write, a multi-step plan. Each work-product class is a candidate trigger.
- List the domain rules that bind on each class. Rules without gates are not gated. Generic test-and-lint passing is not a gate of a domain rule.
- Choose a gate-class per rule. Grep for pattern presence or absence, quote for sentence-level rules, read for prerequisite-loading rules.
Then encode in the skill: when the gate fires, what proof artifact must travel with the response, what happens on FAIL or N/A.
Before pass-through, for every fix you produce:
PG-INV-1 (cross-file): grep the directory of the fixed file for the pattern
you fixed locally. Quote the count of matches.
- PASS = "0 matches across N sibling files"
- If matches found, fix them all, re-grep, quote 0.
- N/A only if the pattern is genuinely local (justify in one sentence).
Format the proof inline in your response.
Drop the fragment into the skill, point at the work-product class it should fire on, and verify the agent quotes the grep count rather than asserting "I checked." If the agent asserts without quoting, the gate is not firing; tighten the spec language.
The full four-step design process is documented in docs/operator-discipline.md. The five invariants every gate-set must satisfy are in docs/design-principles.md. Observed failure modes (too-narrow regex, paraphrased quote, unjustified N/A, gate-of-the-gate regress) are in docs/failure-modes.md. The necessary-vs-sufficient framing for gate-set coverage is in docs/necessary-vs-sufficient.md.
The paper's positioning matrix locates proof gates (agent-authored, in-response) against operator-authored verification artifacts (operator-authored, ahead-of-time, at-commit). The companion repo lint-identity demonstrates the operator-authored sibling: a pre-commit linter that validates references to specific people against a JSON-defined SSOT.
Both repos use observable command output as proof; they differ on authorship locus and firing point. The paper's six-property positioning matrix locates both.
| Version | Released | Corresponds to |
|---|---|---|
| 1.0.0 | 2026-05-25 | Initial release alongside arXiv preprint v1 |
The v1.0.0 git tag is the frozen snapshot the paper cites. The main branch evolves with community contributions; cite v1.0.0 for reproducibility.
MIT. Copyright (c) 2026 Brian Whittaker, RRM Academy.
See CONTRIBUTING.md. PRs welcome for typos, clarity improvements, additional vignettes, and bug fixes. Methodological disagreements should go in this repo's Discussions tab or as a citing publication.