proof-gate-patterns

A pattern library for skill authors who want to add proof gates to their LLM agents.

A proof gate is deterministic verification code that the agent writes per task, runs against its own work product, and quotes the observable output as proof, with hard fix-before-pass-through semantics at a workflow boundary. The pattern is sycophancy-resistant and overtraining/undertraining-skew-resistant because the proof artifact is non-textual: neither pressure can quietly soften a grep result or a quoted sentence.

This repository is the reproducibility artifact for the paper "Proof Gates: Sycophancy-Resistant Self-Verification via Agent-Authored Postconditions" (Whittaker & Whittaker 2026). The frozen v1.0.0 tag is the exact snapshot the paper's Footnote 1 cites; the main branch will evolve with community contributions.

Cite this work

@article{whittaker2026proofgates,
  title={Proof Gates: Sycophancy-Resistant Self-Verification via Agent-Authored Postconditions},
  author={Whittaker, Brian and Whittaker, Naomi M.},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}

Once the paper has a stable arXiv ID, the XXXX.XXXXX placeholder in this README and in CITATION.cff will be updated. GitHub's "Cite this repository" button (powered by CITATION.cff) is also available.

Repository contents

proof-gate-patterns/
├── README.md                       # this file
├── LICENSE                         # MIT
├── CITATION.cff                    # GitHub Cite-button metadata
├── CONTRIBUTING.md                 # contribution scope
├── patterns/                       # canonical gate patterns
│   ├── 01-grep-gate.md             # grep-type gate (pattern presence or absence)
│   ├── 02-quote-gate.md            # quote-type gate (verbatim sentence as proof)
│   └── 03-read-gate.md             # read-type gate (named excerpt from a prerequisite)
├── examples/                       # worked examples of agent output under each pattern
│   ├── 01-grep-gate-example.md     # corresponds to paper vignette V1 (cross-file invariants)
│   ├── 02-quote-gate-example.md    # corresponds to paper vignette V3 (clinical-voice copywriter)
│   ├── 03-read-gate-example.md     # corresponds to paper vignette V3 (voice profile load)
│   └── 04-pre-mutation-example.md  # corresponds to paper vignette V2 (data-loss hygiene)
└── docs/
    ├── design-principles.md        # the 5 design principles from §2.3 of the paper
    ├── operator-discipline.md      # the 4-step operator process for designing a gate-set
    ├── failure-modes.md            # observed gate failure modes + mitigation (§5.1)
    └── necessary-vs-sufficient.md  # necessary minimum gate vs sufficient defense (§2.4)

Vignette-to-example mapping

The paper presents three vignettes from a single-operator longitudinal deployment. Each vignette has a corresponding worked example in this repo:

Paper vignette	Pattern type	Example file
V1: Cross-file invariants (G1 sibling-scan)	grep gate	`examples/01-grep-gate-example.md`
V2: Data-loss hygiene (G2 pre-mutation assertion)	grep gate (pre-mutation variant)	`examples/04-pre-mutation-example.md`
V3: Clinical-voice copywriter (PG-G1, PG-G3, PG-G10)	quote + read gates	`examples/02-quote-gate-example.md` + `examples/03-read-gate-example.md`

The examples are extracted from production deployment in the paper's RRM Academy context. Domain-specific phrasing is generalized where it would obscure the pattern; the gate mechanics are domain-agnostic.

Quick-start for skill authors

If you maintain an agent's system prompt or skill file and you want to add proof gates, three operator decisions get you started:

Enumerate the work-product classes. A fix, a paragraph, a database write, a multi-step plan. Each work-product class is a candidate trigger.
List the domain rules that bind on each class. Rules without gates are not gated. Generic test-and-lint passing is not a gate of a domain rule.
Choose a gate-class per rule. Grep for pattern presence or absence, quote for sentence-level rules, read for prerequisite-loading rules.

Then encode in the skill: when the gate fires, what proof artifact must travel with the response, what happens on FAIL or N/A.

Minimal skill fragment

Before pass-through, for every fix you produce:

PG-INV-1 (cross-file): grep the directory of the fixed file for the pattern
  you fixed locally. Quote the count of matches.
  - PASS = "0 matches across N sibling files"
  - If matches found, fix them all, re-grep, quote 0.
  - N/A only if the pattern is genuinely local (justify in one sentence).
  Format the proof inline in your response.

Drop the fragment into the skill, point at the work-product class it should fire on, and verify the agent quotes the grep count rather than asserting "I checked." If the agent asserts without quoting, the gate is not firing; tighten the spec language.

The full four-step design process is documented in docs/operator-discipline.md. The five invariants every gate-set must satisfy are in docs/design-principles.md. Observed failure modes (too-narrow regex, paraphrased quote, unjustified N/A, gate-of-the-gate regress) are in docs/failure-modes.md. The necessary-vs-sufficient framing for gate-set coverage is in docs/necessary-vs-sufficient.md.

Companion artifact

The paper's positioning matrix locates proof gates (agent-authored, in-response) against operator-authored verification artifacts (operator-authored, ahead-of-time, at-commit). The companion repo lint-identity demonstrates the operator-authored sibling: a pre-commit linter that validates references to specific people against a JSON-defined SSOT.

Both repos use observable command output as proof; they differ on authorship locus and firing point. The paper's six-property positioning matrix locates both.

Versioning

Version	Released	Corresponds to
1.0.0	2026-05-25	Initial release alongside arXiv preprint v1

The v1.0.0 git tag is the frozen snapshot the paper cites. The main branch evolves with community contributions; cite v1.0.0 for reproducibility.

License

Contributing

See CONTRIBUTING.md. PRs welcome for typos, clarity improvements, additional vignettes, and bug fixes. Methodological disagreements should go in this repo's Discussions tab or as a citing publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

proof-gate-patterns

Cite this work

Repository contents

Vignette-to-example mapping

Quick-start for skill authors

Minimal skill fragment

Companion artifact

Versioning

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
examples		examples
patterns		patterns
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

proof-gate-patterns

Cite this work

Repository contents

Vignette-to-example mapping

Quick-start for skill authors

Minimal skill fragment

Companion artifact

Versioning

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Packages