Skip to content

feat(anomaly): vouch flag-anomalies — advisory flags on pending proposals (#323)#353

Open
dhgoal wants to merge 1 commit into
vouchdev:testfrom
dhgoal:feat/anomaly
Open

feat(anomaly): vouch flag-anomalies — advisory flags on pending proposals (#323)#353
dhgoal wants to merge 1 commit into
vouchdev:testfrom
dhgoal:feat/anomaly

Conversation

@dhgoal

@dhgoal dhgoal commented Jul 3, 2026

Copy link
Copy Markdown

closes #323.

reviewers scan the pending queue linearly, and at a glance every item carries the same apparent weight. the ones that most deserve a hard look — a claim far from anything in the kb, a claim that contradicts a pile of approved ones, a claim scraping by with the barest evidence — blend in. the near-duplicate direction already surfaces (find_similar_on_propose attaches non-blocking warnings on propose); this surfaces the outlier direction.

vouch flag-anomalies scores every pending claim proposal worst-first with reason codes:

  • thin_evidence — evidence list at or below a floor. propose_claim already requires ≥1 citation; this is the softer "technically cited but suspiciously thin" case.
  • contradicts_many — the proposal declares it contradicts a threshold number of approved, live claims (over kb.contradict's existing notion; retired claims don't count).
  • far_from_corpus — nearest-neighbour cosine to the approved claim corpus is below a floor (no neighbour is close → an outlier). computed against the same search_embedding / embedding_index path find_similar_on_propose uses, with the same embedder swallow — so it degrades gracefully to no code when the embeddings extra is absent, leaving the two non-embedding codes still computed.

thresholds resolve from review.anomaly.{min_evidence, contradiction_count, far_from_corpus_floor} in .vouch/config.yaml, following the similarity_threshold(store) resolution pattern, and default sanely when unset.

review gate & scope

read-side advisory only. it computes nothing durable — no proposal, no artifact, no status transition — so it goes nowhere near proposals.approve(). it never rejects, approves, blocks, quarantines, or rewrites, and it must not gain such a mode (that would be a write path bypassing the human gate). tests/test_anomaly.py::test_flag_writes_nothing asserts a scoring run appends no audit event. scoring logic lives in a new src/vouch/anomaly.py; storage.py stays pure i/o.

scope decision

shipped cli-only (vouch flag-anomalies). the issue makes the surface a design choice ("if the --flag-anomalies flag on kb.list_pending alone covers the need, only the flag + tests are required; decide during design") — a dedicated kb.flag_anomalies method and a list-pending --flag-anomalies inline flag are a separate concern and would add the four-site registration; this keeps the pr to one concern. flag_anomalies() returns structured Anomaly objects so a follow-up can wire either surface with no rework.

what breaks for an existing .vouch/

nothing. purely additive read-only command; no schema change, no new on-disk state.

tests

tests/test_anomaly.py (12 tests): thin_evidence flagged / clean proposal not flagged, contradictions counting only approved-live claims (a contradiction against a retired claim doesn't count), worst-first ordering (a two-reason proposal sorts ahead of one-reason ones), configurable thresholds, graceful degradation (no embedder → no far_from_corpus code, non-embedding codes still fire), the far_from_corpus floor logic with a stubbed embedder (far neighbour flags, close neighbour doesn't), the no-mutation invariant, empty-kb, and the cli text/json paths. make check green (pytest, mypy, ruff).

note

third in a small series of read-only reviewer-facing viewports (#351 vouch digest, #352 kb.timeline); this one only touches cli.py + the ## [Unreleased] changelog, in different spots than the siblings, so they're independent — whichever merges first, the others rebase cleanly.

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 19d1e9d6-3199-4366-b592-a58cc9e969c0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added docs documentation, specs, examples, and repo guidance cli command line interface tests tests and fixtures size: M 200-499 changed non-doc lines labels Jul 3, 2026
…sals (vouchdev#323)

reviewers scan the pending queue linearly and every item looks the same
weight; the ones that most deserve a hard look blend in. the near-
duplicate direction already surfaces (find_similar_on_propose); this
surfaces the outlier direction.

scores every pending claim proposal worst-first with reason codes:
- thin_evidence: evidence count at or below a floor (barely cited)
- contradicts_many: declares contradictions against >= n approved live
  claims (retired claims don't count)
- far_from_corpus: nearest-neighbour cosine to the approved claim corpus
  below a floor (an outlier). embedding-derived, so it degrades
  gracefully to no code when the embeddings extra is absent — the same
  swallow find_similar_on_propose does — leaving the two non-embedding
  codes still computed.

thresholds resolve from review.anomaly.* (mirroring similarity_threshold).
read-only by construction: a hint for the reviewer that emits no proposal,
writes no artifact, and never rejects or quarantines — the human gate is
untouched. scoring lives in a new anomaly.py, not storage.py.

shipped cli-only (`vouch flag-anomalies`); the kb.flag_anomalies method
and a list-pending --flag-anomalies flag are a separate concern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli command line interface docs documentation, specs, examples, and repo guidance size: M 200-499 changed non-doc lines tests tests and fixtures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant