feat(anomaly): vouch flag-anomalies — advisory flags on pending proposals (#323)#353
Open
dhgoal wants to merge 1 commit into
Open
feat(anomaly): vouch flag-anomalies — advisory flags on pending proposals (#323)#353dhgoal wants to merge 1 commit into
dhgoal wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…sals (vouchdev#323) reviewers scan the pending queue linearly and every item looks the same weight; the ones that most deserve a hard look blend in. the near- duplicate direction already surfaces (find_similar_on_propose); this surfaces the outlier direction. scores every pending claim proposal worst-first with reason codes: - thin_evidence: evidence count at or below a floor (barely cited) - contradicts_many: declares contradictions against >= n approved live claims (retired claims don't count) - far_from_corpus: nearest-neighbour cosine to the approved claim corpus below a floor (an outlier). embedding-derived, so it degrades gracefully to no code when the embeddings extra is absent — the same swallow find_similar_on_propose does — leaving the two non-embedding codes still computed. thresholds resolve from review.anomaly.* (mirroring similarity_threshold). read-only by construction: a hint for the reviewer that emits no proposal, writes no artifact, and never rejects or quarantines — the human gate is untouched. scoring lives in a new anomaly.py, not storage.py. shipped cli-only (`vouch flag-anomalies`); the kb.flag_anomalies method and a list-pending --flag-anomalies flag are a separate concern.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #323.
reviewers scan the pending queue linearly, and at a glance every item carries the same apparent weight. the ones that most deserve a hard look — a claim far from anything in the kb, a claim that contradicts a pile of approved ones, a claim scraping by with the barest evidence — blend in. the near-duplicate direction already surfaces (
find_similar_on_proposeattaches non-blocking warnings on propose); this surfaces the outlier direction.vouch flag-anomaliesscores every pending claim proposal worst-first with reason codes:propose_claimalready requires ≥1 citation; this is the softer "technically cited but suspiciously thin" case.kb.contradict's existing notion; retired claims don't count).search_embedding/embedding_indexpathfind_similar_on_proposeuses, with the same embedder swallow — so it degrades gracefully to no code when the embeddings extra is absent, leaving the two non-embedding codes still computed.thresholds resolve from
review.anomaly.{min_evidence, contradiction_count, far_from_corpus_floor}in.vouch/config.yaml, following thesimilarity_threshold(store)resolution pattern, and default sanely when unset.review gate & scope
read-side advisory only. it computes nothing durable — no proposal, no artifact, no status transition — so it goes nowhere near
proposals.approve(). it never rejects, approves, blocks, quarantines, or rewrites, and it must not gain such a mode (that would be a write path bypassing the human gate).tests/test_anomaly.py::test_flag_writes_nothingasserts a scoring run appends no audit event. scoring logic lives in a newsrc/vouch/anomaly.py;storage.pystays pure i/o.scope decision
shipped cli-only (
vouch flag-anomalies). the issue makes the surface a design choice ("if the--flag-anomaliesflag onkb.list_pendingalone covers the need, only the flag + tests are required; decide during design") — a dedicatedkb.flag_anomaliesmethod and alist-pending --flag-anomaliesinline flag are a separate concern and would add the four-site registration; this keeps the pr to one concern.flag_anomalies()returns structuredAnomalyobjects so a follow-up can wire either surface with no rework.what breaks for an existing
.vouch/nothing. purely additive read-only command; no schema change, no new on-disk state.
tests
tests/test_anomaly.py(12 tests): thin_evidence flagged / clean proposal not flagged, contradictions counting only approved-live claims (a contradiction against a retired claim doesn't count), worst-first ordering (a two-reason proposal sorts ahead of one-reason ones), configurable thresholds, graceful degradation (no embedder → nofar_from_corpuscode, non-embedding codes still fire), thefar_from_corpusfloor logic with a stubbed embedder (far neighbour flags, close neighbour doesn't), the no-mutation invariant, empty-kb, and the cli text/json paths.make checkgreen (pytest, mypy, ruff).note
third in a small series of read-only reviewer-facing viewports (#351
vouch digest, #352kb.timeline); this one only touchescli.py+ the## [Unreleased]changelog, in different spots than the siblings, so they're independent — whichever merges first, the others rebase cleanly.