feat(rewrite): small-model drift tolerance + server-side disposition#175
Conversation
09d631c to
2ca00d4
Compare
Greptile SummaryThis PR introduces small-model drift tolerance for the rewrite pipeline by replacing the strict LLM wire schema with a loose
Confidence Score: 4/5Safe to merge after addressing the _coerce_unit container guard; all other changes are well-tested and correctly implement the two-step drift-tolerance pipeline. The two-step disposition pipeline and the schema-widening changes are solid and well-covered by the new test suite. There is one gap in MeaningUnitSchema._coerce_unit: a small model that emits a list or dict for the src/anonymizer/engine/schemas/rewrite.py — MeaningUnitSchema._coerce_unit Important Files Changed
Sequence DiagramsequenceDiagram
participant DD as DataDesigner
participant LLM as disposition_analyzer LLM
participant Recon as _reconstruct_full_disposition_column
participant Fallback as _pessimistic_fallback_disposition
participant Down as Downstream consumers
DD->>LLM: prompt (SimpleDispositionResult wire schema)
LLM-->>DD: "bare list OR {sensitivity_disposition:[...]} (drifted OK)"
DD->>DD: jsonschema.validate() — oneOf(wrapper, array) passes
DD->>DD: pydantic: accept_bare_list + _coerce_scalar_to_str
DD-->>DD: "COL_SIMPLE_DISPOSITION = SimpleDispositionResult"
DD->>Recon: row[COL_SIMPLE_DISPOSITION, COL_ENTITIES_BY_VALUE, COL_LATENT_ENTITIES]
alt simple items present
Recon->>Recon: reconstruct_full_disposition(simple, ctx)
alt all items orphan → ValidationError
Recon->>Fallback: _pessimistic_fallback_disposition(ctx)
Fallback-->>Recon: SensitivityDispositionSchema (replace/generalize)
end
else empty simple output
Recon->>Fallback: _pessimistic_fallback_disposition(ctx)
Fallback-->>Recon: SensitivityDispositionSchema (noop if ctx empty)
end
Recon-->>DD: "row[COL_SENSITIVITY_DISPOSITION] = full.model_dump()"
DD->>Down: COL_SENSITIVITY_DISPOSITION (strict SensitivityDispositionSchema)
Reviews (3): Last reviewed commit: "fix(rewrite): close record-drop gaps fla..." | Re-trigger Greptile |
| if not simple.sensitivity_disposition: | ||
| logger.warning( | ||
| "reconstruct: empty SimpleDispositionResult for row; " | ||
| "falling back to pessimistic disposition from entity context" | ||
| ) | ||
| full = _pessimistic_fallback_disposition(entities_by_value, latent_entities) | ||
| else: | ||
| try: | ||
| full = reconstruct_full_disposition(simple, entities_by_value, latent_entities) | ||
| except ValidationError as exc: | ||
| logger.warning( | ||
| "reconstruct: ValidationError after orphan-skipping (likely all items out of context range); " | ||
| "falling back to pessimistic disposition. detail=%s", | ||
| str(exc)[:200], | ||
| ) | ||
| full = _pessimistic_fallback_disposition(entities_by_value, latent_entities) |
There was a problem hiding this comment.
Uncaught
ValidationError from _pessimistic_fallback_disposition can still drop the row
Both call-sites of _pessimistic_fallback_disposition inside this function are unguarded. When context is empty (all slots have empty labels/values — or there are no context rows at all), _pessimistic_fallback_disposition calls SensitivityDispositionSchema(sensitivity_disposition=[]), which raises a ValidationError from the min_length=1 constraint. Because neither call-site is wrapped in its own try/except, that exception propagates out of _reconstruct_full_disposition_column uncaught, and the row still drops — contradicting the PR's stated guarantee of preventing whole-record drops.
The try/except on line 398–406 only covers reconstruct_full_disposition; both subsequent _pessimistic_fallback_disposition calls (line 396 and line 406) are outside its scope. Wrapping them as well (or catching at the column level) would close the gap.
| if not items: | ||
| # Genuinely no entities at all in context — the orchestrator should | ||
| # have short-circuited before this step. Still better to raise here | ||
| # than to silently emit garbage; SensitivityDispositionSchema's | ||
| # min_length=1 invariant will surface the bug. | ||
| return SensitivityDispositionSchema(sensitivity_disposition=items) | ||
| return SensitivityDispositionSchema(sensitivity_disposition=items) |
There was a problem hiding this comment.
Duplicate return statements — the
if not items: branch and the fallthrough return are identical. The comment says "Still better to raise here" but neither branch explicitly raises; the raise comes implicitly from SensitivityDispositionSchema validation. Since the two paths return the same expression, the if branch is dead code. Either remove the if not items: block (letting validation raise naturally), or replace it with an explicit raise to make the intent clear and avoid the misleading comment.
| if not items: | |
| # Genuinely no entities at all in context — the orchestrator should | |
| # have short-circuited before this step. Still better to raise here | |
| # than to silently emit garbage; SensitivityDispositionSchema's | |
| # min_length=1 invariant will surface the bug. | |
| return SensitivityDispositionSchema(sensitivity_disposition=items) | |
| return SensitivityDispositionSchema(sensitivity_disposition=items) | |
| # Genuinely no entities at all in context — the orchestrator should | |
| # have short-circuited before this step. SensitivityDispositionSchema's | |
| # min_length=1 invariant will surface the bug via ValidationError. | |
| return SensitivityDispositionSchema(sensitivity_disposition=items) |
| @classmethod | ||
| def _coerce_scalar_to_str(cls, v: object) -> str: | ||
| if v is None: | ||
| return "" | ||
| if isinstance(v, (int, float, bool)): | ||
| return str(v) | ||
| return v |
There was a problem hiding this comment.
_coerce_scalar_to_str falls through for any type other than None, int, float, or bool — including list and dict. If a small model emits a list value for entity_label or category, the validator returns the list unchanged, and Pydantic v2 will raise a ValidationError for the entire SimpleDispositionItem rather than gracefully coercing to "". Adding a final string-cast fallback would match the validator's intent of making all scalar (and unexpectedly typed) values survive.
| @classmethod | |
| def _coerce_scalar_to_str(cls, v: object) -> str: | |
| if v is None: | |
| return "" | |
| if isinstance(v, (int, float, bool)): | |
| return str(v) | |
| return v | |
| @classmethod | |
| def _coerce_scalar_to_str(cls, v: object) -> str: | |
| if v is None: | |
| return "" | |
| if isinstance(v, str): | |
| return v | |
| if isinstance(v, (int, float, bool)): | |
| return str(v) | |
| # Unexpected type (list, dict, etc.) from a drifted model response — | |
| # coerce to empty string rather than letting pydantic raise on str validation. | |
| return "" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
34004d2 to
5d6dd38
Compare
…-side disposition Stacked on the detection-schema PR. Loosen the LLM-facing rewrite wire schemas and move disposition to a two-step (loose-wire → server-reconstruct) pipeline so small models no longer drop records: - shared: loose-list-wrapper helpers so DD's jsonschema pre-check accepts a bare top-level list as well as the canonical wrapper. - domain/meaning-unit/QA/disposition wire schemas typed as str with before-validators that coerce enum/scalar drift into range. - SimpleDispositionResult/Item: loose wire contract; reconstruct_full_disposition pairs it with trusted entity context to build the strict EntityDispositionSchema, with a pessimistic fallback that prevents whole-record drops. Schema-constraint cleanups (per review): - Drop min_length on server-reconstructed EntityDisposition.entity_label / entity_value / protection_reason (these are built from trusted context, not raw model output; the bounds were redundant tripwires). - Drop protection_reason max_length and instead cap a passthrough reason in the reconstructor (silent truncate to 500 chars) so a rambling reason can neither drop the record nor flow unbounded into the rewrite prompt/parquet. - Drop the now-redundant min/max_length on PrivacyAnswerItem.reason (the _truncate_reason before-validator is the sole, always-applied guard). - Keep the two list-level min_length=1 tripwires on the strict disposition containers: they never see raw model output and assert a real pipeline invariant (non-empty disposition when entities were detected). Description cleanups: wire-loose str fields (domain, aspect, importance, category, sensitivity, protection_method_suggestion) now enumerate their valid values inline in Field(description=...), derived from the backing enum, instead of referencing an unresolvable Python path — the enum is absent from the JSON schema the model sees, so the description is its only source of truth.
Two small-model robustness gaps surfaced by Greptile on the disposition path: - _pessimistic_fallback_disposition could still drop a row: with empty entity context it built an empty SensitivityDispositionSchema, which raises on the min_length=1 tripwire (and again in the downstream parser), and both call-sites in _reconstruct_full_disposition_column were unguarded. It now emits a single no-op (leave_as_is/low) disposition in that pipeline-invariant case, logging loudly — the row survives and the no-op never reaches the rewrite (excluded from protected_entities). Also removes the dead duplicate return branch. - SimpleDispositionItem._coerce_scalar_to_str now coerces unexpected container types (list/dict) to "" instead of returning them unchanged, so one drifted field no longer fails the whole item (which would discard every disposition for the row and force a pessimistic fallback). Adds regression tests for the empty-context no-op guarantee (function + column level) and container-value coercion.
5d6dd38 to
1953d43
Compare
| @field_validator("unit", mode="before") | ||
| @classmethod | ||
| def _coerce_unit(cls, v: object) -> str: | ||
| if v is None: | ||
| return "" | ||
| if isinstance(v, (int, float, bool)): | ||
| return str(v) | ||
| return v |
There was a problem hiding this comment.
_coerce_unit returns container values (list/dict) unchanged, so Pydantic raises a ValidationError for the whole MeaningUnitSchema item when a small model emits a list or dict for unit. The analogous validator on SimpleDispositionItem._coerce_scalar_to_str was fixed to return "" for containers — the same guard is needed here to keep a single drifted unit from dropping the entire meaning-units record.
| @field_validator("unit", mode="before") | |
| @classmethod | |
| def _coerce_unit(cls, v: object) -> str: | |
| if v is None: | |
| return "" | |
| if isinstance(v, (int, float, bool)): | |
| return str(v) | |
| return v | |
| @field_validator("unit", mode="before") | |
| @classmethod | |
| def _coerce_unit(cls, v: object) -> str: | |
| if v is None: | |
| return "" | |
| if isinstance(v, (int, float, bool)): | |
| return str(v) | |
| if not isinstance(v, str): | |
| # Unexpected container (list/dict) from a drifted response: coerce | |
| # to "" rather than letting pydantic raise on the whole MeaningUnitSchema. | |
| return "" | |
| return v |
Summary
Second of two stacked PRs for small-model schema robustness. Stacked on #174 (detection schemas) — review/merge that first; this PR's base is the detection branch and its diff shows only the rewrite-side changes. Supersedes the rewrite portion of #130.
Changes
jsonschema.validate()pre-check accepts a bare top-level list as well as the canonical wrapper (small models routinely emit the bare-list shape).DomainClassificationSchema,MeaningUnitSchema, QA answer schemas,SimpleDispositionItem/Result): typed asstrwith before-validators that coerce enum/scalar drift into range.SensitivityDispositionWorkflownow emits a looseSimpleDispositionResultthen reconstructs the strictEntityDispositionSchemaserver-side from trusted entity context (disposition_derivation.py), with a pessimistic fallback that prevents whole-record drops.Schema-constraint cleanups (per review)
min_lengthon server-reconstructedEntityDisposition.entity_label/entity_value/protection_reason— built from trusted context, not raw model output; the bounds were redundant tripwires.protection_reasonmax_length; instead cap the passthrough reason in the reconstructor (silent truncate to 500 chars) so a rambling reason can neither drop the record nor flow unbounded into the rewrite prompt/parquet.min/max_lengthonPrivacyAnswerItem.reason(the_truncate_reasonbefore-validator is the sole, always-applied guard).min_length=1tripwires on the strict disposition containers: they never see raw model output and assert a real pipeline invariant (non-empty disposition when entities were detected).Description cleanups
Wire-loose
strfields (domain,aspect,importance,category,sensitivity,protection_method_suggestion) now enumerate their valid values inline inField(description=...), derived from the backing enum (drift-proof), instead of referencing an unresolvable Python path — the enum is absent from the JSON schema the model sees, so the description is its only source of truth.Test plan
make testgreen (899 passing on the stacked branch)make format-checkcleantest_disposition_reconstructor,test_entity_label_category_map, rewrite drift classes;test_schemas/test_sensitivity_dispositionupdated for the two-step pipeline; added a regression test for the reconstructor reason cap