SDK Conformance Audit - Cases.json, evalCondition operator audit and partial feature-eval / experiment-assignment by madhuchavva · Pull Request #122 · growthbook/growthbook-python

madhuchavva · 2026-06-16T01:53:59Z

Summary

NaN handling in compare() - Add an early math.isnan guard that raises ValueError
Strict comparison for$eq / $ne for number-vs-string and bool-vs-int cases, resulting in eval behavior change.
Additions to cases.json to reinforce the behavior.

Section	Count	Theme
`evalCondition`	24	case-insensitive operator granularity (`$ini`/`$nini`/`$alli`) plus case-sensitive failure variants for `$in`/`$nin`/`$regex`/`$all`
`feature`	2	`hashVersion 2` force-rule inclusion/exclusion
`stickyBucket`	4	`minBucketVersion` / `bucketVersion` boundary semantics

…v1.3.0)

compare(NaN, NaN) returned 0 because both > and < evaluate to false on NaN, wrongly satisfying $eq. Raise instead — the existing exception handlers in evalOperatorCondition give the correct truth values ($eq=False, $ne=True, $lt/$lte/$gt/$gte=False), matching JS and the IEEE 754 spec. JSON can't represent NaN so the regression case lives in test_growthbook.py rather than cases.json.

JS sdk-js/test/cases.json carries 30 cases that Python's copy was missing: 24 evalCondition cases (case-sensitive failure + granular case-insensitive variants), 2 hashVersion 2 feature cases, and 4 sticky-bucket minBucketVersion/bucketVersion boundary cases. All 30 pass against current Python — pure defensive coverage, no behavioral change. Source: docs/sdk_parity/tier1.md (B2 catch-up section).

JS is source of truth for the conformance corpus. specVersion is a label, not a contract — Python's copy fell 30 cases behind under 0.7.1 without any signal. The new check fetches the JS cases.json and diffs case names per top-level key: - 'missing' (JS has, Python doesn't): fails CI unless listed in tests/scripts/corpus_skiplist.json - 'extra' (Python has, JS doesn't): reported but never fails; Python carries intentional extensions ($notRegex regression cases, locally-authored regressions) Wired as a separate job in the build workflow so it surfaces clearly in the PR UI and doesn't run 5x across the python-version matrix. Source URL is configurable via --js-source or GB_JS_CASES_URL; default is the JS SDK main-branch raw URL.

… ASCII-only case-fold Two canonical behaviors per the tier-1 cross-SDK audit: D6a (strict equality). JS's `$eq` is `===` (strict); only `<`/`>` coerce. Python previously used compare() for $eq too, which coerced string→float making `{$eq: 5}` against "5" return True. Real production divergence: a customer with int feature rules and string frontend attributes saw different variations between JS and Python clients. Routed $eq/$ne through a new _js_strict_equal helper that requires getType() match. Comparison operators ($lt/$lte/$gt/$gte) still call compare() and keep JS coercion semantics (null→0, string→float). D10 (ASCII-only case fold). Python's str.lower() is Unicode-aware ("İ".lower() → "i̇", "ß".lower() → "ß"), Rust's to_lowercase() is Unicode-aware (differently), Go's strings.EqualFold is Unicode-aware (differently still), JS's String.prototype.toLowerCase is naive ASCII. Three SDKs producing three different non-ASCII behaviors is worse than all four being naive. Locked the canonical to ASCII-only via a new _ascii_lower translate table. ASCII still folds (US matches us); non-ASCII chars pass through unchanged (Turkish İ no longer matches i). tests/cases.json adds 16 cases: - 5 for D6 (strict equality across number/string, bool/int, no-op equality) - 2 for D10 (Turkish İ, German ß) - 7 defensive pins for D1-D5, D7, D8 (Python already passed these but no case-locked them; pinning so JS-aligned behavior can't silently drift on future refactors) - 2 for D14/D15 (parentCondition gate semantics — Python passes today, Rust will need fixes before it can land these upstream)

…-aware Empirical verification via Node showed JS's String.prototype.toLowerCase() is Unicode-aware ("Σ" → "σ", "А" → "а"), byte-identical to Python's str.lower() on the test inputs. The earlier "ASCII-only" canonical decision was based on three Explore-agent reports that mischaracterized JS as "naive ASCII" — a common misconception that survived to the deliverable. Effect of the wrong fix: Python had been switched to a translate-table ASCII-only lower, which made $ini/$nini/$alli return false for Cyrillic/Greek pairs ("А" vs "а", "Σ" vs "σ") that JS treats as matches. So the "fix" actually broke JS-alignment for those characters. Restored str.lower() in core.py. Removed the misleading "ASCII case-fold only" framing in the cases.json case names; the case bodies ("Turkish İ vs i → false", "German ß vs SS → false") are still correct regression cases because both JS toLowerCase and Python str.lower produce "i̇" (with combining mark) for "İ" and don't fold ß. Added two positive cases pinning the Unicode-aware path: "Σ" vs "σ" → true and Cyrillic "А" vs "а" → true.

Empirical verification against JS via Node and against Rust via a cargo-built test harness: new RegExp("^5$").test(5) → true (JS .test() coerces non-string) Rust regex_comparison.rs:20 → true (it.to_string() coerces) Python re.search(pat, 5) → false (TypeError caught) So Python is the diverger; the case I had committed claimed Rust was wrong (matching the agent's misread of regex_comparison.rs:28). The case is removed. The real Python divergence is tracked in tier1.md D8 (revised) but the fix is non-trivial (Python str(None) ≠ JS String(null), so coercion would need a custom table) — deferred.

… reference identity) JS mongrule.ts:218 uses `actual === expected` for $eq/$ne. For arrays and objects, JS === is reference identity — two distinct literals never match (verified empirically: `[1,2] === [1,2]` and `{a:1} === {a:1}` both return false). Python's previous fallthrough was `a == b` which does deep value equality, so distinct-but-equal containers DID match — diverging from JS. The simple (non-operator) equality path at mongrule.ts:100 uses JSON.stringify deep comparison; Python's `==` on parsed-JSON dicts and lists generally agrees with that, so it's left untouched. Within feature evaluation we can't observe Python object identity in any meaningful way — condition and attribute come from independent JSON parses, never the same object. So container $eq is effectively always-false in practice, matching JS. The fix encodes that directly. Three new cases.json entries pin the behavior under tag [D6c].

Name-only diff missed real drift: if JS changed an expected outcome under an existing case name (no rename), the old script exited 0 with no signal. Verified the gap by flipping one expected boolean in a temp JS corpus copy — old script said OK, new script flags it as body-drift. Changes: - _case_signatures(): canonical-JSON (sort_keys, compact separators) hash of each case body (everything after the name). SHA1, truncated to 16 hex chars — stable across dict insertion order. - _diff() returns 5 categories now: actionable_missing, skipped_missing, extras, actionable_drift, skipped_drift. Same-name + same-hash → pass. Different hash → drift, fails CI. - corpus_skiplist.json gains a "drift" bucket for the rare case where Python deliberately keeps a different body from JS. - Report adds a "Body-drift" section; JSON output adds drift_actionable and drift_skipped keys. Exit codes unchanged: 0 = clean, 1 = drift, 2 = infra error.

Previous version stored one body-hash per name in a dict, so duplicates collapsed to the last occurrence. cases.json carries duplicate names (e.g. $not - pass twice, $not - fail twice) and pytest's pytest_generate_tests parametrizes the full list — so all duplicates DO run. Drift on any earlier duplicate was silently missed. Reproduced empirically: flipped an expected value on the FIRST '$not - pass' case in a temp JS corpus copy; old checker exited 0. Fix: - _case_signatures_grouped() returns {name: [hash, hash, ...]} keeping every occurrence (renamed from _case_signatures). - Drift compares Counter(js_hashes) != Counter(py_hashes) per name, catching: (a) single-occurrence body change, (b) one-of-many duplicates' body change, (c) duplicate-count mismatch (JS has 3, Python has 2). - Multiset comparison so duplicate order within a name doesn't matter. Verified across four scenarios — clean, drift-of-first-duplicate, single-occurrence drift, duplicate count mismatch. All produce correct exit codes and report sections.

madhuchavva added 12 commits June 15, 2026 15:28

test: add regression for multi-rule shared-prerequisite cycle (fixed …

a95ce02

…v1.3.0)

ci: trim corpus-freshness job comment

6b5afb1

fix: cast _js_strict_equal result to bool to satisfy mypy

b011fee

madhuchavva merged commit 8cc14e5 into main Jun 18, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK Conformance Audit - Cases.json, evalCondition operator audit and partial feature-eval / experiment-assignment#122

SDK Conformance Audit - Cases.json, evalCondition operator audit and partial feature-eval / experiment-assignment#122
madhuchavva merged 12 commits into
mainfrom
fix/multi-rule-shared-prereq-regression-test

madhuchavva commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madhuchavva commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

madhuchavva commented Jun 16, 2026 •

edited

Loading