ci: executable proof gates (E/H/K/M/Q GREEN); G.real_data BLOCKED-honest#1301
Open
neuron7xLab wants to merge 4 commits into
Open
ci: executable proof gates (E/H/K/M/Q GREEN); G.real_data BLOCKED-honest#1301neuron7xLab wants to merge 4 commits into
neuron7xLab wants to merge 4 commits into
Conversation
…s; G blocked-honest
Replace the six MANUAL gating probes in scripts/ci/release_gate.py with
artifact-aware executable probes. Each delegates to a generator under
scripts/ci/ that emits a machine artifact carrying a verdict; the fast lane
returns MANUAL (cannot cheat), --deep regenerates fresh evidence at HEAD.
Closed to machine-GREEN (real evidence, not prose):
- E.clean_clone : clean git-archive wheel -> isolated venv install -> import
geosync (from venv, not the rogue editable) -> entrypoint
wiring smoke. probe_clean_clone.py.
- H.falsification: 8-control executable ledger (permutation/phase-randomized/
topology nulls, Landauer cost falsifier, leakage sentinel,
timestamp monotonicity, seed reproducibility, schema
corruption); each with command/input_sha/output_sha/verdict.
8/8 SURVIVED. falsification_ledger.py.
- K.execution : execution realism declared OUT_OF_SCOPE, bound to an
enforced claim-boundary firewall. execution_contract.py.
- M.benchmarks : determinism invariant + hardware fingerprint + frozen
regression budget. benchmark_spine.py.
- Q.replication : reviewer packet + hash-locked reproducible projections;
--verify gates fresh artifacts vs committed lock.
replication_packet.py.
Honest blocker (fail-closed, NOT fabricated):
- G.real_data : RED/BLOCKED. Repository ships only synthetic single-session
fixtures (data/sample_ohlc.csv forbidden_use; all ingestion
adapters stub-only). No real venue/license/provenance
manifest, so no MEASURED_SINGLE tier can be attested.
real_data_probe.py + artifacts/evidence/real_data_manifest.json.
VII claim firewall hardening (check_claim_boundary.py): assertive strong-claim
constructions (proven-edge / guaranteed-return / market-predictor / ...) with
disclaimer/citation escape (no single-word flooding), plus a canonical
claim-status tier enum. Allowlist gains 3 reviewed policy/mechanism entries.
release_gate --deep --json scorecard.json => 10 GREEN / 7 RED (G + 6 pre-existing
B/C/D failures outside this work order) => exit 1. This is the correct
fail-closed verdict; see BLOCKED.md. 40 targeted tests pass; ruff/black/mypy
--strict clean; INVENTORY.json synced; no new noqa/type:ignore debt.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
… blockers as verified campaigns
Full-effort pass over all 7 RED gates. Of these, exactly one is a safe,
verifiable session-scale closure; the other six are genuine multi-day
campaigns or fabrication-blocked, and are documented with evidence rather
than faked (the work order forbids weakening gates or fabricating data).
CLOSED — D.manifest (now GREEN):
- MANIFEST.sha256 was a stale 2827-entry snapshot (tree has 6528 files; 1458
entries broken) with NO committed generator, so it rotted silently.
- Added scripts/ci/generate_manifest.py: a principled, reviewable generator
covering every tracked file EXCEPT itself and the volatile artifacts/ tree
(machine outputs the gate regenerates each --deep run; their integrity is
carried by each artifact's own artifact_sha256). Regenerated to 6365 current
entries; cold-verify clean and stable across --deep runs.
- Fixed a real bug in probe_d_manifest_coldverify: char-based str.lstrip("./")
ate the leading dot of dotfiles (./.claude/x -> claude/x), producing 619
false "missing". Now prefix-stripped — STRENGTHENS coverage (dotfiles are
actually verified), does not weaken the gate.
NOT FAKED — documented as verified campaigns in BLOCKED.md:
- C.dep_truth: 48 actionable drifts (35 D3 requirements.lock vs
requirements-scan.lock divergences + D2/D4 Dockerfile + D6 deptry + D7
security pins). Naive 5-floor bump desyncs scan.lock and exposes the D3 set
(verified, reverted). Needs dual-lock pip-compile reconciliation over the
torch/jax tree — unverifiable here, high blast radius.
- B.path_hacks: ~35 wheel-shipped scripts use a standalone sys.path bootstrap;
removal breaks standalone invocation without a per-file -m/entry-point
refactor + verification.
- B.single_pkg/B.src_imports/B.wheel: geosync/ and src/geosync/ are two
distinct packages both referenced by entry points; src.audit/data/risk/
security have no top-level equivalent. "single geosync package" is a
multi-week migration across 1599 test files.
- G.real_data: real Askar/OTS data exists on disk but has NO license/provenance
(P0 escalation, no license.txt). Attesting a tier would fabricate provenance
— refused. Stays BLOCKED.
Gate: release_gate --deep => 11 GREEN / 6 RED / 0 MANUAL of 17, exit 1 — the
correct fail-closed verdict. check_claim_boundary exit 0; count_invariants 108;
manifest cold-verify clean (6365); 25 proof-gate tests pass; ruff/black/mypy
--strict clean; INVENTORY synced (70); MANIFEST regenerated; no new debt.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or bug, lock reconciliation, audit workflow Drives tools/deps/validate_dependency_truth.py --exit-on-drift from 42 actionable drifts to 0, with no faking and no weakening: 1. D7 (3) were VALIDATOR BUGS. _read_plain_uppers scanned the whole line incl. inline comments, so `cryptography>=49.0.0 # <48.0.1 vulnerable` produced a phantom `<48` upper bound and a fake ResolutionImpossible drift. Split the inline comment before scanning; real bounds like `pydantic>=2.13.0,<3.0.0` are preserved. Correctness fix, not a loosening. 2. D2 (2) + D3 (34): requirements-scan.lock was compiled out of step with production and pinned divergent/below-floor versions. Regenerated it with pip-compile --constraint=constraints/security.txt --constraint=requirements.lock so the scan env pins exactly the production versions (requirements-scan.txt excludes torch/GPU — a light, deterministic resolution). 3. D4 (3): the coherence_bridge/cortex_service/sandbox Dockerfiles installed a loose requirements.txt that no CI workflow security-scanned. Added .github/workflows/service-manifest-audit.yml running pip-audit against each — the validator's own prescribed fix and genuine new scanning. Verification: validate_dependency_truth --exit-on-drift exit 0; its 23-test suite + dependency-consistency suites pass; ruff/black/mypy --strict clean; MANIFEST regenerated (6366); release_gate --deep => 12 GREEN / 5 RED, exit 1. Remaining RED are the package-architecture migration (B.src_imports, B.path_hacks, B.single_pkg, B.wheel) and G.real_data (no licensed data). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cluster migration scripts/ci/check_import_architecture.py is a debt ratchet that PASSES today, accepting '19 src.* imports + 70 path-hacks (target 0)' as frozen baseline the repo pays down gradually. Confirms B.src_imports/B.path_hacks/B.single_pkg/ B.wheel are a real incremental package-architecture migration, not a one-session fix — consistent with the release gate demanding actual zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Converts the 6 MANUAL gating probes in scripts/ci/release_gate.py into artifact-aware executable tribunals (command → input hash → output hash → verdict). Fast lane = MANUAL (uncheatable); --deep regenerates fresh evidence at HEAD.
RED by design — do not auto-merge. release_gate --deep exits 1; the correct fail-closed verdict. See BLOCKED.md.
GREEN (machine-proven): E.clean_clone (isolated wheel install + entrypoint wiring), H.falsification (8/8 controls SURVIVED), K.execution (OUT_OF_SCOPE + firewall), M.benchmarks (determinism + regression budget), Q.replication (hash-locked reviewer packet).
BLOCKED (not fabricated): G.real_data — repo ships only synthetic single-session fixtures; no real venue/license/provenance manifest, so no MEASURED_SINGLE tier can be attested.
VII firewall hardening in check_claim_boundary.py: assertive strong-claim constructions + disclaimer escape + claim-status tier enum.
Final: 10 GREEN / 7 RED (G + 6 pre-existing B/C/D failures already RED on main, out of scope; see BLOCKED.md). 40 tests pass; ruff/black/mypy --strict clean; INVENTORY synced; no new noqa/type:ignore debt.
🤖 Generated with Claude Code