Skip to content

feat(research): add L2 Ricci microstructure validation lane#1242

Draft
neuron7xLab wants to merge 22 commits into
mainfrom
feat/ricci-microstructure-v1
Draft

feat(research): add L2 Ricci microstructure validation lane#1242
neuron7xLab wants to merge 22 commits into
mainfrom
feat/ricci-microstructure-v1

Conversation

@neuron7xLab

Copy link
Copy Markdown
Owner

Summary

Adds a minimal ricci_microstructure_v1 research lane for falsifiable Ollivier-Ricci L2 microstructure experiments.

Implemented:

  • geosync-research ingest --data data/l2_manifest.json
  • geosync-research run --line ricci_microstructure_v1 --config CONFIG --data DATA --out artifacts/runs/
  • geosync-research verify RUN_ID
  • fail-closed L2 schema validation
  • NOBI graph builder
  • Ricci curvature kernel with deterministic local Ollivier approximation and optional GraphRicciCurvature.OllivierRicci
  • null models: IAAFT, shuffled timestamps, random walk, volume-neutral permutation
  • statistical gate: p_value < 0.01 AND abs(cliffs_delta) >= 0.147
  • Draft 2020-12 artifact schema
  • negative result preservation via HYPOTHESIS_NOT_SUPPORTED
  • CLAIMS.md T3 hypothesis entry
  • docs for Ricci microstructure, reproducibility, and negative results
  • focused unit tests for schema failure, graph failure, curvature output, synthetic controls, effect classes, and forbidden statuses

Explicit non-claims

This PR does not claim alpha, production readiness, proven market physics, or validated edge.

Allowed statuses only:

  • SUPPORTED
  • HYPOTHESIS_NOT_SUPPORTED
  • INVALID

Forbidden statuses remain forbidden:

  • ALPHA_FOUND
  • MARKET_SIGNAL_PROVEN
  • PRODUCTION_READY
  • PHYSICS_CONFIRMED

Known incompleteness

  • Real LOBSTER data is not committed. The lane requires operator-supplied licensed data.
  • I could not run full repository CI in this environment because the repo cannot be cloned from GitHub here.
  • A dedicated release-gate helper write was blocked by the connector safety layer, so make release-gate is not rewired in this PR.
  • Existing root Dockerfile is not converted to a digest-pinned Ricci capsule in this pass.
  • requirements.lock is not regenerated after adding Typer/package surface.

Local verification performed outside repo clone

A minimal extracted test harness passed locally for:

  • invalid missing column fails closed
  • negative size fails closed
  • non-monotonic timestamp fails closed
  • graph emits ricciCurvature
  • too few edges raises RuntimeError
  • cycle graph curvature near zero
  • complete graph curvature positive
  • artifact schema rejects missing fields
  • artifact schema rejects market-mythology statuses

Acceptance gate still required before merge

python -m pytest tests/research_lines/test_ricci_microstructure_v1.py -q
geosync-research ingest --data data/l2_manifest.json
geosync-research run --line ricci_microstructure_v1 --config configs/research/ricci_microstructure_v1.json --data data/validated_l2_frame.parquet --out artifacts/runs --seed 1337
geosync-research verify RUN_ID
make release-gate

Keep this as a draft until real LOBSTER substrate, lockfile regeneration, supply-chain evidence, and release-gate wiring are complete.

neuron7xLab and others added 22 commits June 20, 2026 06:40
…hardening

Augments PR #1242 in place (no clobber). Addresses the falsification audit:

KERNEL (critical): the fallback kernel computed UNWEIGHTED hop-count shortest
paths + a UNIFORM transport measure on a FIXED topology, so curvature was a
constant 0.20512821 on every snapshot (empirically proven) — vacuous; every
null bit air. Repaired to be microstructure-sensitive:
  * per-edge metric = absolute price gap (snapshot-varying);
  * weighted shortest paths (weight="weight");
  * size-weighted (volume) transport measure;
  * exact W1 via POT emd2 (== prior linprog to 1e-9) + cached all-pairs
    distances (~100x faster, makes the battery tractable).
Curvature now varies across snapshots (std ~0.25); INV-RC1 (kappa<=1) holds.
KERNEL_VERSION recorded in the artifact.

SOURCE HONESTY: ALLOWED_SOURCES = {LOBSTER_LICENSED_L2, BINANCE_FUTURES_DEPTH};
source carried through verbatim, never relabelled. Manifest now records venue,
snapshot_count, session_window_utc, collector_version, collector_sha256.

NULLS: default is iaaft (perturbs the sizes the kernel consumes), not the
mean-invariant timestamp shuffle; added size_permutation_within_side and
block_bootstrap, each documenting the pathway it destroys.

ARTIFACT HARDENING: + kernel_version, source, config_sha256, collector_sha256,
dirty_git (schema const false), seed, observed_count, null_mean, null_std,
forbidden_claims_absent (schema const true). Forbidden statuses
(ALPHA_FOUND/MARKET_SIGNAL_PROVEN/PRODUCTION_READY/PHYSICS_CONFIRMED) are
schema-rejected.

PRE-EXISTING #1242 BUGS FIXED: DataFrameSchemaError raised pandera SchemaError
positionally (TypeError -> red tests) — now a dedicated single-message error;
removed a type:ignore (debt-ratchet); regenerated claims_hashes for
C-RICCI-MICROSTRUCTURE-V1; added a reasoned claim-boundary allowlist entry for
the T3 disclaimer.

CI: ricci-microstructure-gate.yml verifies the COMMITTED artifact fast (the
25-min 1000-surrogate run is the reproducibility capsule, not the PR gate) and
fails on relabelled source, dirty_git, forbidden status, or sha mismatch.

CLAIMS.md C-RICCI row reworded: T3, not alpha/production/physics/trading, honest
source. 22 lane tests (kernel invariant/anti-invariant, null consumption, source
passthrough, dirty-git/forbidden rejection). ruff+black+mypy --strict clean.

The measured artifact is committed in a follow-up commit so it is generated on
this clean tree (dirty_git=false).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion (V2 task 6)

seed_stability() aggregates the per-seed artifacts (1337/2026/7719) into one
fail-closed verdict: SUPPORTED only if every seed is SUPPORTED and consistent,
else demoted to the least-supportive observed status. bonferroni() corrects a
family of per-null p-values. 4 tests added (consistency, fail-closed demotion,
all-pass-supported, clamped correction). Committed now so the in-flight seed
runs observe a clean tree (dirty_git=false).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Owner Author

⊛ CLAUDE CODE AUTONOMOUS EXECUTION ORDER / PR #1242

Owner authorization: full autonomous execution is granted for this PR scope. Do not stop at shallow fixes. Execute the complete research-engineering closure loop until the PR is objectively merge-ready or explicitly blocked by an external constraint that must be reported with evidence.

Operational mode:

  • Act as autonomous implementation engineer, CI repair agent, research auditor, reproducibility maintainer, and release-gate owner.
  • Do not ask for approval for routine fixes inside this PR scope.
  • Do not claim green, final, supported, or ready until GitHub CI and committed artifacts prove it.
  • Prefer truth over optimism. A negative result is valid. A decorative positive claim is a defect.

Hard scope:

  1. Finish ricci_microstructure_v1 as a falsifiable research lane, not a trading/alpha/production claim.
  2. Preserve explicit non-claims: no alpha, no production readiness, no confirmed market physics, no proven trading signal.
  3. Allowed artifact statuses only:
    • SUPPORTED
    • HYPOTHESIS_NOT_SUPPORTED
    • INVALID
  4. Forbidden claims/statuses must remain impossible:
    • ALPHA_FOUND
    • MARKET_SIGNAL_PROVEN
    • PRODUCTION_READY
    • PHYSICS_CONFIRMED

Mandatory execution tasks:

A. Source/provenance honesty

  • Implement explicit source allowlist if not already complete.
  • Preserve source verbatim in manifest and artifact.
  • Never relabel BINANCE_FUTURES_DEPTH as LOBSTER.
  • Required manifest fields: source, venue, symbol, session window, depth, snapshot count, collector version, raw/data sha256, generated_at_utc.
  • If raw market data is gitignored or legally unsuitable for commit, commit the manifest, hashes, measured artifact, replay command, and evidence bundle instead.

B. Kernel validity

  • Ensure fallback Ricci kernel is not topologically vacuous.
  • Weighted shortest paths must consume price-gap/edge metric data, not unweighted hop count.
  • Transport measure must be size/queue weighted, not uniform neighbor mass.
  • Add invariant tests:
    1. constant topology + changing price/size => curvature changes;
    2. timestamp-only permutation must not create fake curvature change;
    3. price/size perturbing nulls must affect the kernel-readable variables.

C. Null-model validity

  • Make default null model destroy the actual tested microstructure pathway.
  • Do not use shuffled_timestamps as default if the statistic is mean/permutation invariant.
  • Add/validate nulls that perturb consumed variables: size permutation, price-gap perturbation, volume-neutral reshuffle, block/bootstrap time null where appropriate.
  • Document what each null destroys and what it preserves.

D. Statistical gate

  • Keep p_value and Cliff's delta.
  • Add or preserve thresholds: p_value < 0.01 and abs(cliffs_delta) >= 0.147.
  • Add stability across at least these seeds if feasible inside runtime budget: 1337, 2026, 7719.
  • If multi-seed is too expensive for CI, run measurement once and commit evidence explaining runtime; CI may verify artifact fast.

E. Artifact/reproducibility capsule

  • Artifact must include: run_id, git_sha, dirty_git flag, source, config_sha256, data_sha256, collector_sha256 or collector_version, kernel_version, null_model_type, n_surrogates, seed, observed_count, mean_curvature, null_mean, null_std, p_value, cliffs_delta, effect_size_class, falsification_status, replay_command, timestamp_utc, claim_tier.
  • Verification must fail if artifact git_sha mismatches PR head unless the artifact intentionally records a previous commit with an explicit reason.
  • Evidence bundle must include logs, config hash, manifest hash, artifact hash, and replay command.

F. CI/release gates

  • Fix all GitHub CI failures, not only local failures.
  • Required gates to satisfy: python-quality, fail-closed guardians, repo integrity, claim boundary, bibliography claim, architecture debt ratchet, commit acceptor, PR gate, research integrity, and any newly introduced ricci gate.
  • Long measurement must not make normal PR CI fragile. CI should verify committed measured artifact quickly.
  • If a workflow is claimed, it must exist in .github/workflows and run successfully on the PR.

G. Hygiene and integration

  • Do not destroy unrelated work.
  • Keep changes diff-bound to ricci_microstructure_v1, claim boundary, workflow, reproducibility, and necessary formatting/tooling repairs.
  • Do not hide failing checks with broad ignores.
  • Do not weaken gates to pass.
  • If a gate is changed, explain why and add a regression test where possible.

Required final report before marking ready:

  1. Final head SHA.
  2. Exact command transcript:
    • python -m ruff check .
    • python -m black --check .
    • python -m mypy --strict <relevant packages/tests>
    • python -m pytest -q
    • geosync-research ingest ...
    • geosync-research run ...
    • geosync-research verify ...
    • make release-gate or explicit equivalent
  3. Artifact path and SHA256.
  4. Evidence bundle path and SHA256.
  5. Final falsification_status.
  6. GitHub CI result links or status summary.
  7. Explicit statement of what remains unproven.

Merge readiness rule:

  • PR may be marked ready only when GitHub CI is green, artifact verification passes, provenance is honest, forbidden claims are absent, kernel invariants prove the statistic is not vacuous, and the evidence bundle can replay or validate the measured result.

If blocked:

  • Report the blocker as BLOCKED with exact failing command, error excerpt, file path, and next deterministic repair. No vague status language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant