Poseidon2 opt#153
Open
KyrinCode wants to merge 22 commits into
Open
Conversation
Gas-optimized Poseidon2 implementations in Solidity and Circom (BN254,
x^5 S-box), with a benchmark matrix vs Poseidon1 and other Poseidon2
libraries. Source: local vibe/poseidon2-opt as a single snapshot.
- src/solidity: Poseidon2T2 / T2FF / T3 / T4 / T4Sponge / T8 libraries
- src/circom: matching circuits for t in {2, 3, 4, 8}
- bench/solidity: Foundry gas benchmark (FullBenchmark.t.sol)
- bench/circom: snarkjs constraint + Groth16 proving benchmark
- test: correctness vectors + Solidity <-> Circom cross-check
Third-party libraries (forge-std, zemse/poseidon2-evm, V-k-h/
poseidon2-solidity) are gitignored; populate lib/ locally per the URLs
commented in test/poseidon2-opt/.gitignore.
Add scripts/setup-libs.sh, an idempotent helper that clones the three third-party dependencies into lib/ if they are missing. Pins each to a known-good ref so benchmark results stay reproducible: - forge-std v1.15.0 - zemse/poseidon2-evm v1.0.0 - V-k-h/poseidon2-solidity f48a837 (main @ import time) Entry-point shell scripts (test/cross_check.sh and bench/circom/scripts/bench_full.sh) now call setup-libs.sh at startup, so a fresh checkout of xlayer-toolkit can run them directly without a manual clone step. Note: forge-based commands (forge build / forge test) still require a one-time `bash scripts/setup-libs.sh` because Foundry has no pre-build hook.
- setup-libs.sh now also downloads bench/circom/pot12.ptau (~4.6 MB) on first run, keeping Groth16 benchmarks self-bootstrapping. - Makefile wraps the forge workflow with the setup prerequisite, so `make test` and `make bench` work on a fresh clone without any manual steps. Circom scripts already call setup-libs internally. Smoke-tested from a fresh make invocation: forge build compiles cleanly, and the 10-test Correctness suite passes.
Add a Quick Start section with the make targets (test/bench/cross-check/ bench-circom). Replace the raw forge-command usage section with a table mapping each make target to what it runs under the hood. Clarify that lib/ and pot12.ptau are gitignored and populated on first run by scripts/setup-libs.sh, so a fresh clone needs no manual install step beyond foundry + circom toolchain. Surface the pinned dependency versions in the Dependencies section.
…errors
bench_full.sh had a broken r1cs cache guard (\${DIR##*/} produced the
directory basename like "p2t2_h1", but snarkjs / circom write the
artifact at \${NAME}.r1cs where NAME is the circuit filename, e.g.
"bench_t2_hash1.r1cs"). The two strings never matched, so every run
unconditionally recompiled the circuit and — worse — regenerated the
trusted-setup zkey each time. Compute NAME once up-front and guard both
the compile and the setup/vkey-export steps behind it, making re-runs
essentially free for unchanged circuits.
Also:
- Let stderr pass through for circom, snarkjs, and node calls (was
"2>/dev/null"). They were running under "set -e", so a failure would
abort the whole script with no diagnostic — painful in a 10-minute
benchmark. Keep stdout redirected so the bench table stays clean.
- Anchor REPO_ROOT on BASH_SOURCE[0] instead of "$0" so the script can
be sourced or called via non-trivial paths.
- Add "setup" as an explicit prerequisite of the "cross-check" and
"bench-circom" Makefile targets, matching build/test/bench and making
the dependency visible in the Makefile rather than implicit inside
the shell scripts.
ZemseYulWrapper was being used as a Solidity-ABI convenience layer for the
raw-calldata Yul contract, but it introduced an extra STATICCALL hop that
inflated the zemse gas measurements by ~3,100 gas (one cold-address access
plus one call frame). That is wrapper overhead the benchmark was
attributing to zemse's implementation rather than to the harness.
Deploy Poseidon2Yul directly in FullBenchmark.setUp and staticcall it from
new _gasZemseYul{1,2,3} helpers — the exact call path a real user contract
would take for the raw 32-byte calldata ABI. This gives zemse one-staticcall
parity with the P1-circomlibjs measurements (which are also direct).
Result: P2-zemse hash1/2/3 drop from 22,182 / 22,184 / 22,280 to
19,019 / 19,037 / 19,056 gas. Rankings are unchanged; zemse is still
undeployable (32 KB, exceeds EIP-170) and T2/T4S/T8 remain row-best.
Correctness suite (10 tests) still passes. ZemseYulWrapper is kept for
Correctness.t.sol where ~3 K gas is irrelevant and the wrapper's tidy
Solidity ABI reads better.
The hermez.s3-eu-west-1.amazonaws.com URL that setup-libs.sh was using started returning 403 Forbidden in early 2026, breaking bench-circom / cross-check on any fresh checkout. Caught while testing fresh-clone by renaming lib/ and pot12.ptau to simulate a new-developer setup. Switch to the storage.googleapis.com/zkevm/ptau/ mirror (verified byte-identical to the historical hermez file by sha256) and pin the checksum so a future silent mirror-drift is detected immediately rather than surfacing as a corrupt-circuit compile failure minutes later inside snarkjs. Verified end-to-end: with both lib/ and pot12.ptau removed, `make test` auto-clones the three dependencies, downloads the ptau file, verifies the checksum, compiles, and passes the 10-test Correctness suite in ~2 minutes wall-clock on a cold machine.
The bench/PLAN.md and bench/BENCHMARK_PLAN.md files were early-phase artifacts. PLAN.md self-labels as "the initial benchmark before the comprehensive suite was built" and its numbers (zemse Yul 23K gas, V-k-h 45K) are now 20-50% off after the dirty-value optimizations and methodology fixes. BENCHMARK_PLAN.md's test matrix was 100% duplicated by the Lark reference doc and one cell (the zemse impl description) became stale after the Finding A staticcall refactor. Also drop the two orphan Solidity benchmark files that the refactor left behind: - bench/solidity/wrappers/InlineWrapper.sol — no longer imported by any test (FullBenchmark now measures each impl through its own wrapper pattern). - bench/solidity/vendored/LibPoseidon2Yul.sol — only referenced by the now-deleted InlineWrapper. Zemse-yul is measured via Poseidon2Yul (standalone contract) as of the Finding A fix. In return, add an "Adding a New Implementation" section to README.md documenting the workflow for wiring a new Poseidon-family competitor into the Solidity and Circom benchmark matrices. Verified: 10-test Correctness suite still passes after deletions.
Drop entries that were either strictly redundant or covering scenarios this project never hits: Root .gitignore: - /broadcast/*/31337/ and /broadcast/*/1/ were already covered by the preceding /broadcast line - node_modules/ is irrelevant (no package.json; snarkjs is installed globally via npm -g) - /broadcast itself is also removed: script/ is empty and we run no `forge script`, so broadcast logs are never written bench/circom/.gitignore: - Collapse 22 hand-maintained per-circuit build_<name>/ entries into a single build_*/ glob; the current scripts only write to build_full/, build_crosscheck/ and build_debug/, all caught by the glob. Result: root ignore shrinks from 20 to 11 lines, circom ignore shrinks from 26 to 2 lines. Confirmed `git status` still correctly blocks all three lib/ subdirs and all three active build_* directories.
bench/circom/circuits/bench_nm_t4_hash3.circom was never referenced by bench_full.sh or cross_check.sh — an early-phase alternative wrapper for the NethermindEth t=4 target. Its constraint count is already reported via the sibling bench_nm_t4_perm.circom (which IS in bench_full.sh line 115) and matches the 648 figure in the Lark doc. Audited the full tree (90+ tracked files) by cross-referencing every Solidity import and Circom include: this was the only orphan left.
The four un-prefixed poseidon2_{const,perm,hash,compress}.circom files
under bench/circom/vendored/ are NethermindEth's implementation
(stellar-private-payments), but they looked ambiguously like our own
src/circom/poseidon2_*.circom because every other vendored source in
the directory carries an origin prefix (bk_, circomlib_, worldcoin_).
Rename to nm_poseidon2_{const,perm,hash,compress}.circom and update
the three bench circuits that include them (bench_compress.circom,
bench_hash2.circom, bench_nm_t4_perm.circom). Intra-vendored includes
also updated.
Verified via direct circom compile: constraint counts unchanged —
nm_t4_perm still 648, hash2 still 515, compress still 420. All
nine Solidity <-> Circom correctness cross-checks pass.
Two complementary layers of property-based testing:
1. test/Fuzz.t.sol — Foundry-native fuzz, 256 runs/test by default. Six
tests, one per library (T2 / T2FF / T3 / T4 / T4Sponge / T8). Each
asserts two internal invariants on random uint256 input:
a. Output in field range: hash(...) < PRIME
b. Input modular invariance: hash(a) == hash(a % PRIME)
Invariant (b) is especially valuable for T8 — the only library
without explicit entry mod(input, P), relying on the first matmulM4
addmod for implicit reduction. The fuzz proves that path is sound
on 256 random uint256 inputs.
2. test/cross_check.sh — extended with optional CROSS_CHECK_FUZZ=N env
var. When set, after the existing 9 fixed-input comparisons it
generates N random uint256 values per library and runs the full
Solidity <-> Circom byte-equal check on each. Validates two
simultaneous properties: cross-language implementation parity and
alignment of input-reduction semantics across both runtimes.
Makefile gains a `cross-fuzz` target (defaults to N=4, ~4 minutes
wall-clock; override via `CROSS_CHECK_FUZZ=10 make cross-fuzz`).
`make test` now reports 16 tests passing (10 fixed correctness +
6 fuzz, 256 runs each = 1,536 random assertions, ~0.2 s). Smoke-
tested CROSS_CHECK_FUZZ=1 end-to-end: 15/15 pass including the
6 fuzz comparisons across all libraries.
…iterals Pure-uniform random sampling over uint256 has near-zero probability of hitting boundary values like PRIME, PRIME-1, type(uint256).max — exactly the inputs most likely to expose dirty-value tracking and entry-mod bugs. Foundry's native fuzzer auto-includes such values via contract-constant extraction; bash had no such mechanism. Add a deterministic boundary sweep that runs whenever fuzz mode is enabled (CROSS_CHECK_FUZZ env var set, including =0). Six boundary inputs × six libraries = 36 additional compares per run, exercising: - PRIME-1, PRIME, PRIME+1: boundary around the field modulus - 3*PRIME: multi-period reduction - uint256_max, uint256_max-1: uint256 overflow boundary Also: switch every Solidity input from hex literal `0x$H` to decimal literal `$D`. Solidity natively accepts arbitrary-size decimal literals, which makes the bash-side hex constants entirely redundant — and removes a real bug just caught: a hand-computed PRIME_X3_HEX was wrong, causing 6 silent boundary mismatches against the Circom side. Decimal-only on both sides means the same string flows through to both runtimes. Smoke-tested CROSS_CHECK_FUZZ=0: 57 passed, 0 failed (9 fixed + 36 boundary + 12 randomized due to bash seq quirk).
…iterations macOS BSD `seq 1 0` outputs "1 0" (reversed descending sequence) rather than the empty string GNU seq produces, so the random loop ran 2 extra iterations whenever CROSS_CHECK_FUZZ was set to 0. Switch to bash arithmetic `for ((i=1; i<=N; i++))`, which evaluates the condition correctly at 0 and negative N on every platform. This means `CROSS_CHECK_FUZZ=0` now runs exactly the boundary sweep (36 compares) without any random iterations — useful for CI where you want deterministic boundary coverage without nondeterministic random runs.
README updated in four places to reflect the testing layers added in recent commits (a2c8063, ea5aba9, 7e53f3e): - Quick Start: replace single-line `make test` description with the new 16-test count (10 correctness + 6 fuzz × 256 runs), and add `make cross-fuzz` row. - Project Structure: list `test/Fuzz.t.sol` alongside `Correctness.t.sol`. - Make targets table: add a wall-clock column so users can see the cost difference between fixed cross-check (~2 min) and cross-fuzz (~12 min at N=4); add the `cross-fuzz` row with its env-var override syntax. - Adding a New Implementation: document where to add fuzz coverage when a new in-house library variant lands under src/solidity/. External Lark and vault source docs (optimized-implementations.md) are updated separately to extend their methodology sections from a 2-step list (Correctness + cross_check) to a 4-step list including FuzzTest and the boundary-and-random fuzz mode in cross_check.sh.
…mponent
cross_check.sh was using the human-readable LABEL ("T2 hash1(0)",
"T2FF compress(1,2)", etc.) directly as a build-directory name. The
embedded spaces and parentheses survived shell quoting on macOS with
older Node, but newer Node versions (22.22 reported by a colleague,
likely also future Node releases on Linux) fail to resolve relative
requires inside generate_witness.js when its working directory contains
those characters — the failure surfaces minutes later as
`Cannot find module '<path>/witness.json'` because the witness step
silently aborted.
Add a `slugify` helper that maps any non `[A-Za-z0-9._-]` character to
`_`, and use the slug as the on-disk directory name. The display label
is unchanged — users still see "T2 hash1(0)" in pass/fail output —
but the path becomes "T2_hash1_0__" which is safe across every tool
in the witness/snarkjs chain.
Verified: 9/9 fixed tests still pass after `rm -rf build_crosscheck`
and a fresh run.
The script was wrapping circom, generate_witness.js, snarkjs wtns
export, and forge build with `>/dev/null 2>&1` (or `2>/dev/null`),
which silently swallowed any actionable error. The colleague's recent
"Cannot find module witness.json" failure was a downstream symptom of
generate_witness.js erroring earlier on a path with spaces — that
underlying error message was hidden.
Drop the `2>&1` half on the four pipeline calls so stderr passes
through to the user. stdout stays redirected (the bench/compare table
needs to remain clean). Two `2>/dev/null` instances are intentional
and kept:
- solidity_output() merges stderr into stdout so it can grep the
XCHECK: marker line — a forge test failure should still produce
a recognisable diagnostic.
- The `[ "$CROSS_CHECK_FUZZ" -ge 0 ]` input validation discards the
"integer expression expected" noise when the env var is unset or
contains non-numeric.
Verified: 9/9 fixed cross-check tests still pass cleanly.
…peline
A colleague's `make cross-check` failed deep inside the Node module
loader with `Cannot find module .../witness.json`. The real cause was
two missing host dependencies: their `snarkjs` was not on PATH (the
script abort point) and the hardcoded `CIRCOM=~/.cargo/bin/circom`
silently pointed at a nonexistent binary because they had circom
installed elsewhere. Both produced cryptic downstream errors instead
of a clear "install this" message.
Add a preflight block at the top of `cross_check.sh` and
`bench_full.sh` that:
1. Probes `~/.cargo/bin/circom` first (the cargo-install default), then
falls back to whatever circom is on PATH — no more hardcoded path.
2. Verifies `snarkjs`, `node`, and (for cross_check) `forge` are on PATH.
3. On miss, prints a one-line ERROR + the exact install command, then
exits 1 before any tool runs.
README's prerequisites section is upgraded from a paragraph to a table
with concrete install commands for each tool.
Verified: 9/9 cross-check still passes on a fully-equipped host;
simulating a missing snarkjs by removing node from PATH produces the
new error path:
ERROR: cross_check.sh prerequisite missing: snarkjs
Install via: npm install -g snarkjs
Three-agent /simplify review of the recent preflight + slugify commits
flagged consistent issues:
1. The preflight block (`preflight_fail` + circom autodetect + 3-4
`command -v` checks) was 25 lines copy-pasted between
`cross_check.sh` and `bench_full.sh`, with an asymmetric `forge`
check only in cross_check. A future edit to one would silently
diverge from the other.
2. `cross_check.sh` used `$0` for path resolution; `bench_full.sh`
used `${BASH_SOURCE[0]}`. The former breaks under `source` and on
some PATH invocations.
3. `preflight_fail` called `exit 1` directly, which works under the
current `cmd || preflight_fail` pattern only because of an implicit
contract that the function never returns. Subtle to refactor.
4. `slugify` used `echo "$1" | tr -c ...` which (a) injected a
trailing newline that became a spurious final `_` and (b) did not
collapse runs, leaving directory names like `T2_hash1_0__` with a
redundant trailing double-underscore.
5. The README Prerequisites table labelled the row as "Foundry" while
the preflight error message says "forge" — small but real
indirection cost when a user needs to look up the install command.
Fixes:
- New `scripts/lib.sh` with `preflight_fail` (returns non-zero,
letting `set -e` propagate explicitly), `require_command`,
`detect_circom`, and `slugify` (`printf '%s' | tr -c | tr -s '_'`,
collision-safer and trailing-clean).
- Both cross_check.sh and bench_full.sh now `. "$REPO_ROOT/scripts/lib.sh"`
and call the helpers; ~30 lines of duplication go away. Both use
`${BASH_SOURCE[0]}` for REPO_ROOT.
- README table renames the Foundry row to `forge (from Foundry)` to
match the user-visible error string.
Verified: 9/9 cross-check still passes with cleaner slug names
(`T2_hash1_0_` instead of `T2_hash1_0__`); preflight still aborts
loudly when snarkjs is missing from PATH, and the error message still
correctly names "cross_check.sh" as the failing entry point thanks to
`${0##*/}` preserving the caller's `$0` across the function boundary.
…-170) Item 7 — root .gitignore was self-incomplete: it listed cache/, out/, lib/, .env but not pot12.ptau or build_*/, leaving readers to discover those exclusions only by reading the nested bench/circom/.gitignore. Add the two paths to root with a comment pointing at the nested file. Item 8a — cross_check.sh wrote a temp test file test/_CrossCheckTmp.t.sol via solidity_output() and `rm -f`'d it at function exit, but a Ctrl+C, SIGTERM, or `set -e` abort mid-call would leave the file behind to confuse the next run. Hoist the path to TMP_SOL and add `trap 'rm -f "$TMP_SOL"' EXIT INT TERM` near script top. Item 8b — both cross_check.sh and bench_full.sh used `set -e` while the sibling setup-libs.sh used the stricter `set -euo pipefail`. Adopt `set -euo pipefail` in both scripts. Three pipelines that intentionally tolerate partial failure (forge test → grep, snarkjs ri → grep, /usr/bin/time → grep) now end in `|| true` so pipefail does not abort on missing-line cases — these surface as empty output upstream, preserving the existing soft-fail UX. Item 9 — README claimed every src/solidity/ library was deployable (`Deployable: Yes`) without runtime validation. Add test_deployable_wrappers_within_eip170 to Correctness.t.sol that deploys all six of our wrappers (T2 / T2FF / T3 / T4 / T4Sponge / T8) and asserts each `address(wrapper).code.length` is below the 24,576- byte EIP-170 deployment limit. Locks the README claim into CI; any future code growth that pushes a wrapper over the limit fails the test. Bonus — setup-libs.sh `pot12.ptau` download now retries 3 times with 2-second backoff (curl --retry, wget --tries) for resilience against transient network blips during fresh-clone setup. Verified: 17 forge tests pass (10 correctness + 1 EIP-170 + 6 fuzz); 9/9 cross_check; temp file cleaned up after run.
…butors
Walking through the cold-clone onboarding path uncovered three friction
points that all hit a new contributor on their very first command:
1. Wasted 4.8 MB pot12.ptau download on `make test`. The ptau file is
needed by `bench-circom` and (transitively) by `cross-check` /
`cross-fuzz`, but `make test` only runs forge against src/solidity/
and has no use for it. Add `SKIP_PTAU=1` env-var support to
`setup-libs.sh` and split the Makefile into:
- `setup-light` → lib/ only (used by build / test / bench)
- `setup` → lib/ + ptau (used by cross-check / cross-fuzz /
bench-circom, and surfaced as the
manual full-bootstrap target)
2. Three implicit host dependencies — `bc`, `/usr/bin/time`, `python3`
— were used inside the scripts but neither preflighted nor mentioned
in the README. A cold machine without these would fail mid-pipeline
with cryptic errors. Add preflight checks (via lib.sh's
require_command) where they're actually needed:
- bench_full.sh: `bc`, `/usr/bin/time`
- cross_check.sh fuzz block: `python3`
And surface all three in the README Prerequisites table with their
per-target scope.
3. README's Project Structure tree was missing scripts/lib.sh (added
in commit 35b5c9e but the doc tree wasn't updated) and the Quick
Start phrasing implied every make target unconditionally fetches
pot12.ptau, which is no longer accurate. Both fixed.
Verified:
- `rm -f bench/circom/pot12.ptau && make test` now reports
`pot12.ptau: skipped (SKIP_PTAU=1)` and the file does not appear.
- `make cross-check` after the same setup re-downloads the file
(with sha256 verification) before running the actual cross-check.
- `make help` lists all targets with their `lib/ only` vs full-setup
scope explicit.
…nload diagnostics
Three lower-priority new-contributor friction fixes from the onboarding
review:
(G) bench-circom was silent for 8-15 minutes — the user could not tell
whether the script was making progress or stuck. bench_full.sh now
emits ` [N] <LABEL> ...` to stderr at the start of every bench()
call. The benchmark table on stdout is untouched, so redirecting
stdout to a file still produces a clean machine-readable result.
(I) Foundry minimum version was implicit (foundry.toml pins solc to
0.8.30, which not every old foundryup install can fetch). Note
"2024-08+ recommended" inline in the README Prerequisites table
so users on stale binaries see it before hitting the failure.
(J) setup-libs.sh's download path collapsed three failure modes (curl
network error, truncated mirror response, sha256 mismatch) into a
single `set -e`-driven abort whose visible output came from
whichever tool happened to fail first. Now:
- `curl/wget … || download_failed` produces a hand-tailored
"network / mirror issue" message when the transfer aborts.
- A size sanity check rejects tiny / empty downloads (e.g. an
HTML error page that came back with HTTP 200) before the
sha256 step, with a "truncated download — retry" hint.
- Genuine sha256 mismatches keep their existing message, but
gain an explicit "expected vs actual" pair plus a hint that
the mirror itself may have changed file contents.
Verified:
- `make test` still 17/17 pass.
- bench_full.sh syntax OK; progress lines (`[ 1] P1-circomlib(t=2) ...`,
`[ 2] P2-T2 hash1 ...`, …) appear immediately on stderr in a 10-sec
sample run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
test/poseidon2-opt: gas-optimized Poseidon2 benchmark suite
What this directory provides
A self-contained Foundry + Circom sub-project under
test/poseidon2-opt/:src/solidity/): Poseidon2T2 / T2FF / T3 / T4 / T4Sponge / T8, BN254 + x⁵ S-box, shipped asinternal purelibraries (inlined into callers at compile time).src/circom/): matching Poseidon2 R1CS circuits for t ∈ {2, 3, 4, 8}.bench/): EVM gas + R1CS constraint counts, compared against Poseidon1 (chancehudson / circomlibjs / circomlib) and other Poseidon2 implementations (NethermindEth / Worldcoin / V-k-h / zemse / bkomuves / sserrano44).PRIME − 1boundary + 9 Solidity ↔ Circom output-equality cross-checks across all t-values.make test / bench / cross-check / bench-circom.lib/andpot12.ptauare fetched on first run with sha256 pinning, never tracked in git.Impact
On-chain gas (best deployable implementation per scenario):
ZK circuit constraints (Circom / R1CS):
To our knowledge, this is the only Poseidon2 implementation covering both Solidity and Circom across t ∈ {2, 3, 4, 8}.