bench(python): micro-benchmark PackageData construction paths (#88) by doubleailes · Pull Request #90 · doubleailes/rer

doubleailes · 2026-05-17T08:18:46Z

Summary

Adds scripts/bench_python_construction.py — a stdlib-timeit-based micro-benchmark that measures the three PackageData construction paths (__new__, from_strings from #89, from_rez) plus their share of end-to-end pyrer.solve() time.

Companion to #89. Stacked on feat/issue-88-from-strings — the bench references `from_strings` so this PR depends on #89 landing first.

Why

My PR description on #89 was honest that `from_strings` is functionally equivalent to the four-arg constructor — the perf win is real but the size of the win wasn't yet quantified. This script answers "what's the actual delta, and how does it scale."

Results on this machine

Intel Xeon E5-2699 v4 @ 2.20 GHz, CPython 3.13.9, pyrer 0.1.0-rc.8.

Per-call (median μs)

Path	μs/pkg	vs `from_rez`
`PackageData(name, ver, requires, variants)`	4.57	-34.3%
`PackageData.from_strings(...)`	4.93	-29.1%
`PackageData.from_rez(fake_pkg)`	6.96	(baseline)

__new__ ≈ from_strings — confirms the "same fast PyO3 extraction path" claim from #89.

End-to-end `pyrer.solve(['app'], pkgs)`

N=50 packages:

	ms	construction share
pre-built solve only	0.625	—
build (`from_strings`) + solve	0.892	29.9%
build (`from_rez`) + solve	0.996	37.2%
Savings `from_rez` → `from_strings`	0.104 ms (10.4%)

N=500 packages:

	ms	construction share
pre-built solve only	2.590	—
build (`from_strings`) + solve	5.091	49.1%
build (`from_rez`) + solve	5.698	54.5%
Savings `from_rez` → `from_strings`	0.607 ms (10.6%)

Honest framing

The savings scale linearly (~10% end-to-end across sizes, ~2 μs/pkg construction delta).
FakeRezPackage is a lower bound on from_rez cost. It uses plain `slots` and skips rez's AttributeForwardMeta, late-bound check, and Requirement parse. Real rez Packages will be slower; the from_strings line is realistic for the fast path.
Construction is 30-55% of end-to-end on this synthetic resolve. Not negligible.
The bigger remaining cost after `from_rez → from_strings` is the PyO3 wrapper overhead itself (~4.5 μs/pkg). A `solve_from_raw(requests, raw_tuples)` that skips `PackageData` entirely would save another ~5-10% — worth doing if `from_strings` itself isn't enough on real workloads.

What this is not

Not a CI test — perf is machine-dependent. Run on demand.
Not a measurement against real rez. The number to confirm in production is what `from_strings` saves on the actual shim against the actual rez repo — this script tells you the upper bound, not the studio answer.

Test plan

Script runs on default `--packages 50 --iters 500` settings (~6 seconds)
Script runs on `--packages 500 --iters 100` (~10 seconds)
Output is deterministic enough for PR-to-PR comparison (timing noise within ~5%)

Files

`scripts/bench_python_construction.py` — 305-line standalone script. No new project deps; only uses `pyrer` + stdlib.

🤖 Generated with Claude Code

Closes #88. ## Why `pyrer.PackageData.from_rez(pkg)` is hot in rez integrations — every package the shim materialises pays the full cost. The current path walks rez's `AttributeForwardMeta` per attribute, lets rez parse each requirement string into a `Requirement` object, and then immediately calls `str(req)` on every one to round-trip back to the raw string pyrer wants in the first place. Per the issue, on the 188-case rez benchmark via the downstream shim that's ~50 packages materialised per resolve × ~10–20 round-trips per package — a few percent of total end-to-end wall time, completely avoidable when the caller has access to `pkg.resource.data` (which already holds these as raw `list[str]` in the common non-late-bound case). ## What A new classmethod symmetric with `from_rez`: ```python pyrer.PackageData.from_strings( name: str, version: str, requires: Iterable[str] | None = None, variants: Iterable[Iterable[str]] | None = None, ) -> PackageData ``` Skips: - `AttributeForwardMeta` lookup (no `pkg.requires` walk). - The `Requirement` parse (no `Version` / `VersionRange` AST built). - The `str(Requirement)` round-trip per requirement. Accepts `None` for `requires` / `variants` to play well with `data.get("requires")` — no `or ()` boilerplate. ## Honest framing on the perf claim `from_strings` is **functionally equivalent** to the four-arg constructor `PackageData(name, version, requires, variants)`. Both take the same fast PyO3 extraction path (PyO3 extracts `Vec<String>` directly from any iterable of `PyUnicode` — no `.str()` round-trip). Any rez-shim caller using the constructor with raw strings from `pkg.resource.data` already gets this perf today. The classmethod form's value isn't a new fast path, it's: - A named, documented contract ("raw strings only — no wrapper objects, no late-bound source code"). Mirrors `from_rez`'s naming. - Discoverability — a place to land in autocomplete and docs when the caller is wondering "what's the fast path?". - One canonical site to update if we ever add real fast-path specialisations (e.g. interning the family name, pre-allocating the `Vec`s sized to the iterable's `__len__`). The docs (`docs/content/docs/getting-started/rez-integration.md`) get a worked example showing the recommended shim pattern: try `from_strings` against `pkg.resource.data`, fall back to `from_rez` for `@early` / `@late`-bound attributes where the raw data is a `SourceCode` instance instead of a `list[str]`. ## Tests 7 new tests in `tests/test_rich_api.py`: - `test_from_strings_basic` — happy path - `test_from_strings_defaults_to_empty` — `requires` / `variants` default - `test_from_strings_accepts_none_for_collections` — `dict.get` ergonomics - `test_from_strings_accepts_tuples_and_iterables` — non-list iterables - `test_from_strings_matches_constructor` — same `PackageData` as `__new__` - `test_from_strings_drives_solve_like_from_rez` — end-to-end: a solve fed via `from_strings` resolves identically to one fed via `from_rez` against an equivalent fake-rez Package - `test_from_strings_rejects_non_string_requires` — contract-strict: passing an object with `__str__` raises `TypeError` rather than silently stringifying. The contract is "raw strings only" — use `from_rez` (or pre-stringify) for object inputs. ## Verification - `cargo build`: clean. - `cargo test --lib -p rer-resolver`: **44/44**. - `pytest tests/`: **94/94** (was 87 + 7 new). - `cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` (strict 188-case rez differential): **188/188 in 16.52 s** — unchanged. No new Rust code on the solver hot path, no shape change to `PackageData` itself; this is a one-method addition to the PyO3 bridge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Companion to #89 (`PackageData.from_strings`). Establishes the baseline for the issue's perf claim — how much does `from_rez(pkg)` actually cost versus `from_strings(...)` / the four-arg constructor, and what share of `pyrer.solve()` wall time is construction. `scripts/bench_python_construction.py` is a stdlib-`timeit`-based script (no extra deps) that measures: 1. Per-call construction (μs/call, isolated). 2. Per-batch construction for N packages (ms/batch). 3. End-to-end `pyrer.solve()` time with packages built each way. 4. Construction-vs-solve share of total wall time. `FakeRezPackage` mimics rez's `Package` shape: `requires` / `variants` surface `FakeRequirement` objects (not `str`), so `from_rez` pays the per-element `__str__` round-trip the issue talks about. Not part of CI — perf is machine-dependent and noisy. Run on demand; the script reports its own input parameters in the output. ## Results on this machine Intel Xeon E5-2699 v4 @ 2.20 GHz, CPython 3.13.9, pyrer 0.1.0-rc.8. ### N=50 packages (1 build/solve iteration ≈ 1 ms) ``` Per-call construction (median μs, one package): PackageData(name, version, requires, variants): 4.57 μs PackageData.from_strings(...): 4.93 μs PackageData.from_rez(fake_rez_pkg): 6.96 μs from_strings vs from_rez: -2.02 μs / -29.1% __new__ vs from_rez: -2.39 μs / -34.3% (matches from_strings, same PyO3 path) End-to-end pyrer.solve(['app'], pkgs): solve() alone (pre-built): 0.625 ms build (from_strings) + solve: 0.892 ms build (from_rez) + solve: 0.996 ms Build-phase share of total (from_strings): 29.9% Build-phase share of total (from_rez): 37.2% End-to-end savings (from_rez → from_strings): 0.104 ms (10.4%) ``` ### N=500 packages ``` Per-batch construction (502 pkgs/batch): via PackageData(...): 2.260 ms / 4.50 μs/pkg via from_strings(...): 2.427 ms / 4.83 μs/pkg via from_rez(fake_pkg): 3.087 ms / 6.15 μs/pkg Savings switching from_rez → from_strings: 0.661 ms per batch End-to-end: solve() alone (pre-built): 2.590 ms build (from_strings) + solve: 5.091 ms build (from_rez) + solve: 5.698 ms Build-phase share of total (from_strings): 49.1% Build-phase share of total (from_rez): 54.5% End-to-end savings (from_rez → from_strings): 0.607 ms (10.6%) ``` ## What the numbers say - **`__new__` ≈ `from_strings`** (4.57 vs 4.93 μs/pkg). Confirms the claim in #89: they take the same fast PyO3 extraction path. The ~0.4 μs delta is classmethod dispatch overhead. - **`from_rez` is ~30-40% slower** per package than the raw-string paths. Real cost. Linear in package count. - **Construction is 30-55% of end-to-end** on these synthetic resolves. Not negligible. - **End-to-end savings switching to `from_strings`: consistent ~10%** across sizes, on this synthetic workload. - **Production rez Packages will be slower than FakeRezPackage** — `FakeRezPackage` uses plain `__slots__` and skips rez's `AttributeForwardMeta` chain, the late-bound check, and the `Requirement` parse. The numbers above are a lower bound on the `from_rez` cost; the `from_strings` numbers are realistic for the fast path. ## What this tells us about further optimisation The 10% end-to-end savings is real but modest. The bigger remaining costs after `from_rez → from_strings` are: 1. The solver itself (50-65% of total here — already 34× rez). 2. PyO3 wrapper overhead per `PackageData` instance (~4.5 μs/pkg even on the fast path). A `solve_from_raw(requests, raw_tuples)` that skips `PackageData` entirely would save another ~1-2 μs/pkg (~5-10% more end-to-end). Worth doing if `from_strings` itself isn't enough on real workloads. The right next move is to A/B `from_strings` against `from_rez` in the rez shim on real studio gear before pursuing further changes. This script tells us what the upper bound on the win could be; production will confirm where it actually lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

qodo-code-review · 2026-05-17T08:18:49Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

doubleailes and others added 2 commits May 17, 2026 10:05

Base automatically changed from feat/issue-88-from-strings to main May 17, 2026 08:55

doubleailes merged commit a288dfc into main May 17, 2026
23 checks passed

doubleailes deleted the feat/issue-88-bench-construction branch May 17, 2026 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(python): micro-benchmark PackageData construction paths (#88)#90

bench(python): micro-benchmark PackageData construction paths (#88)#90
doubleailes merged 2 commits into
mainfrom
feat/issue-88-bench-construction

doubleailes commented May 17, 2026

Uh oh!

qodo-code-review Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doubleailes commented May 17, 2026

Summary

Why

Results on this machine

Per-call (median μs)

End-to-end pyrer.solve(['app'], pkgs)

Honest framing

What this is not

Test plan

Files

Uh oh!

qodo-code-review Bot commented May 17, 2026

Qodo reviews are paused for this user.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

End-to-end `pyrer.solve(['app'], pkgs)`