feat(python): add PackageData.from_strings raw-string fast path (closes #88)#89
Merged
Conversation
Closes #88. ## Why `pyrer.PackageData.from_rez(pkg)` is hot in rez integrations — every package the shim materialises pays the full cost. The current path walks rez's `AttributeForwardMeta` per attribute, lets rez parse each requirement string into a `Requirement` object, and then immediately calls `str(req)` on every one to round-trip back to the raw string pyrer wants in the first place. Per the issue, on the 188-case rez benchmark via the downstream shim that's ~50 packages materialised per resolve × ~10–20 round-trips per package — a few percent of total end-to-end wall time, completely avoidable when the caller has access to `pkg.resource.data` (which already holds these as raw `list[str]` in the common non-late-bound case). ## What A new classmethod symmetric with `from_rez`: ```python pyrer.PackageData.from_strings( name: str, version: str, requires: Iterable[str] | None = None, variants: Iterable[Iterable[str]] | None = None, ) -> PackageData ``` Skips: - `AttributeForwardMeta` lookup (no `pkg.requires` walk). - The `Requirement` parse (no `Version` / `VersionRange` AST built). - The `str(Requirement)` round-trip per requirement. Accepts `None` for `requires` / `variants` to play well with `data.get("requires")` — no `or ()` boilerplate. ## Honest framing on the perf claim `from_strings` is **functionally equivalent** to the four-arg constructor `PackageData(name, version, requires, variants)`. Both take the same fast PyO3 extraction path (PyO3 extracts `Vec<String>` directly from any iterable of `PyUnicode` — no `.str()` round-trip). Any rez-shim caller using the constructor with raw strings from `pkg.resource.data` already gets this perf today. The classmethod form's value isn't a new fast path, it's: - A named, documented contract ("raw strings only — no wrapper objects, no late-bound source code"). Mirrors `from_rez`'s naming. - Discoverability — a place to land in autocomplete and docs when the caller is wondering "what's the fast path?". - One canonical site to update if we ever add real fast-path specialisations (e.g. interning the family name, pre-allocating the `Vec`s sized to the iterable's `__len__`). The docs (`docs/content/docs/getting-started/rez-integration.md`) get a worked example showing the recommended shim pattern: try `from_strings` against `pkg.resource.data`, fall back to `from_rez` for `@early` / `@late`-bound attributes where the raw data is a `SourceCode` instance instead of a `list[str]`. ## Tests 7 new tests in `tests/test_rich_api.py`: - `test_from_strings_basic` — happy path - `test_from_strings_defaults_to_empty` — `requires` / `variants` default - `test_from_strings_accepts_none_for_collections` — `dict.get` ergonomics - `test_from_strings_accepts_tuples_and_iterables` — non-list iterables - `test_from_strings_matches_constructor` — same `PackageData` as `__new__` - `test_from_strings_drives_solve_like_from_rez` — end-to-end: a solve fed via `from_strings` resolves identically to one fed via `from_rez` against an equivalent fake-rez Package - `test_from_strings_rejects_non_string_requires` — contract-strict: passing an object with `__str__` raises `TypeError` rather than silently stringifying. The contract is "raw strings only" — use `from_rez` (or pre-stringify) for object inputs. ## Verification - `cargo build`: clean. - `cargo test --lib -p rer-resolver`: **44/44**. - `pytest tests/`: **94/94** (was 87 + 7 new). - `cargo test --release -p rer-resolver --test test_rez_benchmark -- --ignored` (strict 188-case rez differential): **188/188 in 16.52 s** — unchanged. No new Rust code on the solver hot path, no shape change to `PackageData` itself; this is a one-method addition to the PyO3 bridge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
3 tasks
Picks up the `PackageData.from_strings` classmethod (issue #88) — a raw-string fast-path constructor symmetric with `from_rez`, intended for rez-integration callers that already have the strings pulled from `pkg.resource.data`. Functionally equivalent to the four-arg constructor; the classmethod exists to give the fast path a name and a stable contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doubleailes
added a commit
that referenced
this pull request
May 17, 2026
Companion to #89 (`PackageData.from_strings`). Establishes the baseline for the issue's perf claim — how much does `from_rez(pkg)` actually cost versus `from_strings(...)` / the four-arg constructor, and what share of `pyrer.solve()` wall time is construction. `scripts/bench_python_construction.py` is a stdlib-`timeit`-based script (no extra deps) that measures: 1. Per-call construction (μs/call, isolated). 2. Per-batch construction for N packages (ms/batch). 3. End-to-end `pyrer.solve()` time with packages built each way. 4. Construction-vs-solve share of total wall time. `FakeRezPackage` mimics rez's `Package` shape: `requires` / `variants` surface `FakeRequirement` objects (not `str`), so `from_rez` pays the per-element `__str__` round-trip the issue talks about. Not part of CI — perf is machine-dependent and noisy. Run on demand; the script reports its own input parameters in the output. ## Results on this machine Intel Xeon E5-2699 v4 @ 2.20 GHz, CPython 3.13.9, pyrer 0.1.0-rc.8. ### N=50 packages (1 build/solve iteration ≈ 1 ms) ``` Per-call construction (median μs, one package): PackageData(name, version, requires, variants): 4.57 μs PackageData.from_strings(...): 4.93 μs PackageData.from_rez(fake_rez_pkg): 6.96 μs from_strings vs from_rez: -2.02 μs / -29.1% __new__ vs from_rez: -2.39 μs / -34.3% (matches from_strings, same PyO3 path) End-to-end pyrer.solve(['app'], pkgs): solve() alone (pre-built): 0.625 ms build (from_strings) + solve: 0.892 ms build (from_rez) + solve: 0.996 ms Build-phase share of total (from_strings): 29.9% Build-phase share of total (from_rez): 37.2% End-to-end savings (from_rez → from_strings): 0.104 ms (10.4%) ``` ### N=500 packages ``` Per-batch construction (502 pkgs/batch): via PackageData(...): 2.260 ms / 4.50 μs/pkg via from_strings(...): 2.427 ms / 4.83 μs/pkg via from_rez(fake_pkg): 3.087 ms / 6.15 μs/pkg Savings switching from_rez → from_strings: 0.661 ms per batch End-to-end: solve() alone (pre-built): 2.590 ms build (from_strings) + solve: 5.091 ms build (from_rez) + solve: 5.698 ms Build-phase share of total (from_strings): 49.1% Build-phase share of total (from_rez): 54.5% End-to-end savings (from_rez → from_strings): 0.607 ms (10.6%) ``` ## What the numbers say - **`__new__` ≈ `from_strings`** (4.57 vs 4.93 μs/pkg). Confirms the claim in #89: they take the same fast PyO3 extraction path. The ~0.4 μs delta is classmethod dispatch overhead. - **`from_rez` is ~30-40% slower** per package than the raw-string paths. Real cost. Linear in package count. - **Construction is 30-55% of end-to-end** on these synthetic resolves. Not negligible. - **End-to-end savings switching to `from_strings`: consistent ~10%** across sizes, on this synthetic workload. - **Production rez Packages will be slower than FakeRezPackage** — `FakeRezPackage` uses plain `__slots__` and skips rez's `AttributeForwardMeta` chain, the late-bound check, and the `Requirement` parse. The numbers above are a lower bound on the `from_rez` cost; the `from_strings` numbers are realistic for the fast path. ## What this tells us about further optimisation The 10% end-to-end savings is real but modest. The bigger remaining costs after `from_rez → from_strings` are: 1. The solver itself (50-65% of total here — already 34× rez). 2. PyO3 wrapper overhead per `PackageData` instance (~4.5 μs/pkg even on the fast path). A `solve_from_raw(requests, raw_tuples)` that skips `PackageData` entirely would save another ~1-2 μs/pkg (~5-10% more end-to-end). Worth doing if `from_strings` itself isn't enough on real workloads. The right next move is to A/B `from_strings` against `from_rez` in the rez shim on real studio gear before pursuing further changes. This script tells us what the upper bound on the win could be; production will confirm where it actually lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
pyrer.PackageData.from_strings(name, version, requires=None, variants=None)— a classmethod constructor symmetric withfrom_rez(pkg), intended for callers that already have raw strings frompkg.resource.dataand want to bypass rez's wrapper-object resolution path. Closes #88.Honest framing on the perf claim
from_stringsis functionally equivalent to the four-arg constructor — both take the same fast PyO3 extraction path. Any rez-shim caller using `PackageData(name, version, raw_requires, raw_variants)` already gets this perf today.The classmethod's value isn't a new fast path, it's:
from_rez's naming.The docs (
rez-integration.md) get a worked example: tryfrom_stringsagainstpkg.resource.data, fall back tofrom_rezfor@early/@late-bound attributes where the raw data is aSourceCodeinstance.Tests
7 new tests in
tests/test_rich_api.py: basic happy path, defaults,Nonefor collections, non-list iterables (tuples), equivalence with the four-arg constructor, end-to-end solve equivalence with `from_rez` against fake-rez Packages, and the contract-strict rejection of non-string inputs.Test plan
cargo build: cleancargo test --lib -p rer-resolver: 44/44pytest tests/: 94/94 (was 87 + 7 new)Files
crates/rer-python/src/lib.rs— adds the classmethod, updates thefrom_rezdocstring to cross-reference ittests/test_rich_api.py— 7 new testsdocs/content/docs/getting-started/rez-integration.md— new "Faster construction with from_strings" subsection with the recommended shim patternCHANGELOG.md— Unreleased entry🤖 Generated with Claude Code