Skip to content

feat(python): add PackageData.from_strings raw-string fast path (closes #88)#89

Merged
doubleailes merged 2 commits into
mainfrom
feat/issue-88-from-strings
May 17, 2026
Merged

feat(python): add PackageData.from_strings raw-string fast path (closes #88)#89
doubleailes merged 2 commits into
mainfrom
feat/issue-88-from-strings

Conversation

@doubleailes
Copy link
Copy Markdown
Owner

Summary

Adds pyrer.PackageData.from_strings(name, version, requires=None, variants=None) — a classmethod constructor symmetric with from_rez(pkg), intended for callers that already have raw strings from pkg.resource.data and want to bypass rez's wrapper-object resolution path. Closes #88.

Honest framing on the perf claim

from_strings is functionally equivalent to the four-arg constructor — both take the same fast PyO3 extraction path. Any rez-shim caller using `PackageData(name, version, raw_requires, raw_variants)` already gets this perf today.

The classmethod's value isn't a new fast path, it's:

  • A named, documented contract ("raw strings only — no wrapper objects"). Mirrors from_rez's naming.
  • Discoverability — autocomplete + docs steer callers to the right method.
  • One canonical site to land any future fast-path specialisations.

The docs (rez-integration.md) get a worked example: try from_strings against pkg.resource.data, fall back to from_rez for @early / @late-bound attributes where the raw data is a SourceCode instance.

Tests

7 new tests in tests/test_rich_api.py: basic happy path, defaults, None for collections, non-list iterables (tuples), equivalence with the four-arg constructor, end-to-end solve equivalence with `from_rez` against fake-rez Packages, and the contract-strict rejection of non-string inputs.

Test plan

  • cargo build: clean
  • cargo test --lib -p rer-resolver: 44/44
  • pytest tests/: 94/94 (was 87 + 7 new)
  • Strict 188-case rez differential: 188/188 in 16.52 s, unchanged

Files

  • crates/rer-python/src/lib.rs — adds the classmethod, updates the from_rez docstring to cross-reference it
  • tests/test_rich_api.py — 7 new tests
  • docs/content/docs/getting-started/rez-integration.md — new "Faster construction with from_strings" subsection with the recommended shim pattern
  • CHANGELOG.md — Unreleased entry

🤖 Generated with Claude Code

Closes #88.

## Why

`pyrer.PackageData.from_rez(pkg)` is hot in rez integrations — every
package the shim materialises pays the full cost. The current path
walks rez's `AttributeForwardMeta` per attribute, lets rez parse each
requirement string into a `Requirement` object, and then immediately
calls `str(req)` on every one to round-trip back to the raw string
pyrer wants in the first place. Per the issue, on the 188-case rez
benchmark via the downstream shim that's ~50 packages materialised
per resolve × ~10–20 round-trips per package — a few percent of total
end-to-end wall time, completely avoidable when the caller has access
to `pkg.resource.data` (which already holds these as raw `list[str]`
in the common non-late-bound case).

## What

A new classmethod symmetric with `from_rez`:

```python
pyrer.PackageData.from_strings(
    name: str,
    version: str,
    requires: Iterable[str] | None = None,
    variants: Iterable[Iterable[str]] | None = None,
) -> PackageData
```

Skips:
- `AttributeForwardMeta` lookup (no `pkg.requires` walk).
- The `Requirement` parse (no `Version` / `VersionRange` AST built).
- The `str(Requirement)` round-trip per requirement.

Accepts `None` for `requires` / `variants` to play well with
`data.get("requires")` — no `or ()` boilerplate.

## Honest framing on the perf claim

`from_strings` is **functionally equivalent** to the four-arg
constructor `PackageData(name, version, requires, variants)`. Both
take the same fast PyO3 extraction path (PyO3 extracts `Vec<String>`
directly from any iterable of `PyUnicode` — no `.str()` round-trip).
Any rez-shim caller using the constructor with raw strings from
`pkg.resource.data` already gets this perf today.

The classmethod form's value isn't a new fast path, it's:
- A named, documented contract ("raw strings only — no wrapper
  objects, no late-bound source code"). Mirrors `from_rez`'s naming.
- Discoverability — a place to land in autocomplete and docs when
  the caller is wondering "what's the fast path?".
- One canonical site to update if we ever add real fast-path
  specialisations (e.g. interning the family name, pre-allocating
  the `Vec`s sized to the iterable's `__len__`).

The docs (`docs/content/docs/getting-started/rez-integration.md`)
get a worked example showing the recommended shim pattern: try
`from_strings` against `pkg.resource.data`, fall back to `from_rez`
for `@early` / `@late`-bound attributes where the raw data is a
`SourceCode` instance instead of a `list[str]`.

## Tests

7 new tests in `tests/test_rich_api.py`:

- `test_from_strings_basic` — happy path
- `test_from_strings_defaults_to_empty` — `requires` / `variants` default
- `test_from_strings_accepts_none_for_collections` — `dict.get` ergonomics
- `test_from_strings_accepts_tuples_and_iterables` — non-list iterables
- `test_from_strings_matches_constructor` — same `PackageData` as `__new__`
- `test_from_strings_drives_solve_like_from_rez` — end-to-end:
  a solve fed via `from_strings` resolves identically to one fed
  via `from_rez` against an equivalent fake-rez Package
- `test_from_strings_rejects_non_string_requires` — contract-strict:
  passing an object with `__str__` raises `TypeError` rather than
  silently stringifying. The contract is "raw strings only" — use
  `from_rez` (or pre-stringify) for object inputs.

## Verification

- `cargo build`: clean.
- `cargo test --lib -p rer-resolver`: **44/44**.
- `pytest tests/`: **94/94** (was 87 + 7 new).
- `cargo test --release -p rer-resolver --test test_rez_benchmark
  -- --ignored` (strict 188-case rez differential):
  **188/188 in 16.52 s** — unchanged.

No new Rust code on the solver hot path, no shape change to
`PackageData` itself; this is a one-method addition to the PyO3
bridge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Picks up the `PackageData.from_strings` classmethod (issue #88) — a
raw-string fast-path constructor symmetric with `from_rez`, intended
for rez-integration callers that already have the strings pulled
from `pkg.resource.data`. Functionally equivalent to the four-arg
constructor; the classmethod exists to give the fast path a name and
a stable contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@doubleailes doubleailes merged commit eb710b3 into main May 17, 2026
23 checks passed
@doubleailes doubleailes deleted the feat/issue-88-from-strings branch May 17, 2026 08:55
doubleailes added a commit that referenced this pull request May 17, 2026
Companion to #89 (`PackageData.from_strings`). Establishes the
baseline for the issue's perf claim — how much does `from_rez(pkg)`
actually cost versus `from_strings(...)` / the four-arg constructor,
and what share of `pyrer.solve()` wall time is construction.

`scripts/bench_python_construction.py` is a stdlib-`timeit`-based
script (no extra deps) that measures:

1. Per-call construction (μs/call, isolated).
2. Per-batch construction for N packages (ms/batch).
3. End-to-end `pyrer.solve()` time with packages built each way.
4. Construction-vs-solve share of total wall time.

`FakeRezPackage` mimics rez's `Package` shape: `requires` / `variants`
surface `FakeRequirement` objects (not `str`), so `from_rez` pays the
per-element `__str__` round-trip the issue talks about.

Not part of CI — perf is machine-dependent and noisy. Run on
demand; the script reports its own input parameters in the output.

## Results on this machine

Intel Xeon E5-2699 v4 @ 2.20 GHz, CPython 3.13.9, pyrer 0.1.0-rc.8.

### N=50 packages (1 build/solve iteration ≈ 1 ms)

```
Per-call construction (median μs, one package):
  PackageData(name, version, requires, variants):     4.57 μs
  PackageData.from_strings(...):                       4.93 μs
  PackageData.from_rez(fake_rez_pkg):                  6.96 μs

  from_strings vs from_rez:  -2.02 μs / -29.1%
  __new__       vs from_rez: -2.39 μs / -34.3% (matches from_strings, same PyO3 path)

End-to-end pyrer.solve(['app'], pkgs):
  solve() alone (pre-built):                  0.625 ms
  build (from_strings) + solve:               0.892 ms
  build (from_rez)     + solve:               0.996 ms

  Build-phase share of total (from_strings):   29.9%
  Build-phase share of total (from_rez):       37.2%

  End-to-end savings (from_rez → from_strings): 0.104 ms (10.4%)
```

### N=500 packages

```
Per-batch construction (502 pkgs/batch):
  via PackageData(...):     2.260 ms /  4.50 μs/pkg
  via from_strings(...):    2.427 ms /  4.83 μs/pkg
  via from_rez(fake_pkg):   3.087 ms /  6.15 μs/pkg

  Savings switching from_rez → from_strings: 0.661 ms per batch

End-to-end:
  solve() alone (pre-built):                  2.590 ms
  build (from_strings) + solve:               5.091 ms
  build (from_rez)     + solve:               5.698 ms

  Build-phase share of total (from_strings):   49.1%
  Build-phase share of total (from_rez):       54.5%

  End-to-end savings (from_rez → from_strings): 0.607 ms (10.6%)
```

## What the numbers say

- **`__new__` ≈ `from_strings`** (4.57 vs 4.93 μs/pkg). Confirms the
  claim in #89: they take the same fast PyO3 extraction path. The
  ~0.4 μs delta is classmethod dispatch overhead.
- **`from_rez` is ~30-40% slower** per package than the raw-string
  paths. Real cost. Linear in package count.
- **Construction is 30-55% of end-to-end** on these synthetic
  resolves. Not negligible.
- **End-to-end savings switching to `from_strings`: consistent ~10%**
  across sizes, on this synthetic workload.
- **Production rez Packages will be slower than FakeRezPackage** —
  `FakeRezPackage` uses plain `__slots__` and skips rez's
  `AttributeForwardMeta` chain, the late-bound check, and the
  `Requirement` parse. The numbers above are a lower bound on the
  `from_rez` cost; the `from_strings` numbers are realistic for the
  fast path.

## What this tells us about further optimisation

The 10% end-to-end savings is real but modest. The bigger remaining
costs after `from_rez → from_strings` are:

1. The solver itself (50-65% of total here — already 34× rez).
2. PyO3 wrapper overhead per `PackageData` instance (~4.5 μs/pkg
   even on the fast path). A `solve_from_raw(requests, raw_tuples)`
   that skips `PackageData` entirely would save another ~1-2 μs/pkg
   (~5-10% more end-to-end). Worth doing if `from_strings` itself
   isn't enough on real workloads.

The right next move is to A/B `from_strings` against `from_rez` in
the rez shim on real studio gear before pursuing further changes.
This script tells us what the upper bound on the win could be;
production will confirm where it actually lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster PackageData construction by skipping the rez wrapper layer

1 participant