feat(resolver,python): lazy package discovery via load_family callback (closes #86)#87
Merged
Merged
Conversation
Closes #86. ## Why `pyrer.solve()` previously required the complete `list[PackageData]` up front, so any integration had to materialise every package the solve *might* touch before the Rust algorithm started. For loader-driven integrations against rez — where each `package.py` is arbitrary Python that's only AST-evaluated on attribute access — this defeats rez's own lazy-load advantage: rez can bail on an early conflict after touching two families while the rest of the dep graph stays on disk. ## Resolver: `PackageRepo` becomes a struct `PackageRepo` was a type alias for `HashMap<String, HashMap<String, PackageData>>` — eager and immutable. It's now a struct holding: RefCell<HashMap<String, Option<Rc<FamilyMap>>>> # cache Option<Box<dyn Fn(&str) -> Vec<(String, PackageData)>>> # loader `get_family(name) -> Option<Rc<FamilyMap>>` routes every lookup through the cache; on miss it calls the loader (if any), memoising both the hit and the "no such family" answer so the loader fires at most once per family per repo. Constructors: PackageRepo::from_map(HashMap<…>) # eager, no loader (back-compat) HashMap<…>::into() → PackageRepo # same, via From PackageRepo::with_loader(loader) # lazy PackageRepo::insert_family(name, fam) # pre-seed a lazy repo Plus `family_count()` for the bench reporter (was `.len()` on the old alias). The two repo-access sites in the codebase change cleanly: - `PackageVariantList::new` (`variant.rs:372`) now does `ctx.repo.get_family(name)?` and stores `Rc<FamilyMap>` for the one family it represents (was `Rc<PackageRepo>` covering all of them). - The version-data lookup inside `get_intersection` becomes `self.versions[&version_str]` — single index, no longer double-indexed through the whole repo. The cache (`PackageVariantCache`) is unchanged: it still memoises the parsed `PackageVariantList` per family, on top of the new repo's own cache. Two cheap memos for two distinct things (parsed variants vs. raw family map); no double work. ## Python: `load_family` kwarg ```python result = pyrer.solve( requests, packages=None, # eager seed, optional load_family=my_loader, # NEW: Callable[[str], list[PackageData]] ) ``` - The eager `packages` arg now defaults to `None`. When `load_family` is given, anything in `packages` is pre-seeded into the cache so the loader is never called for those families. - The loader returns a `list[PackageData]` for the requested family. Empty list ⇒ "no such family" (cached, never re-asked). - Defensive: entries whose `name` ≠ the requested family are dropped (a misbehaving loader can't poison the repo for other families). Duplicate versions inside one loader response surface as `status='error'`. - If the Python callback raises, the loader stores the exception's message in a shared `RefCell`, returns empty (so the solver doesn't pile up errors), and the outer `solve()` surfaces it as `status='error'` before any other status. Never a Python exception out of pyrer. - GIL: kept held throughout the solve for v1. `Solver` is `!Send` because it holds `Rc`, so `Python::allow_threads` can't move it across the GIL release boundary. Practical effect: blocks other Python threads during the resolve. Worth revisiting after the internal `Rc → Arc` switch (separate change; not load-bearing for this issue's wins). ## Tests Rust (in `solver::tests`): - `test_loader_called_only_for_needed_families` — a loader-backed repo solves correctly and the loader is *not* called for an unrelated family the solver never touches. - `test_loader_called_once_per_family` — a diamond (`app → lib & util; util → lib`) loads `lib` exactly once. - `test_loader_empty_means_missing_family` — empty result for an unknown name produces a failed resolve, not a panic. Python (in `tests/test_rich_api.py`): - `test_load_family_lazy_only_touched_families` - `test_load_family_called_at_most_once_per_family` - `test_load_family_empty_means_no_such_family` - `test_load_family_works_with_eager_seed` - `test_load_family_callback_exception_surfaces_as_error` - `test_load_family_filters_mismatched_name` - `test_load_family_duplicate_versions_reports_error` ## Verification - `cargo test --lib -p rer-resolver`: **44/44** (was 41 + 3 new). - `pytest tests/`: **87/87** (was 80 + 7 new). - `cargo test --release … --ignored` (strict 188-case differential against recorded rez resolves): **188/188** in 17.37 s — unchanged. - Eager-path benchmark on the same machine as the README reference (Intel Xeon E5-2699 v4): total 11.29 s, mean 60.07 ms, 33.82× rez. README reference is 11.35 s / 60 ms / 34.1× — within noise. The `RefCell<HashMap>` hop pays nothing because each family's `Rc<FamilyMap>` is borrowed once during `PackageVariantList::new` and the cache holds a direct `Rc` thereafter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Picks up the lazy package-discovery feature (issue #86): `pyrer.solve` gains an optional `load_family` callback, and `PackageRepo` becomes a struct with cache + optional loader instead of a `HashMap` type alias. Eager-path perf and the strict 188-case rez differential are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the `load_family` story to the user-facing docs alongside the #86 implementation: - `docs/getting-started/rez-integration.md` — new "Lazy package discovery on cold caches" section covering API, semantics, when the win is real vs flat, a worked Windows+CIFS example, the lazy variant of the monkey-patch shim, and what lazy loading does *not* fix (cross-invocation cost, GIL contention, solve-phase CPU). The pre-existing eager note picks up a forward pointer. - `docs/getting-started/quick-start.md` — short callout with the basic shape and a link to the integration page. - `docs/help/faq.md` — "Where does rer get package data from?" updated to mention both eager and lazy supply. - `CHANGELOG.md` — Unreleased section gets `load_family`, `resolved_ephemerals`, the borrowing-iterator forms, and the `PackageRepo` struct conversion. The Windows+CIFS framing matches the actual canonical motivating case: Samba-served package stores, no Windows-side page cache for SMB, every `rez env` invocation pays full network roundtrips for every reachable family. Lazy loading is the right primitive to close that gap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The bench file built its repo as `HashMap<String, HashMap<String, PackageData>>` and wrapped it in `Rc::new(...)`, which matched the old `PackageRepo` type alias. After the issue #86 conversion to a struct, two construction sites (`slice_for` and `bench_solve`) needed to wrap via `PackageRepo::from_map`. Same shape, no behaviour change — just keeps `cargo bench` compiling. Last run on this machine, criterion comparing against the prior local baseline: every bench neutral or faster, none slower. `reduce_by(fast-path)` -5.97% (p=0.00), `Solver/triple-with-pin` -3.56% (p=0.01); the rest in the noise band on the improvement side. Consistent with the `RefCell<HashMap>` indirection in `PackageRepo` being cold-pathed (one borrow per family at variant-list construction, never in the hot loop). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements issue #86 — lazy, on-demand family loading via a Python
callback. Pyrer integrations against rez can now skip the up-front BFS
that materialises every reachable family before the Rust algorithm
runs; the loader is consulted only for families the solver actually
touches, at most once each.
What changed
PackageRepois now a struct (was aHashMaptype alias) with acache plus an optional loader callback. Routes every lookup through
get_family, memoising both hits and "no such family" answers.Constructors:
from_map(eager, back-compat),with_loader(lazy),insert_family(pre-seed).FamilyLoader,FamilyMapre-exported.PackageVariantListnow holds the single family'sRc<FamilyMap>rather than the whole
Rc<PackageRepo>— the version-data lookupis a single index instead of double-indexed through the repo.
solve(...)gains an optionalload_familykwarg(
Callable[[str], list[PackageData]]);packagesbecomesoptional. The loader is invoked under the GIL; if it raises, the
error surfaces as
status='error'— never a Python exception outof pyrer.
name≠ the requested family are dropped;duplicate versions in one loader response produce
status='error'.Tests
cargo test --lib -p rer-resolverpytest tests/New Rust tests cover: loader called only for needed families, called
at most once per family, empty-list-as-missing. New Python tests cover
the same plus eager-seed integration, callback-exception surfacing,
mismatched-name filtering, and duplicate-version detection.
Test plan
Perf
Eager path: indistinguishable from README reference. The
RefCell<HashMap>indirection inPackageRepocosts essentiallynothing because each family's
Rc<FamilyMap>is borrowed once duringPackageVariantList::newand the cache holds a directRcthereafter.
Lazy path: the real win is downstream on cold caches / network
package stores — the loader is invoked only for families the solver
actually exercises. On the bundled in-memory benchmark there's no
loader to skip so this PR doesn't show that win directly; the rez
shim will.
Known limitation
The GIL is held throughout the resolve in v1. `Solver` is `!Send`
(holds `Rc`), so `Python::allow_threads` can't move it across the
GIL-release boundary. Practical effect: blocks other Python threads
during a resolve, which is the same behaviour pyrer had pre-PR.
Worth revisiting alongside an `Rc → Arc` switch, separately.
🤖 Generated with Claude Code