Conversation
…u/two-queue-filtered-search
…u/two-queue-filtered-search
There was a problem hiding this comment.
Pull request overview
Adds a new “two-queue” filtered graph search path intended to improve recall for low-selectivity filtered queries by decoupling exploration from filter acceptance, and extends the Garnet FFI to support per-candidate callback-based filtering (in addition to bitmap filtering). Also wires the new search mode into the benchmark tooling/config.
Changes:
- Introduces
TwoQueueSearch(DiskANN) + scratch support (two heaps: exploration candidates + filtered results) and exports it fromdiskann::graph::search. - Adds Garnet callback filtering via
GarnetFilterProviderand a unifiedGarnetFilterenum to select bitmap vs callback filtering. - Adds a benchmark phase + benchmark-core search wrapper for running two-queue filtered search experiments.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/neighbor/queue.rs | Adds “unbounded” variants of not-visited queue traversal helpers. |
| diskann/src/graph/search/two_queue_search.rs | New two-queue filtered search implementation + termination reporting. |
| diskann/src/graph/search/scratch.rs | Extends SearchScratch with candidates + filtered_results heaps and adds new_two_queue. |
| diskann/src/graph/search/mod.rs | Registers/exports the new two_queue_search module and public types. |
| diskann/src/graph/search/diverse_search.rs | Updates manual SearchScratch { ... } initializer to include new fields. |
| diskann/src/graph/config/defaults.rs | Adds RESULT_SIZE_FACTOR default for two-queue result heap sizing. |
| diskann-garnet/src/test_utils.rs | Updates test callbacks to include a no-op filter callback. |
| diskann-garnet/src/lib.rs | Extends FFI (create_index, search_*) to support callback-based filtering and max effort. |
| diskann-garnet/src/labels.rs | Adds GarnetFilterProvider and GarnetFilter enum. |
| diskann-garnet/src/garnet.rs | Extends callback bundle (Callbacks) with FilterCandidateCallback. |
| diskann-garnet/src/ffi_tests.rs | Updates FFI tests for new create_index signature. |
| diskann-garnet/src/ffi_recall_tests.rs | Updates recall tests for new create_index signature. |
| diskann-garnet/src/dyn_index.rs | Routes callback filtering to TwoQueueSearch; keeps bitmap filtering via beta-filter path. |
| diskann-benchmark/src/inputs/async_.rs | Adds TopkTwoQueueFilter phase config schema. |
| diskann-benchmark/src/backend/index/spherical.rs | Adds execution path for two-queue filtered benchmark phase (spherical backend). |
| diskann-benchmark/src/backend/index/search/knn.rs | Adds Knn runner integration for the benchmark-core TwoQueue searcher. |
| diskann-benchmark/src/backend/index/benchmarks.rs | Adds generic backend execution path for TopkTwoQueueFilter. |
| diskann-benchmark/example/async-two-queue-filter-ground-truth-small.json | Adds an example benchmark input for two-queue filtered search. |
| diskann-benchmark-core/src/search/graph/two_queue.rs | Adds benchmark-core TwoQueue search wrapper built on diskann::TwoQueueSearch. |
| diskann-benchmark-core/src/search/graph/mod.rs | Exports the new benchmark-core TwoQueue searcher. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| write_callback: WriteCallback, | ||
| delete_callback: DeleteCallback, | ||
| rmw_callback: ReadModifyWriteCallback, | ||
| filter_callback: FilterCandidateCallback, | ||
| ) -> *const c_void { |
There was a problem hiding this comment.
create_index gained a new filter_callback parameter, which changes the exported C ABI for an existing symbol. Any external callers not updated will pass the wrong arguments and can crash/UB. Consider providing a versioned entry point (e.g., create_index_v2) and keeping the old signature delegating to a default/no-op filter, or otherwise ensuring backward compatibility for existing FFI consumers.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #929 +/- ##
==========================================
+ Coverage 89.44% 90.25% +0.81%
==========================================
Files 449 451 +2
Lines 83779 84208 +429
==========================================
+ Hits 74932 76004 +1072
+ Misses 8847 8204 -643
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
) K-means in `diskann-providers` was the last consumer of the old BLAS-based clustering path; PQ training has since migrated to `diskann-quantization`. The only active call site remaining was disk-index partitioning in `diskann-disk`. We will keep diskann-providers's implementation for now and move it to `diskann-disk`, rather than switching to the one in diskann-quantization, for the following reasons: - K-means in diskann-providers performs better at higher dimensions (>100): <img width="618" height="507" alt="image" src="https://github.com/user-attachments/assets/1e483411-18ae-4cc7-aa59-d9df05f4e0cf" /> - K-means in diskann-providers supports multi-threading: <img width="612" height="503" alt="image" src="https://github.com/user-attachments/assets/f5219632-6223-45fe-b0e0-18d40f0e2a1d" /> We will work on closing these performance gaps and converging the two implementations in separate PRs. # Changes in this PR ## diskann-disk - Added `src/utils/kmeans.rs` — k-means implementation moved from `diskann-providers` - Added `src/utils/math_util.rs` — mathematical utilities (`compute_vecs_l2sq`, `compute_closest_centers`, `compute_closest_centers_in_block`, and helpers) extracted from `diskann-providers` and deduplicated - Exported `k_means_clustering`, `k_meanspp_selecting_pivots`, `run_lloyds`, `compute_vecs_l2sq`, `compute_closest_centers`, `compute_closest_centers_in_block` from `utils/mod.rs` - Updated `utils/partition.rs` to import kmeans functions and math utilities from local modules instead of `diskann-providers` - Moved kmeans criterion and iai-callgrind benchmarks from `diskann-providers/benches` to `diskann-disk/benches` - Added `proptest` and `approx` to dev-dependencies ## diskann-providers - Deleted `src/utils/kmeans.rs` - Removed `k_means_clustering`, `k_meanspp_selecting_pivots`, `run_lloyds`, `compute_vecs_l2sq`, `compute_vec_l2sq` from the public API - Removed the now-deduplicated math utility implementations from `math_util.rs` - Removed dead OPQ code: `generate_optimized_pq_pivots`, `opq_quantize_all_chunks`, `copy_chunk_centroids_to_full_table`, their test, and unused imports/constants — these were the sole remaining callers of k-means in this crate and were already gated behind `#[allow(dead_code)]` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com> Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>
I missed inlining the distance computation path for minmax... again.
Our current `ci.yml` manually specifies a Rust version instead of using `rust-toolchain.toml`. While this separation is maybe maintainable when we have just a single CI yaml, it will quickly spiral out of control is we have more CI jobs. The [rust-toolchain](dtolnay/rust-toolchain#133) action seems uninterested in supporting a workflow where `rust-toolchain.toml` is the source of truth. But reading the issue led me to discover that `rustup` (which **does** respect `rust-toolchain.toml`) is already installed on GitHub Actions runners. This changes our CI to use the native `rustup`, using `rustup show` to trigger the fetching of the toolchain before any caching occurs.
Branch protection rules prevent merging PRs until certain gates have passed. Unfortunately, the blocking gates need to be specified explicitly. When there are a large number of gates like what we have in our repo, this can be a little tedious (and we need to remember to update the ruleset when this changes). This PR is based off [this article](https://devopsdirective.com/posts/2025/08/github-actions-required-checks-for-conditional-jobs/), which takes advantage of GitHub marking skipped pipelines as [successes](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks). Quoting from the docs: ``` The check run was skipped. This is treated as a success for dependent checks in GitHub Actions. ``` To that end, this new job only runs if any of its dependent jobs failed and is skipped if all dependent jobs succeed. Our branch protection rule can then just be this singular gate.
`math_util.rs` in `diskann-providers` was a grab-bag module containing unused utilities and a vector generation helper. This PR removes the file by deleting unused functions and replacing the remaining usage with the equivalent from `diskann-utils`. ## Changes - **Deleted unused functions** from `diskann-providers/src/utils/math_util.rs`: - `process_residuals` — no callers - `convert_usize_to_u64` — no callers - **Replaced `generate_vectors_with_norm`** in `diskann-tools/src/utils/random_data_generator.rs` with `f32::with_approximate_norm()` from `diskann-utils`. - **Replaced `generate_vectors_with_norm`** in the `diskann_async.rs` sphere tests with inline Gaussian sphere sampling (using `rand_distr::StandardNormal`) and a local `CastSphericalF32` helper trait. This preserves the original Gaussian distribution required for Cosine-metric ANN tests to pass — `WithApproximateNorm` uses uniform sampling, which caused test failures. - **Deleted `diskann-providers/src/utils/math_util.rs`** entirely. - **Updated `diskann-providers/src/utils/mod.rs`**: removed the `pub mod math_util` declaration and its re-exports. - **Merged `origin/main`**: resolved a rename-collision conflict where `main` moved `diskann-providers/src/utils/math_util.rs` → `diskann-disk/src/utils/math_util.rs` while this branch had already deleted it from `diskann-providers`. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com> Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>
Bumps [rand](https://github.com/rust-random/rand) from 0.9.2 to 0.9.3. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-random/rand/blob/0.9.3/CHANGELOG.md">rand's changelog</a>.</em></p> <blockquote> <h2>[0.9.3] — 2026-02-11</h2> <p>This release back-ports a fix from v0.10. See also <a href="https://redirect.github.com/rust-random/rand/issues/1763">#1763</a>.</p> <h3>Changes</h3> <ul> <li>Deprecate feature <code>log</code> (<a href="https://redirect.github.com/rust-random/rand/issues/1764">#1764</a>)</li> <li>Replace usages of <code>doc_auto_cfg</code> (<a href="https://redirect.github.com/rust-random/rand/issues/1764">#1764</a>)</li> </ul> <p><a href="https://redirect.github.com/rust-random/rand/issues/1763">#1763</a>: <a href="https://redirect.github.com/rust-random/rand/pull/1763">rust-random/rand#1763</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/rust-random/rand/commit/1aeee9f4c506f9f737c6c37c169ccdc365bfbabf"><code>1aeee9f</code></a> Prepare v0.9.3: deprecate feature <code>log</code> (<a href="https://redirect.github.com/rust-random/rand/issues/1764">#1764</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/98473ee6f9b44eb85154b59b67adade7f2a9b8a1"><code>98473ee</code></a> Prepare rand 0.9.2 (<a href="https://redirect.github.com/rust-random/rand/issues/1648">#1648</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/031a1f5589e487ce95972cb3acc0833ef64cfc10"><code>031a1f5</code></a> <code>examples/print-next.rs</code> (<a href="https://redirect.github.com/rust-random/rand/issues/1647">#1647</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/6cb75ee59eda73967b6a3cae4fdcf2c21f6e0e4e"><code>6cb75ee</code></a> Make UniformUsize serializable (<a href="https://redirect.github.com/rust-random/rand/issues/1646">#1646</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/0c955c5b7a079bc2fe67fe946a8deb46c4bc58d8"><code>0c955c5</code></a> Add some tests for BlockRng, BlockRng64 and Xoshiro RNGs (<a href="https://redirect.github.com/rust-random/rand/issues/1639">#1639</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/204084a35fc7289e9a38575fdd80869818484517"><code>204084a</code></a> Fix: Remove accidental editor swap file (<a href="https://redirect.github.com/rust-random/rand/issues/1636">#1636</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/86262ac190ec20a79293607fb2347dc74c99122e"><code>86262ac</code></a> Deprecate rand::rngs::mock module and StepRng (<a href="https://redirect.github.com/rust-random/rand/issues/1634">#1634</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/a6e217f4a3ce78223a59cc1ff9afb2b5e589d785"><code>a6e217f</code></a> Update statrs link (<a href="https://redirect.github.com/rust-random/rand/issues/1630">#1630</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/db993ec12676119251eaf9f2cba8389a1b07abef"><code>db993ec</code></a> Prepare rand v0.9.1 (<a href="https://redirect.github.com/rust-random/rand/issues/1629">#1629</a>)</li> <li><a href="https://github.com/rust-random/rand/commit/3057641020408f64a4618b1c582cad45a9304811"><code>3057641</code></a> Remove zerocopy from rand (<a href="https://redirect.github.com/rust-random/rand/issues/1579">#1579</a>)</li> <li>Additional commits viewable in <a href="https://github.com/rust-random/rand/compare/rand_core-0.9.2...0.9.3">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/DiskANN/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR removes support for OPQ and all related code in the repo. The existing implementation is a legacy one, and the current complexity doesn't justify the benefit from this feature (if there is any at all). From what we can tell, no consumer is using this path. This should resolve Issue #922 # Changes Deleted functions: - `pq_construction.rs`: `generate_optimized_pq_pivots`, `opq_quantize_all_chunks`, `copy_chunk_centroids_to_full_table` from - `opq_rotation_matrix` field from `FixedChunkPQTable` and associated calls to it. - `write_rotation_matrix_data`, `read_opq_rotation_matrix`, `get_rotation_matrix_path` from `PQStorage` - `DistanceComputerConstructionError` was removed. `ANNError::OPQError` was kept just to not mess with variant numbering of errors. Some noteworthy API changes (please review): - `DistanceComputer::new` and `MultiDistanceComputer::new`: now return `Self` directly instead of `Result<Self, DistanceComputerConstructionError>` (construction is infallible). As a result, the quant providers now return `DistanceComputer` direclty instead of a result. This affects `bf_tree`, `multi_pq` and the in-mem PQ provider. - Apart from that used Copilot to fix docs that mention OPQ and also remove some OPQ specific tests in pq_construction and fixed_chunk_pq_table. ## `diskann_linalg` dependency Deleted the benchmark for dead code paths - chunking_closest_centers_benchmarks.rs in diskann-providers/bench. As a result, was able to remove `diskann_linalg` as a dependency for `diskann-providers`. This has reduced compile time for this crate from 200s to 51s.
This is in service of the #927 work item. After talking with @hildebrandmw, the test_flaky_consolidate was identified as an "easy" target to migrate. This change creates a consolidate.rs test file and implements the new flaky test. - Create FlakyPruneStrategy using test_provider::Accessor::flaky() to produce transient errors for specific IDs during consolidation - Remove test_flaky_consolidate and SuperFlaky from diskann-providers - Clean up unused imports (ConsolidateKind, workingset::self) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Summary Addresses the `graph_data_types.rs` cleanup item from #899. - **Decouple diskann-providers internal code**: Replace `GraphDataType` bounds with direct type params (`T: VectorRepr`, `A`) in `VectorDataIterator`, `MemoryVectorProviderAsync`, `FastMemoryVectorProviderAsync`. Eliminates the `AdHoc<T>` wrapper from all internal usage and inmem provider type aliases. - **Move trait to diskann-disk**: `GraphDataType` + `AdHoc` now live in `diskann-disk::data_model`. Delete trait definition, test types, and empty `traits/` module from diskann-providers. - **Replace boilerplate**: Concrete `GraphDataType` impls in diskann-tools replaced with `type Alias = AdHoc<T>`. Benchmark's `GraphData<T>` replaced with `AdHoc<T>` directly. - **Decouple relative_contrast.rs**: Use `T: VectorRepr` instead of `Data: GraphDataType` since only `VectorDataType` is accessed. After this PR, `GraphDataType` exists only in `diskann-disk` (where it belongs as a disk-index concept) and `diskann-tools` (which calls disk-index APIs).
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Haiyang. There was some mention of fusing predicate checks with element retrieval. Is that something that's still planned?
| result.append(AggregatedSearchResults::Topk(search_results)); | ||
| Ok(result) | ||
| } | ||
| SearchPhase::TopkTwoQueueFilter(search_phase) => { |
There was a problem hiding this comment.
I'm a little worried about what adding this universally will do for compile times. Ideally, we'd have a more focused way to add extensions like this that don't encroach on our hard-won compile-time reduction efforts. It would require some backend shuffling, but I think there is a world where search phases behave more like plugins rather than enums, so we can target specific monomorphizations instead of forcing this on all instances.
What've you observed in terms of compile time differences here?
|
|
||
| /// Default result queue capacity factor for two-queue filtered search. | ||
| /// The result queue capacity is k * this factor. | ||
| pub const RESULT_SIZE_FACTOR: usize = 10; |
There was a problem hiding this comment.
Shouldn't this live in two_queue? That's what it's meant to control, right? Things that live here should be focused on graph building confiugration. To that end, I'm not sure what FILTER_BETA is doing here either, but at least for this PR, can this constant be moved nearer to what it's meant to influence?
There was a problem hiding this comment.
:) I think we wanted the defaults to be in a single place. The **FILTER_BETA ** was moved here pervious comments request last time.
I would prefer we have a central place for defaults, if you think here should be all graph build related defaults, then we might just create another defaults for all search related.
| /// Filtered results for two-queue search. | ||
| /// Max-heap of filter-passing neighbors (worst/largest distance on top for pruning). | ||
| /// Only used during two-queue filtered search. | ||
| pub filtered_results: BinaryHeap<Neighbor<I>>, |
There was a problem hiding this comment.
This adds two BinaryHeaps for all users of SearchScratch, just for use in two-queue algorithm. Can we instead create these two queues manually in the place where they are needed instead of putting it in a central location? Really, SearchScratch needs to get pared down.
| /// Like [`closest_notvisited`](Self::closest_notvisited), but ignores the `search_param_l` | ||
| /// bound and considers all entries in the queue. Use this for resizable/unbounded queues | ||
| /// (e.g. two-queue filtered search) where exploration should not be capped at L. | ||
| pub fn closest_notvisited_unbounded(&mut self) -> Option<Neighbor<I>> { |
There was a problem hiding this comment.
Can these be removed? It doesn't look like they are used?
| /// Filter evaluator for determining node matches. | ||
| pub filter: &'q dyn QueryLabelProvider<InternalId>, | ||
| /// Maximum number of hops before stopping search. | ||
| pub max_candidates: usize, |
There was a problem hiding this comment.
Are there invariants that exist among these fields? If so - they shouldn't be public and this struct should have a dedicated constructor.
| #[derive(Debug)] | ||
| pub struct TwoQueueSearch<'q, InternalId> { | ||
| /// Base graph search parameters (k, ef/l_value, beam_width). | ||
| pub inner: Knn, |
There was a problem hiding this comment.
It doesn't look like Knn::search_l() is used at all in the two-queue method (outside of a size hint). If that's the case, then this shouldn't wrap Knn and instead capture just the parameters that actually affect the algorithm.
| // Check filter on start point | ||
| if let QueryVisitDecision::Accept(n) = filter.on_visit(neighbor) { | ||
| scratch.filtered_results.push(n); | ||
| } |
There was a problem hiding this comment.
What about QueryVisitDecision::Terminate?
Summary
filter evaluation, improving recall for low-selectivity filtered queries
per-candidate FFI filter evaluation from Garnet/C#
Motivation
The existing beta-filtered search works well when filters are moderately selective, but struggles with low-selectivity
filters where most candidates are rejected. In those cases, the search converges prematurely because pruning is based
on distance to filtered results that haven't been found yet. The two-queue approach keeps exploration broad until
enough filtered results are accumulated.
Design
Two-Queue Search (two_queue_search.rs)
(filtered_results) for filter-passing neighbors
the worst filtered result
capacity (k * result_size_factor)