two queue filtered search with max effort by hailangx · Pull Request #929 · microsoft/DiskANN

hailangx · 2026-04-08T23:22:39Z

Summary

Introduces a new two-queue search algorithm for graph-based vector search that decouples exploration from
filter evaluation, improving recall for low-selectivity filtered queries
Adds a callback-based filter (GarnetFilterProvider) to diskann-garnet alongside the existing bitmap filter, enabling
per-candidate FFI filter evaluation from Garnet/C#

Motivation

The existing beta-filtered search works well when filters are moderately selective, but struggles with low-selectivity
filters where most candidates are rejected. In those cases, the search converges prematurely because pruning is based
on distance to filtered results that haven't been found yet. The two-queue approach keeps exploration broad until
enough filtered results are accumulated.

Design

Two-Queue Search (two_queue_search.rs)

Maintains two separate queues: a min-heap (candidates) for exploration ordered by distance, and a max-heap
(filtered_results) for filter-passing neighbors
All graph neighbors are explored regardless of filter status; only filter-passing nodes are added to filtered_results
Convergence: terminates when filtered_results has enough results and the closest unexplored candidate is farther than
the worst filtered result
Supports a max_candidates hop limit as a safety cap and result_size_factor (default 10) to control result queue
capacity (k * result_size_factor)
Reports termination reason via TwoQueueTermination enum (Exhausted, MaxCandidates, Converged, FilterTerminated)

…u/two-queue-filtered-search

Copilot

Pull request overview

Adds a new “two-queue” filtered graph search path intended to improve recall for low-selectivity filtered queries by decoupling exploration from filter acceptance, and extends the Garnet FFI to support per-candidate callback-based filtering (in addition to bitmap filtering). Also wires the new search mode into the benchmark tooling/config.

Changes:

Introduces TwoQueueSearch (DiskANN) + scratch support (two heaps: exploration candidates + filtered results) and exports it from diskann::graph::search.
Adds Garnet callback filtering via GarnetFilterProvider and a unified GarnetFilter enum to select bitmap vs callback filtering.
Adds a benchmark phase + benchmark-core search wrapper for running two-queue filtered search experiments.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
diskann/src/neighbor/queue.rs	Adds “unbounded” variants of not-visited queue traversal helpers.
diskann/src/graph/search/two_queue_search.rs	New two-queue filtered search implementation + termination reporting.
diskann/src/graph/search/scratch.rs	Extends `SearchScratch` with `candidates` + `filtered_results` heaps and adds `new_two_queue`.
diskann/src/graph/search/mod.rs	Registers/exports the new `two_queue_search` module and public types.
diskann/src/graph/search/diverse_search.rs	Updates manual `SearchScratch { ... }` initializer to include new fields.
diskann/src/graph/config/defaults.rs	Adds `RESULT_SIZE_FACTOR` default for two-queue result heap sizing.
diskann-garnet/src/test_utils.rs	Updates test callbacks to include a no-op filter callback.
diskann-garnet/src/lib.rs	Extends FFI (`create_index`, `search_*`) to support callback-based filtering and max effort.
diskann-garnet/src/labels.rs	Adds `GarnetFilterProvider` and `GarnetFilter` enum.
diskann-garnet/src/garnet.rs	Extends callback bundle (`Callbacks`) with `FilterCandidateCallback`.
diskann-garnet/src/ffi_tests.rs	Updates FFI tests for new `create_index` signature.
diskann-garnet/src/ffi_recall_tests.rs	Updates recall tests for new `create_index` signature.
diskann-garnet/src/dyn_index.rs	Routes callback filtering to `TwoQueueSearch`; keeps bitmap filtering via beta-filter path.
diskann-benchmark/src/inputs/async_.rs	Adds `TopkTwoQueueFilter` phase config schema.
diskann-benchmark/src/backend/index/spherical.rs	Adds execution path for two-queue filtered benchmark phase (spherical backend).
diskann-benchmark/src/backend/index/search/knn.rs	Adds `Knn` runner integration for the benchmark-core TwoQueue searcher.
diskann-benchmark/src/backend/index/benchmarks.rs	Adds generic backend execution path for `TopkTwoQueueFilter`.
diskann-benchmark/example/async-two-queue-filter-ground-truth-small.json	Adds an example benchmark input for two-queue filtered search.
diskann-benchmark-core/src/search/graph/two_queue.rs	Adds benchmark-core TwoQueue search wrapper built on `diskann::TwoQueueSearch`.
diskann-benchmark-core/src/search/graph/mod.rs	Exports the new benchmark-core `TwoQueue` searcher.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T17:13:36Z

    write_callback: WriteCallback,
    delete_callback: DeleteCallback,
    rmw_callback: ReadModifyWriteCallback,
+    filter_callback: FilterCandidateCallback,
 ) -> *const c_void {


create_index gained a new filter_callback parameter, which changes the exported C ABI for an existing symbol. Any external callers not updated will pass the wrong arguments and can crash/UB. Consider providing a versioned entry point (e.g., create_index_v2) and keeping the old signature delegating to a default/no-op filter, or otherwise ensuring backward compatibility for existing FFI consumers.

codecov-commenter · 2026-04-10T17:20:14Z

Codecov Report

❌ Patch coverage is 27.55556% with 326 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.25%. Comparing base (8126b0f) to head (0cbda48).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
diskann/src/graph/search/two_queue_search.rs	0.00%	164 Missing ⚠️
...skann-benchmark-core/src/search/graph/two_queue.rs	0.00%	59 Missing ⚠️
diskann-benchmark/src/backend/index/benchmarks.rs	0.00%	30 Missing ⚠️
diskann-garnet/src/lib.rs	48.78%	21 Missing ⚠️
diskann-benchmark/src/backend/index/search/knn.rs	0.00%	14 Missing ⚠️
diskann-garnet/src/labels.rs	0.00%	12 Missing ⚠️
diskann-benchmark/src/inputs/async_.rs	0.00%	11 Missing ⚠️
diskann-garnet/src/dyn_index.rs	37.50%	10 Missing ⚠️
diskann-garnet/src/test_utils.rs	25.00%	3 Missing ⚠️
diskann/src/neighbor/queue.rs	96.22%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #929      +/-   ##
==========================================
+ Coverage   89.44%   90.25%   +0.81%     
==========================================
  Files         449      451       +2     
  Lines       83779    84208     +429     
==========================================
+ Hits        74932    76004    +1072     
+ Misses       8847     8204     -643

Flag	Coverage Δ
miri	`90.25% <27.55%> (+0.81%)`	⬆️
unittests	`90.21% <27.55%> (+0.93%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark-core/src/search/graph/mod.rs	`100.00% <ø> (ø)`
diskann-benchmark/src/backend/index/spherical.rs	`100.00% <ø> (ø)`
diskann-garnet/src/garnet.rs	`97.69% <100.00%> (+0.04%)`	⬆️
diskann/src/graph/search/scratch.rs	`98.68% <100.00%> (+0.46%)`	⬆️
diskann/src/neighbor/queue.rs	`98.18% <96.22%> (-0.15%)`	⬇️
diskann-garnet/src/test_utils.rs	`97.31% <25.00%> (-2.00%)`	⬇️
diskann-garnet/src/dyn_index.rs	`58.02% <37.50%> (-6.77%)`	⬇️
diskann-benchmark/src/inputs/async_.rs	`36.98% <0.00%> (-0.75%)`	⬇️
diskann-garnet/src/labels.rs	`90.32% <0.00%> (-9.68%)`	⬇️
diskann-benchmark/src/backend/index/search/knn.rs	`62.82% <0.00%> (-13.75%)`	⬇️
... and 4 more

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

) K-means in `diskann-providers` was the last consumer of the old BLAS-based clustering path; PQ training has since migrated to `diskann-quantization`. The only active call site remaining was disk-index partitioning in `diskann-disk`. We will keep diskann-providers's implementation for now and move it to `diskann-disk`, rather than switching to the one in diskann-quantization, for the following reasons: - K-means in diskann-providers performs better at higher dimensions (>100): <img width="618" height="507" alt="image" src="https://github.com/user-attachments/assets/1e483411-18ae-4cc7-aa59-d9df05f4e0cf" /> - K-means in diskann-providers supports multi-threading: <img width="612" height="503" alt="image" src="https://github.com/user-attachments/assets/f5219632-6223-45fe-b0e0-18d40f0e2a1d" /> We will work on closing these performance gaps and converging the two implementations in separate PRs. # Changes in this PR ## diskann-disk - Added `src/utils/kmeans.rs` — k-means implementation moved from `diskann-providers` - Added `src/utils/math_util.rs` — mathematical utilities (`compute_vecs_l2sq`, `compute_closest_centers`, `compute_closest_centers_in_block`, and helpers) extracted from `diskann-providers` and deduplicated - Exported `k_means_clustering`, `k_meanspp_selecting_pivots`, `run_lloyds`, `compute_vecs_l2sq`, `compute_closest_centers`, `compute_closest_centers_in_block` from `utils/mod.rs` - Updated `utils/partition.rs` to import kmeans functions and math utilities from local modules instead of `diskann-providers` - Moved kmeans criterion and iai-callgrind benchmarks from `diskann-providers/benches` to `diskann-disk/benches` - Added `proptest` and `approx` to dev-dependencies ## diskann-providers - Deleted `src/utils/kmeans.rs` - Removed `k_means_clustering`, `k_meanspp_selecting_pivots`, `run_lloyds`, `compute_vecs_l2sq`, `compute_vec_l2sq` from the public API - Removed the now-deduplicated math utility implementations from `math_util.rs` - Removed dead OPQ code: `generate_optimized_pq_pivots`, `opq_quantize_all_chunks`, `copy_chunk_centroids_to_full_table`, their test, and unused imports/constants — these were the sole remaining callers of k-means in this crate and were already gated behind `#[allow(dead_code)]` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com> Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>

I missed inlining the distance computation path for minmax... again.

Our current `ci.yml` manually specifies a Rust version instead of using `rust-toolchain.toml`. While this separation is maybe maintainable when we have just a single CI yaml, it will quickly spiral out of control is we have more CI jobs. The [rust-toolchain](dtolnay/rust-toolchain#133) action seems uninterested in supporting a workflow where `rust-toolchain.toml` is the source of truth. But reading the issue led me to discover that `rustup` (which **does** respect `rust-toolchain.toml`) is already installed on GitHub Actions runners. This changes our CI to use the native `rustup`, using `rustup show` to trigger the fetching of the toolchain before any caching occurs.

Branch protection rules prevent merging PRs until certain gates have passed. Unfortunately, the blocking gates need to be specified explicitly. When there are a large number of gates like what we have in our repo, this can be a little tedious (and we need to remember to update the ruleset when this changes). This PR is based off [this article](https://devopsdirective.com/posts/2025/08/github-actions-required-checks-for-conditional-jobs/), which takes advantage of GitHub marking skipped pipelines as [successes](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks). Quoting from the docs: ``` The check run was skipped. This is treated as a success for dependent checks in GitHub Actions. ``` To that end, this new job only runs if any of its dependent jobs failed and is skipped if all dependent jobs succeed. Our branch protection rule can then just be this singular gate.

`math_util.rs` in `diskann-providers` was a grab-bag module containing unused utilities and a vector generation helper. This PR removes the file by deleting unused functions and replacing the remaining usage with the equivalent from `diskann-utils`. ## Changes - **Deleted unused functions** from `diskann-providers/src/utils/math_util.rs`: - `process_residuals` — no callers - `convert_usize_to_u64` — no callers - **Replaced `generate_vectors_with_norm`** in `diskann-tools/src/utils/random_data_generator.rs` with `f32::with_approximate_norm()` from `diskann-utils`. - **Replaced `generate_vectors_with_norm`** in the `diskann_async.rs` sphere tests with inline Gaussian sphere sampling (using `rand_distr::StandardNormal`) and a local `CastSphericalF32` helper trait. This preserves the original Gaussian distribution required for Cosine-metric ANN tests to pass — `WithApproximateNorm` uses uniform sampling, which caused test failures. - **Deleted `diskann-providers/src/utils/math_util.rs`** entirely. - **Updated `diskann-providers/src/utils/mod.rs`**: removed the `pub mod math_util` declaration and its re-exports. - **Merged `origin/main`**: resolved a rename-collision conflict where `main` moved `diskann-providers/src/utils/math_util.rs` → `diskann-disk/src/utils/math_util.rs` while this branch had already deleted it from `diskann-providers`. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com> Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>

This PR removes support for OPQ and all related code in the repo. The existing implementation is a legacy one, and the current complexity doesn't justify the benefit from this feature (if there is any at all). From what we can tell, no consumer is using this path. This should resolve Issue #922 # Changes Deleted functions: - `pq_construction.rs`: `generate_optimized_pq_pivots`, `opq_quantize_all_chunks`, `copy_chunk_centroids_to_full_table` from - `opq_rotation_matrix` field from `FixedChunkPQTable` and associated calls to it. - `write_rotation_matrix_data`, `read_opq_rotation_matrix`, `get_rotation_matrix_path` from `PQStorage` - `DistanceComputerConstructionError` was removed. `ANNError::OPQError` was kept just to not mess with variant numbering of errors. Some noteworthy API changes (please review): - `DistanceComputer::new` and `MultiDistanceComputer::new`: now return `Self` directly instead of `Result<Self, DistanceComputerConstructionError>` (construction is infallible). As a result, the quant providers now return `DistanceComputer` direclty instead of a result. This affects `bf_tree`, `multi_pq` and the in-mem PQ provider. - Apart from that used Copilot to fix docs that mention OPQ and also remove some OPQ specific tests in pq_construction and fixed_chunk_pq_table. ## `diskann_linalg` dependency Deleted the benchmark for dead code paths - chunking_closest_centers_benchmarks.rs in diskann-providers/bench. As a result, was able to remove `diskann_linalg` as a dependency for `diskann-providers`. This has reduced compile time for this crate from 200s to 51s.

@hildebrandmw

This is in service of the #927 work item. After talking with @hildebrandmw, the test_flaky_consolidate was identified as an "easy" target to migrate. This change creates a consolidate.rs test file and implements the new flaky test. - Create FlakyPruneStrategy using test_provider::Accessor::flaky() to produce transient errors for specific IDs during consolidation - Remove test_flaky_consolidate and SuperFlaky from diskann-providers - Clean up unused imports (ConsolidateKind, workingset::self) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

## Summary Addresses the `graph_data_types.rs` cleanup item from #899. - **Decouple diskann-providers internal code**: Replace `GraphDataType` bounds with direct type params (`T: VectorRepr`, `A`) in `VectorDataIterator`, `MemoryVectorProviderAsync`, `FastMemoryVectorProviderAsync`. Eliminates the `AdHoc<T>` wrapper from all internal usage and inmem provider type aliases. - **Move trait to diskann-disk**: `GraphDataType` + `AdHoc` now live in `diskann-disk::data_model`. Delete trait definition, test types, and empty `traits/` module from diskann-providers. - **Replace boilerplate**: Concrete `GraphDataType` impls in diskann-tools replaced with `type Alias = AdHoc<T>`. Benchmark's `GraphData<T>` replaced with `AdHoc<T>` directly. - **Decouple relative_contrast.rs**: Use `T: VectorRepr` instead of `Data: GraphDataType` since only `VectorDataType` is accessed. After this PR, `GraphDataType` exists only in `diskann-disk` (where it belongs as a disk-index concept) and `diskann-tools` (which calls disk-index APIs).

…red-search

hildebrandmw

Thanks Haiyang. There was some mention of fusing predicate checks with element retrieval. Is that something that's still planned?

hildebrandmw · 2026-04-16T23:08:54Z

            result.append(AggregatedSearchResults::Topk(search_results));
            Ok(result)
        }
+        SearchPhase::TopkTwoQueueFilter(search_phase) => {


I'm a little worried about what adding this universally will do for compile times. Ideally, we'd have a more focused way to add extensions like this that don't encroach on our hard-won compile-time reduction efforts. It would require some backend shuffling, but I think there is a world where search phases behave more like plugins rather than enums, so we can target specific monomorphizations instead of forcing this on all instances.

What've you observed in terms of compile time differences here?

hildebrandmw · 2026-04-16T23:10:07Z

+
+/// Default result queue capacity factor for two-queue filtered search.
+/// The result queue capacity is k * this factor.
+pub const RESULT_SIZE_FACTOR: usize = 10;


Shouldn't this live in two_queue? That's what it's meant to control, right? Things that live here should be focused on graph building confiugration. To that end, I'm not sure what FILTER_BETA is doing here either, but at least for this PR, can this constant be moved nearer to what it's meant to influence?

:) I think we wanted the defaults to be in a single place. The **FILTER_BETA ** was moved here pervious comments request last time.

I would prefer we have a central place for defaults, if you think here should be all graph build related defaults, then we might just create another defaults for all search related.

hildebrandmw · 2026-04-16T23:13:28Z

+    /// Filtered results for two-queue search.
+    /// Max-heap of filter-passing neighbors (worst/largest distance on top for pruning).
+    /// Only used during two-queue filtered search.
+    pub filtered_results: BinaryHeap<Neighbor<I>>,


This adds two BinaryHeaps for all users of SearchScratch, just for use in two-queue algorithm. Can we instead create these two queues manually in the place where they are needed instead of putting it in a central location? Really, SearchScratch needs to get pared down.

hildebrandmw · 2026-04-16T23:24:36Z

+    /// Like [`closest_notvisited`](Self::closest_notvisited), but ignores the `search_param_l`
+    /// bound and considers all entries in the queue. Use this for resizable/unbounded queues
+    /// (e.g. two-queue filtered search) where exploration should not be capped at L.
+    pub fn closest_notvisited_unbounded(&mut self) -> Option<Neighbor<I>> {


Can these be removed? It doesn't look like they are used?

hildebrandmw · 2026-04-16T23:25:33Z

+    /// Filter evaluator for determining node matches.
+    pub filter: &'q dyn QueryLabelProvider<InternalId>,
+    /// Maximum number of hops before stopping search.
+    pub max_candidates: usize,


Are there invariants that exist among these fields? If so - they shouldn't be public and this struct should have a dedicated constructor.

hildebrandmw · 2026-04-16T23:30:30Z

+#[derive(Debug)]
+pub struct TwoQueueSearch<'q, InternalId> {
+    /// Base graph search parameters (k, ef/l_value, beam_width).
+    pub inner: Knn,


It doesn't look like Knn::search_l() is used at all in the two-queue method (outside of a size hint). If that's the case, then this shouldn't wrap Knn and instead capture just the parameters that actually affect the algorithm.

hildebrandmw · 2026-04-16T23:33:23Z

+            // Check filter on start point
+            if let QueryVisitDecision::Accept(n) = filter.on_visit(neighbor) {
+                scratch.filtered_results.push(n);
+            }


What about QueryVisitDecision::Terminate?

Haiyang Xu added 8 commits April 8, 2026 16:19

two queue search

e093790

Merge branch 'main' of https://github.com/microsoft/DiskANN into haix…

02c6cbd

…u/two-queue-filtered-search

use native heap for explore queue

5f2e5aa

fix

0a82a37

RESULT_SIZE_FACTOR

e554450

Merge branch 'main' of https://github.com/microsoft/DiskANN into haix…

0b30d3f

…u/two-queue-filtered-search

fix feature gate clippy

9a57a7f

fix doc

1bcc9dd

hailangx marked this pull request as ready for review April 10, 2026 17:02

hailangx requested review from a team and Copilot April 10, 2026 17:02

Copilot started reviewing on behalf of hailangx April 10, 2026 17:03 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

hailangx and others added 16 commits April 10, 2026 11:24

Update diskann-garnet/src/lib.rs

fffd119

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update diskann-garnet/src/lib.rs

e1026a0

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update diskann-benchmark/src/backend/index/spherical.rs

6beff12

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix typo

ecec081

add test

1ad7e2e

Inline minmax distance evaluations (#935)

cbfe112

I missed inlining the distance computation path for minmax... again.

Merge https://github.com/microsoft/DiskANN into haixu/two-queue-filte…

1024e17

…red-search

fix

a2bb122

hildebrandmw reviewed Apr 16, 2026

View reviewed changes

Merge branch 'main' into haixu/two-queue-filtered-search

0cbda48

Conversation

hailangx commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

hailangx commented Apr 8, 2026 •

edited

Loading

codecov-commenter commented Apr 10, 2026 •

edited

Loading