Skip to content

Adding tests for diverse_search, diverse_priority_queue, and a small transient suite.#972

Open
JordanMaples wants to merge 6 commits intomainfrom
jordanmaples/misc_remaining_tests
Open

Adding tests for diverse_search, diverse_priority_queue, and a small transient suite.#972
JordanMaples wants to merge 6 commits intomainfrom
jordanmaples/misc_remaining_tests

Conversation

@JordanMaples
Copy link
Copy Markdown
Contributor

I had Copilot take a look at some of the test coverage numbers and write up some unit tests to address some
gaps.

  • diverse_search was only gaining test coverage in diskann-providers, so unit tests were added directly in
    diskann. The test in providers will remain as it is an integration test and requires tight coupling with inmem.
  • Added some transient error tests that were gaps highlighted following the implementation of the flaky tests
    added in Migrate test_flaky_consolidate from diskann_providers to diskann #942.
  • Added some extra validation in diverse_priority_queue.

JordanMaples and others added 5 commits April 22, 2026 14:18
- Amend test_insert_inner_queue_full to verify Case 3 local queue cleanup:
  assert evicted item is removed from its attribute's local queue, not just
  from the global queue.
- Add test_post_process_empty_queue: confirm post_process is a no-op on an
  empty queue without panicking.
- Add test_post_process_no_trimming_needed: confirm post_process leaves queues
  unchanged when all local queues are already within diverse_results_k.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Enable flaky search_accessor in test provider Strategy so transient_ids
  are respected during search (matching existing prune_accessor behavior).
- Add search_tolerates_transient_errors: exercises allow_transient in
  ExpandBeam during beam expansion with flaky get_element calls.
- Add insert_tolerates_transient_prune_errors: exercises transient error
  handling during robust_prune_list when building an index one-by-one.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 6 tests exercising Diverse search through DiskANNIndex::search using
the in-memory test provider and synthetic Grid data:

- diverse_params_accessors: verify Diverse/DiverseSearchParams getters
- diverse_search_returns_results: basic search produces sorted distances
- diverse_search_produces_multiple_labels_2d: results span multiple labels
- diverse_search_produces_multiple_labels_3d: same for 3D grid
- diverse_search_limits_per_label_count: diversity reduces per-label concentration
- diverse_search_sorted_distances_various_params: sorted distances across configs

Tests are feature-gated behind experimental_diversity_search.

Note: strict per-attribute cardinality (diverse_results_k) is tested at
the queue level in diverse_priority_queue_test. Search-level tests verify
end-to-end behavior because NeighborPriorityQueue::remove can leave
orphaned items in the global queue during Case 2 eviction, weakening
the strict guarantee at the search integration level.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Beam width variations (1, 2, 4) mirroring grid_search patterns
- Empty attribute provider: all get() → None yields 0 results
- diverse_results_l = 0 (div_k=0): documents panic via #[should_panic]
- diverse_results_l = 1: tightest non-degenerate local queue config
- diverse_results_k > k: over-provisioned diversity (no-op post_process)
- result_count <= k invariant across many (k, l, div_k) combos
- Refactored run_diverse_search to support optional beam_width param

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands unit test coverage in diskann for experimental diverse search and for transient (non-critical) error handling paths, and adjusts the in-memory test provider strategy to optionally inject transient accessor failures during search.

Changes:

  • Added/expanded tests for DiverseNeighborQueue post-processing and eviction bookkeeping.
  • Added a new transient error test suite covering search beam-expansion and insert-time pruning tolerance.
  • Added crate-level diverse search tests (behind experimental_diversity_search) and enabled flaky accessors in the test provider strategy.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
diskann/src/neighbor/diverse_priority_queue.rs Adds stronger assertions and new post_process edge-case tests for the diverse priority queue.
diskann/src/graph/test/provider.rs Enables transient/flaky accessor injection via Strategy::with_transient for search/prune accessors.
diskann/src/graph/test/cases/transient_errors.rs New tests ensuring transient accessor failures are tolerated during search and insert pruning.
diskann/src/graph/test/cases/mod.rs Registers the new test modules (diverse_search behind feature flag; transient_errors always).
diskann/src/graph/test/cases/diverse_search.rs New crate-level tests exercising diverse search behavior and parameter accessors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann/src/graph/test/cases/transient_errors.rs Outdated
Comment thread diskann/src/neighbor/diverse_priority_queue.rs Outdated
Comment on lines +379 to +392
// ─── diverse_results_l boundary: div_l = 0 panics ────────────────────────────

#[test]
#[should_panic]
fn diverse_search_div_l_zero_panics() {
// When diverse_results_k * l_value / k_value rounds to 0, local queues
// have capacity 0. The first insert with an attribute triggers Case 2
// (local queue "full" at size 0 == capacity 0), which indexes at
// diverse_results_l - 1 = usize::MAX. This is an unchecked edge case.
let (index, attr) = setup_diverse_grid(Grid::Two, 4, 3);

let query = vec![0.0f32; 2];
// k=10, l=5, div_k=1 → diverse_results_l = 1 * 5 / 10 = 0
let _result = run_diverse_search(&index, &query, 10, 10, 0, attr);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diverse_search_div_l_zero_panics is currently driven by diverse_results_k = 0 (run_diverse_search(..., 10, 10, 0, ...)), not by the documented rounding case (diverse_results_k * l / k == 0 with diverse_results_k > 0). This makes the test misleading and effectively encodes “invalid input causes an internal underflow panic”. Prefer adding/using input validation (e.g., reject diverse_results_k == 0 with a clear error) and updating this test to assert the validation behavior rather than #[should_panic] on an internal implementation detail.

Suggested change
// ─── diverse_results_l boundary: div_l = 0 panics ────────────────────────────
#[test]
#[should_panic]
fn diverse_search_div_l_zero_panics() {
// When diverse_results_k * l_value / k_value rounds to 0, local queues
// have capacity 0. The first insert with an attribute triggers Case 2
// (local queue "full" at size 0 == capacity 0), which indexes at
// diverse_results_l - 1 = usize::MAX. This is an unchecked edge case.
let (index, attr) = setup_diverse_grid(Grid::Two, 4, 3);
let query = vec![0.0f32; 2];
// k=10, l=5, div_k=1 → diverse_results_l = 1 * 5 / 10 = 0
let _result = run_diverse_search(&index, &query, 10, 10, 0, attr);
// ─── Input validation: diverse_results_k must be non-zero ────────────────────
#[test]
fn diverse_search_rejects_zero_diverse_results_k() {
let (index, attr) = setup_diverse_grid(Grid::Two, 4, 3);
let query = vec![0.0f32; 2];
let panic = std::panic::catch_unwind(|| {
let _result = run_diverse_search(&index, &query, 10, 10, 0, attr);
})
.expect_err("diverse_results_k == 0 should be rejected by input validation");
let panic_message = if let Some(message) = panic.downcast_ref::<&str>() {
*message
} else if let Some(message) = panic.downcast_ref::<String>() {
message.as_str()
} else {
panic!("validation panic should use a string message");
};
assert!(
panic_message.contains("diverse_results_k")
&& (panic_message.contains("non-zero") || panic_message.contains("zero")),
"unexpected panic message for invalid diverse_results_k: {panic_message}",
);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a pretty valid comment. it seems to me that diverse_results_k should be a NonZeroUsize, it's not really clear to me why we'd allow 0 as a value.

- Fix inaccurate comment about transient IDs in prune test
- Relax over-constrained assertion in diverse queue eviction test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.46835% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.44%. Comparing base (56e3b7f) to head (912b605).

Files with missing lines Patch % Lines
diskann/src/graph/test/cases/transient_errors.rs 97.36% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #972      +/-   ##
==========================================
- Coverage   90.58%   89.44%   -1.15%     
==========================================
  Files         449      450       +1     
  Lines       83779    83857      +78     
==========================================
- Hits        75891    75002     -889     
- Misses       7888     8855     +967     
Flag Coverage Δ
miri 89.44% <97.46%> (-1.15%) ⬇️
unittests 89.28% <97.46%> (-1.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann/src/graph/test/provider.rs 94.41% <100.00%> (+0.01%) ⬆️
diskann/src/graph/test/cases/transient_errors.rs 97.36% <97.36%> (ø)

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants