Add A/A benchmark documentation by YuanyuanTian-hh · Pull Request #974 · microsoft/DiskANN

YuanyuanTian-hh · 2026-04-23T02:49:18Z

Summary

Add diskann-benchmark/AA_BENCHMARK.md documenting how the daily A/A benchmark stability test is conducted and scheduled.

Content Covered

Purpose: Detecting environment noise on CI runners (not code regressions)
Schedule: Daily at 9 AM UTC via cron, plus manual \workflow_dispatch\
Datasets: \wikipedia-100K\ and \openai-100K\
Execution flow: Step-by-step description of the benchmark pipeline
Tolerance thresholds: Build time, QPS, recall, mean I/Os, mean comparisons, mean/P95 latency
Failure notification: Auto-created GitHub issue tagging @microsoft/diskann-disk-maintainers\
Comparison: How A/A differs from the A/B benchmark workflow

Motivation

This makes it easier for new contributors and maintainers to understand how the A/A stability test works without having to read the YAML workflow directly.

Copilot

Pull request overview

Adds standalone documentation for the daily DiskANN A/A benchmark stability workflow, so maintainers can understand what it does and how to interpret failures without reading the workflow YAML.

Changes:

Add diskann-benchmark/AA_BENCHMARK.md describing the A/A benchmark purpose, schedule, datasets, and pipeline steps.
Document the tolerance thresholds used for baseline vs target comparisons.
Document failure notification behavior and how A/A differs from the A/B benchmark workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-04-23T03:08:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.43%. Comparing base (cb52a9f) to head (64c1e48).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #974      +/-   ##
==========================================
- Coverage   89.43%   89.43%   -0.01%     
==========================================
  Files         449      449              
  Lines       83779    83779              
==========================================
- Hits        74928    74926       -2     
- Misses       8851     8853       +2

Flag	Coverage Δ
miri	`89.43% <ø> (-0.01%)`	⬇️
unittests	`89.27% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Address review feedback from arrayka: - Place doc in .github/docs/disk-benchmarks-aa.md (lowercase, next to workflows) - Remove redundant Steps section to keep the doc concise - Add back-link from disk-benchmarks-aa.yml to the doc - Keep tolerance thresholds, failure notification, and A/B comparison sections

Address review feedback: the A/A benchmark has a 95% reliability promise, meaning 1 failure in 20 runs is expected. The notify-on-failure job now checks the last 20 completed runs and only creates a GitHub issue if the failure rate exceeds 5%. This avoids unnecessary noise from expected environment variability. Also update disk-benchmarks-aa.md to document this behavior.

YuanyuanTian-hh requested review from a team and Copilot April 23, 2026 02:49

Copilot started reviewing on behalf of YuanyuanTian-hh April 23, 2026 02:49 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

arrayka requested changes Apr 23, 2026

View reviewed changes

Comment thread diskann-benchmark/AA_BENCHMARK.md Outdated

Comment thread diskann-benchmark/AA_BENCHMARK.md Outdated

Comment thread .github/docs/disk-benchmarks-aa.md Outdated

Comment thread .github/docs/disk-benchmarks-aa.md Outdated

YuanyuanTian-hh force-pushed the tianyuanyuan/add-aa-benchmark-docs branch from 419230d to 49554d3 Compare April 28, 2026 08:25

YuanyuanTian-hh force-pushed the tianyuanyuan/add-aa-benchmark-docs branch from 49554d3 to d53e542 Compare April 28, 2026 08:31

Yuanyuan Tian (from Dev Box) and others added 2 commits April 28, 2026 16:52

Merge branch 'main' into tianyuanyuan/add-aa-benchmark-docs

64c1e48

YuanyuanTian-hh requested a review from arrayka April 28, 2026 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add A/A benchmark documentation#974

Add A/A benchmark documentation#974
YuanyuanTian-hh wants to merge 3 commits intomainfrom
tianyuanyuan/add-aa-benchmark-docs

YuanyuanTian-hh commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov-commenter commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

YuanyuanTian-hh commented Apr 23, 2026

Summary

Content Covered

Motivation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

codecov-commenter commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Apr 23, 2026 •

edited

Loading