Introduce robust metrics#379
Conversation
Quartiles are computed using nearest rank method.
Two implementations are provided:
1. Sort-based:
a. sort array
b. extract values at ranks of interest
2. Selection based:
a. Run nth_element to find median on whole range
b. Run nth_element on left side to find first quartile
c. Run nth_element on right side to find thirst quartile
Public API copies input into temporary vector which is mutated as needed.
Public API uses sort-based implementation for small arrays ( <= 4096 elements),
and selection-based implementation for larger arrays.
Sort-based implementation can support computation of arbitrary percentiles,
which could be useful later if more extreme statistics is needed.
Add tests covering percentile and quartile edge cases, input iterators,
selection-vs-sorting agreement, empty and singleton inputs, and relative
dispersion validation.
Use the quartile helpers to report robust cold and CPU-only timing summaries: Q1, median, Q3, interquartile range, and relative interquartile range. These values stay hidden. Summary tags are nv/cold/time/gpu/q1, nv/cold/time/gpu/median, nv/cold/time/gpu/q3, nv/cold/time/gpu/ir/absolute, nv/cold/time/gpu/ir/relative ir/absolute = q3 - q1, ir/relative = (q3 - q1)/median Similar tags added for nv/cold/time/cpu and for CPU-only measures. Validate relative-dispersion calculations before publishing relative noise summaries so invalid centers or dispersion values do not produce misleading summary entries.
Only flip visibility for nv/cold/cpu/time, nv/cold/gpu/time, and nv/cpu_only/only: - hide mean - hide stdev/relative - show median - show ir/relative
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
This PR supersedes #348 . |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR replaces mean/standard-deviation statistics with robust quartile-based measures across NVBench's CPU and GPU timing reports. New percentile/quartile utilities compute first quartile, median, third quartile, absolute interquartile range, and relative interquartile range (gated on sample count). These replace direct stdev/mean ratios in measurement summaries and timeout-warning thresholds. ChangesRobust Statistics Implementation and Integration
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 56812ebf-b232-466b-aecc-80a18471a348
📒 Files selected for processing (4)
nvbench/detail/measure_cold.cunvbench/detail/measure_cpu_only.cxxnvbench/detail/statistics.cuhtesting/statistics.cu
|
I separated commits adding computation of quartiles and outputting them to summaries from the commit that makes Technically, this change is not necessary for #313, since Pros of using robust metrics:
Cons:
Perhaps, Winsorized mean and Winsorized standard deviation should be added to the summaries and displayed instead. These would be regular mean and standard deviation computed on the sample dataset where top Additionally, summaries may contain Winsorized values for different values of The choice of what to replace displayed values with is to be deferred to a different PR. For this PR, we need to decide whether to keep displaying mean/standard-deviation or replace them with median/interquartile-range. |
|
Ok, I think the right thing to do is to revert change implementing item 3, and open it up as a separate PR. |
This reverts commit 9a0afc3. Basically, all robust statistics summaries entries are hidden, and mean + stdev/relative are back to be default displayed items
Add statistics utilities to compute quartiles using nearest rank method and tests.
float64_telements)Add quartile information for
"nv/cold/cpu/time","nv/cold/gpu/time", and"nv/cpy_only/time"summaries.Tags added are:
"*/median": median"*/q1": first quartile"*/q3": third quartile"*/ir/absolute": absolute interquartile range ="*/ir/relative": relative interquartile range,Make"*/mean"and"*/stdev/relative"hidden, replaced by"*/median"and"*/ir/relative".Closes #342 .
Technically, due to change described in item 3, `"CPU Time"`/`"Noise"` as well as `"GPU Time"`/`"Noise"` entries in the summary tables output by NVBench instrumented benchmarks change from being based on (`mean`, `standard_dev`) to being based on (`median`, `interquartile_range`).This change only affects printed summaries, i.e.,
--markdownand--csvoutputs. Behavior ofnvbench_comparewon't change as JSON data still contains mean and standard deviation entries, albeit hidden by default.Update
The change implementing item 3 has been reverted. See comments below.