Skip to content

Add frequency measurement to the stopping criterion#372

Open
mfranzrebsal wants to merge 1 commit into
NVIDIA:mainfrom
mfranzrebsal:add-frequency-to-criterion
Open

Add frequency measurement to the stopping criterion#372
mfranzrebsal wants to merge 1 commit into
NVIDIA:mainfrom
mfranzrebsal:add-frequency-to-criterion

Conversation

@mfranzrebsal
Copy link
Copy Markdown
Contributor

This MR makes every iteration's measured average frequency available to the stopping criterion, in case it wants to use it to make decisions. The virtual function is implemented as a no-op so that classes that do not have it implemented do not break. If you think it would be a better idea to implement it some other way, please let me know!

On another note: I have noticed that you are using FP32 for the frequency, which is always going to be an integer since it is in Hz, and for values in the GHz range, FP32 has a granularity of around 128. I don't think this is especially problematic given that when the frequency is in the order of GHz, a discrepancy of 128 Hz is insignificant, but wanted to bring it to your attention just in case.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jrhemstad jrhemstad requested a review from gevtushenko May 26, 2026 15:28
@oleksandr-pavlyk oleksandr-pavlyk self-requested a review May 26, 2026 16:04
@gevtushenko gevtushenko removed their request for review May 26, 2026 16:04
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

/ok to test 023e330

@oleksandr-pavlyk oleksandr-pavlyk added the type: enhancement New feature or request. label May 26, 2026
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0e67c7b8-e771-4128-958b-ebfdfaab7a8d

📥 Commits

Reviewing files that changed from the base of the PR and between 4a33a61 and 023e330.

📒 Files selected for processing (2)
  • nvbench/detail/measure_cold.cu
  • nvbench/stopping_criterion.cuh

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • GPU SM clock frequency monitoring enhanced: now unconditionally captures frequency data for each measurement sample throughout benchmark execution.
    • Measurement framework extended to support per-sample GPU clock frequency tracking, improving throttling detection and measurement analysis capabilities.

Walkthrough

Stopping criteria now receive GPU clock frequency measurements. The stopping_criterion_base interface declares a new public add_frequency method forwarding to a protected virtual hook with a default no-op body. The measurement loop unconditionally captures SM clock rate and reports it per sample after timing and count updates.

Changes

Frequency Tracking in Stopping Criteria

Layer / File(s) Summary
Stopping criterion frequency interface
nvbench/stopping_criterion.cuh
stopping_criterion_base receives a public add_frequency(float32_t) method that delegates to a protected virtual do_add_frequency(float32_t) hook with a default no-op implementation.
Frequency collection in measurement loop
nvbench/detail/measure_cold.cu
record_measurements() now unconditionally captures the SM clock rate before throttling checks and passes each sample's frequency to the stopping criterion via add_frequency().

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread nvbench/detail/measure_cold.cu Outdated
Comment thread nvbench/detail/measure_cold.cu Outdated
Comment on lines 168 to 169
m_stopping_criterion.add_frequency(current_clock_rate);
m_stopping_criterion.add_measurement(cur_cuda_time);
Copy link
Copy Markdown
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopping_criterion_base::add_frequency() is added as a separate callback.

The interface does not document the ordering/pairing contract.

The cold measurement calls add_frequency() before add_measurement() only for accepted samples, and CPU-only measurement does not call it at all.

Ideally, all stopping criterion classes should work with all measurement classes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more documentation in the last commit. What exactly do you mean by "all stopping criterion classes should work with all measurement classes"? Is that not the case currently?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "all stopping criterion classes should work with all measurement classes"?

Author implementing a stopping criterion that takes advantage of frequency information:

  1. Ensures that criterion handles situation where frequency data is not available. That would occur if a benchmark uses CPU-only timer, e.g. benchmark uses state.exec(nvbench::exec_tag::cpu_only, ...);.
  2. Handling GPU frequency data:
    • Author must be assured that number of frequency measurements processed is either 0 or is equal to the number of sample measurements processed.
    • Author must be informed of any consistency conditions expected by NVBench of the criterion. For example, we might expect that absent GPU data and a stream of sample measurements, criterion should behave as if GPU frequency data is available but contain a constant value.
    • Author must be instructed of a mechanism for stopping criterion to enforce that frequency data is provided. For example, it might throw an exception if frequency and sample counts disagree. We need to make sure that an NVBench-instrumented benchmark handles such situation well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 1: I agree, but that is fully the users responsibility, correct?
Re 2:

  • How I see it is that currently only two things can happen: either we never add any frequency measurements, or we add as many as we do timing measurements. So if the number of frequency measurements is larger than 0 but not the same as the number of timing measurements, that would be a bug.
  • What do you mean by "we might expect that absent GPU data and a stream of sample measurements, criterion should behave as if GPU frequency data is available but contain a constant value."?
  • From a user's perspective, creating a stopping criterion that only works when frequency data is provided is brittle IMO, and should not be encouraged or supported.

Of course, I trust your judgement on this more. What changes do you think are still necessary?

@mfranzrebsal mfranzrebsal force-pushed the add-frequency-to-criterion branch from d9aadad to 67ff75a Compare May 28, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants