Add frequency measurement to the stopping criterion#372
Conversation
|
/ok to test 023e330 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughStopping criteria now receive GPU clock frequency measurements. The ChangesFrequency Tracking in Stopping Criteria
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
| m_stopping_criterion.add_frequency(current_clock_rate); | ||
| m_stopping_criterion.add_measurement(cur_cuda_time); |
There was a problem hiding this comment.
stopping_criterion_base::add_frequency() is added as a separate callback.
The interface does not document the ordering/pairing contract.
The cold measurement calls add_frequency() before add_measurement() only for accepted samples, and CPU-only measurement does not call it at all.
Ideally, all stopping criterion classes should work with all measurement classes.
There was a problem hiding this comment.
I have added more documentation in the last commit. What exactly do you mean by "all stopping criterion classes should work with all measurement classes"? Is that not the case currently?
There was a problem hiding this comment.
What exactly do you mean by "all stopping criterion classes should work with all measurement classes"?
Author implementing a stopping criterion that takes advantage of frequency information:
- Ensures that criterion handles situation where frequency data is not available. That would occur if a benchmark uses CPU-only timer, e.g. benchmark uses
state.exec(nvbench::exec_tag::cpu_only, ...);. - Handling GPU frequency data:
- Author must be assured that number of frequency measurements processed is either 0 or is equal to the number of sample measurements processed.
- Author must be informed of any consistency conditions expected by NVBench of the criterion. For example, we might expect that absent GPU data and a stream of sample measurements, criterion should behave as if GPU frequency data is available but contain a constant value.
- Author must be instructed of a mechanism for stopping criterion to enforce that frequency data is provided. For example, it might throw an exception if frequency and sample counts disagree. We need to make sure that an NVBench-instrumented benchmark handles such situation well.
There was a problem hiding this comment.
Re 1: I agree, but that is fully the users responsibility, correct?
Re 2:
- How I see it is that currently only two things can happen: either we never add any frequency measurements, or we add as many as we do timing measurements. So if the number of frequency measurements is larger than 0 but not the same as the number of timing measurements, that would be a bug.
- What do you mean by "we might expect that absent GPU data and a stream of sample measurements, criterion should behave as if GPU frequency data is available but contain a constant value."?
- From a user's perspective, creating a stopping criterion that only works when frequency data is provided is brittle IMO, and should not be encouraged or supported.
Of course, I trust your judgement on this more. What changes do you think are still necessary?
d9aadad to
67ff75a
Compare
This MR makes every iteration's measured average frequency available to the stopping criterion, in case it wants to use it to make decisions. The virtual function is implemented as a no-op so that classes that do not have it implemented do not break. If you think it would be a better idea to implement it some other way, please let me know!
On another note: I have noticed that you are using FP32 for the frequency, which is always going to be an integer since it is in Hz, and for values in the GHz range, FP32 has a granularity of around 128. I don't think this is especially problematic given that when the frequency is in the order of GHz, a discrepancy of 128 Hz is insignificant, but wanted to bring it to your attention just in case.