Cranelift: robustify timing infrastructure against mis-use and/or system clock bugs.#13273
Conversation
…tem clock bugs. In bytecodealliance#12692, it was observed that the computation of time spent in nested timing spans, minus child time, was underflowing. Correct operation of the handling of nested spans depends on the invariant that the accumulated time for child spans is less than or equal to a parent span once timing is completed. This property should hold as long as the system clock is monotonic, and as long as timing tokens are dropped in-order, so that the elapsed time of a parent truly is computed after the elapsed time of a child ends. The timing state may also temporarily violate this invariant whenever a token is pending (still on stack and timing): the child time of any completed child spans will be counted, but the parent has not yet been. Hence, taking a snapshot of the state and computing "parent minus children" on that snapshot may observe cases that yield underflow. This PR makes the infrastructure more robust along a few different dimensions: - It hardens the clock source we use to have a locally-ensured guarantee of monotonicity, since we rely on this for logical correctness. In particular, for each thread (since timing spans never move between threads), we track the last `Instant` that was used by the timing infrastructure, and use that value (zero time passed) if the system clock moves backward. - It hardens the assert about pass-timing token drop order from a `debug_assert` to an `assert`. If this invariant is being violated, we want to know about it noisily, rather than observing a subtraction underflow or other inconsistency later. - It adds an assert in `take_current()` to ensure that a snapshot is never taken when any pass timing is in progress. This should address any theoretically possible sources of bytecodealliance#12692, as far as I can tell. (cherry picked from commit 2f8644f)
|
Thanks! Can you add a section to You can include a release date of today, and copy the note about this change from other releases that have included it. I'm happy to trigger the point-release once this is in. |
|
Could this additionally cherry-pick #13253 too to keep it working? |
|
Ah just saw that one too! Working on cherry-picking now |
A typo in bytecodealliance#12709 accidentally led to all passes clocking in at 0ns. Swap the order of arguments to get true timing information. (cherry picked from commit 8b664c3)
|
I pointed the release notes to this PR, but let me know if I should point to the original |
|
Ah, generally we'd want to refer to the original PRs -- thanks! |
|
(And, make sure you follow the format you see below in other bullet-points, with Markdown links that will link back to GitHub when rendered) |
|
Agh sorry 🙈 fixed! |
|
Another day that ends in "y", another GitHub outage: CI jobs failing with "The job was not acquired by Runner of type hosted even after multiple attempts" and an active incident with Actions. I'll retry the merge once things clear up... |
d48730f
into
bytecodealliance:release-36.0.0
|
Ah, great, the jobs were eventually picked up -- will trigger the release now. |
|
@benbrandt the point-release is now merging in #13276; when that merges it'll trigger post-merge CI to upload the crates and artifacts. Thanks again! |
Backport #12709 to v36
(cherry picked from commit 2f8644f)