Lock-free scheduler with per-priority queues by TrentHouliston · Pull Request #193 · Fastcode/NUClear

TrentHouliston · 2026-05-27T03:57:30Z

Summary

Rework Scheduler / Pool / Group to remove the global mutex contention that showed up on the existing implementation under load:

Per-priority lock-free task queues. Each Pool now owns five independent MPMC TaskQueues (one per priority bucket) instead of a single mutex-protected priority queue. Submitting and dequeuing on different priorities no longer contend at all, and same-priority submits use a block-based lock-free FIFO with a graveyard-style block reclamation scheme to avoid ABA.
MPSC specialisation for single-consumer pools. Pools with concurrency == 1 (e.g. MainThread, TraceController) now construct MPSCQueue instead of TaskQueue, giving a non-atomic consumer side and removing the dequeue CAS entirely for those pools. Both queues sit behind a Queue<T> polymorphic interface so Pool can hold an array of buckets and dispatch the right type at construction.
Counting-semaphore Group fast path. Single-Sync groups now acquire via a signed std::atomic<int> tokens plus a lock-free TaskQueue per priority for waiters, falling back to the mutex-backed GroupLock only for multi-group submissions. Sync ordering inside a group is strict; equal-priority tasks across pools see relaxed global FIFO.
Idle bookkeeping off the pool mutex. Pool idle tracking moved off the main pool mutex so wake-ups no longer block submitters.

New lock-free primitives (TaskQueue, MPSCQueue, Semaphore) and the Group counting fast path are covered by Catch2 BDD-style tests, plus a scheduler microbenchmark (tests/tests/Benchmark.cpp, hidden behind the [.benchmark] tag).

TSAN race fixes (pre-existing)

While validating the changes under TSAN on macOS clang and Linux gcc 13 three pre-existing data races surfaced and are fixed in this PR:

IOController_Posix.ipp — pollfd::events. The IOFinished handler mutated watches[].events while only holding tasks_mutex, and the poll thread read the same field from inside ::poll() while only holding notifier.mutex. bump() woke poll but released notifier.mutex before the mutation, leaving the race window open. The handler now writes the wake byte inline and holds notifier.mutex across the watches update and the follow-up fire_event call (which can also touch watches[].events).
Scheduler::submit — Reaction::scheduler_data. The cached pool was read/written from any submitting thread without synchronisation. It is now a non-owning raw Pool* cached in a std::atomic<void*> (release store / acquire load) rather than a std::shared_ptr<void> accessed via std::atomic_load/atomic_store — the shared_ptr atomics fall back to a small global mutex pool on libstdc++ and become a contention point on hot submission paths. Pools outlive all reactions (PowerPlant tears reactors down before the scheduler), so a raw pointer is safe, and all racers resolve the same pool, so the benign store is last-writer-wins.
Watchdog data store. The service time_point was read by the chrono controller while being mutated by user threads emitting a service event, and the void specialisation returned a reference through a temporary shared_ptr (latent dangling reference). Centralised reads/writes through a per-(WatchdogGroup, RuntimeType) std::mutex, made get return by value, and routed WatchdogServicer::service through WatchdogDataStore::service so writes share the read mutex.

Other

.gitignore now matches build-*/ so out-of-tree TSAN/ASAN/Release build directories don't appear in git status.
SonarCloud suppressions for the deliberate lock-free / placement-new idioms live in sonar-project.properties.

Test plan

macOS clang TSAN: dsl/IO, dsl/Inline, dsl/Watchdog 30/30 clean each; full suite ✓
macOS clang Release: full suite ✓
Linux gcc 13 TSAN (Docker): same three tests 30/30 clean; hot-path tests under repeated runs with no TSAN warnings
Linux gcc 9 Release (Docker): full suite ✓
Lock-free primitives have BDD-style unit tests (including queue-destructor leak coverage)
Scheduler microbenchmark exercises submit / dequeue / group acquire under contention

Made with Cursor

…ce fixes Rework Scheduler/Pool/Group to avoid the global mutex contention that showed up under load. Tasks are now placed into one of five lock-free queues (one per priority bucket) so producers and consumers no longer serialise on a single lock, and Group acquisition uses a counting-semaphore fast path with a lock-free waiter queue per priority. Pools whose concurrency is one (MainThread, TraceController, etc.) now use a specialised MPSC queue with a non-atomic consumer side. Tasks within a Sync group retain strict in-order delivery; equal-priority tasks across pools see relaxed global FIFO. The new lock-free primitives (TaskQueue, MPSCQueue, Semaphore) and the Group counting-lock are covered by Catch2 BDD-style tests plus a scheduler benchmark. During TSAN validation across macOS clang and Linux gcc 13 three pre-existing data races were uncovered and fixed: * IOController (POSIX): the IOFinished handler mutated watches[].events under tasks_mutex while the poll thread read the same field from inside ::poll() under notifier.mutex. The bump() call closed the wake window but released notifier.mutex before the mutation, leaving the race. Inline the wake and hold notifier.mutex across the watches mutation and follow-up fire_event. * Scheduler::submit: the cached Pool pointer on Reaction (scheduler_data) was read/written from any submitting thread without synchronisation. Switch the cache to std::atomic_load/store on the shared_ptr; the worst case is two submitters racing both compute the identical pool and last-writer-wins. * Watchdog data store: the service time_point was read by the chrono controller while being mutated by user threads emitting a service event, and the void specialisation returned a reference through a temporary shared_ptr. Centralise reads/writes through a per-(WatchdogGroup, RuntimeType) std::mutex, return the time_point by value, and route WatchdogServicer::service through WatchdogDataStore::service so writes share the read mutex. Validation: * macOS clang TSAN: dsl/IO, dsl/Inline, dsl/Watchdog 30/30 clean each; full suite 63/63. * Linux gcc 13 TSAN: same three tests 30/30 clean; 16 hot-path tests x 3 runs serially with no TSAN warnings. * macOS Release: 64/64. Also ignore build-*/ directories so out-of-tree build folders don't show up in git status. Co-authored-by: Cursor <cursoragent@cursor.com>

The Reaction::scheduler_data cache previously held an std::shared_ptr<void> read/written via std::atomic_load/atomic_store. On libstdc++ those fall back to a small global pool of mutexes (selected by pointer hash), which becomes a contention point on hot submission paths. Change scheduler_data to std::atomic<void*>{nullptr}. Pools live for the lifetime of the Scheduler and the PowerPlant tears reactors down before the scheduler, so a non-owning raw pointer is safe. Group::try_submit and WaitEntry are switched to Pool* accordingly, and the Scheduler field declaration order is changed so that pools outlive groups on destruction. Also fix the clang-tidy errors that were blocking the lint job: switch the queue Slot/Block backing storage to std::array (avoid-c-arrays, member init), explicit-base BLOCK_SIZE, do-while -> while, use auto with new, RunningLock special members, Semaphore destructor, missing direct includes, unused-and-kept includes, and a couple of small test cleanups (reserve before emplace, explicit lvalue MPSCQueue enqueue overload to work around an MSVC overload-resolution quirk on int(i)). Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Reworks the Scheduler / Pool / Group machinery to remove global mutex contention by introducing per-priority lock-free task queues, an MPSC specialisation for single-consumer pools, and a counting-semaphore fast path for single-Sync groups. The change set also fixes three pre-existing TSAN races (IO poll/pollfd::events, Reaction::scheduler_data cache, and Watchdog data store) and adds BDD-style unit tests plus a scheduler microbenchmark.

Changes:

New lock-free primitives (Queue/TaskQueue/MPSCQueue/Semaphore/Priority) and a Pool that owns five priority-bucketed queues with off-mutex idle bookkeeping.
Group gains a lock-free fast path (signed tokens + per-priority wait_buckets of WaitEntry) and a RunningLock, with the existing mutex-based path retained for multi-group submissions.
TSAN race fixes in IOController_Posix.ipp, Scheduler::submit (atomic scheduler_data raw pointer), and WatchdogDataStore (mutex-protected reads/writes, get returns by value, new service entry point).

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
.gitignore	Ignore `build-*/` out-of-tree build directories.
src/extension/IOController_Posix.ipp	Hold `notifier.mutex` across `watches[].events` mutation to close TSAN race with the poll thread.
src/dsl/word/Watchdog.hpp	Add per-store mutex; `get` returns by value; new `service` static.
src/dsl/word/emit/Watchdog.hpp	Delegate `WatchdogServicer::service` to the mutex-protected `WatchdogDataStore::service`.
src/threading/Reaction.hpp	Change `scheduler_data` to `std::atomic<void*>` to avoid shared_ptr atomic-load contention.
src/threading/scheduler/Scheduler.{hpp,cpp}	Switch cached pool to raw pointer; reorder members so groups destroy before pools; fast-path single-group submit.
src/threading/scheduler/Pool.{hpp,cpp}	Replace single mutex-protected queue with per-priority buckets (MPMC or MPSC), atomic `accept`/`pending_tasks`/`external_waiters`, external waiter tracking.
src/threading/scheduler/Group.{hpp,cpp}	Add `try_submit`/`try_acquire_running_lock` lock-free fast path with signed `tokens`, `slow_pending`, `wait_buckets`, and `RunningLock`.
src/threading/scheduler/queue/{Queue,TaskQueue,MPSCQueue,Semaphore,Priority}.hpp	New lock-free queue interface and concrete MPMC/MPSC implementations, counting semaphore, priority bucket helper.
tests/tests/threading/{TaskQueue,MPSCQueue,Semaphore,Group}.cpp	BDD-style unit tests for the new primitives; Group test gains a forward-declaration include.
tests/tests/Benchmark.cpp	New scheduler microbenchmark over no-sync / single-sync / two-sync configurations.
tests/tests/util/serialise/xxhash.cpp	Add missing `<cstring>` include.

* TaskQueue/MPSCQueue::link_next_block: hold the freshly allocated Block in a std::unique_ptr so a lost CAS race no longer leaks the block. * Group: drain the wait_buckets and call unregister_external_waiter for every parked entry in ~Group so the Pool::external_waiters counter is balanced on Scheduler teardown (per declaration order Pools outlive Groups, so the raw Pool* pointers in WaitEntry remain valid). * Group.hpp: drop the orphan try_submit doc block that was sitting above try_acquire_running_lock and add the rule-of-five deletes that come with declaring an explicit destructor. Co-authored-by: Cursor <cursoragent@cursor.com>

In the lock-free fast path a task that can't take its group token is parked in the Group's wait_buckets instead of sitting in the destination pool's queue with a failing lock. The old scheduler relied on that queued-but- unrunnable task to force the pool worker to poll, fail the lock, and fall through to get_idle_task, so a parked waiter always produced exactly one idle fire on its destination pool. Without that, a worker preempted past its natural idle window could pick up the drained (lock-OK) task directly and silently swallow the idle epoch (flaky IdleSingle under load). Restore the invariant with a per-pool pending_idle latch: register_external_ waiter sets it and wakes one worker, and get_task consumes it to fire one idle epoch before dispatching the next task. Gate the whole mechanism on idle_relevant() (any idle reaction bound to this pool, or any global idle reaction) so the hot Sync-contended submission path pays nothing when no idle reaction exists. Without this gate the latch oscillated every iteration and regressed single-sync by ~70% and two-syncs by ~28%; with it, contended scheduling is back within noise of the no-fix baseline. Co-authored-by: Cursor <cursoragent@cursor.com>

ysims · 2026-06-06T05:54:08Z

trace.pftrace.txt

Seems to work, heres a trace file that I can't view if someone wants to check.

Edit: perfetto website cooperating today, trace looks fine and can see providers.

MacOS and Windows CI set NUCLEAR_TEST_TIME_UNIT_DEN=10, so the default TestBase timeout of 20 units is only 2000ms. UDP runs more network cases than TCP and routinely exceeds that on loaded runners, then hangs in the IO poll loop after the forced timeout and consumes the whole 5-minute test step. TCP already uses TimeUnit(50) for this reason; apply the same budget here. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 8 comments.

Fix the two BUG-severity destructor-throw findings (S1048/M23_201) by marking Group::release_token, drain_one_to_pool, and notify_slow_path noexcept. Apply stylistic refactors: remove continue/break from lock-free retry loops in TaskQueue/MPSCQueue and Group; make Pool::drain_queues const; rename Binder ctor param to avoid shadowing; drop redundant BLOCK_SIZE casts. Add C++14 odr-definitions for template static constexpr BLOCK_SIZE members used from Pool.cpp/Group.cpp instantiations. Add sonar-project.properties to scope-ignore S8417 (memory_order) under src/threading/** and src/extension/**, and S5025/S3630/S3432 (placement-new queue idioms) under **/scheduler/queue/*.hpp. Co-authored-by: Cursor <cursoragent@cursor.com>

…top, benchmark gating - TaskQueue/MPSCQueue destructors now run ~T on still-enqueued items so a non-empty queue torn down (e.g. holding Task's unique_ptr<ReactionTask>) no longer skips element destructors. Added red/green tests that fill the queue across multiple blocks, partially drain, then assert all live elements are destroyed on teardown. - TaskQueue::try_dequeue now retires the block when its read-path wins the head-advance CAS. Previously only try_reclaim_block retired on that CAS, so a race between the two left a fully-drained block unreachable from both head and the graveyard, leaking it under sustained contention. - Pool::stop(FORCE) now clears `accept` so persistent pools cannot repopulate the queues after a force stop drains them and stops the threads. - Benchmark cases are hidden behind the [.benchmark] tag so the slow, timing-sensitive matrix no longer runs in the default CTest suite; test runners pass --allow-running-no-tests so an all-hidden binary still exits 0. - Corrected the Reaction::scheduler_data doc comment to describe the actual plain release/acquire store (not a CAS / set-once). Co-authored-by: Cursor <cursoragent@cursor.com>

Tag fast-path waiters as handback vs normal parkers so the GroupLock opportunistic drain only keeps its pre-reserved token for handbacks. Add a deterministic Catch2 scenario that reproduces the publish/reconcile interleaving and verifies tokens are not permanently lost. Co-authored-by: Cursor <cursoragent@cursor.com>

…test Replace the unreliable handback-detection heuristic in park_reconcile (handback_parker && wait_buckets_empty()) with a per-waiter arbiter slot (shared_ptr<atomic<bool>>). Whoever flips the slot false->true -- the parker's own park_reconcile or a racing opportunistic drainer -- owns that waiter's single token decrement, so the keep/hand-back decision is exact regardless of how many other waiters are parked. This closes the token-leak hole where another parked waiter made wait_buckets_empty() return false and the phantom fetch_sub was never undone. Preserve slow-path priority: when a token is free but slow_pending > 0 the parker hands it straight back and stays parked uncounted (slot left unclaimed) so an older multi-group waiter is not jumped by a single-group fast submit. Clear Pool::current_pool in ~Scheduler. The constructor installs a non-owning pointer to the main thread pool; without a matching reset it dangles once the Scheduler is destroyed and any later ReactionTask built on that thread trips a bad_weak_ptr via Pool::current(). Reset it (only when it still points at one of our pools) so its lifetime is bounded by the Scheduler. Add a randomized multi-threaded stress test that mixes single-group fast-path submits with multi-group slow-path lock/unlock across several threads and concurrency levels, asserting no deadlock (bounded wait) and that tokens return to concurrency with a fresh submit still scheduling after quiescence. Co-authored-by: Cursor <cursoragent@cursor.com>

Remove redundant member init on WaitEntry::slot and explicitly delete Scheduler copy/move operations to satisfy cppcoreguidelines-special-member-functions. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 5 comments.

Delete the unused Semaphore primitive and its test rather than fixing its multi-waiter wakeup bug, since it is dead code referenced only by its own test. Increment pending_tasks before publishing the task in Pool::submit and in the get_task re-enqueue path, so a worker can never dequeue a task before its count is registered (over-counting is safe; under-counting could transiently underflow). Reword the ~Scheduler comments to describe the real failure mode: a dangling current_pool leads to shared_from_this() on a destroyed pool (undefined behaviour), not a guaranteed bad_weak_ptr throw. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 4 comments.

Defer drained task destruction until after Pool::mutex is released so group-lock destructors cannot re-enter submit() under the lock. Keep NUCLEAR_GROUP_TEST_API off nuclear's public interface by making it PRIVATE on the library and PUBLIC on test_util for ODR-safe test TUs. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Assess which enqueue/dequeue paths are wait-free vs lock-free CAS vs brief spinning; add assessment doc and accurate header commentary. Use pause-and-yield on single-iteration stall paths and spin_until on the committed handoff without changing try_dequeue semantics. Co-authored-by: Cursor <cursoragent@cursor.com>

NUCLEAR_GROUP_TEST_API was never set when relying on the default ON option because src/CMakeLists.txt evaluated BUILD_TESTS before the option was declared in the root CMakeLists.txt. Co-authored-by: Cursor <cursoragent@cursor.com>

Use const Pred& since the predicate is only invoked locally. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Use uint8_t for PriorityLevel to satisfy clang-tidy performance-enum-size. Scale UDP TestBase timeout with active test count so the full matrix still fits CI windows (NUCLEAR_TEST_TIME_UNIT_DEN=10) without restoring the blanket Windows skip. Co-authored-by: Cursor <cursoragent@cursor.com>

Scale with active test count using factor 20 so a full matrix matches the prior TimeUnit(200) ceiling under NUCLEAR_TEST_TIME_UNIT_DEN=10. Co-authored-by: Cursor <cursoragent@cursor.com>

Full matrix on Windows CI finishes all but the last multicast-v6 receive at 20s; scale timeout with active_tests * 25. Co-authored-by: Cursor <cursoragent@cursor.com>

30s scaled timeout leaves headroom after the last receive for idle shutdown; 25s still tripped TestBase timeout while events were landing. Co-authored-by: Cursor <cursoragent@cursor.com>

40s scaled budget (active_tests * 40 under DEN=10) covers last receive plus Finished/idle shutdown on slow Windows runners. Co-authored-by: Cursor <cursoragent@cursor.com>

Multicast availability is gated by the round-trip probe on all platforms, but the NUClear UDP DSL matrix intermittently stalls on Windows CI runners (40s timeout with no receives). Skip on CI Windows only; matrix runs on Linux, macOS, and local Windows. Co-authored-by: Cursor <cursoragent@cursor.com>

Document pool routing, priority buckets, lock-free queues, group tokens, idle behaviour, and shutdown so PR #193 can merge with maintainer-facing architecture coverage on ReadTheDocs. Co-authored-by: Cursor <cursoragent@cursor.com>

Apply mdformat to docs/explanation/scheduler.md so the Markdown Formatting CI check passes. Co-authored-by: Cursor <cursoragent@cursor.com>

Spike write-ups belong in spike worktrees only; keep progress-guarantee content inline in scheduler docs and TaskQueue header. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace manual atomic flag management in bump(), the unmask path, and the poll loop with NotifierWakeGuard and NotifierPollScope so the wake-then-lock handoff cannot be skipped on error or early return. Co-authored-by: Cursor <cursoragent@cursor.com>

Platform pause intrinsics showed no measurable win in a handoff microbench and add ifdef complexity; plain yield loops match pre-spike behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston

🤖 Doxygen pass across new scheduler code in 5735023d and 42a79531: queue classes (TaskQueue, MPSCQueue, Queue), block_ops.hpp helpers, ExternalWaiterRegistration, PriorityLevel mapping, and queue_live_tracker.hpp now have @tparam/@param/@return blocks matching existing NUClear style. TaskQueue.hpp also documents progress guarantees inline. Added docs/explanation/scheduler.md for the design overview. Group headers/sources are intentionally deferred for a follow-up pass as noted in your review.

Delete move operations to satisfy cppcoreguidelines-special-member-functions and mark signal() const. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

Replace white-box Group tests with black-box helpers that observe behavior through the public try_acquire_running_lock and try_submit APIs only. Co-authored-by: Cursor <cursoragent@cursor.com>

Correct the scheduler_data comment (atomic stores are not a data race), cache Pool::current() on the submit hot path, and reflow scheduler.md with semantic line breaks. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Rebased onto main after lock-free scheduler merge (#193). Introduce typed PriorityLevel, ThreadPriority RAII for set/restore, and integrate priority handling with the lock-free scheduler pools. Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston and others added 3 commits May 27, 2026 13:56

Formatting

c03ffd2

TrentHouliston requested a review from Copilot May 27, 2026 05:35

Copilot started reviewing on behalf of TrentHouliston May 27, 2026 05:35 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread src/threading/scheduler/queue/TaskQueue.hpp Outdated

Comment thread src/threading/scheduler/queue/MPSCQueue.hpp Outdated

Comment thread src/threading/scheduler/Group.hpp

Comment thread src/threading/scheduler/Group.cpp

TrentHouliston and others added 3 commits May 28, 2026 13:50

Merge branch 'main' into houliston/scheduler

64b9879

TrentHouliston requested a review from Copilot June 15, 2026 08:59

Copilot AI reviewed Jun 15, 2026

TrentHouliston requested a review from Copilot June 15, 2026 09:01

Copilot started reviewing on behalf of TrentHouliston June 15, 2026 09:01 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

TrentHouliston and others added 5 commits June 15, 2026 19:32

Fix clang-tidy findings in scheduler headers.

fb14e22

Remove redundant member init on WaitEntry::slot and explicitly delete Scheduler copy/move operations to satisfy cppcoreguidelines-special-member-functions. Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston requested a review from Copilot June 16, 2026 00:42

Copilot started reviewing on behalf of TrentHouliston June 16, 2026 00:42 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

TrentHouliston requested a review from Copilot June 16, 2026 03:34

Copilot started reviewing on behalf of TrentHouliston June 16, 2026 03:35 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread src/threading/scheduler/Pool.cpp Outdated

Comment thread tests/CMakeLists.txt

Comment thread cmake/TestRunner.cmake

Comment thread src/CMakeLists.txt Outdated

TrentHouliston and others added 2 commits June 16, 2026 13:49

Fix clang-tidy diagnostics in threading unit tests.

e032424

Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston and others added 15 commits June 16, 2026 21:16

Define BUILD_TESTS before add_subdirectory(src).

a550cad

NUCLEAR_GROUP_TEST_API was never set when relying on the default ON option because src/CMakeLists.txt evaluated BUILD_TESTS before the option was declared in the root CMakeLists.txt. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix clang-tidy forward-ref warning in spin_until.

94c3ac2

Use const Pred& since the predicate is only invoked locally. Co-authored-by: Cursor <cursoragent@cursor.com>

Format taskqueue wait-free assessment for mdformat.

a35eca3

Co-authored-by: Cursor <cursoragent@cursor.com>

Restore 20s UDP timeout budget on Windows CI.

db584f1

Scale with active test count using factor 20 so a full matrix matches the prior TimeUnit(200) ceiling under NUCLEAR_TEST_TIME_UNIT_DEN=10. Co-authored-by: Cursor <cursoragent@cursor.com>

Give Windows UDP matrix 25s CI timeout headroom.

166a903

Full matrix on Windows CI finishes all but the last multicast-v6 receive at 20s; scale timeout with active_tests * 25. Co-authored-by: Cursor <cursoragent@cursor.com>

Allow UDP matrix to finish and shut down on Windows CI.

31962a2

30s scaled timeout leaves headroom after the last receive for idle shutdown; 25s still tripped TestBase timeout while events were landing. Co-authored-by: Cursor <cursoragent@cursor.com>

Extend UDP CI timeout for Windows shutdown pipeline.

3568c44

40s scaled budget (active_tests * 40 under DEN=10) covers last receive plus Finished/idle shutdown on slow Windows runners. Co-authored-by: Cursor <cursoragent@cursor.com>

Add explanation docs for the lock-free scheduler.

42a7953

Document pool routing, priority buckets, lock-free queues, group tokens, idle behaviour, and shutdown so PR #193 can merge with maintainer-facing architecture coverage on ReadTheDocs. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix mdformat violations in scheduler docs.

c33d48b

Apply mdformat to docs/explanation/scheduler.md so the Markdown Formatting CI check passes. Co-authored-by: Cursor <cursoragent@cursor.com>

Remove R&D spike docs from houliston/scheduler branch.

3b3ca4b

Spike write-ups belong in spike worktrees only; keep progress-guarantee content inline in scheduler docs and TaskQueue header. Co-authored-by: Cursor <cursoragent@cursor.com>

Revert queue spin backoff to std::this_thread::yield().

c49d925

Platform pause intrinsics showed no measurable win in a handoff microbench and add ifdef complexity; plain yield loops match pre-spike behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston commented Jun 17, 2026

View reviewed changes

TrentHouliston requested a review from Copilot June 17, 2026 01:11

Copilot started reviewing on behalf of TrentHouliston June 17, 2026 01:11 View session

Fix clang-tidy on NotifierWakeGuard RAII helpers.

dff42d6

Delete move operations to satisfy cppcoreguidelines-special-member-functions and mark signal() const. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Comment thread src/threading/Reaction.hpp Outdated

Comment thread docs/explanation/scheduler.md

Comment thread src/threading/scheduler/Scheduler.cpp Outdated

TrentHouliston and others added 2 commits June 17, 2026 11:50

Remove NUCLEAR_GROUP_TEST_API hooks from Group production code.

4591e31

Replace white-box Group tests with black-box helpers that observe behavior through the public try_acquire_running_lock and try_submit APIs only. Co-authored-by: Cursor <cursoragent@cursor.com>

Address latest Copilot review on PR #193.

c53c2a9

Correct the scheduler_data comment (atomic stores are not a data race), cache Pool::current() on the submit hot path, and reflow scheduler.md with semantic line breaks. Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston enabled auto-merge (squash) June 17, 2026 01:55

Fix clang-tidy include-cleaner in Group test

5612b90

Co-authored-by: Cursor <cursoragent@cursor.com>

TrentHouliston disabled auto-merge June 17, 2026 02:49

TrentHouliston merged commit 30ee1cd into main Jun 17, 2026
15 checks passed

TrentHouliston deleted the houliston/scheduler branch June 17, 2026 02:49

This was referenced Jun 17, 2026

Scheduler-driven IOController poll on POSIX #194

Open

Improve how thread priority is applied #162

Open

Conversation

TrentHouliston commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TSAN race fixes (pre-existing)

Other

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysims commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TrentHouliston left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TrentHouliston commented May 27, 2026 •

edited

Loading

ysims commented Jun 6, 2026 •

edited

Loading