Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
3f052ee
Lock-free scheduler with per-priority queues and pre-existing TSAN ra…
TrentHouliston May 27, 2026
c03ffd2
Formatting
TrentHouliston May 27, 2026
3d2ebfd
Replace atomic shared_ptr cache with atomic raw pointer; fix clang-tidy
TrentHouliston May 27, 2026
1268385
Address Copilot review: leak fix + group drain + doc cleanup
TrentHouliston May 28, 2026
64b9879
Merge branch 'main' into houliston/scheduler
TrentHouliston May 28, 2026
9159be4
Fix idle epoch lost when a Sync waiter is parked off-pool
TrentHouliston Jun 3, 2026
105cefe
Give UDP test the same CI timeout budget as TCP
TrentHouliston Jun 15, 2026
84da938
Address SonarCloud PR 193 issues: code fixes and scoped suppressions
TrentHouliston Jun 15, 2026
3244882
Address Copilot review: queue dtor leaks, block reclaim race, force-s…
TrentHouliston Jun 15, 2026
aa8b9a2
Fix group token leak when opportunistic drain races park publish.
TrentHouliston Jun 15, 2026
5f74b95
Make group token handback accounting exact and add contention stress …
TrentHouliston Jun 15, 2026
fb14e22
Fix clang-tidy findings in scheduler headers.
TrentHouliston Jun 16, 2026
28a0ad9
Remove unused Semaphore and fix pending_tasks publish ordering
TrentHouliston Jun 16, 2026
04de2ed
Fix force-stop deadlock hazard and scope test API define
TrentHouliston Jun 16, 2026
e032424
Fix clang-tidy diagnostics in threading unit tests.
TrentHouliston Jun 16, 2026
31377e2
Fix SonarCloud findings in pool teardown and scheduler dtor.
TrentHouliston Jun 16, 2026
054d399
Improve scheduler new-code coverage via deletion and BDD tests.
TrentHouliston Jun 16, 2026
4fb157a
Fix CI failures in scheduler tests and multicast probing.
TrentHouliston Jun 16, 2026
307691f
Extract shared queue BDD helpers to cut SonarCloud duplication.
TrentHouliston Jun 16, 2026
da737ff
Adopt PR #190 multicast round-trip probe for UDP CI stability.
TrentHouliston Jun 16, 2026
42d386a
Consolidate shared queue tests into templated Catch2 cases.
TrentHouliston Jun 16, 2026
aecf5c0
Extract shared lock-free block helpers to cut SonarCloud duplication.
TrentHouliston Jun 16, 2026
1708045
Delegate MPSC force-stop drain to the consumer thread.
TrentHouliston Jun 16, 2026
d500e32
Fix MPSC force-stop hangs after the consumer thread exits.
TrentHouliston Jun 16, 2026
5735023
Apply PR review queue cleanups for docs, BLOCK_SIZE, and test constants.
TrentHouliston Jun 16, 2026
eb13426
Type Reaction scheduler cache as atomic Pool* for type safety.
TrentHouliston Jun 16, 2026
e92f9bb
Replace Pool external-waiter register/unregister with RAII handle.
TrentHouliston Jun 16, 2026
d5fdd39
Document why Catch2 BENCHMARK was not adopted for emit ping-pong matrix.
TrentHouliston Jun 16, 2026
b78ea74
Introduce PriorityLevel enum for scheduler bucket mapping.
TrentHouliston Jun 16, 2026
1e8a509
Close IOController wake-then-lock race with wake_requested handoff.
TrentHouliston Jun 16, 2026
15ee1ea
Drop PR #190 UDP CI workarounds; rely on multicast probe.
TrentHouliston Jun 16, 2026
a7b63ea
Spike: document TaskQueue wait-freedom and tighten spin back-off
TrentHouliston Jun 16, 2026
a550cad
Define BUILD_TESTS before add_subdirectory(src).
TrentHouliston Jun 16, 2026
94c3ac2
Fix clang-tidy forward-ref warning in spin_until.
TrentHouliston Jun 16, 2026
a35eca3
Format taskqueue wait-free assessment for mdformat.
TrentHouliston Jun 16, 2026
030a98a
Fix CI: PriorityLevel enum size and UDP timeout scaling.
TrentHouliston Jun 16, 2026
db584f1
Restore 20s UDP timeout budget on Windows CI.
TrentHouliston Jun 16, 2026
166a903
Give Windows UDP matrix 25s CI timeout headroom.
TrentHouliston Jun 16, 2026
31962a2
Allow UDP matrix to finish and shut down on Windows CI.
TrentHouliston Jun 16, 2026
3568c44
Extend UDP CI timeout for Windows shutdown pipeline.
TrentHouliston Jun 16, 2026
fe9fdba
Restore Windows CI UDP skip; IOController path still flaky on GHA.
TrentHouliston Jun 16, 2026
42a7953
Add explanation docs for the lock-free scheduler.
TrentHouliston Jun 17, 2026
c33d48b
Fix mdformat violations in scheduler docs.
TrentHouliston Jun 17, 2026
3b3ca4b
Remove R&D spike docs from houliston/scheduler branch.
TrentHouliston Jun 17, 2026
b7534f6
Refactor IO notifier wake_requested to RAII guards.
TrentHouliston Jun 17, 2026
c49d925
Revert queue spin backoff to std::this_thread::yield().
TrentHouliston Jun 17, 2026
dff42d6
Fix clang-tidy on NotifierWakeGuard RAII helpers.
TrentHouliston Jun 17, 2026
4591e31
Remove NUCLEAR_GROUP_TEST_API hooks from Group production code.
TrentHouliston Jun 17, 2026
c53c2a9
Address latest Copilot review on PR #193.
TrentHouliston Jun 17, 2026
5612b90
Fix clang-tidy include-cleaner in Group test
TrentHouliston Jun 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

# Build & CMake files
build/
build-*/
CMakeCache.txt
CMakeFiles
Makefile
Expand Down
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,13 @@ if(CI_BUILD)
endif()
endif(CI_BUILD)

# Tests must be declared before src/ so NUClear can expose test-only APIs when enabled.
option(BUILD_TESTS "Builds all of the NUClear unit tests." ON)

# Add the src directory
add_subdirectory(src)

# Add the tests directory
option(BUILD_TESTS "Builds all of the NUClear unit tests." ON)
if(BUILD_TESTS)
enable_testing()
add_subdirectory(tests)
Expand Down
3 changes: 2 additions & 1 deletion cmake/TestRunner.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ foreach(target ${all_targets})
list(APPEND report_outputs ${junit_report_file})
add_custom_command(
OUTPUT ${junit_report_file} ${raw_coverage}
COMMAND ${command_prefix} $<TARGET_FILE:${target}> --reporter console --reporter JUnit::out=${junit_report_file}
COMMAND ${command_prefix} $<TARGET_FILE:${target}> --allow-running-no-tests --reporter console
--reporter JUnit::out=${junit_report_file}
Comment thread
TrentHouliston marked this conversation as resolved.
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
DEPENDS ${target}
USES_TERMINAL
Expand Down
1 change: 1 addition & 0 deletions docs/explanation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ If you've already followed the tutorials and know how to use NUClear, this is wh
| --------------------------------- | --------------------------------------------------------------------------------------------- |
| [Architecture](architecture.md) | Why NUClear exists, the problems it solves, and the event-driven reactive pattern at its core |
| [Threading Model](threading.md) | How tasks are scheduled across thread pools, priority queues, and group constraints |
| [Scheduler](scheduler.md) | Internal design of the lock-free scheduler: pools, queues, groups, idle tasks, and shutdown |
| [Lifecycle](lifecycle.md) | The three phases of a NUClear system: initialisation, execution, and shutdown |
| [The DSL System](dsl-system.md) | How `on<>().then()` works from top to bottom — template metaprogramming in action |
| [Message Flow](message-flow.md) | What happens when you emit data, from call site to reaction execution |
Expand Down
297 changes: 297 additions & 0 deletions docs/explanation/scheduler.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/explanation/threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
NUClear's threading model is designed around a simple goal: **you should never have to write a mutex**.
The framework handles concurrency for you through immutable messages, thread pools, and a priority-based scheduler.

For the internal design of the scheduler (lock-free queues, group tokens, idle detection, shutdown), see [Scheduler](scheduler.md).

## Thread Pool Architecture

NUClear uses multiple thread pools, each serving a different purpose:
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ nav:
- explanation/index.md
- Architecture: explanation/architecture.md
- Threading Model: explanation/threading.md
- Scheduler: explanation/scheduler.md
- Lifecycle: explanation/lifecycle.md
- The DSL System: explanation/dsl-system.md
- Message Flow: explanation/message-flow.md
Expand Down
26 changes: 26 additions & 0 deletions sonar-project.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# SonarCloud issue suppressions for deliberate lock-free / placement-new code.
# projectKey, organization, sources, tests and coverage settings are passed on
# the scanner CLI in .github/workflows/sonarcloud.yaml; only the ignore rules
# below are configured here.

sonar.issue.ignore.multicriteria=e1,e2,e3,e4,e5

# S8417: explicit memory_order arguments are intentional in this concurrency
# framework; the carefully chosen relaxed/acquire/release/acq_rel orderings are
# required for performance and must not be forced to seq_cst.
sonar.issue.ignore.multicriteria.e1.ruleKey=cpp:S8417
sonar.issue.ignore.multicriteria.e1.resourceKey=src/threading/**
sonar.issue.ignore.multicriteria.e2.ruleKey=cpp:S8417
sonar.issue.ignore.multicriteria.e2.resourceKey=src/extension/**

# S5025 (manual new/delete), S3630 (reinterpret_cast) and S3432 (explicit
# destructor call) are unavoidable in the lock-free queues: manual Block
# lifetime is dictated by the graveyard reclamation scheme and the
# reinterpret_cast + explicit ~T() are the placement-new idiom for the aligned
# slot storage. Scope these to the queue files only.
sonar.issue.ignore.multicriteria.e3.ruleKey=cpp:S5025
sonar.issue.ignore.multicriteria.e3.resourceKey=**/scheduler/queue/*.hpp
sonar.issue.ignore.multicriteria.e4.ruleKey=cpp:S3630
sonar.issue.ignore.multicriteria.e4.resourceKey=**/scheduler/queue/*.hpp
sonar.issue.ignore.multicriteria.e5.ruleKey=cpp:S3432
sonar.issue.ignore.multicriteria.e5.resourceKey=**/scheduler/queue/*.hpp
2 changes: 1 addition & 1 deletion src/Reactor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -390,7 +390,7 @@ class Reactor {

public:
template <typename... Args>
Binder(Reactor& r, Args&&... args) : reactor(r), args(std::forward<Args>(args)...) {}
Binder(Reactor& r, Args&&... args_) : reactor(r), args(std::forward<Args>(args_)...) {}

template <typename Label, typename Function>
auto then(Label&& label, Function&& callback) {
Expand Down
65 changes: 62 additions & 3 deletions src/dsl/word/Watchdog.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#ifndef NUCLEAR_DSL_WORD_WATCHDOG_HPP
#define NUCLEAR_DSL_WORD_WATCHDOG_HPP

#include <mutex>
#include <stdexcept>

#include "../../threading/Reaction.hpp"
Expand Down Expand Up @@ -52,12 +53,25 @@ namespace dsl {
using MapType = std::remove_cv_t<RuntimeType>;
using WatchdogStore = util::TypeMap<WatchdogGroup, MapType, std::map<MapType, NUClear::clock::time_point>>;

/**
* Mutex protecting structural and value updates to the underlying map for this
* (WatchdogGroup, RuntimeType) pair. Watchdog timers are read by the chrono controller
* thread (via @ref get) while being written by user threads that emit a service event
* (via @ref service), and the underlying std::map is also mutated by init/unbind, so a
* single shared mutex serialises all of those operations.
*/
static std::mutex& mutex() {
static std::mutex m; // NOLINT(cppcoreguidelines-avoid-non-const-global-variables)
return m;
}

/**
* Ensures the data store is initialised correctly.
*
* @param data The runtime argument for the current watchdog in the WatchdogGroup/RuntimeType group
*/
static void init(const RuntimeType& data) {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr) {
WatchdogStore::set(std::make_shared<std::map<MapType, NUClear::clock::time_point>>());
}
Expand All @@ -67,11 +81,15 @@ namespace dsl {
}

/**
* Gets the current service time for the WatchdogGroup/RuntimeType/data watchdog
* Gets the current service time for the WatchdogGroup/RuntimeType/data watchdog.
*
* Returned by value so the caller never holds a reference into the (mutex-protected)
* map. The time_point is small and trivially copyable so the copy is essentially free.
*
* @param data The runtime argument for the current watchdog in the WatchdogGroup/RuntimeType group
*/
static const NUClear::clock::time_point& get(const RuntimeType& data) {
static NUClear::clock::time_point get(const RuntimeType& data) {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr || WatchdogStore::get()->count(data) == 0) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name()) + ", "
+ util::demangle(typeid(MapType).name())
Expand All @@ -80,12 +98,29 @@ namespace dsl {
return WatchdogStore::get()->at(data);
}

/**
* Atomically updates the service time for the WatchdogGroup/RuntimeType/data watchdog.
*
* Called by @ref emit::WatchdogServicer::service to keep the write under the same
* mutex that @ref get uses for reads.
*/
static void service(const RuntimeType& data, const NUClear::clock::time_point& when) {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr || WatchdogStore::get()->count(data) == 0) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name()) + ", "
+ util::demangle(typeid(MapType).name())
+ "> has not been created yet or no watchdog has been set up");
}
WatchdogStore::get()->at(data) = when;
}

/**
* Cleans up any allocated storage for the WatchdogGroup/RuntimeType/data watchdog
*
* @param data The runtime argument for the current watchdog in the WatchdogGroup/RuntimeType group
*/
static void unbind(const RuntimeType& data) {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() != nullptr) {
WatchdogStore::get()->erase(data);
}
Expand All @@ -105,30 +140,54 @@ namespace dsl {
struct WatchdogDataStore<WatchdogGroup, void> {
using WatchdogStore = util::TypeMap<WatchdogGroup, void, NUClear::clock::time_point>;

/// See the documentation on the runtime-arg specialisation.
static std::mutex& mutex() {
static std::mutex m; // NOLINT(cppcoreguidelines-avoid-non-const-global-variables)
return m;
}

/**
* Ensures the data store is initialised correctly.
*/
static void init() {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr) {
WatchdogStore::set(std::make_shared<NUClear::clock::time_point>(NUClear::clock::now()));
}
}

/**
* Gets the current service time for the WatchdogGroup watchdog.
*
* Returned by value so the caller never reads from the time_point while it is being
* mutated by @ref service on another thread.
*/
static const NUClear::clock::time_point& get() {
static NUClear::clock::time_point get() {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name())
+ "> is trying to field a service call for an unknown data type");
}
return *WatchdogStore::get();
}

/**
* Atomically updates the service time for the WatchdogGroup watchdog.
*/
static void service(const NUClear::clock::time_point& when) {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() == nullptr) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name())
+ "> has not been created yet or no watchdog has been set up");
}
*WatchdogStore::get() = when;
}

/**
* Cleans up any allocated storage for the WatchdogGroup watchdog.
*/
static void unbind() {
const std::lock_guard<std::mutex> lock(mutex());
if (WatchdogStore::get() != nullptr) {
WatchdogStore::get().reset();
}
Expand Down
33 changes: 10 additions & 23 deletions src/dsl/word/emit/Watchdog.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,8 @@
#ifndef NUCLEAR_DSL_WORD_EMIT_WATCHDOG_HPP
#define NUCLEAR_DSL_WORD_EMIT_WATCHDOG_HPP

#include <stdexcept>

#include "../../../PowerPlant.hpp"
#include "../../../util/TypeMap.hpp"
#include "../../../util/demangle.hpp"
#include "../Watchdog.hpp"

namespace NUClear {
namespace dsl {
Expand All @@ -47,8 +44,6 @@ namespace dsl {
template <typename WatchdogGroup, typename RuntimeType = void>
struct WatchdogServicer {
using MapType = std::remove_cv_t<RuntimeType>;
using WatchdogStore =
util::TypeMap<WatchdogGroup, MapType, std::map<MapType, NUClear::clock::time_point>>;

/**
* Construct a new Watchdog Servicer object
Expand All @@ -63,18 +58,14 @@ namespace dsl {
explicit WatchdogServicer(const RuntimeType& data) : data(data) {}

/**
* Services the watchdog
* Services the watchdog.
*
* The watchdog timer that is specified by the WatchdogGroup/RuntimeType/data combination will have its
* service time updated to whatever is stored in when.
* Delegates to @ref word::WatchdogDataStore::service so the write happens under the
* same mutex that guards reads in the chrono controller; otherwise the time_point
* would be torn-read / torn-written across threads.
*/
void service() {
if (WatchdogStore::get() == nullptr || WatchdogStore::get()->count(data) == 0) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name()) + ", "
+ util::demangle(typeid(RuntimeType).name())
+ "> has not been created yet or no watchdog has been set up");
}
WatchdogStore::get()->at(data) = when;
word::WatchdogDataStore<WatchdogGroup, RuntimeType>::service(data, when);
}

private:
Expand All @@ -94,19 +85,15 @@ namespace dsl {
*/
template <typename WatchdogGroup>
struct WatchdogServicer<WatchdogGroup, void> {
using WatchdogStore = util::TypeMap<WatchdogGroup, void, NUClear::clock::time_point>;

/**
* Services the watchdog
* Services the watchdog.
*
* The watchdog timer for WatchdogGroup will have its service time updated to whatever is stored in when
* Delegates to @ref word::WatchdogDataStore::service so the write happens under the
* same mutex that guards reads in the chrono controller.
*/
void service() {
if (WatchdogStore::get() == nullptr) {
throw std::domain_error("Store for <" + util::demangle(typeid(WatchdogGroup).name())
+ "> has not been created yet or no watchdog has been set up");
}
WatchdogStore::set(std::make_shared<NUClear::clock::time_point>(when));
word::WatchdogDataStore<WatchdogGroup, void>::service(when);
}

private:
Expand Down
4 changes: 4 additions & 0 deletions src/extension/IOController.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
#include "../dsl/word/IO.hpp"
#include "../util/platform.hpp"

#include <atomic>

namespace NUClear {
namespace extension {

Expand All @@ -51,6 +53,8 @@ namespace extension {
fd_t recv{-1}; ///< This is the file descriptor that is waited on by poll
fd_t send{-1}; ///< This is the file descriptor that is written to to wake up the poll command
std::mutex mutex; ///< This mutex is used to ensure that a write to poll has worked
/// Armed by NotifierWakeGuard during the wake-then-lock handoff; checked under mutex before ::poll().
std::atomic<bool> wake_requested{false};
};
#endif

Expand Down
Loading
Loading