Cross-platform stability fixes and FP-contract standardization#283
Cross-platform stability fixes and FP-contract standardization#283Madreag wants to merge 13 commits into
Conversation
Swap the per-tick hitMOAtoms thread-locals from unordered_map to std::map so impulse accumulation iterates MOID-ascending, removing a source of cross-run bit-noise in the physics step.
Add deviceId as a secondary sort key in WeaponSearch and ToolSearch so the pickup ordering is stable when scores collide; scheduler-dependent async callback order otherwise leaves the table order non-deterministic.
On macOS, route RTEError message/abort/assert boxes off worker threads to stderr: Cocoa dispatches the dialog onto the main thread, which deadlocks when that thread is blocked waiting on the worker. Windows/Linux tolerate cross-thread boxes and keep their behaviour. Guard the save-list file_clock -> system_clock conversion on macOS libc++, whose __int128 file_clock rep is rejected when constructing the time_point directly; an explicit duration_cast lands it in range.
CalculatePathAsync runs on the parallel ThreadedUpdateAI workers (AI scripts call Scene:CalculatePathAsync from HumanBehaviors); the static-int increment could hand two concurrent requests the same id and silently drop one Lua callback slot. Make the counter std::atomic with relaxed fetch_add.
Pin floating-point so same-arch Win/Linux/macOS builds produce bit-identical results. meson: -ffp-contract=off plus the no-fast-math family for GCC/Clang, /fp:precise + /arch:SSE2 for MSVC. RTEA.vcxproj: FloatingPointModel=Precise on every config (was Fast). Document the FPU + libm path in Source/CI.
The release branch reassigned extra_args to ['-w'], discarding -ffp-contract=off and the rest of the cross-platform FP block. Append instead of overwrite.
Atom::Clear left m_IntPos and the Bresenham step fields uninitialized. SetupPos branches on m_IntPos, and StepForward steps on the Bresenham state, before SetupSeg runs for a freshly-pooled Atom, so the stale pool value (which varies with allocation order) made collision stepping nondeterministic run to run. This was the dominant source of the cross-platform determinism divergence on the combat scenarios.
The sim-frame counter was incremented each tick but never initialized, so HitWhatMOID and HitWhatTerrMaterial read an uninitialized value when comparing a per-MO collision frame against it.
A SoundContainer's FMOD channels keep its address in userData, so the positional and channel-ended passes read freed memory once the container is destroyed. Null the back-reference on destroy and skip channels whose container is gone.
When every atom rasterizes to zero steps, stepsOnSeg is 0 and the step ratio divided 0/0. Use 0 in that case.
broadcastMsg.activeTime goes negative when the broadcast timestamp predates m_epoch, tripping Tracy's own assert(activeTime >= 0) on startup. Clamp to 0.
Vendored luabind 0.7.1 and boost 1.75 still use the C++17-removed APIs (auto_ptr, unary/binary_function, binders) that libc++ 19+ hides behind these macros, so the macOS build fails to compile without them.
| lastBroadcast = t; | ||
| const auto ts = std::chrono::duration_cast<std::chrono::seconds>( std::chrono::system_clock::now().time_since_epoch() ).count(); | ||
| broadcastMsg.activeTime = int32_t( ts - m_epoch ); | ||
| broadcastMsg.activeTime = ts > m_epoch ? int32_t( ts - m_epoch ) : 0; |
There was a problem hiding this comment.
update from upstream instead. Also unnecessary, can't realistically happen until the 2038 bug hits.
There was a problem hiding this comment.
I just verified master and v0.13.1 still have the bare int32_t(ts - m_epoch) + assert, same as our 0.12.2, so updating wouldn't fix it. It's not really 2038 either. m_epoch is uint64_t, so a backwards clock step at startup (ntp/vm resync) makes ts drop below it and underflow, tripping the assert. Windows shouldn't hit it (NDEBUG compiles it out).
It's a profiler thing so I'll remove it. I could send the guard upstream to tracy separately later.
There was a problem hiding this comment.
upstream if possible then
| add_global_arguments( | ||
| '-D_LIBCPP_ENABLE_CXX17_REMOVED_AUTO_PTR', | ||
| '-D_LIBCPP_ENABLE_CXX17_REMOVED_UNARY_BINARY_FUNCTION', | ||
| '-D_LIBCPP_ENABLE_CXX17_REMOVED_BINDERS', | ||
| '-D_LIBCPP_ENABLE_CXX17_REMOVED_RANDOM_SHUFFLE', | ||
| '-D_LIBCPP_ENABLE_CXX17_REMOVED_UNEXPECTED_FUNCTIONS', | ||
| '-D_LIBCPP_ENABLE_CXX20_REMOVED_BINDER_TYPEDEFS', | ||
| '-D_LIBCPP_ENABLE_CXX20_REMOVED_NEGATORS', | ||
| '-D_LIBCPP_ENABLE_EXPERIMENTAL', | ||
| '-D_LIBCPP_PSTL_BACKEND_SERIAL', | ||
| language: 'cpp') |
There was a problem hiding this comment.
Remove. Not necessary, already in place in the luabind meson build files. Should also not be global arguments.
There was a problem hiding this comment.
Agreed on most of these. cxx17 and cxx20 ones get pulled in through luabind/boost, and nothing outside the lua binding TUs includes boost, so luabind's own args cover them (already in place, like you stated), so I'll remove them all and move the block from add_global_arguments to extra_args . I agree it shouldn't be global.
_LIBCPP_ENABLE_EXPERIMENTAL and _LIBCPP_PSTL_BACKEND_SERIAL are needed though. They're not removed-api shims, they're what turns on <execution>. libc++ sets _LIBCPP_HAS_NO_INCOMPLETE_PSTL and drops the parallel algorithms unless _LIBCPP_ENABLE_EXPERIMENTAL is defined, and the engine uses std::execution::par_unseq in PathFinder, SLTerrain, ActivityMan, and SaveLoad. None of those see luabind's args, so on a libc++ build those TUs won't compile without them. (Upstream CI builds macos through osxcross/GCC, and these don't do anything there, so it's the native clang/libc++ build that needs them.)
Also, did a bit of research on the luabind side: The _LIBCPP_ENABLE_CXX17_REMOVED_FEATURES umbrella it uses got removed in libc++ 19, so on a recent xcode it doesn't do anything anymore. _LIBCPP_ENABLE_CXX17_REMOVED_* macros still work, so if that ever starts breaking the build I could move luabind onto those. That would be another PR though.
There was a problem hiding this comment.
yup, I see. Apple really seems to like weird clang configurations. This literally works just fine on any normal clang build (x86 or arm). Just put it under the clang compiler, since this pollutes the command line quite a bit. And add a comment that it's for apple support, because I will probably forget why it's there otherwise.
| void AudioMan::DisownSoundContainerPlayingChannels(const SoundContainer* soundContainer) { | ||
| if (!m_AudioEnabled || !soundContainer || !soundContainer->IsBeingPlayed()) { | ||
| return; | ||
| } | ||
| // Leave the channels playing, but null their back-reference so the positional/ended passes don't read the freed container. | ||
| FMOD::Channel* soundChannel; | ||
| for (int channelIndex: *soundContainer->GetPlayingChannels()) { | ||
| if (m_AudioSystem->getChannel(channelIndex, &soundChannel) == FMOD_OK) { | ||
| soundChannel->setUserData(nullptr); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Hm, this technically works around the crashes ig. Not sure this behaves correctly tho, and will still crash from lua (much rarer, can still happen).
There was a problem hiding this comment.
you're right... it'll probably still crash.
The destructor disown covers the C++ side, but CreateSoundContainer is adopt-bound. When async GC collects one it runs ~SoundContainer on a threadpool worker while AudioMan::Update() reads that channel's userData on the main thread. The null checks don't close the window since the container can get freed after the check passes. Also, the looping case looks wrong (disowned loop -1 channels would keep playing with nothing left to stop them).
I'll remove this from the PR rather than do a partial fix. It's out of scope for the determinism and cross-platform work I should be focusing on.
These are cross platform stability fixes that came up while bringing the determinism work over to Mac and Linux. Most are small correctness issues that Windows happened to hide (its heap fill pattern and thread scheduling masked a few), so they only showed up once the same code ran on the other two platforms. see commit messages.
AtomGroupcollision iteration now runs in MOID order (the two thread localhitMOAtomsmaps go fromunordered_maptomap;MOIgnoreMapis left alone since it's lookup only and never iterated). Plus a guard against a 0/0 step ratio when every atom in a segment rasterizes to zero steps, which left a dead NaN that still trippedFE_INVALID.HumanBehaviorsweapon and tool pickup sort gets adeviceIdtie break, since the async pickup callbacks fire in scheduler dependent order and equal scores need a stable total order.FP contract pinned across toolchains:
-ffp-contract=offon GCC/Clang andFloatingPointModel=Preciseon every MSVC config, so the compiler can't fold FMA or reassociate floats differently per platform. A follow on fixes a Meson bug where the release branch reassignedextra_argsand silently dropped those flags on every release TU (checked incompile_commands.json: 203 TUs now, was 0).LuaAdaptersasync callback id counter is now astd::atomic<int>, closing a race under parallel AI.Two uninitialized reads off the pooled allocator, both returning prior occupant bytes so the value tracked allocation order:
Atom's integer step state, andMovableMan::m_SimUpdateFrameNumber. Caught with Valgrind/ASan.An
AudioManuse after free: an MO gibbed mid tick frees itsSoundContainer, but its still playing FMOD channels keep the freed address inuserData. The channels now get disowned on destroy (they keep playing) and the readers skip a container that's gone.A few Mac only fixes: message boxes raised from a worker thread now log to stderr instead of popping a dialog (a cross thread Cocoa alert deadlocks), an explicit
file_time_typetosystem_clockconversion in the save menu, and the libc++ removed-API macros that luabind 0.7.1 and boost 1.75 still need to build on libc++ 19+.a one line guard in Tracy (
TracyProfiler.cpp) that clampsbroadcastMsg.activeTimeto 0 when the timestamp predates the profiler epoch, otherwise it trips Tracy's own assert on startup.Built and ran clean on Windows, Linux, and macOS arm64. A few of these (the iteration order, FP contract, and uninitialized reads) are prerequisites for the determinism series this is part of, but they all stand on their own as fixes.