Description
I'm experiencing a deadlock with .NET 6.x on Mac OS under Rosetta. It's a really weird one:
- It doesn't always happen.
- It usually only happens within the first few minutes of application start.
You can find a full sample here: https://gist.github.com/CharlieEriksen/2f04ec835a72adfe2643a2a0ffdf7679
Here's what seems to be the most relevant part of a sample. It's the same pattern always:
Thread 0xdf373 1001 samples (1-1001) priority 31 (base 31)
1001 thread_start + 15 (libsystem_pthread.dylib + 7123) [0x7ff801266bd3]
1001 _pthread_start + 125 (libsystem_pthread.dylib + 25043) [0x7ff80126b1d3]
1001 CorUnix::CPalThread::ThreadEntry(void*) + 407 (jswzl.Server + 7158375) [0x1013dda67]
1001 (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) + 116 (jswzl.Server + 1892484) [0x100ed8084]
1001 WKS::gc_heap::bgc_thread_function() + 257 (jswzl.Server + 3310881) [0x101032521]
1001 WKS::gc_heap::gc1() + 836 (jswzl.Server + 3201796) [0x101017b04]
1001 WKS::gc_heap::background_mark_phase() + 858 (jswzl.Server + 3205786) [0x101018a9a]
1001 WKS::gc_heap::revisit_written_pages(int, int) + 1011 (jswzl.Server + 3315155) [0x1010335d3]
1001 SoftwareWriteWatch::GetDirty(void*, unsigned long, void**, unsigned long*, bool, bool) + 65 (jswzl.Server + 3378033) [0x101042b71]
1001 FlushProcessWriteBuffers + 147 (jswzl.Server + 7152259) [0x1013dc283]
1001 thread_get_register_pointer_values + 144 (libsystem_kernel.dylib + 72297) [0x7ff80123ca69]
1001 thread_get_state + 140 (libsystem_kernel.dylib + 35498) [0x7ff801233aaa]
1001 mach_msg2_trap + 10 (libsystem_kernel.dylib + 5554) [0x7ff80122c5b2]
*1001 ??? (kernel.release.t8112 + 5328152) [0xfffffe0008900d18] (blocked by turnstile waiting for jswzl.Server [45872] thread 0xdf374)
Thread 0xdf374 1001 samples (1-1001) priority 31 (base 31)
1001 thread_start + 15 (libsystem_pthread.dylib + 7123) [0x7ff801266bd3]
1001 _pthread_start + 125 (libsystem_pthread.dylib + 25043) [0x7ff80126b1d3]
1001 CorUnix::CPalThread::ThreadEntry(void*) + 407 (jswzl.Server + 7158375) [0x1013dda67]
1001 ThreadNative::KickOffThread(void*) + 170 (jswzl.Server + 1690938) [0x100ea6d3a]
1001 ManagedThreadBase::KickOff(void (*)(void*), void*) + 32 (jswzl.Server + 1369424) [0x100e58550]
1001 ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 296 (jswzl.Server + 1368040) [0x100e57fe8]
1001 ThreadNative::KickOffThread_Worker(void*) + 136 (jswzl.Server + 1690728) [0x100ea6c68]
1001 DispatchCallSimple(unsigned long*, unsigned int, unsigned long, unsigned int) + 223 (jswzl.Server + 1596143) [0x100e8faef]
1001 CallDescrWorkerInternal + 124 (jswzl.Server + 3382761) [0x101043de9]
1001 ??? [0x1273f7ff1]
1001 ??? [0x1274034eb]
1001 ??? [0x1273f92a8]
1001 ??? [0x12d962257]
1001 ??? [0x12d962946]
1001 ??? [0x12d9631c8]
1001 ??? [0x127e7ba36]
1001 ??? [0x127e7d5c3]
1001 ??? [0x127e7d8ba]
1001 ??? [0x127e76c1e]
1001 ??? [0x1282af899]
1001 SystemNative_ForkAndExecProcess + 848 (jswzl.Server + 6841936) [0x101390650]
1001 __fork + 11 (libsystem_kernel.dylib + 30175) [0x7ff8012325df]
*1001 ??? (kernel.release.t8112 + 5328152) [0xfffffe0008900d18] (blocked by turnstile waiting for jswzl.Server [45872] thread 0xdf373)
If I understand this correctly, the GC is getting deadlocked by the fork call to spawn a new process. Based on this theory, I've tried not to spawn a process when a full GC pass is approaching. But that doesn't seem to prevent this.
Reproduction Steps
- Try spawning a lot of processes at a rapid pace
- Trigger a GC pass
- Once in a while, you will get a deadlock
- If it doesn't happen relatively quickly, restart the process and try again
Expected behavior
The process does not deadlock
Actual behavior
The process deadlocks
Regression?
No response
Known Workarounds
No response
Configuration
- .NET 6.0.13
- Ventura 13.4.1 (c)
- M1/M2
- More prone to happen on lower-memory configs
- Running x64 under Rosetta
Other information
No response
Description
I'm experiencing a deadlock with .NET 6.x on Mac OS under Rosetta. It's a really weird one:
You can find a full sample here: https://gist.github.com/CharlieEriksen/2f04ec835a72adfe2643a2a0ffdf7679
Here's what seems to be the most relevant part of a sample. It's the same pattern always:
If I understand this correctly, the GC is getting deadlocked by the fork call to spawn a new process. Based on this theory, I've tried not to spawn a process when a full GC pass is approaching. But that doesn't seem to prevent this.
Reproduction Steps
Expected behavior
The process does not deadlock
Actual behavior
The process deadlocks
Regression?
No response
Known Workarounds
No response
Configuration
Other information
No response