Skip to content

GC deadlock on spawning new process on OSX under Rosetta #89272

Description

@CharlieEriksen

Description

I'm experiencing a deadlock with .NET 6.x on Mac OS under Rosetta. It's a really weird one:

  • It doesn't always happen.
  • It usually only happens within the first few minutes of application start.

You can find a full sample here: https://gist.github.com/CharlieEriksen/2f04ec835a72adfe2643a2a0ffdf7679

Here's what seems to be the most relevant part of a sample. It's the same pattern always:


  Thread 0xdf373    1001 samples (1-1001)    priority 31 (base 31)
  1001  thread_start + 15 (libsystem_pthread.dylib + 7123) [0x7ff801266bd3]
    1001  _pthread_start + 125 (libsystem_pthread.dylib + 25043) [0x7ff80126b1d3]
      1001  CorUnix::CPalThread::ThreadEntry(void*) + 407 (jswzl.Server + 7158375) [0x1013dda67]
        1001  (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) + 116 (jswzl.Server + 1​​892484) [0x100ed8084]
          1001  WKS::gc_heap::bgc_thread_function() + 257 (jswzl.Server + 3310881) [0x101032521]
            1001  WKS::gc_heap::gc1() + 836 (jswzl.Server + 3201796) [0x101017b04]
              1001  WKS::gc_heap::background_mark_phase() + 858 (jswzl.Server + 3205786) [0x101018a9a]
                1001  WKS::gc_heap::revisit_written_pages(int, int) + 1011 (jswzl.Server + 3315155) [0x1010335d3]
                  1001  SoftwareWriteWatch::GetDirty(void*, unsigned long, void**, unsigned long*, bool, bool) + 65 (jswzl.Server + 3378033) [0x101042b71]
                    1001  FlushProcessWriteBuffers + 147 (jswzl.Server + 7152259) [0x1013dc283]
                      1001  thread_get_register_pointer_values + 144 (libsystem_kernel.dylib + 72297) [0x7ff80123ca69]
                        1001  thread_get_state + 140 (libsystem_kernel.dylib + 35498) [0x7ff801233aaa]
                          1001  mach_msg2_trap + 10 (libsystem_kernel.dylib + 5554) [0x7ff80122c5b2]
                           *1001  ??? (kernel.release.t8112 + 5328152) [0xfffffe0008900d18] (blocked by turnstile waiting for jswzl.Server [45872] thread 0xdf374)

  Thread 0xdf374    1001 samples (1-1001)    priority 31 (base 31)
  1001  thread_start + 15 (libsystem_pthread.dylib + 7123) [0x7ff801266bd3]
    1001  _pthread_start + 125 (libsystem_pthread.dylib + 25043) [0x7ff80126b1d3]
      1001  CorUnix::CPalThread::ThreadEntry(void*) + 407 (jswzl.Server + 7158375) [0x1013dda67]
        1001  ThreadNative::KickOffThread(void*) + 170 (jswzl.Server + 1690938) [0x100ea6d3a]
          1001  ManagedThreadBase::KickOff(void (*)(void*), void*) + 32 (jswzl.Server + 1369424) [0x100e58550]
            1001  ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 296 (jswzl.Server + 1368040) [0x100e57fe8]
              1001  ThreadNative::KickOffThread_Worker(void*) + 136 (jswzl.Server + 1690728) [0x100ea6c68]
                1001  DispatchCallSimple(unsigned long*, unsigned int, unsigned long, unsigned int) + 223 (jswzl.Server + 1596143) [0x100e8faef]
                  1001  CallDescrWorkerInternal + 124 (jswzl.Server + 3382761) [0x101043de9]
                    1001  ??? [0x1273f7ff1]
                      1001  ??? [0x1274034eb]
                        1001  ??? [0x1273f92a8]
                          1001  ??? [0x12d962257]
                            1001  ??? [0x12d962946]
                              1001  ??? [0x12d9631c8]
                                1001  ??? [0x127e7ba36]
                                  1001  ??? [0x127e7d5c3]
                                    1001  ??? [0x127e7d8ba]
                                      1001  ??? [0x127e76c1e]
                                        1001  ??? [0x1282af899]
                                          1001  SystemNative_ForkAndExecProcess + 848 (jswzl.Server + 6841936) [0x101390650]
                                            1001  __fork + 11 (libsystem_kernel.dylib + 30175) [0x7ff8012325df]
                                             *1001  ??? (kernel.release.t8112 + 5328152) [0xfffffe0008900d18] (blocked by turnstile waiting for jswzl.Server [45872] thread 0xdf373)



If I understand this correctly, the GC is getting deadlocked by the fork call to spawn a new process. Based on this theory, I've tried not to spawn a process when a full GC pass is approaching. But that doesn't seem to prevent this.

Reproduction Steps

  1. Try spawning a lot of processes at a rapid pace
  2. Trigger a GC pass
  3. Once in a while, you will get a deadlock
  4. If it doesn't happen relatively quickly, restart the process and try again

Expected behavior

The process does not deadlock

Actual behavior

The process deadlocks

Regression?

No response

Known Workarounds

No response

Configuration

  • .NET 6.0.13
  • Ventura 13.4.1 (c)
  • M1/M2
  • More prone to happen on lower-memory configs
  • Running x64 under Rosetta

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions