Skip to content

macOS SOS-in-lldb: speed up ReadVirtual and fix arm64 SpecialDiagInfo#5823

Merged
hoyosjs merged 7 commits into
dotnet:mainfrom
steveisok:steveisok/macos-arm64-sos-fixes
May 12, 2026
Merged

macOS SOS-in-lldb: speed up ReadVirtual and fix arm64 SpecialDiagInfo#5823
hoyosjs merged 7 commits into
dotnet:mainfrom
steveisok:steveisok/macos-arm64-sos-fixes

Conversation

@steveisok
Copy link
Copy Markdown
Member

@steveisok steveisok commented May 2, 2026

Two independent fixes to SOS-in-lldb on macOS. They surfaced together
while debugging extremely slow / failing clrthreads -managedexception
on Apple Silicon, but each stands on its own.

1. Speed up LLDBServices::ReadVirtual with a cached section table — applies to both arm64 and x86_64 macOS

When lldb's process.ReadMemory can't satisfy a read, the plugin falls
back to reading from on-disk module sections. For MachO core dumps this
is the common path, not the rare one — .text/code segments aren't
in the dump, so the DAC's metadata reads almost all hit the fallback.

The previous implementation iterated numModules × numSections per
call. Measured on a typical .NET process: ~3500 entries × multiple SWIG
calls each = ~44 µs per fallback. With ~5M ReadVirtual calls during a
single clrthreads -managedexception, that's ~200 s of pure iteration.

This change builds a sorted SectionRange table on first use and
binary-searches it with std::upper_bound. The table is invalidated
when the target's module count changes, and through the existing
ClearCache() path. The fix is arch-agnostic — x86_64 macOS sees the
same speedup since the underlying lldb MachO core behavior is identical.

Measured impact (macOS arm64, SOS.LineNums)

Config Before After
singlefile.* (live debug) ~35 s 15–21 s
prebuilt.11 (dump load) 118 s 28 s (~4×)
prebuilt.{10,9,8} (dump load) ~120 s each 39–42 s each (~3×)
Total LineNums wall time ~13 min ~3.7 min (~3.5×)

Not yet measured on x86_64 macOS, but the same code path is exercised
and the same root cause applies.

2. Fix SpecialDiagInfo address for arm64 macOS — arm64 only

The legacy address 0x7fffffff10000000 is beyond Apple Silicon's
47-bit user-space VM limit. createdump still writes a segment at that
address into the core file, but lldb's MachO core reader refuses reads
above 0x7FFF_FFFFFFFF, so SOS reports

Special diagnostics info read failed

and falls through to the slower managed-side path (or fails outright on
older builds without that fallback). On x86_64 macOS this address
remains valid, so the legacy constant is preserved there — no behavior
change for Intel Macs.

This change uses 0x00007ffffff10000 on arm64 macOS — the same address
already used on Linux/non-Apple 64-bit — and probes the legacy x86_64
address as a fallback so dumps produced by older createdump on x86_64
Macs continue to be recognized. The managed SpecialDiagInfo.cs path
is mirrored.

Note: full coverage on arm64 macOS requires the matching createdump
change in dotnet/runtime to flow through. Until then, dumps from
old createdump remain unreadable on arm64 macOS (the data sits at an
address lldb won't return) — same behavior as today, no regression.


CI on Windows/Linux is unaffected — the section cache only kicks in
when lldb's primary read fails (rare on those platforms), and the
SpecialDiagInfo legacy fallback preserves the previous address on every
non-arm64-macOS configuration.

steveisok and others added 2 commits May 2, 2026 12:09
The legacy SpecialDiagInfo address 0x7fffffff10000000 is beyond Apple
Silicon's 47-bit user-space VM limit. While createdump writes this
address into the core file's segment list, lldb's MachO core reader
rejects reads above 0x7FFF_FFFFFFFF, so SOS reports
'Special diagnostics info read failed' and falls back to the slower
managed-side path.

Use 0x00007ffffff10000 on arm64 macOS (matching the Linux/non-Apple
64-bit address) and probe the legacy x86_64 address as a fallback so
older dumps continue to be recognized on platforms where the legacy
address is readable.

Mirrored on the managed side (SpecialDiagInfo.cs) for the
DataTarget-based path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LLDBServices::ReadVirtual falls back to reading directly from native
module sections when lldb's process.ReadMemory cannot satisfy a read.
For dumps that don't include code/data segments (notably MachO core
files on macOS, where the .text segments aren't in the dump) this is
hit on the vast majority of reads.

The previous fallback iterated numModules x numSections per call
(~3500 entries on a typical .NET process; ~44 microseconds per fallback)
making clrthreads -managedexception and similar DAC-heavy commands
take ~100s on macOS arm64.

Build a sorted SectionRange table on first use and binary-search it
with std::upper_bound. The cache is invalidated when the target's
module count changes, and through the existing ClearCache() path.

Measured on macOS arm64 SOS LineNums tests with prebuilt runtime
configs: ~118s -> ~28s for net11 (~4x), and total LineNums wall
time from ~13 minutes to ~3.7 minutes (~3.5x).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 2, 2026 16:15
@steveisok steveisok requested a review from a team as a code owner May 2, 2026 16:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes SOS-in-lldb behavior on macOS arm64 by moving SpecialDiagInfo to a 47-bit-valid address (with legacy fallback) and by significantly reducing ReadVirtual fallback overhead via a cached, binary-searched section table.

Changes:

  • Update SpecialDiagInfo address on Apple Silicon macOS and add legacy-address probing for older dumps.
  • Add a cached SectionRange table to speed up LLDBServices::ReadVirtual section-backed fallback reads.
  • Mirror the SpecialDiagInfo address fallback behavior in the managed SpecialDiagInfo implementation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/SOS/lldbplugin/services.h Introduces section-range cache fields and helpers for faster section fallback reads.
src/SOS/lldbplugin/services.cpp Implements legacy SpecialDiagInfo fallback and the section-range cache (sort + upper_bound) for ReadVirtual.
src/SOS/inc/specialdiaginfo.h Adjusts arm64 macOS SpecialDiagInfo address and defines legacy address constant.
src/Microsoft.Diagnostics.DebugServices.Implementation/SpecialDiagInfo.cs Adds macOS arm64-valid address probing + legacy fallback in managed reader.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/SOS/lldbplugin/services.cpp
Comment thread src/SOS/lldbplugin/services.cpp
Comment thread src/SOS/inc/specialdiaginfo.h Outdated
@steveisok steveisok changed the title macOS arm64 SOS-in-lldb: fix SpecialDiagInfo address and speed up ReadVirtual macOS SOS-in-lldb: speed up ReadVirtual and fix arm64 SpecialDiagInfo May 2, 2026
* Section cache: guard endAddr/endOffset against uint64 overflow when
  building the SectionRange table and when computing the containment
  check, so a pathologically large section can't wrap and produce
  spurious cache hits.

* GetExceptionRecord: when the SpecialDiagInfo signature matches but
  the exception record can't be read (Version too low,
  ExceptionRecordAddress=0, or read fails), continue to the next
  candidate address instead of returning eagerly. Cheap and avoids
  pinning behavior to the first matching address.

* specialdiaginfo.h: fix cross-reference comment to point at the actual
  reader (LLDBServices::GetLastEventInformation), not a non-existent
  SOSReadDiagInfoHeader symbol.

* ExtensionCommands SpecialDiagInfoHeader: add GetCandidateAddresses
  alongside the existing GetAddress so the OSX path probes the
  47-bit-valid address first then falls back to the legacy x86_64
  address. CommandFormatHelpers.DisplaySpecialInfo now iterates the
  candidates so 'runtimes' / extension commands recognize Apple Silicon
  dumps the same way the lldb plugin and managed reader already do.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
return SpecialDiagInfoAddressMacOS64;
// Try the arm64-valid address first (also valid on x86_64 macOS for newer
// createdump output); fall back to the legacy x86_64 address.
yield return SpecialDiagInfoAddressMacOSArm64;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up change - lets remove the fallback. 8.0 already had this address.

{
Span<byte> headerBuffer = stackalloc byte[Unsafe.SizeOf<SpecialDiagInfoHeader>()];
if (_memoryService.ReadMemory(SpecialDiagInfoAddress, headerBuffer, out int bytesRead) && bytesRead == headerBuffer.Length)
Span<byte> exceptionRecordBuffer = stackalloc byte[Unsafe.SizeOf<EXCEPTION_RECORD64>()];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up - encapsulate getting the SpecialDiagInfoHeader into a helper that this and HasDiagnosticInfo can use

Comment thread src/SOS/lldbplugin/services.cpp Outdated
@hoyosjs hoyosjs force-pushed the steveisok/macos-arm64-sos-fixes branch from 609e546 to 38d8dba Compare May 11, 2026 09:26
hoyosjs and others added 3 commits May 11, 2026 10:13
The flag was load-bearing only for the degenerate 'target with 0 modules
is a valid cached state' case. Module-count equality alone is sufficient
to detect cache freshness, and matches the previous behavior in every
real-world scenario.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hoyosjs hoyosjs enabled auto-merge (squash) May 11, 2026 23:28
Copy link
Copy Markdown
Member

@tommcdon tommcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hoyosjs hoyosjs merged commit 9c40506 into dotnet:main May 12, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants