Skip to content

metal : wind down leftover residency sets at teardown instead of aborting#3870

Open
AlexCherrypi wants to merge 1 commit into
ggml-org:masterfrom
AlexCherrypi:metal-no-abort-on-quit
Open

metal : wind down leftover residency sets at teardown instead of aborting#3870
AlexCherrypi wants to merge 1 commit into
ggml-org:masterfrom
AlexCherrypi:metal-no-abort-on-quit

Conversation

@AlexCherrypi

@AlexCherrypi AlexCherrypi commented Jun 9, 2026

Copy link
Copy Markdown

What

On macOS 15+ (Apple Silicon), ggml_metal_rsets_free() does GGML_ASSERT([rsets->data count] == 0), which calls abort() when the Metal device is torn down while residency sets are still registered. The device is freed from a C++ static destructor at process exit (ggml_metal_device_get's function-local static vector), so any app embedding the Metal backend that exits without freeing every Metal buffer first crashes on every quit.

Why it happens

A residency set is added in ggml_metal_buffer_init_* and removed from the collection in exactly one place — ggml_metal_buffer_free(). An application that lets the OS reclaim its model/weights on exit (a common, historically fine pattern) never calls ggml_backend_buffer_free for those buffers, so the collection is non-empty when the device's static destructor runs ggml_metal_rsets_free(), and the assert fires.

The device does not own those buffers and cannot free them from its destructor, so the assert can't be made to legitimately hold from within ggml_metal_rsets_free().

Fix

Make teardown defensive instead of aborting:

  1. stop the keep-alive heartbeat (existing d_stop + dispatch_group_wait),
  2. wind down residency on any leftover sets — endResidency + removeAllAllocations, mirroring ggml_metal_buffer_rset_free() but without -release (each set is still owned by its not-yet-freed buffer, so releasing here would over-release),
  3. then release the collection as before.

The backing buffers are reclaimed by the OS as the process exits. No behavior change when all buffers were freed — the array is empty and the loop is a no-op. Guarded by the existing GGML_METAL_HAS_RESIDENCY_SETS + @available(macOS 15.0, …).

Notes

  • The assert is recent — added in 322903f ("metal : add residency sets keep-alive heartbeat", llama/17766). It's a teardown sanity check, not a correctness/security guard: nothing reads rsets->data after this point, so leftover entries cause no UB — only the abort.
  • This file is synced from llama.cpp/ggml; the same fix is submitted there as metal : wind down leftover residency sets at teardown instead of aborting llama.cpp#24368.
  • Repro: drive the Metal backend (e.g. load a whisper model) on macOS 15+ Apple Silicon and quit the process without explicitly freeing the backend buffers → abort on quit. Surfaced in a real macOS app (live dictation).

Happy to adjust — feedback welcome.

…ting

ggml_metal_rsets_free() did GGML_ASSERT([rsets->data count] == 0) and so called
abort() when the Metal device is torn down (a C++ static destructor at process
exit) while residency sets are still registered. On macOS 15+ this crashes the
app on every quit: a residency set is removed from the collection only by
ggml_metal_buffer_free(), so an app that exits without freeing every buffer
(letting the OS reclaim the model on quit) leaves sets registered.

The device does not own the buffers and cannot free them from its destructor, so
make teardown defensive instead: stop the keep-alive heartbeat, then wind down
residency on any leftover sets (endResidency + removeAllAllocations, mirroring
ggml_metal_buffer_rset_free but without -release, since each set is still owned
by its not-yet-freed buffer) before releasing the collection. The backing
buffers are reclaimed by the OS as the process exits. No behavior change when all
buffers were freed (the array is empty).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant