Skip to content

Vulkan: export allocator for custom contexts#8871

Open
xFile3160 wants to merge 1 commit into
halide:mainfrom
xFile3160:main
Open

Vulkan: export allocator for custom contexts#8871
xFile3160 wants to merge 1 commit into
halide:mainfrom
xFile3160:main

Conversation

@xFile3160
Copy link
Copy Markdown

@xFile3160 xFile3160 commented Nov 17, 2025

This adds Vulkan allocator lifecycle APIs for applications that provide their own VkInstance, VkDevice, and queue.
Halide still owns internal Vulkan allocator/shader-module state for that external context. The application can acquire that Halide allocator state, keep it with its Vulkan context, and release it before destroying its VkDevice.
The shader cache stays keyed by VkDevice for sharing. Releasing an allocator now removes only the cache entries owned by that allocator.
Added Vulkan acquire/release AOT coverage.

AI disclosure:
I used OpenAI Codex to help inspect code, draft/revise the patch. I reviewed the final code and take responsibility for it.

@xFile3160
Copy link
Copy Markdown
Author

Full discussion available here: #8715 (comment)

@alexreinking
Copy link
Copy Markdown
Member

@derek-gerstmann could you review this PR? Seems you were discussing the related issue.

@xFile3160
Copy link
Copy Markdown
Author

Any news about this? Should I rebase?

@alexreinking
Copy link
Copy Markdown
Member

Hi @xFile3160 -- happy new year! Yes, please rebase (looks like you just did). I'll ping @derek-gerstmann again to look at this since he has in-depth knowledge of the Vulkan backend.

@derek-gerstmann
Copy link
Copy Markdown
Contributor

Thanks for the reminder! I'll look this over this week!

Comment thread src/runtime/HalideRuntimeVulkan.h Outdated
// with the same locking used by the custom acquire/release implementations. This allows the allocator to be
// saved for future halide_vulkan_acquire_context calls that Halide will automatically issue to retrieve
// the custom context.
extern int halide_vulkan_export_memory_allocator(void *user_context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for this method, or for the corresponding release method. The allocator should be stored in your custom context, and held onto for the lifetime of the context. The context manages lifespan of the allocator.

Comment thread src/runtime/HalideRuntimeVulkan.h Outdated
// - halide_vulkan_memory_allocator_release
// releases the internally allocated memory allocator, important for proper memory cleanup. Must have overridden halide_vulkan_acquire_context
// and halide_vulkan_release_context, and must coordinate with the same locking as the custom implementations.
extern int halide_vulkan_memory_allocator_release(void *user_context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment.

Comment thread src/runtime/vulkan.cpp Outdated
return is_initialized;
}

WEAK int halide_vulkan_export_memory_allocator(void *user_context, halide_vulkan_memory_allocator *allocator) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually do anything other than check to see if the allocator is null.

Comment thread src/runtime/vulkan.cpp Outdated
return destroy_status;
}

WEAK int halide_vulkan_memory_allocator_release(void *user_context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the intent ... was it to have a public method to invoke the destructor for the allocator?

Comment thread src/runtime/vulkan_context.h Outdated
error = halide_error_code_device_interface_no_device;
halide_error_no_device_interface(user_context);
}
// If user overrode halide_vulkan_acquire_context and returned nullptr for allocator,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class shouldn't be doing anything other than holding a lock on the context. It's just a convenient wrapper for the internal methods to have a lock that lives within a scope.

Comment thread src/runtime/HalideRuntimeVulkan.h Outdated
// with the same locking used by the custom acquire/release implementations. This allows the allocator to be
// saved for future halide_vulkan_acquire_context calls that Halide will automatically issue to retrieve
// the custom context.
extern int halide_vulkan_export_memory_allocator(void *user_context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest following the conventions of the context methods and naming this halide_vulkan_acquire_memory_allocator.

Comment thread src/runtime/HalideRuntimeVulkan.h Outdated
// - halide_vulkan_memory_allocator_release
// releases the internally allocated memory allocator, important for proper memory cleanup. Must have overridden halide_vulkan_acquire_context
// and halide_vulkan_release_context, and must coordinate with the same locking as the custom implementations.
extern int halide_vulkan_memory_allocator_release(void *user_context,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I'd suggest I'd suggest naming this halide_vulkan_release_memory_allocator.

Comment thread src/runtime/vulkan.cpp Outdated
}

WEAK int halide_vulkan_export_memory_allocator(void *user_context, halide_vulkan_memory_allocator *allocator) {
halide_mutex_lock(&thread_lock);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default implementation doesn't actually do anything ... shouldn't it return the allocator associated with the context?

Comment thread src/runtime/vulkan.cpp Outdated
return halide_error_code_buffer_argument_is_null;
}

return vk_release_memory_allocator(user_context, (VulkanMemoryAllocator *)allocator,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifetime management is an issue here. How do we know there are no remaining uses for the allocator? Also, allocators are specific to the context, so we need to make sure the given allocator matches the one associated with the given context.

Comment thread src/runtime/vulkan_context.h Outdated
halide_start_clock(user_context);
#endif
// make sure halide vulkan is loaded BEFORE creating allocator
debug(user_context) << "VulkanContext: Loading Vulkan function pointers for context override...\n";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to initialize device function pointers. They are specific to the context, and should only be initialized once, which is why they are only done in the acquire_context method.

@xFile3160
Copy link
Copy Markdown
Author

I've changed this patch quiet a bit actually. But @derek-gerstmann your comments make absolutely sense, and I'm going to address/explain the intent and the new API a bit better soon. Sorry, I haven't followed up either updating this patch.

Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it was vibe coded ... there's too many unnecessary/unrelated changes and it doesn't match what we discussed.

The proposal was to add two methods halide_vulkan_acquire_memory_allocator and halide_vulkan_release_memory_allocator.

This would allow you to override halide_vulkan_acquire_context() and halide_vulkan_release_context() by declaring them in your code base and relying upon the weak linking to override the default.

In your custom halide_vulkan_acquire_context() you have the ability to now call halide_vulkan_acquire_memory_allocator() to create an allocator instance, and return it. In your custom halide_vulkan_release_context() you can now call halide_vulkan_release_memory_allocator().

Likewise, the existing halide_vulkan_acquire_context() would need to be modified to also call the newly added halide_vulkan_acquire_memory_allocator() to create the allocator, and return it. And then the default halide_vulkan_release_context() would need to be modified to call halide_vulkan_release_memory_allocator().


for (int i = 0; i < (1 << log2_compilations_size); i++) {
if (compilations[i].kernel_id > kInvalidId &&
if (compilations[i].kernel_id > kDeletedId &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing things in the GPUCompilationCache?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shader modules cached in GPUCompilationCache are owned by Halide runtime state associated with the allocator used to create/destroy them. For externally managed contexts, VkDevice lifetime and Halide allocator lifetime are not the same boundary. Keying by allocator lets the release_memory_allocator delete only the cache entries owned by that allocator. I've done this to prevent stale shader-module/cache when external context tear down Halide allocator state without destroying the vkDevice, not owned by halide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two separate issues here. The first is the line I commented on in gpu_context_common.h. Why are you changing anything in this file?

The second issue is the change in the type definition for the GPUCompilationCache Key being used in the Vulkan runtime. The reason it was specified with the Device pointer was to allow sharing across contexts for the same devices created by the same instance to minimize kernel launch overhead.

Changing this to the allocator pointer now means the compilation cache isn't shared for all contexts for the same device, since the allocator pointer is created dynamically for the context.

I'd suggest leaving it as it is, and release the compilation cache inside of halide_vulkan_release_allocator() to detach the external vkDevice.

/** Override the Vulkan context acquisition callback. Returns the previous
* handler. If unset, Halide uses its built-in Vulkan context management.
*/
extern halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No .... I don't think we can allow this. This doesn't match the runtime interface design. These methods are overloaded via weak linking.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the setter callbacks to support embedding environments where weak-symbol interposition is unreliable or unavailable, especially Windows-style linkage. Isn't vulkan cross-platform? If you don't want this in this PR I can move it out, but I think this is important to discuss.

Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets move the setter callbacks into a separate PR.

I'll raise this at the next community dev meeting.

I believe the CUDA get/set acquire/release context methods were added to support JIT compilation many years ago, but we really don't want to force an indirect call for everyone if we don't have to. With AOT, you can always override this method yourself regardless of the weak symbols.

extern halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler);

/** Override the Vulkan context release callback. Returns the previous handler. */
extern halide_vulkan_release_context_t halide_set_vulkan_release_context(halide_vulkan_release_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows-OS doesn't like WEAK. Vulkan should be eventually supported by it, am I mistaken?
So to give you some context: I'm developing/building a cross-platform studio that uses halide as recommended way to implement image-processing kernels. This thing, owns stuff, vkDevice, vkInstance and stuff. But the intention is to leverage the memory allocator inside halide safely. This leads me to:

  1. First introduce this APIs like it was done for CUDA I think, without weak linkage to support windows?
  2. The gpu compilation cache keyed by allocator instead of vkDevice because halide doesn't own it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a design decision for the Halide runtime more than anything else. They all follow the same interface which has been very stable for a long long time. Yes, the MSVS toolchain is a pain to deal with for weak linking, but that doesn't prevent you from writing your own custom runtime which is usually what most integrators due when they wish to customize the behavior of the runtime to their app/framework.

My main concern is forcing an indirect call for all acquire/release context invocations. I'll raise this at the dev meeting this week and let you know how to proceed!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!

Comment thread src/runtime/vulkan.cpp Outdated
namespace Vulkan {

// --------------------------------------------------------------------------
ALWAYS_INLINE int vk_load_external_context_functions(void *user_context, VkInstance instance, VkDevice device) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really specific to external contexts ... just call it vk_load_vulkan_interface

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be done once per context creation, not repeatedly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper was introduced because with the new api to acquire external context, the caller returns an already created VkInstance/VkDevice etc. Halide still needs the device functions internally. This helper is making sure Halide's dispatch table is initialized for the supplied external instance/device. But you're right, vk_load_vulkan_interface is probably better and it should not be called everytime

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really don't want to be reloading dispatch tables. They should be loaded once, when the context is created.

Comment thread src/runtime/vulkan.cpp
// call to halide_release_vulkan_context. halide_acquire_vulkan_context
// should block while a previous call (if any) has not yet been
// released via halide_release_vulkan_context.
WEAK int halide_vulkan_acquire_context(void *user_context,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't be changed ... they need to match all the other runtimes.

Comment thread src/runtime/vulkan.cpp Outdated
return halide_error_code_internal_error;
}

int error_code = vk_load_external_context_functions(user_context, instance, device);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should only be called during context creation, and only once.

Comment thread src/runtime/vulkan.cpp
return halide_error_code_symbol_not_found;
}

vk_destroy_shader_modules(user_context, runtime_allocator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you destroying shader modules in this method?

return;
}

if (shader_module->descriptor_set_layouts) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you reorder this set of statements and move them down below?

Comment thread src/runtime/vulkan_resources.h Outdated
};

WEAK Halide::Internal::GPUCompilationCache<VkDevice, VulkanCompilationCacheEntry *> compilation_cache;
WEAK Halide::Internal::GPUCompilationCache<VulkanMemoryAllocator *, VulkanCompilationCacheEntry *> compilation_cache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the cache entry to use an Allocator pointer?

Comment thread src/runtime/vulkan.cpp
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
return vulkan_acquire_context_handler(user_context, allocator, instance, device,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be changed.

@xFile3160
Copy link
Copy Markdown
Author

This seems like it was vibe coded ... there's too many unnecessary/unrelated changes and it doesn't match what we discussed.

The proposal was to add two methods halide_vulkan_acquire_memory_allocator and halide_vulkan_release_memory_allocator.

This would allow you to override halide_vulkan_acquire_context() and halide_vulkan_release_context() by declaring them in your code base and relying upon the weak linking to override the default.

In your custom halide_vulkan_acquire_context() you have the ability to now call halide_vulkan_acquire_memory_allocator() to create an allocator instance, and return it. In your custom halide_vulkan_release_context() you can now call halide_vulkan_release_memory_allocator().

Likewise, the existing halide_vulkan_acquire_context() would need to be modified to also call the newly added halide_vulkan_acquire_memory_allocator() to create the allocator, and return it. And then the default halide_vulkan_release_context() would need to be modified to call halide_vulkan_release_memory_allocator().

Thanks a lot for reviewing first of all. Didn't exactly vibe coded but definitely leveraged heavily AI tools. I think I took too much liberty here to change things without properly explain what led to these changes.

The usage I’m trying to support is an embedder-owned Vulkan context: the application owns the VkInstance, VkDevice, and queue, but Halide still needs its own runtime allocator for shader modules, staging buffers, descriptor resources, and other internal Vulkan state.

That allocator needs to live with the external context and be released before that context/device is torn down, without Halide destroying the application-owned Vulkan handles. At least, for how I'm using this which I think should be pretty common?

I can rework this PR to just the allocator-lifecycle stuff. I can preserve the WEAK stuff, but please let me know if my Windows OS concern is real. I had problems cross-compiling my stuff onto Windows mainly due to unsupported WEAK linkages. I saw CUDA runtime did similar to the approach I took here.
I can remove the setter stuff and keep only acquire_memory_allocator and release_memory_allocator. I can also keep the allocator release scoped to halide owned allocator and the vulkan instance/device/queue. The issue with the GPUCompilationCache is because those shader modules are halide-owned stuff associated with the allocator used to desctroy/create them. When I manage context externally, vkDevice is not the right lifetime boundary, the device is owned by my stuff, while the halide allocator/cache needs explicit teardown. Keying by the allocator let the release_memory_allocator release only the cache entries owned by the allocator.

@derek-gerstmann derek-gerstmann added the dev_meeting Topic to be discussed at the next dev meeting label May 8, 2026
@derek-gerstmann
Copy link
Copy Markdown
Contributor

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!

The remaining issues are making sure the dispatch tables are only loaded once, resolving how to cache the shader modules, how to cleanup the shader modules, and how to resolve allocator lifetime issues.

@xFile3160
Copy link
Copy Markdown
Author

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!

The remaining issues are making sure the dispatch tables are only loaded once, resolving how to cache the shader modules, how to cleanup the shader modules, and how to resolve allocator lifetime issues.

Thanks Derek and all of reviewers/devs. I'm going to take a stab at this as soon as I can. Will request review once ready.

@xFile3160
Copy link
Copy Markdown
Author

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!

The remaining issues are making sure the dispatch tables are only loaded once, resolving how to cache the shader modules, how to cleanup the shader modules, and how to resolve allocator lifetime issues.

Hey Derek, I fixed the dispatch table loading so it only happens during allocator/context acquisition.
For the shader cache, I think I understand your concern now: VkDevice is the right cache key for sharing across contexts on the same device, so I can keep that.

The part I still need to solve is cleanup. The cached Vulkan module state stores resources allocated/destroyed through a specific VulkanMemoryAllocator. When an embedder owns the VkInstance/VkDevice but asks Halide to create the allocator, halide_vulkan_release_memory_allocator() needs to remove only the cache entries owned by that allocator.

Would you be open to a small targeted cleanup API on GPUCompilationCache, e.g. delete_if(predicate, free_fn), so Vulkan can keep lookup keyed by VkDevice but cleanup entries where entry->allocator == allocator?

@xFile3160
Copy link
Copy Markdown
Author

@derek-gerstmann I force pushed an update for this PR rebasing on current main.
I kept the shader cache keyed by VkDevice as you mentioned.
I've added allocator cleanup so halide_vulkan_release_memory_allocator() only removes the shader cache entries created with that allocator. The main thing I'd like feedback on is the GPUCompilationCache::detete_context_if function I've added: it lets vulkan keep the vkDevice as the cache key while still cleaning the allocator owned stuff when an embedder (in this case an external app providing device/instance and acquiring allocator) releases the Halide allocator before destroying its own device. The device is not handled in this case by Halide.
I've also kept the default vulakn context behavior unchanged, the release_context is just a per-dispatch unlock. The full teardown still happens at halide_vulkan_device_release.
I moved the dispatch table loading into allocator acquisition, not per-dispatch so should happen once. The default halide-owned contexts are not changed, so dispatch loading is unchanged. The vk_load_vulkan_interface is for externally supplied VkInstance/VkDevice when halide_vulkan_acquire_memory_allocator creates halide allocator for the external context. It's xkipped if the allocator exists. I've also added test coverage which passes on my local machine. Will update also the other PR addressing your review comments (thanks a lot for them).

Please do not hesitate to provide feedback, it's my first contribution after-all as I'm hands-deep with Vulkan lately.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@d58798a). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8871   +/-   ##
=======================================
  Coverage        ?   69.61%           
=======================================
  Files           ?      255           
  Lines           ?    77525           
  Branches        ?    18534           
=======================================
  Hits            ?    53966           
  Misses          ?    17989           
  Partials        ?     5570           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev_meeting Topic to be discussed at the next dev meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants