Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions src/runtime/HalideRuntimeVulkan.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,45 @@ extern int halide_vulkan_release_context(void *user_context,
VkDevice device,
VkQueue queue,
VkDebugUtilsMessengerEXT messenger);

typedef int (*halide_vulkan_acquire_context_t)(void *user_context,
struct halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create);
typedef int (*halide_vulkan_release_context_t)(void *user_context,
VkInstance instance,
VkDevice device,
VkQueue queue,
VkDebugUtilsMessengerEXT messenger);

/** Override the Vulkan context acquisition callback. Returns the previous handler. */
extern halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No .... I don't think we can allow this. This doesn't match the runtime interface design. These methods are overloaded via weak linking.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the setter callbacks to support embedding environments where weak-symbol interposition is unreliable or unavailable, especially Windows-style linkage. Isn't vulkan cross-platform? If you don't want this in this PR I can move it out, but I think this is important to discuss.

Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets move the setter callbacks into a separate PR.

I'll raise this at the next community dev meeting.

I believe the CUDA get/set acquire/release context methods were added to support JIT compilation many years ago, but we really don't want to force an indirect call for everyone if we don't have to. With AOT, you can always override this method yourself regardless of the weak symbols.


/** Override the Vulkan context release callback. Returns the previous handler. */
extern halide_vulkan_release_context_t halide_set_vulkan_release_context(halide_vulkan_release_context_t handler);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows-OS doesn't like WEAK. Vulkan should be eventually supported by it, am I mistaken?
So to give you some context: I'm developing/building a cross-platform studio that uses halide as recommended way to implement image-processing kernels. This thing, owns stuff, vkDevice, vkInstance and stuff. But the intention is to leverage the memory allocator inside halide safely. This leads me to:

  1. First introduce this APIs like it was done for CUDA I think, without weak linkage to support windows?
  2. The gpu compilation cache keyed by allocator instead of vkDevice because halide doesn't own it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a design decision for the Halide runtime more than anything else. They all follow the same interface which has been very stable for a long long time. Yes, the MSVS toolchain is a pain to deal with for weak linking, but that doesn't prevent you from writing your own custom runtime which is usually what most integrators due when they wish to customize the behavior of the runtime to their app/framework.

My main concern is forcing an indirect call for all acquire/release context invocations. I'll raise this at the dev meeting this week and let you know how to proceed!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xFile3160 Okay, chatted with the rest of the team, and adding the get/set acquire/release access methods is fine if it makes things easier to use. Feel free to leave them in this PR!


/** Create or validate Halide allocator state for an external Vulkan context.
* The embedder owns the Vulkan handles and the returned allocator lifetime.
*/
extern int halide_vulkan_acquire_memory_allocator(void *user_context,
struct halide_vulkan_memory_allocator **allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device);

/** Release Halide allocator state for an external Vulkan context.
* The embedder must ensure no Halide work is still using it.
*/
extern int halide_vulkan_release_memory_allocator(void *user_context,
struct halide_vulkan_memory_allocator *allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device);
// --

// Override the default allocation callbacks (default uses Vulkan runtime implementation)
Expand Down
27 changes: 26 additions & 1 deletion src/runtime/gpu_context_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ class GPUCompilationCache {
}

for (int i = 0; i < (1 << log2_compilations_size); i++) {
if (compilations[i].kernel_id > kInvalidId &&
if (compilations[i].kernel_id > kDeletedId &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing things in the GPUCompilationCache?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shader modules cached in GPUCompilationCache are owned by Halide runtime state associated with the allocator used to create/destroy them. For externally managed contexts, VkDevice lifetime and Halide allocator lifetime are not the same boundary. Keying by allocator lets the release_memory_allocator delete only the cache entries owned by that allocator. I've done this to prevent stale shader-module/cache when external context tear down Halide allocator state without destroying the vkDevice, not owned by halide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two separate issues here. The first is the line I commented on in gpu_context_common.h. Why are you changing anything in this file?

The second issue is the change in the type definition for the GPUCompilationCache Key being used in the Vulkan runtime. The reason it was specified with the Device pointer was to allow sharing across contexts for the same devices created by the same instance to minimize kernel launch overhead.

Changing this to the allocator pointer now means the compilation cache isn't shared for all contexts for the same device, since the allocator pointer is created dynamically for the context.

I'd suggest leaving it as it is, and release the compilation cache inside of halide_vulkan_release_allocator() to detach the external vkDevice.

(all || (compilations[i].context == context)) &&
compilations[i].use_count == 0) {
debug(user_context) << "Releasing cached compilation: " << compilations[i].module_state
Expand Down Expand Up @@ -168,6 +168,31 @@ class GPUCompilationCache {
release_context_already_locked(user_context, false, context, f);
}

template<typename ShouldDeleteModuleT, typename FreeModuleT>
void delete_context_if(void *user_context, ContextT context,
ShouldDeleteModuleT &should_delete, FreeModuleT &f) {
ScopedMutexLock lock_guard(&mutex);

if (count == 0) {
return;
}

for (int i = 0; i < (1 << log2_compilations_size); i++) {
if (compilations[i].kernel_id > kDeletedId &&
compilations[i].context == context &&
compilations[i].use_count == 0 &&
should_delete(compilations[i].module_state)) {
debug(user_context) << "Releasing cached compilation: " << compilations[i].module_state
<< " id " << compilations[i].kernel_id
<< " context " << compilations[i].context << "\n";
f(compilations[i].module_state);
compilations[i].module_state = nullptr;
compilations[i].kernel_id = kDeletedId;
count--;
}
}
}

template<typename FreeModuleT>
void release_all(void *user_context, FreeModuleT &f) {
ScopedMutexLock lock_guard(&mutex);
Expand Down
4 changes: 4 additions & 0 deletions src/runtime/runtime_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -213,10 +213,14 @@ extern "C" __attribute__((used)) void *halide_runtime_api_functions[] = {
(void *)&halide_d3d12compute_release_context,
(void *)&halide_d3d12compute_run,
(void *)&halide_vulkan_acquire_context,
(void *)&halide_vulkan_acquire_memory_allocator,
(void *)&halide_vulkan_device_interface,
(void *)&halide_vulkan_initialize_kernels,
(void *)&halide_vulkan_release_memory_allocator,
(void *)&halide_vulkan_release_context,
(void *)&halide_vulkan_run,
(void *)&halide_set_vulkan_acquire_context,
(void *)&halide_set_vulkan_release_context,
(void *)&halide_webgpu_device_interface,
(void *)&halide_webgpu_initialize_kernels,
(void *)&halide_webgpu_finalize_kernels,
Expand Down
168 changes: 156 additions & 12 deletions src/runtime/vulkan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,34 @@ using namespace Halide::Runtime::Internal::Vulkan;

// --------------------------------------------------------------------------

extern "C" {
namespace Halide {
namespace Runtime {
namespace Internal {
namespace Vulkan {

// --------------------------------------------------------------------------
ALWAYS_INLINE int vk_load_vulkan_interface(void *user_context, VkInstance instance, VkDevice device) {
if (vkGetInstanceProcAddr == nullptr) {
vk_load_vulkan_loader_functions(user_context);
if (vkGetInstanceProcAddr == nullptr) {
error(user_context) << "Vulkan: Failed to resolve loader functions!\n";
return halide_error_code_symbol_not_found;
}
}

vk_load_vulkan_instance_functions(user_context, instance);
if (vkGetPhysicalDeviceProperties == nullptr || vkGetDeviceProcAddr == nullptr) {
error(user_context) << "Vulkan: Failed to resolve instance functions!\n";
return halide_error_code_symbol_not_found;
}

vk_load_vulkan_device_functions(user_context, device);
if (vkCreateBuffer == nullptr || vkAllocateMemory == nullptr) {
error(user_context) << "Vulkan: Failed to resolve device functions!\n";
return halide_error_code_symbol_not_found;
}

return halide_error_code_success;
}

// The default implementation of halide_acquire_vulkan_context uses
// the global pointers above, and serializes access with a spin lock.
Expand All @@ -29,15 +54,15 @@ extern "C" {
// call to halide_release_vulkan_context. halide_acquire_vulkan_context
// should block while a previous call (if any) has not yet been
// released via halide_release_vulkan_context.
WEAK int halide_vulkan_acquire_context(void *user_context,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't be changed ... they need to match all the other runtimes.

halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
WEAK int default_vulkan_acquire_context(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
#ifdef DEBUG_RUNTIME
halide_start_clock(user_context);
#endif
Expand Down Expand Up @@ -74,11 +99,130 @@ WEAK int halide_vulkan_acquire_context(void *user_context,
return halide_error_code_success;
}

WEAK int halide_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
WEAK int default_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
halide_mutex_unlock(&thread_lock);
return halide_error_code_success;
}

WEAK halide_vulkan_acquire_context_t vulkan_acquire_context_handler =
default_vulkan_acquire_context;
WEAK halide_vulkan_release_context_t vulkan_release_context_handler =
default_vulkan_release_context;

} // namespace Vulkan
} // namespace Internal
} // namespace Runtime
} // namespace Halide

// --------------------------------------------------------------------------

extern "C" {

// --------------------------------------------------------------------------

WEAK int halide_vulkan_acquire_context(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance *instance,
VkDevice *device,
VkPhysicalDevice *physical_device,
VkQueue *queue,
uint32_t *queue_family_index,
VkDebugUtilsMessengerEXT *messenger,
bool create) {
return vulkan_acquire_context_handler(user_context, allocator, instance, device,
Copy link
Copy Markdown
Contributor

@derek-gerstmann derek-gerstmann May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be changed.

physical_device, queue, queue_family_index,
messenger, create);
}

WEAK int halide_vulkan_release_context(void *user_context, VkInstance instance, VkDevice device, VkQueue queue, VkDebugUtilsMessengerEXT messenger) {
return vulkan_release_context_handler(user_context, instance, device, queue, messenger);
}

WEAK halide_vulkan_acquire_context_t halide_set_vulkan_acquire_context(halide_vulkan_acquire_context_t handler) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't allowed. Overloading is currently handled via weak linking, and this needs to be the same across all runtimes.

halide_vulkan_acquire_context_t result = vulkan_acquire_context_handler;
vulkan_acquire_context_handler = handler ? handler : default_vulkan_acquire_context;
return result;
}

WEAK halide_vulkan_release_context_t halide_set_vulkan_release_context(halide_vulkan_release_context_t handler) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

halide_vulkan_release_context_t result = vulkan_release_context_handler;
vulkan_release_context_handler = handler ? handler : default_vulkan_release_context;
return result;
}

WEAK int halide_vulkan_acquire_memory_allocator(void *user_context,
halide_vulkan_memory_allocator **allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device) {
if (allocator == nullptr) {
error(user_context) << "Vulkan: allocator output pointer is null!\n";
return halide_error_code_buffer_argument_is_null;
}
if (instance == VK_NULL_HANDLE || device == VK_NULL_HANDLE || physical_device == VK_NULL_HANDLE) {
error(user_context) << "Vulkan: invalid external context handles for allocator acquisition!\n";
return halide_error_code_device_interface_no_device;
}

VulkanMemoryAllocator *runtime_allocator =
reinterpret_cast<VulkanMemoryAllocator *>(*allocator);
if (runtime_allocator != nullptr) {
if (runtime_allocator->current_device() != device ||
runtime_allocator->current_physical_device() != physical_device) {
error(user_context) << "Vulkan: external allocator does not match supplied device handles!\n";
return halide_error_code_internal_error;
}
return halide_error_code_success;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually return the allocator pointer. What's the intent?

}

const VkAllocationCallbacks *alloc_callbacks =
halide_vulkan_get_allocation_callbacks(user_context);

int error_code = vk_load_vulkan_interface(user_context, instance, device);
if (error_code != halide_error_code_success) {
return error_code;
}

runtime_allocator =
vk_create_memory_allocator(user_context, device, physical_device, alloc_callbacks);
if (runtime_allocator == nullptr) {
error(user_context) << "Vulkan: Failed to create memory allocator for external context!\n";
return halide_error_code_out_of_memory;
}

*allocator = reinterpret_cast<halide_vulkan_memory_allocator *>(runtime_allocator);
return halide_error_code_success;
}

WEAK int halide_vulkan_release_memory_allocator(void *user_context,
halide_vulkan_memory_allocator *allocator,
VkInstance instance,
VkDevice device,
VkPhysicalDevice physical_device) {
VulkanMemoryAllocator *runtime_allocator =
reinterpret_cast<VulkanMemoryAllocator *>(allocator);
if (runtime_allocator == nullptr) {
return halide_error_code_success;
}
if (instance == VK_NULL_HANDLE || device == VK_NULL_HANDLE || physical_device == VK_NULL_HANDLE) {
error(user_context) << "Vulkan: invalid external context handles for allocator release!\n";
return halide_error_code_device_interface_no_device;
}
if (runtime_allocator->current_device() != device ||
runtime_allocator->current_physical_device() != physical_device) {
error(user_context) << "Vulkan: external allocator does not match supplied device handles during release!\n";
return halide_error_code_internal_error;
}

if (vkDestroyShaderModule == nullptr || vkFreeMemory == nullptr) {
error(user_context) << "Vulkan: Failed to resolve device functions for external allocator release!\n";
return halide_error_code_symbol_not_found;
}

vk_destroy_shader_modules(user_context, runtime_allocator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you destroying shader modules in this method?

return vk_destroy_memory_allocator(user_context, runtime_allocator);
}

WEAK bool halide_vulkan_is_initialized() {
halide_mutex_lock(&thread_lock);
bool is_initialized = (cached_instance != nullptr) && (cached_device != nullptr);
Expand Down
Loading
Loading