-
Notifications
You must be signed in to change notification settings - Fork 613
Adding qemu vm driver support with GPU pass-through #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,6 +150,7 @@ impl KubernetesComputeDriver { | |
| driver_version: openshell_core::VERSION.to_string(), | ||
| default_image: self.config.default_image.clone(), | ||
| supports_gpu: self.has_gpu_capacity().await.unwrap_or(false), | ||
| gpu_count: 0, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: is a raw int rich enough here? Should a driver expose the valid names of devices that are available, for example?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, a raw int is limited. A richer repeated GpuDeviceInfo message (with BDF, device name, availability) on GetCapabilitiesResponse would let the CLI show available devices and validate --gpu-device client-side. I propose to address this, along with the previous one, in a follow-up PR. |
||
| }) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this is not specific to the VM driver and could be mapped to requests in k8s, Docker, or Podman?
As a follow up question: Does it make sense to allow
gpu_deviceto be specified multiple times to allow for multiple devices, or should validation (e.g. a comma-separated list) be delegated to the driver?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question on both points. Yes, --gpu and --gpu-device are intentionally driver-agnostic — the proto defines them on CreateSandboxRequest and DriverSandboxSpec, so k8s/Docker/Podman drivers can map them to their native GPU request mechanisms. For multi-device: today the proto field is a single string, so multi-GPU per sandbox would need a proto change (repeated string gpu_devices) plus inventory updates. I propose to update this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a follow-up. Would an issue to discuss how users are expected to request GPUs be a good place to have a follow-up discussion? Some of the basic use cases that I can see are:
A more advanced use case that one could also start discussing is when a user wants a sandbox with access to one or more GPUs with specific properties. I would assume that this could also be reduced to a set of driver-specific IDs though, so maybe it is sufficient to demonstrate this transform.