GPUMan is a reliability tool designed for shared GPU environments. It prevents a single misbehaving process from triggering a catastrophic, system-wide CUDA Out-of-Memory (OOM) failure.
TL;DR: GPUMan prevents a single CUDA process from crashing all other GPU workloads by enforcing per-process VRAM limits using NVML.
- Per-Process Limits: Enforces strict memory ceilings for individual CUDA workloads.
- Proactive Enforcement: Monitors usage via NVML and terminates offending processes before they can destabilize the GPU.
- Non-Invasive: Provides a practical safeguard without requiring specialized hardware features (like MIG) or complex code changes.
- Operational Safety: Protects shared workstations, CI runners, and ad-hoc services from "noisy neighbor" crashes.
Modern GPU infrastructure is increasingly shared — whether across multiple users on a single workstation, concurrent jobs on a cluster node, or automated CI/CD runners. Despite this, CUDA provides no native per-process memory isolation on consumer or mid-range enterprise hardware.
This lack of isolation leads to a critical operational failure mode:
- Global Instability: When a single misbehaving process exhausts available VRAM, the resulting Out-of-Memory (OOM) error is not isolated. It often causes all active CUDA contexts on that GPU to fail simultaneously.
- Workload Disruption: Unrelated processes—including long-running training jobs or production inference services—can crash or become unresponsive, leading to significant data loss and downtime.
- The "Noisy Neighbor" Effect: In a shared environment, a single user’s error can effectively destabilize the entire system, rendering the hardware unusable for others.
GPUMan addresses these risks by providing a practical enforcement layer. By monitoring memory via NVIDIA’s NVML interface and enforcing strict per-process limits, it terminates offending workloads before they can reach the threshold of a catastrophic, GPU-wide failure.
GPUMan is intended for:
- Shared research workstations
- CI runners with GPU access
- Multi-user development servers
- Ad-hoc inference or training environments
It is not a replacement for hardware isolation (MIG) or a full scheduler.
GPUMan consists of two executables:
- A user facing CLI
- A long-running daemon
The CLI translates user intent into structured messages sent to the daemon via IPC (Unix Domain Sockets). It also uses the information to update a manifest of managed processes stored on disk.
The daemon owns all privileged operations such as process supervision and GPU monitoring. On startup, the daemon restores and supervises all processes defined in the on-disk manifest. It periodically polls NVML for memory usage statistics and compares them against the limits specified by the user. If a process exceeds its configured limit, the daemon terminates it before a global CUDA OOM can occur.
The daemon also listens for commands from the CLI to add, remove, restart, or update managed processes. Also, the daemon can report the current status of GPU memory usage and the list of managed processes. The CLI provides a user-friendly interface to interact with these features.
On my development system, I am using Ubuntu 24.04 LTS with an NVIDIA RTX 4050 GPU and the proprietary NVIDIA drivers installed.
The project depends on NVML (NVIDIA Management Library) and CUDA, though CUDA is only to develop test binaries for the main program. To install these dependencies on a Debian-based system, run the following commands:
sudo apt install libnvidia-ml-dev nvidia-cuda-toolkitGPUMan is built using CMake. To build the project, ensure you have CMake and a compatible C++ compiler installed. This can be done on a Debian-based system with the following command:
sudo apt install cmake build-essential clang ninja-buildMy attempt is to make everything compiler and generator agnostic, but I'm testing it on a configuration that uses Clang and Ninja and will try my best to make sure that at least this configuration works well. To clone and build the project, run the following commands:
git clone https://github.com/Nalin-Angrish/GPUMan.git
cd GPUMan
cmake --preset ninja-clang
cd build
ninja
sudo ninja install
cd ..
chmod +x postinstall.sh
sudo ./postinstall.shThe GPUMan executable and the daemon are installed in the system directories after running ninja install. To run the GPUMan daemon, use the following command:
sudo systemctl daemon-reload
sudo systemctl start gpumand.serviceAnd to enable it to start on boot, use:
sudo systemctl enable gpumand.serviceTo use the GPUMan CLI, you will need to add your user to the gpuman group. This can be done with the following command:
sudo usermod -aG gpuman $USERYou will need to log out and log back in for the group change to take effect.
To run a CUDA application with GPUMan enforcing a memory limit, use the following command:
gpuman run --tag <a_tag_for_the_process> --command <command_to_execute_in_terminal> --memory <max_memory_in_bytes>Do note that an executable that is not on the PATH will need its full path specified. For example:
gpuman run --tag test_cuda_app --command $PWD/build/cuda_mem_stable --memory 536870912Usage: gpuman [--help] [--version] {proclist,remove,restart,run,status,update}
GPU memory manager for multi-tenant CUDA workloads
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
Subcommands:
proclist View all processes using GPU memory
remove Remove a running application from GPU memory management
restart Restart a running application from GPU memory management
run Run a CUDA application with GPU memory management
status Show the status of GPU memory usage
update Update the GPU memory management configuration for a running application
Detailed help for each subcommand and the arguments that can be passed can be accessed using gpuman <subcommand> --help.
🔤 This video has no audio — please enable captions for context.
- Enforcement is polling-based (not instantaneous)
- Memory spikes between polls may briefly exceed limits
- No true isolation (hardware MIG required for that, and is not supported on consumer GPUs)
- No support for multi-GPU scheduling in v1
These tradeoffs are explicit and documented by design.
GPUMan is intentionally scoped, but designed to evolve. Planned future features include:
-
Allocation Interception
- Hook
cudaMalloc/cudaMallocAsync - Enforce limits at allocation time
- Reduce reliance on polling
- Hook
-
Kubernetes Integration
- GPUMan as a device-plugin companion
- Per-pod GPU memory quotas
- Node-level GPU protection
-
Soft Limits & Throttling
- Warning thresholds
- Graceful degradation
- Priority-aware enforcement
-
NVIDIA MIG AwarenessSupport MIG-partitioned GPUsEnforce limits per MIG sliceStronger isolation guarantees- I really want this but don't have hardware to test on, so I'm leaving it out for now.