Somewhere in your cluster, right now, a GPU that costs more per hour than a nice dinner is rendering exactly nothing. It is warm. It is allocated. It is, by every billing definition, hard at work. It is also idle.
This is a detective story. The victim is your budget. The suspects are everywhere.
The usual suspects
After enough investigations, the same culprits keep showing up.
Most idle-GPU waste is one of four characters. Learn their faces.
The Ghost Reservation. A job finished hours ago but never released its allocation. The GPU is held open for a process that no longer exists.
The Optimistic Over-Provisioner. Someone requested eight GPUs "to be safe" for a workload that uses one and a half.
The Phantom of the Dev Notebook. A Jupyter kernel opened on Tuesday, attached to an A100, and never closed. It is Thursday.
The Warm Standby That Forgot To Stand. Capacity pre-warmed for a traffic spike that, this time, politely declined to arrive.
Collecting evidence
You cannot convict on vibes. The tell is the gap between allocated and actually used:
# GPUs allocated but averaging < 5% utilization over the last hour
modelplane gpu list --allocated \
| mp-filter 'util_1h < 0.05' \
| sort-by cost_per_hour --descA healthy fleet has a small, explainable gap. A leaky one has a chasm.
| Signal | Busy GPU | Idle suspect |
|---|---|---|
| SM utilization (1h) | 60–95% | < 5% |
| Memory in use | High | A few MB |
| Owning process | Alive | Long gone |
| Last activity | Seconds | Hours |
Making the arrest
The fun part about a control plane is that it can close the case automatically. Idle reclamation is just a reconciliation rule: if a GPU's utilization stays below a threshold for longer than a grace window, and its owning workload is gone, take it back.
Always pair reclamation with a grace window and a notification. The goal is to recover obviously-wasted capacity, not to kill a job that's mid-checkpoint.
Case closed
We pointed this at one of our own clusters expecting a rounding error. We found that, on a quiet weekend, roughly a fifth of allocated GPU-hours were going to ghosts, phantoms, and optimists. Reclaiming them did not require buying anything. It required noticing.
So: go dust your fleet for fingerprints. The most expensive idle GPU is the one nobody is looking for.