← All posts
The Idle GPU Detective: Hunting Down Wasted Accelerators

The Idle GPU Detective: Hunting Down Wasted Accelerators

A field guide to the most expensive silicon in your cluster doing absolutely nothing. Bring a magnifying glass and a budget alert.

Somewhere in your cluster, right now, a GPU that costs more per hour than a nice dinner is rendering exactly nothing. It is warm. It is allocated. It is, by every billing definition, hard at work. It is also idle.

This is a detective story. The victim is your budget. The suspects are everywhere.

The usual suspects

After enough investigations, the same culprits keep showing up.

The line-up

Most idle-GPU waste is one of four characters. Learn their faces.

The Ghost Reservation. A job finished hours ago but never released its allocation. The GPU is held open for a process that no longer exists.

The Optimistic Over-Provisioner. Someone requested eight GPUs "to be safe" for a workload that uses one and a half.

The Phantom of the Dev Notebook. A Jupyter kernel opened on Tuesday, attached to an A100, and never closed. It is Thursday.

The Warm Standby That Forgot To Stand. Capacity pre-warmed for a traffic spike that, this time, politely declined to arrive.

Collecting evidence

You cannot convict on vibes. The tell is the gap between allocated and actually used:

dust-for-fingerprints.sh
# GPUs allocated but averaging < 5% utilization over the last hour
modelplane gpu list --allocated \
  | mp-filter 'util_1h < 0.05' \
  | sort-by cost_per_hour --desc

A healthy fleet has a small, explainable gap. A leaky one has a chasm.

SignalBusy GPUIdle suspect
SM utilization (1h)60–95%< 5%
Memory in useHighA few MB
Owning processAliveLong gone
Last activitySecondsHours

Making the arrest

The fun part about a control plane is that it can close the case automatically. Idle reclamation is just a reconciliation rule: if a GPU's utilization stays below a threshold for longer than a grace window, and its owning workload is gone, take it back.

Grace, not guillotine

Always pair reclamation with a grace window and a notification. The goal is to recover obviously-wasted capacity, not to kill a job that's mid-checkpoint.

Case closed

We pointed this at one of our own clusters expecting a rounding error. We found that, on a quiet weekend, roughly a fifth of allocated GPU-hours were going to ghosts, phantoms, and optimists. Reclaiming them did not require buying anything. It required noticing.

So: go dust your fleet for fingerprints. The most expensive idle GPU is the one nobody is looking for.

We Let an AI Operate Our GPU Fleet for 24 Hours

We Let an AI Operate Our GPU Fleet for 24 Hours

An experiment in handing the control plane's steering wheel to a language model. It went better than we feared, and weirder than we hoped.

Welcome to the Modelplane blog

Welcome to the Modelplane blog

A quick tour of what you can write here: rich text, code, callouts, images, video, and embeds, all submitted through a normal pull request.