The Idle GPU Detective: Hunting Down Wasted Accelerators

Somewhere in your cluster, right now, a GPU that costs more per hour than a nice dinner is rendering exactly nothing. It is warm. It is allocated. It is, by every billing definition, hard at work. It is also idle.

This is a detective story. The victim is your budget. The suspects are everywhere.

The usual suspects

After enough investigations, the same culprits keep showing up.

The line-up

Most idle-GPU waste is one of four characters. Learn their faces.

The Ghost Reservation. A job finished hours ago but never released its allocation. The GPU is held open for a process that no longer exists.

The Optimistic Over-Provisioner. Someone requested eight GPUs "to be safe" for a workload that uses one and a half.

The Phantom of the Dev Notebook. A Jupyter kernel opened on Tuesday, attached to an A100, and never closed. It is Thursday.

The Warm Standby That Forgot To Stand. Capacity pre-warmed for a traffic spike that, this time, politely declined to arrive.

Collecting evidence

You cannot convict on vibes. The tell is the gap between allocated and actually used:

dust-for-fingerprints.sh

# GPUs allocated but averaging < 5% utilization over the last hour
modelplane gpu list --allocated \
  | mp-filter 'util_1h < 0.05' \
  | sort-by cost_per_hour --desc

A healthy fleet has a small, explainable gap. A leaky one has a chasm.

Signal	Busy GPU	Idle suspect
SM utilization (1h)	60–95%	< 5%
Memory in use	High	A few MB
Owning process	Alive	Long gone
Last activity	Seconds	Hours

Making the arrest

The fun part about a control plane is that it can close the case automatically. Idle reclamation is just a reconciliation rule: if a GPU's utilization stays below a threshold for longer than a grace window, and its owning workload is gone, take it back.

Grace, not guillotine

Always pair reclamation with a grace window and a notification. The goal is to recover obviously-wasted capacity, not to kill a job that's mid-checkpoint.

Case closed

We pointed this at one of our own clusters expecting a rounding error. We found that, on a quiet weekend, roughly a fifth of allocated GPU-hours were going to ghosts, phantoms, and optimists. Reclaiming them did not require buying anything. It required noticing.

So: go dust your fleet for fingerprints. The most expensive idle GPU is the one nobody is looking for.

The usual suspects#

Collecting evidence#

Making the arrest#

Case closed#

Related posts

We Let an AI Operate Our GPU Fleet for 24 Hours

Welcome to the Modelplane blog

The usual suspects

Collecting evidence

Making the arrest

Case closed