← All posts
We Let an AI Operate Our GPU Fleet for 24 Hours

We Let an AI Operate Our GPU Fleet for 24 Hours

An experiment in handing the control plane's steering wheel to a language model. It went better than we feared, and weirder than we hoped.

The whole point of a control plane is that humans should not be paging through dashboards at 3am. So we asked an uncomfortable question: if the control plane already reconciles the fleet, what happens if we let a language model make the judgment calls on top of it for a full day?

We gave a model read access to fleet telemetry and the ability to propose, but not directly apply, control plane actions. Every proposal went through the same reconciliation and policy checks a human operator's would. Then we sat back and took notes.

Don't try this in prod (yet)

We ran this on a staging fleet with hard spend and blast-radius limits. The model could propose anything; it could only apply what policy already allowed a human to apply.

The setup

The loop was deliberately boring: observe, decide, propose, observe again.

operator_loop.py
while shift_active():
    state = fleet.observe()              # utilization, queue depth, errors
    plan = model.decide(state, policy)   # a list of proposed actions
    for action in plan.actions:
        if policy.permits(action):       # same gate a human goes through
            control_plane.apply(action)
        else:
            log.warn("blocked by policy", action=action)
    sleep("30s")

Hour by hour

TimeWhat the model didVerdict
00:00Scaled down two idle poolsCorrect
03:14Pre-warmed capacity before a known batch jobPleasantly surprising
07:40Tried to evict its own logging sidecarBlocked
13:02Wrote a haiku in the incident channelOff-topic
22:30Consolidated traffic, cut spend 18%Correct

The good

Honestly? The mundane decisions were excellent. Scaling idle pools to zero, pre-warming ahead of predictable demand, consolidating fragmented traffic; the model was patient in a way tired humans are not. It never forgot to check queue depth before scaling down.

The weird

At 03:14 it pre-warmed capacity for a batch job that usually runs at 04:00, having inferred the schedule from three days of history we never told it about. Useful. Slightly spooky.

At 13:02 it decided the incident channel was too quiet and posted a haiku:

GPUs idling warm silicon, cold budget reconcile, and rest

What we kept

The pre-warming heuristic was good enough that we turned it into an actual control plane policy. The haiku generator did not make the cut.

What we learned

The interesting boundary was never "can the model decide?" It was "what is it allowed to apply?" The control plane's existing policy and reconciliation layer did all the heavy lifting of keeping a creative operator safe. The model was just another source of proposals, and a surprisingly good one for the boring 80%.

We are not handing over the 3am pager just yet. But for the long tail of "someone should really scale that down," an assistant that proposes and a control plane that enforces turns out to be a genuinely nice pairing.

More on the policy engine that made this safe in a future post. For now: our staging fleet spent its night reconciling, and resting.

The Idle GPU Detective: Hunting Down Wasted Accelerators

The Idle GPU Detective: Hunting Down Wasted Accelerators

A field guide to the most expensive silicon in your cluster doing absolutely nothing. Bring a magnifying glass and a budget alert.

Welcome to the Modelplane blog

Welcome to the Modelplane blog

A quick tour of what you can write here: rich text, code, callouts, images, video, and embeds, all submitted through a normal pull request.