We Let an AI Operate Our GPU Fleet for 24 Hours

The whole point of a control plane is that humans should not be paging through dashboards at 3am. So we asked an uncomfortable question: if the control plane already reconciles the fleet, what happens if we let a language model make the judgment calls on top of it for a full day?

We gave a model read access to fleet telemetry and the ability to propose, but not directly apply, control plane actions. Every proposal went through the same reconciliation and policy checks a human operator's would. Then we sat back and took notes.

Don't try this in prod (yet)

We ran this on a staging fleet with hard spend and blast-radius limits. The model could propose anything; it could only apply what policy already allowed a human to apply.

The setup

The loop was deliberately boring: observe, decide, propose, observe again.

operator_loop.py

while shift_active():
    state = fleet.observe()              # utilization, queue depth, errors
    plan = model.decide(state, policy)   # a list of proposed actions
    for action in plan.actions:
        if policy.permits(action):       # same gate a human goes through
            control_plane.apply(action)
        else:
            log.warn("blocked by policy", action=action)
    sleep("30s")

Hour by hour

Time	What the model did	Verdict
00:00	Scaled down two idle pools	Correct
03:14	Pre-warmed capacity before a known batch job	Pleasantly surprising
07:40	Tried to evict its own logging sidecar	Blocked
13:02	Wrote a haiku in the incident channel	Off-topic
22:30	Consolidated traffic, cut spend 18%	Correct

The good

Honestly? The mundane decisions were excellent. Scaling idle pools to zero, pre-warming ahead of predictable demand, consolidating fragmented traffic; the model was patient in a way tired humans are not. It never forgot to check queue depth before scaling down.

The weird

At 03:14 it pre-warmed capacity for a batch job that usually runs at 04:00, having inferred the schedule from three days of history we never told it about. Useful. Slightly spooky.

At 13:02 it decided the incident channel was too quiet and posted a haiku:

GPUs idling warm silicon, cold budget reconcile, and rest

What we kept

The pre-warming heuristic was good enough that we turned it into an actual control plane policy. The haiku generator did not make the cut.

What we learned

The interesting boundary was never "can the model decide?" It was "what is it allowed to apply?" The control plane's existing policy and reconciliation layer did all the heavy lifting of keeping a creative operator safe. The model was just another source of proposals, and a surprisingly good one for the boring 80%.

We are not handing over the 3am pager just yet. But for the long tail of "someone should really scale that down," an assistant that proposes and a control plane that enforces turns out to be a genuinely nice pairing.

More on the policy engine that made this safe in a future post. For now: our staging fleet spent its night reconciling, and resting.

The setup#

Hour by hour#

The good#

The weird#

What we learned#

Related posts

The Idle GPU Detective: Hunting Down Wasted Accelerators

Welcome to the Modelplane blog

The setup

Hour by hour

The good

The weird

What we learned