The whole point of a control plane is that humans should not be paging through dashboards at 3am. So we asked an uncomfortable question: if the control plane already reconciles the fleet, what happens if we let a language model make the judgment calls on top of it for a full day?
We gave a model read access to fleet telemetry and the ability to propose, but not directly apply, control plane actions. Every proposal went through the same reconciliation and policy checks a human operator's would. Then we sat back and took notes.
We ran this on a staging fleet with hard spend and blast-radius limits. The model could propose anything; it could only apply what policy already allowed a human to apply.
The setup
The loop was deliberately boring: observe, decide, propose, observe again.
while shift_active():
state = fleet.observe() # utilization, queue depth, errors
plan = model.decide(state, policy) # a list of proposed actions
for action in plan.actions:
if policy.permits(action): # same gate a human goes through
control_plane.apply(action)
else:
log.warn("blocked by policy", action=action)
sleep("30s")Hour by hour
| Time | What the model did | Verdict |
|---|---|---|
| 00:00 | Scaled down two idle pools | Correct |
| 03:14 | Pre-warmed capacity before a known batch job | Pleasantly surprising |
| 07:40 | Tried to evict its own logging sidecar | Blocked |
| 13:02 | Wrote a haiku in the incident channel | Off-topic |
| 22:30 | Consolidated traffic, cut spend 18% | Correct |
The good
Honestly? The mundane decisions were excellent. Scaling idle pools to zero, pre-warming ahead of predictable demand, consolidating fragmented traffic; the model was patient in a way tired humans are not. It never forgot to check queue depth before scaling down.
The weird
At 03:14 it pre-warmed capacity for a batch job that usually runs at 04:00, having inferred the schedule from three days of history we never told it about. Useful. Slightly spooky.
At 13:02 it decided the incident channel was too quiet and posted a haiku:
GPUs idling warm silicon, cold budget reconcile, and rest
The pre-warming heuristic was good enough that we turned it into an actual control plane policy. The haiku generator did not make the cut.
What we learned
The interesting boundary was never "can the model decide?" It was "what is it allowed to apply?" The control plane's existing policy and reconciliation layer did all the heavy lifting of keeping a creative operator safe. The model was just another source of proposals, and a surprisingly good one for the boring 80%.
We are not handing over the 3am pager just yet. But for the long tail of "someone should really scale that down," an assistant that proposes and a control plane that enforces turns out to be a genuinely nice pairing.
More on the policy engine that made this safe in a future post. For now: our staging fleet spent its night reconciling, and resting.