Originally published at devopsdiary.blog
The first time I asked ChatGPT to write a deployment runbook, it did. That was the problem.
The output was close enough to be dangerous: kubectl steps, rollback sequence, health check endpoints. Structured, clear, apparently professional. But it had no idea whether any of it belonged to us. Whether our tooling matched what it described. Whether the service was subject to SOX controls or just basic SLO monitoring.
It wrote a competent generic runbook for a deployment process that didn't exist at our org. We already had something for that. It was called Google.
What December 2023 looked like from a platform engineering chair
By late 2023, ChatGPT had been public for a year. Engineers had stopped being surprised by it and started depending on it. The use cases I was seeing were real: code generation, IaC explanations, documentation drafts, commit messages, Jira ticket descriptions. (I'll confess I was writing Jira tickets with it too, so the skepticism was not entirely consistent.) The productivity numbers were positive. Nobody was exaggerating those.
What was quietly getting ignored was the distribution problem.
AI tools are generative. They produce outputs shaped by training data, not by your specific environment. A runbook they write doesn't know your Kubernetes version, your FluxCD reconciliation loop, or how your org defines "deployment-ready." A code snippet they generate doesn't know that your team settled a particular pattern dispute two years ago and the outcome is buried in a Confluence page nobody's touched since.
The outputs weren't wrong randomly. They were wrong the specific, consistent way that generic things are wrong when applied to specific contexts. Plausible on the surface. Missing the thing that mattered underneath.
I'd watched this pattern before
Every major technology shift I'd lived through follows the same arc: a capability arrives, adoption spreads before the infrastructure does, and governance shows up late to clean up what enthusiasm left behind.
CASE tools in the 90s. Offshore development in the early 2000s. Agile in the 2010s. Each one had a legitimate productivity case. Each one also created a class of problems nobody had built the infrastructure to handle, because they were busy with the first wave of adoption.
DevOps was different, because DevOps was about the infrastructure. You couldn't do GitOps without pipelines, and pipelines forced the governance questions into the open: Who owns this? What's the approval gate? What does rollback look like? The tooling made the governance visible whether you wanted it to be or not.
AI code generation skips that forcing function. The output is a file. You can add it to a repo without any pipeline knowing the difference. You can ship it without triggering a single question about where it came from or what reviewed it. The capability arrived with a lower floor for adoption than anything I'd seen before.
That observation sits differently when you've watched the same movie play out four times.
| Era | Technology | Outcome |
|---|---|---|
| 1990s | CASE tools | Fast generation. Governance: years later. |
| 2000s | Offshore development | Scale arrived. Governance: years later. |
| 2010s | Agile | Velocity up. Governance: sometimes never. |
| 2015+ | DevOps | Exception — pipelines forced governance from day one. |
| 2023+ | AI code generation | The output is a file. Pipeline optional. |
The question nobody was asking
By the end of 2023, the industry conversation was focused on capability: what can the model do, how accurate is the output, what's the context window. Reasonable things to care about, if you're evaluating whether to adopt the tool.
The question I kept waiting for someone to ask was the governance question. Not in a compliance-checkbox sense. In the same way platform engineers ask it about any new delivery mechanism.
Who reviews AI-generated code before it ships? If a model hallucinates a library dependency that doesn't exist, what catches it before it reaches a build? When AI writes your runbooks, how does your audit process know those runbooks were AI-assisted? If something breaks at 2am and the runbook was written by a model that's never seen your infrastructure, what does your on-call engineer actually do with it?
None of these are theoretical. They're the same class of problems platform teams have always solved for human engineers. We just hadn't started solving them for AI output yet.
The gap wasn't surprising. It was predictable. The surprise was how few people seemed bothered by it.
I started writing things down.
Top comments (0)