I was staring at roughly 10,000 lines of network rules spread across a live cloud environment. Two environments, dev and prod, two regions each, all handled by their own separate configuration files. The task was to cross-check what had already been imported into Terraform and what hadn't, and then split the rules correctly across all those files. That kind of task could easily take weeks to do carefully by hand. With an LLM doing the heavy lifting, I was done in three hours.
That was the moment the mental model clicked for me.
And yes, these were network security rules. But here is the thing: in a Terraform import workflow, the tooling itself is the safety net. The goal is a 1:1 match between your IaC and the actual state of the environment. If the AI-generated configuration has any drift from reality, Terraform tells you immediately when you run the import. You are not trusting the AI blindly, you are using it to do the repetitive work and then letting Terraform verify the result. That is a very different risk profile from asking an LLM to design your network security from scratch.
I have been using AI tools as a regular part of my DevOps work for about a year now, not just occasionally but daily, across hobby projects, volunteer work and professional infrastructure and development work. I come at this as a lead DevOps consultant with over 20 years in IT, so I have a pretty good baseline for what good looks like and what bad looks like.
After a year of this, I have some clear opinions about what these tools are actually good for, where they fall apart, and what way of thinking about them actually helps in practice.
This is not a model comparison and not a benchmark. Those exist already. This is just what I have noticed from using LLMs in real DevOps work.
A year of real use
The work has covered:
- coding in JavaScript, TypeScript, Golang and Java
- IaC with Terraform and Terragrunt
- configuration management with Ansible
- CI/CD work on GitLab, GitHub, Docker Compose and Helm
I have tried several models, including Claude (Sonnet, Opus, Haiku), ChatGPT, Gemini and Grok, and different IDEs like VS Code and Cursor. The models do have different strengths and there are clear gaps between them, but I am not going to get into that here. What I want to talk about is what using all of them has taught me about AI-assisted DevOps work in general.
The pattern that kept repeating
Across all that work, one pattern kept showing up.
When I gave the model clear context, a well-scoped task and some constraints to work within, the output was fast and impressively good. When I gave it an open-ended problem or let things run without much correction, the quality dropped quickly. Dead code started accumulating, inconsistent patterns appeared and the model started looping through variations of the same wrong answer.
The difference was not which model I was using. The difference was how much structure I brought to the interaction.
And that structure comes directly from your own maturity and experience in the domain. The more you know, the more precisely you can specify what you want, and the better the output gets. This is probably the most underappreciated factor in how well LLMs actually perform in practice.
Compare these two prompts for the same task:
"Create pipeline that deploys my nodejs app"
versus:
"Create CI/CD pipelines for pull requests and deploying on main branch. Add quality gates to the PR pipeline: format, lint, security, build and docker build. In the main pipeline do docker builds and use the registry for cached images to make builds faster. On the Dockerfiles use multi-stage builds where possible to keep the final image small, and make sure we are not running as root. Make the pipelines DRY on the sections that overlap"
The second prompt does not just describe what to build. It reflects years of experience with CI/CD, Docker best practices and security thinking. Someone without that background would not even know to ask for those things. The model cannot supply that knowledge from its own side, it can only work with what you give it.
It is not that the model is bad. It just has no stake in the outcome and no experience to fall back on. It will produce output either way. The quality of that output depends almost entirely on the quality of the guidance behind it.
A very fast junior engineer
The mental model that finally made this click for me: an LLM behaves like a very fast junior engineer.
A good junior can produce a lot of work quickly and they follow clear instructions well. But they struggle with architectural decisions, tend to go with the most obvious approach rather than the most appropriate one, and need supervision.
They act this way not because they are useless but because they lack the context and experience to make the right call on their own. Leave them unsupervised long enough and small decisions start to compound into bigger problems.
LLMs behave exactly like this, just at roughly ten times the speed. The speed is real and genuinely useful, but it does not change the underlying dynamic.
There is an important flip side to this that is worth saying directly: the analogy only works if you actually are the senior. If you jump into a domain you know nothing about, the dynamic inverts. The model becomes the one with more apparent knowledge and you have no real basis to supervise it. You cannot catch the bad architectural decisions because you do not recognise them. That is when you get the worst outcomes: confident-sounding output that is quietly wrong in ways that take a long time to find and fix.
When you accept this framing, a few things shift in how you work:
- Your job becomes that of an architect instead of a typist. You define the structure, the constraints, the approach. The model handles the execution.
- Structuring the problem well matters more than prompting well. A well-defined task with clear context will beat a cleverly worded prompt for an undefined problem every time.
- You still need to know your domain. The better you understand the area you are working in, the better you can guide the model and catch its mistakes. Domain expertise is not optional, it is what makes the supervision possible in the first place.
What this looks like in practice
One thing that has helped quite a lot is writing documentation and conventions that both humans and the model can use. Not AI-specific memory tools or special prompting tricks, but actual documentation that would exist for your team anyway. Things like guidelines in a Terraform modules folder, pipeline conventions, naming rules.
When that structure exists and you give the model access to it, it follows the established patterns instead of inventing new ones. The corrections get smaller and the output actually fits the system you are building.
The other thing is knowing when to stop iterating with the model and just fix something yourself. Sometimes two or three rounds of back and forth are not making progress and the model is just looping. At that point the fastest path forward is usually to step in, fix the specific issue yourself, and re-engage the model for the work around it.
The takeaway
AI is already genuinely useful in DevOps workflows. The value you get out of it scales with the quality of the supervision and structure you bring as the engineer. The model is the junior. You are the senior. That dynamic does not disappear as the tools get faster or more capable.
The rest of this series goes into the details. The next piece looks at where LLMs consistently struggle and why the failures are harder to catch than they look. After that, a concrete GitLab pipeline example that most DevOps engineers will recognise. And then the positive story: why importing existing infrastructure into Terraform is one of the best use cases for LLMs I have found.
If you have been using LLMs in DevOps or platform engineering work, I am curious what mental model you have settled on. Does the junior engineer analogy match your experience or have you found a better way to think about it?
Top comments (0)