From 9 hours a week babysitting deployments to 20 minutes reviewing what the agent already did. Here’s the exact stack.
It was a regular evening. Movie on. Phone face-down. Then Slack buzzed.
“API is down. Users can’t login.”
I knew the fix immediately connection pool exhausted, classic. Restart the service, bump the pool size, redeploy. Twenty minutes max. Except my deployment script decided that was a great night to die halfway through. SSH connection dropped. Restarted from scratch. Config edit directly in prod because I was panicking. Finally back up forty-five minutes later.
The movie had moved on without me. So had my will to live.
The next morning I pulled my deployment history for the last month. Forty-seven manual deployments. Average time per deploy: thirty-eight minutes. Total time spent: roughly thirty hours. That’s almost a full work week. Just deploying code. Manually. Like it’s 2015.
I didn’t have a DevOps problem. I had a “I never designed this workflow” problem. The deployments worked they just required me to babysit every single one like a process that couldn’t be trusted to run unsupervised. Spoiler: it couldn’t. Because nothing was automated and everything depended on me not making a typo at eleven PM.
That month I started wiring up Claude Code with seven tools. Not all at once one integration at a time over about ten weeks. What I ended up with is a deployment pipeline that runs itself, monitors itself, and tells me what happened in Slack while I’m doing literally anything else.
This is the exact stack, how it fits together, and the mistakes I made building it so you don’t have to repeat them.
TL;DR: Claude Code isn’t a chatbot you paste kubectl commands into. It’s a terminal-native agent. Give it real tool access and it becomes a DevOps co-pilot that actually operates your stack.
What Claude Code actually is (and what most devs get wrong)
Most people hear “Claude Code” and picture a smarter Copilot. Autocomplete with better vibes. A chatbot that knows what kubectl means. That mental model will make you use it wrong.
Claude Code is a terminal-native agentic AI. It runs in your shell, has access to your filesystem, executes commands, reads outputs, reacts to errors, and chains multiple actions together without you holding its hand through every step. It doesn’t suggest it does. You give it a goal, it figures out the steps, runs them, checks the output, and adjusts.
The unlock isn’t the AI. It’s what you give the AI access to.
Most devs use it like a search engine. “Hey Claude, how do I write a GitHub Actions workflow for Node?” Cool, you got an answer. You could’ve Googled that. The actual unlock is when you stop asking it things and start giving it access to things your repo, your CI config, your cluster credentials, your alerting setup. That’s when it stops being a productivity boost and starts being an actual workflow layer.
The first time it clicked for me: a GitHub Actions pipeline failed on a dependency conflict. Instead of me digging through logs, Claude Code read the failure output, identified the version mismatch, updated package.json, re-ran the workflow, and posted a Slack summary. I was in another tab. Didn't touch it once. That felt genuinely strange the good kind of strange, like the first time a cron job ran and you weren't sure whether to feel proud or nervous.
That’s the tool. What you wire it to determines what it’s capable of.
The 7 tools that make it a real DevOps co-pilot
Not a “here are some tools to explore” list. These are the seven things I connected, what each one does, and what Claude Code does with access to them.
1. Docker
The foundation. Claude Code writes the Dockerfile, builds the image, reads errors mid-build, and fixes them in the same session. Hand it your app structure and it handles multi-stage builds, layer caching, and base image optimization without you reading docs.
Dockerfile
# Claude Code generated this after reading the repo structure
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
I asked it to optimize my existing Dockerfile for size. Went from 850MB to 180MB by switching base images and adding multi-stage builds. Took three minutes.
Before: Manual builds, manual pushes, “works on my machine” debugging sessions. After: Claude Code builds, tags, and pushes. I review the output.
2. GitHub Actions
Your CI/CD pipeline lives in YAML that nobody enjoys writing and everyone breaks at least once a sprint. Claude Code reads your existing workflows, edits them, adds jobs, and when a run fails — reads the logs, finds the problem, patches the file, and pushes the fix.
# Claude Code added this after a deploy left no rollback path
- name: Rollback on failure
if: failure()
run: |
echo "Deploy failed. Rolling back..."
kubectl rollout undo deployment/myapp
kubectl rollout status deployment/myapp
Before: Hours debugging YAML indentation and missing environment variables. After: Claude Code generated 90% of my workflows. I just review and merge.
3. Kubernetes
Manifests are verbose, unforgiving, and somehow always hiding one wrong indent. Claude Code writes them from scratch, applies them via kubectl, reads pod status, checks logs, and rolls back when things go sideways.
Hcl
# Claude Code runs this sequence after a failed health check
kubectl rollout status deployment/myapp --timeout=60s
kubectl logs deployment/myapp --tail=50
kubectl rollout undo deployment/myapp
Before: Manual kubectl commands, SSH tunnels, VPN connections, pain. After: Claude Code manages the manifests. I review before apply.
4. Terraform
Infra-as-code is powerful and also a great way to accidentally delete a database. Claude Code generates configs, runs plans, reads the diff output, and applies changes. Excellent at scaffolding new resources and catching config drift.
# Claude Code generated this after "spin up a staging RDS instance"
resource "aws_db_instance" "staging" {
identifier = "myapp-staging"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.micro"
allocated_storage = 20
skip_final_snapshot = true
}
Always review the plan output before apply. Claude Code will show you the diff read it. This is the one place where “looks good” is not sufficient review.
Before: Manual AWS console clicking, forgetting what I configured, breaking things. After: terraform apply → infrastructure deployed. Version controlled. Reproducible.
5. ArgoCD
This one changed how I think about deployments entirely. ArgoCD watches your Git repo when you push a new Kubernetes manifest, it automatically syncs it to your cluster. Your Git repo becomes the single source of truth. No manual kubectl apply. No "did I deploy the right version?" confusion.
# ArgoCD application config — Claude Code generated this
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
spec:
source:
repoURL: https://github.com/myorg/myapp
path: k8s/
targetRevision: main
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Before: Manual kubectl commands, wondering if staging and prod were in sync. After: Push to Git → ArgoCD syncs → deployed. Claude Code generates all the manifests.
6. Datadog
Observability is only useful if someone’s actually reading it. Claude Code connected to Datadog reads active alerts, pulls recent metrics, correlates a spike with a recent deploy, and suggests whether to roll back or hold. It caught a memory leak mid-deploy before PagerDuty even fired.
# Claude Code queried this after spotting a latency anomaly
curl -X GET "https://api.datadoghq.com/api/v1/events" <br> -H "DD-API-KEY: $DD_API_KEY" <br> -H "DD-APPLICATION-KEY: $DD_APP_KEY" <br> -d "start=$(date -d '30 minutes ago' +%s)&end=$(date +%s)"
Before: Found out about problems from angry users in support tickets. After: Found out about problems before users noticed. Sometimes before I noticed.
7. Slack + PagerDuty
Incident response is 40% fixing things and 60% telling people what’s happening. Claude Code wired into Slack means when something breaks it’s already posting the incident summary, updating the right channels, and drafting the runbook while you’re still figuring out what’s on fire.
# Auto-posted to #incidents during a recent outage
curl -X POST https://slack.com/api/chat.postMessage <br> -H "Authorization: Bearer $SLACK_TOKEN" <br> -d "channel=#incidents" <br> -d "text=🚨 Deploy myapp:v2.3.1 caused 500 spike. Rolling back. ETA 3 min."
I also built a Slack bot with Claude’s help that answers questions like “what’s the current error rate?” by querying Datadog and replying inline. Took an afternoon to set up. Saved dozens of dashboard context-switches since.
Before: Check GitHub. Check Datadog. Check ArgoCD. Check logs. Repeat. After: Everything surfaces in Slack. One place. One thread per incident.

A real deployment flow, start to finish
Let me walk you through an actual deploy. Not a sanitized demo a real one, including the part where it almost did something stupid.
The stack: Node.js API, GitHub repo, Docker builds, Kubernetes cluster on AWS, ArgoCD for GitOps sync, Datadog for monitoring, Slack for everything else. Standard mid-size setup.
I pushed a feature branch, opened a PR, and from that point Claude Code handled the rest.
Step 1: PR opened, CI triggered, test failure caught
Claude Code detected the new PR via the GitHub MCP server and checked the Actions workflow status. First run failed a null reference in the auth middleware I’d missed locally.
# Claude Code read the Actions log and identified the failure
gh run view 8842931 --log-failed
# Found the issue, patched it, pushed the fix
git add src/middleware/auth.js
git commit -m "fix: null check on req.user before role validation"
git push origin feature/user-permissions
No ping. No “hey can you check this.” It just fixed it and moved on.
Step 2: Tests passed, image built and pushed
Once the workflow went green, Claude Code built the Docker image and pushed it to ECR.
docker build -t myapp:v2.4.0 .
docker tag myapp:v2.4.0 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:v2.4.0
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:v2.4.0
Build time: three minutes. My involvement: zero.
Step 3: Manifest updated, ArgoCD synced to cluster
Claude Code updated the image tag in the Kubernetes deployment manifest and pushed it to the repo. ArgoCD detected the change and synced it to the cluster automatically.
# Claude Code updated this before pushing to Git
spec:
containers:
- name: myapp
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:v2.4.0
# ArgoCD sync status check
argocd app get myapp --refresh
argocd app wait myapp --health
Staging looked clean. Health checks passed. Two minutes of log monitoring, no anomalies. Moved to prod.
Step 4: The moment I had to step in
Datadog flagged a latency spike mid-rollout response times jumping from 120ms to 800ms. Claude Code caught the alert, paused the rollout, and queued up a response. Here’s what it was about to run:
# This is what it queued — I cancelled it before it executed
kubectl scale deployment/myapp --replicas=2 # was 4
kubectl rollout undo deployment/myapp
Scaling down during a latency spike is exactly backwards. It would’ve made things significantly worse. I cancelled the scale-down, let the rollback run on its own, and the latency resolved in under two minutes cold-start spike from the new image, totally normal for this service. Claude Code didn’t have that context. I did. That’s the job now.
Step 5: Slack summary, automatic
Once metrics stabilized, this landed in #deployments without me asking:
✅ myapp deploy v2.4.0 → rolled back to v2.3.8
Reason: Latency spike detected (120ms → 800ms)
Rollback time: 2 minutes
Next step: Re-deploy with extended warm-up period
Clean. Accurate. The kind of update that normally takes five minutes to write while you’re already stressed about something else.
Total time from PR open to resolved rollback: twenty minutes. My actual involvement: cancelling two kubectl commands and making one judgment call about a cold-start spike.
That’s the workflow. Not magic just well-connected tooling with an agent in the middle that actually understands your stack.

The numbers (ecause vibes aren’t a deployment metric)
Three months of this stack. Here’s what actually changed.
Deployment metrics

The deploy frequency jump is the one people don’t expect. When deployments are painful you batch changes to minimize how often you do them. When they’re automated and four minutes long you ship smaller, ship faster, and catch issues earlier. The whole engineering rhythm changes.
Monthly cost breakdown

The ROI math
Thirty-six hours saved per month. If your hourly rate is $50, that’s $1,800 in reclaimed time every month. Spent $51 to get there.
That’s a 3,400% ROI. A dedicated DevOps hire at market rate runs $120K+ annually. This stack costs $612 a year and covers the bulk of what that role was doing the repetitive, process-heavy, script-running parts that consumed the most clock time.
The parts it doesn’t cover architecture decisions, incident judgment calls, the scale-down situation from the last section those still need a human. But that’s maybe 20% of what the role actually looked like day to day.
The honest version: the $120K number is provocative but not wrong. One senior engineer with this stack can operate infrastructure that previously needed a dedicated ops person. Whether that’s exciting or uncomfortable depends entirely on which side of that equation you’re sitting on.
Mistakes I made (so you don’t have to repeat them)
Three months of building this taught me more through failures than wins. Here are the three that cost real time and one that almost cost a prod database.
Mistake 1: Trying to automate everything at once
Week one I sat down and tried to wire up all seven tools over a single weekend. GitHub Actions, Docker, Kubernetes, Terraform, ArgoCD, Datadog, Slack all of it, simultaneously, from scratch.
By Sunday night I had seven half-working integrations, a broken cluster, and genuine regret.
What actually worked was one tool at a time with a week of real usage before adding the next:

Each tool compounded on the last. By week ten the stack felt coherent because I actually understood every layer of it. The weekend approach would’ve given me a fragile house of cards I didn’t understand and couldn’t debug.
Mistake 2: Trusting AI-generated configs blindly
Claude Code once generated a Kubernetes deployment manifest with zero resource limits. Looked completely valid. Passed syntax checks. I deployed it without reading it carefully.
It consumed all available cluster memory inside twenty minutes and took down two other services running on the same nodes.
# What Claude Code generated — notice what's missing
spec:
containers:
- name: myapp
image: myapp:v1.2.0
ports:
- containerPort: 3000
# No resources block. No limits. No requests.
# This will eat your cluster alive.
# What it should have included
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
The same thing happened with a Terraform config that referenced a resource argument that doesn’t exist in the current provider version generated confidently, failed on apply. Always run terraform plan. Always read the Kubernetes manifest before kubectl apply. The AI doesn't know what it doesn't know and it won't flag its own blind spots.
Mistake 3: No rollback plan in the early setup
First two months I had automated deployments but no automated rollback trigger. Which meant a bad deploy looked like this: automated push to prod, health checks fail, panic, manual kubectl rollout undo, five minutes of scrambling that felt like thirty.
The fix was simple and I should have built it on day one:
# Added to every GitHub Actions deploy job
- name: Verify deployment health
run: |
kubectl rollout status deployment/myapp --timeout=120s || <br> (kubectl rollout undo deployment/myapp && exit 1)
- name: Post rollback notice to Slack
if: failure()
run: |
curl -X POST https://slack.com/api/chat.postMessage <br> -H "Authorization: Bearer $SLACK_TOKEN" <br> -d "channel=#deployments" <br> -d "text=⚠️ Deploy failed. Auto-rolled back to previous version."
Every deployment now has an automatic rollback trigger if health checks don’t pass within two minutes. Slack gets notified either way. I find out what happened from a clean summary, not from a user complaint.
The pattern across all three mistakes is the same: moving faster than your understanding. The stack rewards patience. Add one thing, break it, understand why, fix it, then add the next thing. The engineers who try to skip that process are the ones who end up with automation they’re afraid to touch.
The DevOps role isn’t dying. It’s collapsing into you.
Here’s the take: the traditional DevOps engineer role the one that’s mostly deployment pipelines, manual runbooks, and being the person who knows which kubectl flag to toggle at midnight that role is getting absorbed into the senior engineer. The skills aren’t disappearing. They’re becoming table stakes for everyone who ships software.
Automation doesn’t eliminate work. It eliminates bullshit.
I still write code. I still handle incidents. I still make judgment calls like the scale-down situation that would’ve made a latency spike significantly worse. What I don’t do anymore is spend nine hours a week babysitting deployment scripts that were never reliable enough to trust unsupervised anyway.
The $120K number in the title is real but it’s also the wrong frame. This isn’t about replacing a person. It’s about finally treating your deployment workflow like a system worth designing instead of a chore worth tolerating. One engineer with this stack can operate infrastructure that previously needed a dedicated ops role. Whether that’s exciting or uncomfortable depends on where you’re sitting.
What comes next is more interesting full IAM integration, end-to-end autonomous deploys, agents with scoped cloud permissions handling routine operations without a human in the approval chain. Some teams are already there. Most aren’t ready for the conversation about who owns the incident when the agent breaks something.
I don’t have a clean answer for that yet. I suspect nobody does.
Drop your current deployment setup or your hot take in the comments. Especially curious how teams are handling the governance question because that’s the conversation the industry is quietly avoiding.
Top comments (0)