Nimesh Kulkarni

Posted on May 23

From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP

#devops #ai #mcp #cicd

From YAML to AI agents: building smarter DevOps pipelines with MCP

DevOps teams have spent years turning manual work into YAML.

That helped. CI runs on every pull request. Deployments can be triggered from a commit. Kubernetes can reconcile desired state. Terraform can plan infrastructure before it changes anything.

But a lot of DevOps work still sits outside the pipeline:

reading failed CI logs
checking whether a deployment is safe
connecting traces, alerts, recent commits, and infra changes
deciding whether to roll forward or roll back
writing the same runbook steps again and again
asking five tools for the same incident context

This is where AI automation gets interesting. Not as a magic replacement for DevOps engineers, but as a better interface for operational work.

The strongest version of this stack is not just "AI in CI/CD." It is an AI-native DevOps layer built around three pieces:

MCP servers for tool access
Skills for repeatable expert workflows
Plugins for company-specific infrastructure actions

If you build it well, the pipeline gets faster because the boring glue work disappears. If you build it badly, you get an AI bot with production credentials and vague judgment. That is not automation. That is a future incident report.

Why MCP matters for DevOps

MCP, or Model Context Protocol, gives AI applications a standard way to connect to external systems.

The official MCP docs describe three main server-side primitives:

tools: functions an AI app can call, like file operations, API calls, database queries, or deployment actions
resources: context an AI app can read, like docs, schemas, logs, runbooks, or service metadata
prompts: reusable templates for structured workflows

That maps cleanly to DevOps.

A platform team could expose separate MCP servers for:

GitHub or GitLab
CI/CD logs
Kubernetes
Terraform or OpenTofu
Argo CD
Prometheus, Grafana, Datadog, or OpenTelemetry backends
cloud cost data
incident management
internal service catalog

The AI agent does not need to scrape random dashboards or guess from partial screenshots. It can ask real tools for real state.

For example:

User: Why did the production deploy fail?

Agent flow:
1. Read the failed GitHub Actions job logs.
2. Check the changed files in the pull request.
3. Query Argo CD for sync status.
4. Read Kubernetes events for the affected namespace.
5. Pull recent error traces from observability.
6. Summarize the likely failure and suggest the smallest safe fix.

That is not replacing the DevOps engineer. It is removing the tab-hopping tax.

Skills are where the real expertise lives

MCP gives the agent access. Skills tell it how to work.

A skill is a reusable procedure for a specific job. In DevOps, that matters because production work has rules. You do not want an agent inventing a deployment strategy every time someone asks a question.

Good DevOps skills could look like this:

skill: debug_failed_ci
steps:
  - fetch failed jobs
  - group logs by failure type
  - check if failure is test, lint, dependency, infra, or runner-related
  - compare against recent commits
  - suggest the smallest code or config fix
  - never rerun expensive jobs more than once without approval

skill: safe_kubernetes_rollout
steps:
  - check current deployment health
  - verify image tag and git SHA
  - check recent incidents for the service
  - confirm SLO status before rollout
  - deploy to one environment first
  - watch error rate, latency, and pod readiness
  - stop if guardrail thresholds fail

skill: terraform_plan_review
steps:
  - read the Terraform plan
  - classify adds, changes, and destroys
  - flag IAM, networking, database, and public exposure changes
  - check cost-sensitive resources
  - summarize blast radius
  - require human approval for destructive or privilege-expanding changes

This is the part I think most people miss. The value is not just that an AI can call tools. The value is that it can call tools through a workflow your team already trusts.

Plugins make it fit your company

Every company has weird infrastructure.

Maybe your deploys go through Argo CD, but production still needs a Slack approval. Maybe your Terraform state is split across workspaces. Maybe the service catalog is internal. Maybe your rollback process depends on a custom CLI that only three people understand.

Plugins are how you expose that reality safely.

A plugin can wrap a company-specific action like:

get_service_owner(service_name)
fetch_deploy_risk_score(pr_number)
create_change_request(environment, service, sha)
run_internal_canary(service, image_tag)
open_incident_with_context(summary, traces, logs)
estimate_cloud_cost_diff(terraform_plan_id)

The plugin should not give the agent unlimited shell access and vibes. It should expose narrow, typed actions with logs, permissions, and guardrails.

A good internal DevOps plugin feels boring:

{
  "name": "request_production_deploy",
  "input": {
    "service": "checkout-api",
    "image_tag": "2026.05.23.4",
    "change_summary": "Fix timeout handling in payment gateway client",
    "risk_level": "medium"
  },
  "requires_approval": true,
  "audit_log": true
}

Boring is good here. Boring means it can survive production.

Who should use this?

This stack is useful for a bunch of specialists, but each one should use it differently.

DevOps engineers can use it to debug CI/CD failures faster, generate release notes, identify flaky jobs, and automate repetitive deployment checks.

Platform engineers can turn internal developer platforms into agent-accessible systems. Instead of making every developer learn five dashboards, they can expose safe workflows through MCP servers and skills.

SREs can use it for incident triage: correlate alerts, attach traces, find recent deployments, pull service ownership, and suggest runbooks.

Cloud infrastructure engineers can use it to review Terraform plans, detect risky IAM changes, estimate cost impact, and standardize provisioning workflows.

Release engineers can use it to decide whether a release is ready, what changed, what failed, what needs approval, and what rollback path exists.

DevSecOps engineers can connect security checks into the pipeline: secret scanning, policy checks, dependency review, artifact provenance, image scanning, and permission drift.

AI infrastructure engineers can use the same pattern to manage model-serving deployments, GPU capacity, eval gates, prompt/version rollouts, and inference observability.

The common thread is simple: if your job involves reading state from multiple systems and taking careful action, AI agents can help. But only if you give them structured tools and clear operating procedures.

A practical AI-native CI/CD pipeline

Here is a realistic pipeline architecture.

The shape is simple: pull request, CI checks, AI agent, MCP tool layer, guardrails, then GitOps/deploy automation. The agent speeds up context gathering. The pipeline still owns execution, approval, and audit history.

Pull request opened
        |
        v
CI runs tests, lint, security checks
        |
        v
Agent reads CI result through MCP
        |
        v
If failed:
  - summarize failure
  - identify likely owner
  - suggest fix
  - open comment with exact logs and files

If passed:
  - read diff
  - check Terraform or Kubernetes changes
  - classify deployment risk
  - verify service ownership and runbook
  - prepare deploy summary
        |
        v
Human approval for production
        |
        v
Argo CD / deploy tool syncs desired state
        |
        v
Agent watches rollout health
        |
        v
If healthy:
  - close deploy task
  - attach release summary

If unhealthy:
  - collect logs, traces, events
  - recommend rollback or roll-forward
  - require approval before mutation

This is faster because the agent handles context collection. It is safer because the agent does not blindly mutate production.

That balance matters.

Where Kubernetes and GitOps fit

Kubernetes already works like an automation platform. Controllers watch desired state and reconcile actual state. The Kubernetes docs describe the controller pattern as programs that read an object's spec, act on it, and update status.

GitOps tools like Argo CD build on that idea. Argo CD treats Git as the source of truth, compares live cluster state against desired state, and syncs when needed.

AI should not replace that control loop.

It should sit above it.

The agent can explain what changed, detect risk, connect symptoms to recent deploys, and recommend action. Kubernetes and Argo CD should still do the actual reconciliation with clear audit history.

That gives you the best version of both worlds:

deterministic infrastructure control loops
human-readable operational reasoning
faster triage
safer approvals

Observability is the agent's fuel

An AI DevOps agent is only as good as the context it can retrieve.

OpenTelemetry matters here because it gives teams a common way to collect traces, metrics, and logs. Traces are especially useful because they show the path of a request across services.

For an agent, this context can answer questions like:

Did the error start after a deployment?
Which dependency is adding latency?
Is this one service failing or a full user journey?
Are we seeing infrastructure failure, code failure, or traffic shape change?
Did the rollback actually improve user-facing symptoms?

Without observability, the agent is just guessing politely.

Guardrails that should exist before production actions

If an AI agent can touch production, the guardrails need to be boring and strict.

Start with these:

read-only by default
separate permissions per environment
mandatory approval for production mutations
audit logs for every tool call
allowlists for safe actions
dry-run support for infrastructure changes
policy checks before apply
rollback plans attached to deploy actions
rate limits for repeated retries
no secret exposure in prompts or logs

Terraform run tasks are a good mental model. HCP Terraform can call external systems between plan and apply, show messages in the run pipeline, and block the apply phase when needed. That is exactly the kind of control point AI automation should respect.

Do not start by letting an agent run kubectl delete in prod.

Start by letting it explain what it would do, why it would do it, what it needs approval for, and how to undo it.

What to build first

If I were building this for a real DevOps team, I would not start with autonomous deployment.

I would start with three narrow workflows.

First: CI failure explanation.

The agent reads failed logs, groups the error, identifies the likely cause, links exact lines, and comments on the PR. Low risk, high value.

Second: Terraform plan review.

The agent summarizes infrastructure changes, flags destructive actions, points out IAM/network/database risk, and asks for review before apply.

Third: deployment health summary.

The agent watches rollout status, error rate, latency, pod readiness, recent traces, and recent alerts. It posts one clean summary instead of making someone manually check six tools.

Once those work, you can add more automation.

A good maturity path looks like this:

Level 1: Read-only assistant
Level 2: Suggests fixes and runbooks
Level 3: Opens tickets, comments, and summaries
Level 4: Runs approved low-risk actions
Level 5: Handles narrow autonomous remediation with hard guardrails

Most teams should live at Level 2 or Level 3 for a while. That is not slow. That is how trust gets built.

The main takeaway

AI-native DevOps is not about replacing YAML with a chatbot.

It is about giving DevOps specialists a faster way to move through the work they already do: gather context, understand risk, apply a known workflow, and take the next safe action.

MCP gives the agent a standard way to reach tools. Skills give it repeatable expert behavior. Plugins make it fit the company's real infrastructure.

The result is a better pipeline:

faster CI/CD debugging
cleaner infrastructure reviews
safer releases
better incident context
less repetitive manual work

The best DevOps AI systems will not be the ones that act the most independently. They will be the ones that know when not to act.

Start with read-only context. Add skills. Wrap dangerous actions in plugins with approvals. Then automate the boring work first.

That is how AI makes DevOps faster without making production scarier.

References

Model Context Protocol, Architecture overview https://modelcontextprotocol.io/docs/learn
Model Context Protocol, Understanding MCP servers https://modelcontextprotocol.io/docs/learn/server-concepts
GitHub Docs, GitHub Actions documentation https://docs.github.com/actions
GitHub Docs, Workflows https://docs.github.com/en/actions/concepts/workflows-and-actions/workflows
Argo CD Docs, Declarative GitOps CD for Kubernetes https://argo-cd.readthedocs.io/en/latest/
Kubernetes Docs, Extending Kubernetes https://kubernetes.io/docs/concepts/extend-kubernetes/
HashiCorp Developer, Set up HCP Terraform run task integrations https://developer.hashicorp.com/terraform/cloud-docs/integrations/run-tasks
OpenTelemetry Docs, OpenTelemetry Concepts https://opentelemetry.io/docs/concepts/
OpenTelemetry Docs, Traces https://opentelemetry.io/docs/concepts/signals/traces/

DEV Community