DEV Community: Sergey Byvshev

Building an AI First-Line for DevOps Support on n8n for $250/Month - Part 2

Sergey Byvshev — Tue, 09 Jun 2026 06:30:35 +0000

In Part 1, I walked through how chat requests in Slack land in n8n, get classified by category, and are routed into the right processing branch. Under the hood of the classifier sits an LLM with access to Slack over MCP — it reads the thread context and decides what the new request is about. Around it run a few helper sub-workflows:

attachmentsAnalyzer — parses screenshots and text logs;
httpProbeTool — performs endpoint availability checks correctly, without taking the agent chain down with it;
errorReporter — covers us when the workflow itself fails so the requester isn't left in the dark.

Using the CI/CD assistant as an example, I showed how those building blocks compose into a processing branch. Today — the three remaining branches. Each handles a different class of requests, but architecturally they follow the same template. Thanks to that, adding a new branch takes a couple of hours rather than reinventing the wheel from scratch.

In this part:

Incident investigator — the most temperamental category, with the lowest rate of fully autonomous resolution and the most interesting edge cases;
Task manager — handling infrastructure modification requests with automatic ticket creation in Jira;
Infrastructure knowledge assistant — answers to "where is X configured?" with a small trick involving auto-generated READMEs in IaC repos.

All workflows and system prompts are published separately — link at the end.

Incident Investigator

This is probably the most "temperamental" branch of them all. The incident category catches pretty much anything that means "something just broke": a hung Postgres, 502s on the ingress controller, sudden OOMs in some consumer, mysterious "all my requests are slow but my neighbor's are fine." The spectrum is huge, there's no universal recipe — so the rate of fully autonomous resolution here is the lowest across all branches.

But even when the automation doesn't close the problem end-to-end, the dossier it puts together saves the on-call engineer 10–15 minutes at the start: firing alerts are already pulled, metrics and logs are already eyeballed, hypotheses are already framed. When you get dragged into an on-call rotation on a Saturday morning, those 10 minutes are sometimes the difference between "I had time to wake up" and "I'm already answering in chat with coffee in one hand."

Input data

The sub-workflow expects the same input structure as the CI/CD assistant from Part 1. To save you from jumping between articles — a short description of the fields:

{
  "message": "Chat request text",
  "post_id": "Message ID — needed to reply into that exact thread",
  "channel_id": "Slack channel ID",
  "channel_name": "Channel name, passed into the prompt for context",
  "user_name": "Author of the request, mentioned in the final reply",
  "user_id": "User ID",
  "file_ids": ["Attachment IDs, if any"],
  "category": "incident",
  "confidence": 0.95,
  "summary": "Short summary from the classifier",
  "is_thread": true,
  "thread_root_id": "ID of the thread's root message",
  "on_call_user": "On-call engineer name — comes in handy for escalation"
}

First, attachmentsAnalyzer2 runs — the same sub-workflow familiar from Part 1 that parses screenshots and logs. Triggering it only requires passing file_ids. If there are no attachments, the empty-attachments branch goes through a Merge node, and the pipeline doesn't crash on missing data.

Collecting variables in SetVars

Next, the SetVars node assembles everything that'll be needed both in the agent's system prompt and when posting the reply back to Slack. This part is worth pausing on, because these variables effectively parameterize the system prompt without forcing you to edit it by hand:

K8S_CLUSTERS — list of available contexts (we have two: dev and prod, both in DigitalOcean Frankfurt);
K8S_NAMESPACE — the main namespace with production workload;
GITHUB_ORG — GitHub organization name, so the agent doesn't try to search code across the entire internet;
prometheus_uid and loki_uid — Grafana datasource UIDs; without them, the agent has no idea where to knock for metrics and logs;
reply_root_id — ID of the message the agent will reply into (either the thread's root post or the original request post itself).

These values then get substituted into the system prompt via {{ $('SetVars').first().json.* }}. When a new cluster comes up or the namespace changes, you just edit values in a single node — instead of crawling through the big block of prompt text.

Querying the agent

The user prompt is built with the same template as the CI/CD assistant:

Investigate incident from {{ $json.user_name }}
in channel {{ $json.channel_name }}{{ $json.is_thread ? ' (message in thread, thread_ts=' + $json.thread_root_id + ' — read the history through Slack conversations.replies first)' : '' }}

{{ $json.message }}{{ $json.attachments_context && $json.attachments_context.trim().length > 0 ? '\n\nAdditional information from attachments:\n' + $json.attachments_context : '' }}

Three blocks inside it: the original message, an explicit instruction to read the thread history (if the request came from a thread), and context from attachments if any were present.

I use GPT-5.5 from OpenAI as the model. On this task, Sonnet and Gemini show comparable quality — the choice is more out of habit. What actually moves the needle isn't the model but the completeness of the system prompt and the toolset.

Tools the agent has access to

For the agent to make sense of an incident, it has to see the infrastructure through an engineer's eyes. When our on-call goes through a breakdown, the usual path is: "what do the alerts say → what's in the service logs → what's in the metrics → what releases went out → what's in the IaC." A corresponding MCP tool is wired up for each step:

Kubernetes MCP — look at pods, events, read container logs. The system prompt explicitly says: don't call pods_log for the same pod more than twice. Without that constraint, the agent loves to loop trying to "double-check just one more time."
Grafana MCP — queries into Prometheus (query_prometheus) and Loki (query_loki_logs). Same tool also covers dashboard search, in case the agent wants to drop a link to a ready-made panel into the reply.
DigitalOcean MCP — needed when the incident touches the infra layer: App Platform, droplets, the DOKS cluster, load balancers.
GitHub MCP — look at the latest commits, Actions runs, open PRs. Especially useful for the "everything broke right after deploy" scenario — and these scenarios, like for everyone else, aren't rare.
Slack MCP — read-only. Mostly used for conversations.replies at the very start of the investigation. We don't trust the agent with sending the reply itself — that's done by a separate node after the output is received. If something goes wrong at the posting step, the thread just stays without a final reply, but the execution doesn't crash.
Qdrant Vector Store — knowledge base on our infrastructure: component descriptions, service relationships, naming conventions, useful labels. Used when the agent runs into the name of an unfamiliar service and wants to figure out where it lives and what it talks to.

And as a separate line item — prometheusAlertSearch. This is a custom sub-workflow that the agent reaches for almost always within the first 2–3 steps of an investigation. It's worth describing in more detail.

Alert tool: prometheusAlertSearch

The idea is simple: the lion's share of complaints from developers essentially mirror an alert that's already firing in Prometheus. "Postgres is kinda slow" usually arrives at the exact moment when PostgresHighLatency has already been firing for ten minutes. It's logical to first check whether the cause is sitting right on the surface — and only then go digging through logs.

The sub-workflow accepts keywords and a match mode:

{
  "keywords": ["postgres", "pgbouncer"],
  "match_mode": "any"
}

Inside, it queries Prometheus's /api/v1/alerts and, among the firing alerts, picks the ones where the keywords appear in the name, labels, or annotations. match_mode: "any" (default) is an OR across the keywords; "all" is AND.

It connects to the agent through the toolWorkflow node as a regular MCP-tool. The description for the agent is critically important — without it, the agent doesn't understand when and how to use this tool:

Search currently firing alerts in Prometheus by keywords. Use early in
investigation to find correlated active alerts across the platform.

Input:

keywords (array of strings, required): lowercase substrings matched against alert names, all label keys/values, and all annotation values. Examples: ["postgres"], ["http","5xx"], ["kafka","redpanda"].
match_mode (string, optional): "any" (default, OR) or "all" (AND).

Returns: { ok, total_firing, matched_count, returned_count, truncated,
alerts:[...] }
Each alert has alertname, severity, labels, summary, description,
activeAt, value. Output capped at 25 alerts — if truncated=true,
refine keywords.

Without an explicit "call this in the first 2–3 steps," the agent likes to first wander into logs, then into cluster events, then somewhere else — and only at the end remember about alerts. A hard hint in the tool description visibly changes the behavior.

Response format

The system prompt defines a strict output format:

What happened

Short description of the incident

Timeline of events

Events from 10 minutes before the problem appeared

Likely causes

At most two hypotheses

What to try

Step-by-step actions for each hypothesis

The structure mirrors the familiar incident-review format — easy to lift into a postmortem if the incident turns out to be significant, without rewriting.
Average processing time per request and token consumption: roughly under 2 minutes end-to-end and 40–50K tokens depending on how deep the agent has to go. Cost-wise — peanuts compared to engineer time, especially considering that some of these requests used to require a call rather than just a chat exchange.

Task Manager

The "infrastructure modification" category covers requests like "roll us out a new service," "give us access to Grafana," "add a bucket for analytics." Full automation here is off the table: nearly every such request needs approval, estimation, and just plain human attention. But what definitely can be automated is turning free-form text into a properly formatted ticket.

The previous typical scenario looked like this: a developer writes in chat, the on-call reads it, asks clarifying questions, and writes all of it into Jira in their own words. Now Jira gets the request straight away — and the on-call gets a notification with a ready-made link.

The input is the same structure with fields from the classifier. Then — the familiar sequence: attachment parsing through attachmentsAnalyzer2, variable collection in SetVars, handoff to the LLM agent.

Output format

The distinctive part here is the strictly defined response structure. The agent must return JSON:

{
  "summary": "<area>: <what needs to be done>",
  "description": "<1-3 sentences with details>",
  "label": "<one of the predefined directions>"
}

label is picked from a constrained dictionary: kubernetes, monitoring, network, access, database, ci-cd, and so on. This simplifies further task routing by engineer competence and gathering analytics on "who's working on what and how much of it."

For the agent to write decent summary and description, it has access to:

Qdrant Vector Store — to see how the area the request belongs to is generally structured at our place. This is needed so the agent doesn't invent a new name where an existing one already exists.
Slack MCP — if the request came in a thread, read the backstory. People often clarify details exactly in the thread, while the first message is a single line.
HTTP Request — in case the request contains a link to someone's PR, a Confluence doc, or an external spec.

Validation and task creation

After receiving the JSON, validation runs: we check for required fields and a permitted label value. If something's off — a fallback message goes out ("couldn't formulate the task automatically, please take a look"), and the on-call handles it manually. If everything's fine — the task is filed in Jira through n8n's built-in node.
A short message comes back to Slack with the task title and a link to it. The on-call then sees a ready ticket with the right label and decides: take it on, reassign, or ask for clarifications.

An important point: automatic task creation doesn't cancel manual validation. Sometimes the classifier gets it wrong and reads an incident as a modification (especially when the author writes in the "we need X to work" style instead of "X is not working"). That's why the system prompt explicitly carries a rule: if it's unclear from the text whether new work is actually needed or whether this is about something existing — ask "could you clarify what exactly needs to be done?" as a separate message, and don't create the task. A cheap measure that noticeably cuts down on junk tickets.

Infrastructure Knowledge Assistant

The last workflow for today is the "calmest" one. The question category catches requests like "where is the connection limit configured in pgbouncer?", "where do alloy logs from droplets go?", "what region is the assets-prod bucket in?". Sometimes from new joiners on the team, sometimes from the same DevOps engineers who forgot where what lives. (It happens, I won't lie.)

The structure is almost identical to the incident investigator: the same input JSON structure, the same chain with attachment parsing, the same SetVars. Toolset: GitHub MCP, Slack MCP, Kubernetes MCP, Grafana MCP, DigitalOcean MCP, Qdrant Vector Store, and HTTP Request for pulling official component documentation when the required parameter isn't in the code and you have to look up vendor defaults.

I won't dwell on the same nodes — let me tell you about one trick without which the agent would hit a wall pretty quickly.

A skill that writes READMEs in IaC repositories

The initial hypothesis was: give the agent GitHub MCP and a knowledge base in Qdrant — and it'll figure it out. In practice, it turned out that the structure in IaC repositories is almost always non-obvious: somewhere Terragrunt is mixed with Helmfile, somewhere Ansible playbooks live with inventories two subdirectories deep, somewhere Terraform modules are laid out under names that only make sense to us. The agent burned a ton of tokens and time just to figure out where to look in the first place.

The solution came from the Claude Code Skills format: I wrote a separate skill that runs locally in the IDE and generates/updates the README in every infrastructure repo. The skill reads the directory structure, identifies entry points, and describes them in a unified format. The output looks like this:

# infra

Main IaC repository: terragrunt code for cloud resources,
Ansible inventories and playbooks for VMs, Helm releases and
manifests for k8s.

## Directory structure

- `terraform/` — terragrunt code, organized by cloud and region:
  - `do/fra1/` — DigitalOcean Frankfurt resources (DOKS clusters,
    buckets, load balancers, droplets for DBs).
  - `yc/ru-central1/` — Yandex Cloud resources.
- `ansible/` — playbooks and inventories for Droplet management:
  - `playbooks/` — entry points (`postgres.yml`, `redpanda.yml`, ...).
  - `inventories/{stage,prod}/` — per-environment inventories with
    `group_vars/` next to them.
- `helm/` — Helmfile releases of infrastructure components in k8s
  (ingress-nginx, cert-manager, kube-prometheus-stack, etc.).
- `manifests/<cluster>/` — manifests applied on top of Helm releases:
  alerts, ServiceMonitors, standalone CRDs.

## Naming conventions

- VM: `<project>-<env>-<kind>-[<purpose>]-<index>.example.com`
- clusters: `<env>-<region>-01` (e.g. `prod-fra1-01`)
- buckets: `s3-<project>-<purpose>-<env>` (e.g. `s3-project-assets-prod`)

From the agent's perspective, this changes everything: the first thing it does is read README.md through get_file_contents, understand the structure, and then go into the right subdirectory for a specific file. The number of GitHub MCP calls dropped roughly 3×, and answer quality went up noticeably — especially on questions of the "where is X configured" type.

I'll publish the skill itself in the repo — it's simple and easy to adapt to someone else's structure.

What we got in total

After all three branches rolled into production and lived through a couple of months of real traffic:

Incident investigator — closes about a quarter of requests fully; in the rest, the on-call gets a ready-made breakdown with hypotheses and saves 10–15 minutes at the start.
Task manager — practically every modification request makes it into Jira with a meaningful summary and the right label set; manual fixes are needed rarely, usually around the description wording.
Knowledge assistant — closes roughly half of the questions without an engineer; for the rest, the agent's answer is still useful as a starting point for the on-call.
Combined cost at our flow stays around $250/month on LLM spend. Considering the system works 24/7 and doesn't take vacations — laughably small money.

What hasn't worked out yet

To avoid creating the illusion that everything's smooth — a few rough edges we're still living with:

Objective quality metrics — for now, the only feedback channel is occasional comments along the lines of "no, the agent guessed wrong here." I want to wire up more structured feedback — for example, through emoji reactions in Slack with automatic stats collection.
Long incident threads — the agent reliably gets lost when a thread already has 30+ messages. Right now, I cap the context depth hard in the prompt, but it's a tradeoff: sometimes important context from the start gets dropped.
The "security" category — it doesn't exist in the classifier, but requests like "does this comply with GDPR?" come in periodically. For now, I shove them into question, but answer quality there is shaky.

Plans for the future

What's in the queue for the coming months:

Expand the agent toolset to "actions, not just reads" — carefully give access to restarting pods, applying pre-vetted manifests, restarting systemd services. Obviously, this is minefield territory, so it'll be done through explicit engineer confirmation in Slack — without confirmation, no changes get applied.
Add handling for maintenance announcements — for now, such messages just get tagged and ignored, but they could be pushed into a separate digest channel with an automatic "what's planned for this week" summary.
Wire up a dedicated branch for security questions — with its own knowledge base on our compliance docs and security policy.
Add analytics over processed requests — which categories are growing, where the autonomous resolution rate is what, how many tokens go to which category. Without numbers, it's hard to understand what should actually be improved first.

Wrap-up

In short: AI agents driven by n8n with MCP tools turned out to be a very workable way to offload a significant chunk of routine work from the on-call engineer. Not a silver bullet — but something close to a modest, hardworking intern who works around the clock and costs about as much as a couple of lunches.

The key is not to try automating everything at once. Better to ship one branch, live closely with it, understand its weak spots — and only then extend the approach to the other categories. From my first working CI/CD assistant to the full set of all branches took about three months — and I have no regrets about that pacing.

Link to the repo with workflows: https://github.com/javdet/automagicops-workflows
Which category of requests automates best in your setup? And have you ever had the situation where an AI agent confidently nailed the wrong diagnosis and led the engineer down the wrong path? Share in the comments — especially curious to compare where everyone’s stepping on rakes.

Building an AI First-Line for DevOps Support on n8n for $250/Month — Part 1

Sergey Byvshev — Tue, 02 Jun 2026 10:07:28 +0000

How a small DevOps team offloaded chat triage, CI/CD diagnostics, and attachment parsing to an AI agent — and what’s still rough about it.

f you, like me, run infrastructure for a small team, you’ve probably been in this spot: the engineering org keeps growing while the DevOps headcount stays the same. With the rise of vibe-coding, that imbalance became especially obvious — our dev team at the studio grew roughly 1.5× in a couple of months, because every product manager wanted their own mini-application. On top of that, we got an extra headache from increasingly frequent availability issues from certain regions.

As a result, the flow of messages in our Slack support channel grew to the point where a significant chunk of an engineer’s day was spent triaging them. And the most frustrating part: not every request actually fell under DevOps responsibility, but each one still required at least a shallow diagnostic to figure that out.

That’s how the idea to offload the first line to an AI agent came up. By that point, we’d already automated incident analysis triggered by alerts and the approach had proven itself. Extending the same pattern to manual chat requests was the logical next step.
In this article — the first of two — I’ll walk through how we:

classified the past year’s request flow and picked categories worth automating;
built a classifier in n8n with Slack integration over MCP;
implemented a CI/CD incident assistant as the first production-ready branch;
added attachment parsing (error screenshots, log files);
set up proper error reporting for the workflows themselves.

Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant for routine questions, and the handler for infrastructure modification tasks.

All workflows and system prompts are published separately — link at the end of the article.

Preparation: classifying requests

Before automating anything, you need to understand what. I exported the request history from our Slack channels for the past year and bucketed it into categories. The result:

Infrastructure modification — changes to existing infrastructure, adding standard resources.
New installation — deploying new systems and integrations that didn't exist before.
Incident — something stopped working in the current infrastructure.
CI/CD — failing builds, broken tests, broken deploys.
Question — general questions about our infrastructure.
Announcement — informational messages: planned maintenance, for example.
Other — anything that didn't fit above.

The bulk of requests fell into categories 3–5 — those were the obvious starting point. Categories 1–2 require approvals and almost always need an engineer in the loop, so there's no point automating them. Announcements don't need agent handling at all — it's enough to recognize them correctly and not page the on-call.

But before building any handling branches, we needed to reliably identify which category a new request belongs to.

The classifier

At first glance, the setup looked straightforward: create a Slack app with a bot user, subscribe to app_mention events, point them at an n8n webhook, run the payload through an LLM, get the category.

The first nuance surfaced quickly: an incoming message can either start a new thread or be a reply inside an existing one. In the second case, without thread context the classifier will misfire — a one-liner like "same problem on my end" makes no sense in isolation.

Instead of calling the Slack API directly from n8n, I offloaded this to the agent — we already have slack-mcp, which can read messages from channels and threads. The agent itself decides whether to pull the thread history and does so when the context calls for it. The system prompt needs to describe how to do this, plus a few other things:

category descriptions with examples;
the expected output format:

{
  "category": "<category_key>",
  "confidence": <0.0-1.0>,
  "summary": "<one-sentence summary in the same language as the user message>",
  "acknowledge": "<a short response that you accepted the request and started working on it>",
  "is_thread": <true|false>,
  "parent_thread_ts": "<thread_ts to use when replying — ALWAYS set>"
}

In the user prompt, I additionally pass channel_id, channel_name, message_ts, and user_name — this helps the classifier orient itself in the message.

For the model I use Sonnet or GPT-5 Codex — on classification, both show comparable quality.

At this stage I don't yet touch attachments — screenshots and log files come into play further down, inside the logic of specific branches.

Once the agent's response is received and its fields are validated, we need to determine the on-call engineer — they may be needed if automated handling can't close the request. On-call rotations live in Google Calendar, so I had to configure OAuth2 access to it following the n8n docs.
After the category and on-call are determined, the corresponding sub-workflow kicks off. In parallel, an acknowledge message goes to Slack so the author can see the request was received and is being worked on. That's an important detail — without it, the person keeps typing "hey, is anyone looking at this?" into the thread, which defeats the whole point.

CI/CD assistant

CI/CD is one of the most common categories, so that's where I started. A solid share of these issues can be resolved without an engineer: builds that fell over because of a temporarily unreachable repository, flaky tests, misconfigured pipelines, expired tokens.

The sub-workflow expects an input structure with the request data:

{
  "message": "Chat request text",
  "message_ts": "Slack message timestamp",
  "channel_id": "Slack channel ID",
  "channel_name": "Slack channel name",
  "user_name": "Sender's display name",
  "user_id": "Sender's Slack user ID",
  "file_ids": ["List of attachment IDs"],
  "category": "One of the categories",
  "confidence": "Confidence score",
  "summary": "Short request description",
  "is_thread": "Whether the message came from inside a thread",
  "thread_ts": "Parent message timestamp, if this is a thread reply",
  "on_call_user": "On-call engineer's name"
}

Parsing attachments

Most CI requests arrive in the format "build failed" + an error screenshot. That description clearly isn't enough to identify a specific build, so before the main agent runs, a helper sub-workflow — attachmentsAnalyzer — kicks off first.

It processes attached files:

Images (error screenshots) — sent to gpt-4o-mini to extract text and describe context.
Text files (logs) — if the size doesn't exceed the limit set in the Config node, the content is passed along as extra context.

The output is a compact text block:

{
  "attachments_context": "...human-readable block...",
  "attachments_count": 1
}

I deliberately split this out into its own workflow — it's reused in other handling branches. If attachmentsAnalyzer fails, the main workflow keeps going without the extra context instead of falling over entirely.

Gathering context and calling the agent

Before forming the LLM request, the SetVars node assembles everything needed:

the raw request data;
the output of attachmentsAnalyzer;
auxiliary context for the system prompt: GitHub organization names, Kubernetes contexts and namespaces, Grafana data source names.

The agent itself works with a set of MCP tools:

GitHub MCP — access to Actions build logs, PRs, source code.
Slack MCP — reading messages in a thread when the initial request doesn't carry enough context.
Grafana / Kubernetes MCP — looking up cluster logs and events on deploy-related issues.

The system prompt should cover:

which teams exist and which repository groups belong to them — speeds up identifying the right repo;
how to pull additional context from a Slack thread;
the DevOps team's scope of responsibility — if an error falls within it, the assistant additionally tags the on-call at the end of the investigation;
a general description of the available tools and when to use each;
a few worked examples;
the output format.

The user prompt is templated like this:

Investigate the issue from {{ $json.user_name }} in channel {{ $json.channel_name }}{{ $json.is_thread ? ' (message is in a thread, thread_ts=' + $json.thread_ts + ' — first read the history via Slack conversations.replies)' : '' }}
{{ $json.message }}{{ $json.attachments_context && $json.attachments_context.trim().length > 0 ? '\n\nAdditional information from attachments:\n' + $json.attachments_context : '' }}

Alongside the message itself, this passes who sent it and which channel it came from, whether it's a thread reply, and any attachments if present.

The HTTP-probe workaround

One of the common reasons builds fail is an unreachable external resource: a dependency repository, a proxy, a registry. The natural move would be to give the agent a built-in HTTP Request tool and let it check availability. In practice, that didn't work — the built-in n8n node doesn't handle timeouts and network errors gracefully, and on a failure it brings down the whole agent chain.

So I wrapped the check in a separate sub-workflow httpProbeTool that always returns a structured result: success, failure with a reason, or timeout. The agent uses it like any other tool.

Once the agent responds, there's a short format validation step, and the message gets posted into the Slack thread.

Handling workflow errors

When you build a system that handles real user requests, reliability is critical. If a workflow falls over for any reason — LLM quota exhausted, MCP server unreachable, invalid JSON — nobody in chat will know, and the request just hangs there.

This is especially relevant in the first weeks after launch, when you're constantly tweaking things.

The solution is simple: a dedicated workflow specified in the main workflows' settings as the Error Workflow.

What it does:

Pulls information about the failed execution from the n8n API (you'll need to generate an API key for this).
Through the Extract Thread Context node, determines channel_id and thread_ts of the original message.
Posts a short error message directly into the request's thread, so the author isn't left in the dark. A more verbose error report also goes into the DevOps team's internal channel — this lets us react quickly to regressions.

What we got

After a couple of months running in production, the picture looks like this:

Average response time — up to 3 minutes from the moment a message appears in the channel.
~25% of requests are fully closed without an engineer.
~40% of requests are resolved faster than usual — the agent does a preliminary diagnostic, and the on-call gets ready-made context.
Cost at our volume (several dozen requests per week) — up to $250/month on LLM usage.

Example creation issue by request in Slack

Example CI request resolved without an engineer

Example investigating a build crash based on a screenshot

Compared to an engineer's hourly rate, this looks like a very cost-effective team addition — especially given the agent works around the clock and doesn't pull the on-call away from their main work.

What's still rough

To avoid leaving the impression that everything is smooth, here are the rough edges we still live with:

The agent sometimes gets lost in long threads with dozens of messages — we have to limit context depth in the system prompt explicitly.
The Incident category has the lowest autonomous resolution rate so far — too many non-standard situations. We're working on expanding the MCP toolset.
It's hard to objectively evaluate the quality of answers to "infrastructure questions" — we need a feedback mechanism from engineers (planning Slack reactions as the simplest signal).

Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant with RAG search over our docs, and the handler for modification tasks with automatic ticket creation.

How do you offload first-line support on your team? Are you using off-the-shelf products (like PagerDuty AIOps) or building your own? Which request categories automate best in your environment — share in the comments, I'm curious to compare the distribution.

Repository with workflows and system prompts: https://github.com/javdet/automagicops-workflows

MCP servers for the entire team: from local launch to centralized access

Sergey Byvshev — Wed, 15 Apr 2026 06:50:14 +0000

When you have six MCP servers and ten colleagues, "just run npx locally" stops working. Not everyone wants to install Node.js, managers don't have Docker, and your local claude_desktop_config.json starts looking like a secrets vault for every production system.

I went from remote MCP → local setup → Docker → Kubernetes with a universal Helm chart and JWT auth via Envoy. Here's what I hit along the way, what worked, and what's still unsolved.

Level 1: Remote MCP — When the Vendor Did the Work

My first MCP experience was dead simple. I added the Atlassian MCP server to Claude as a remote MCP, authenticated, and it just worked:

{
  "mcpServers": {
    "atlassian": {
      "type": "http",
      "url": "https://mcp.atlassian.com/v1/sse"
    }
  }
}

The problem? Very few SaaS products offer this. Everything self-hosted or without native MCP support is a different story.

Level 2: Local Setup — The Dependency Zoo

Next, I wanted to connect my IDE to Kubernetes. No built-in MCP support here, so dependencies it is:

{
  "mcpServers": {
    "kubernetes": {
      "command": "npx",
      "args": ["-y", "kubernetes-mcp-server@latest"]
    }
  }
}

It worked, but one server needs Node.js, another needs Python and uvx, a third needs a Go binary. The runtime zoo on your machine grows with every new MCP server. Not great when you're not even a developer.

Level 3: Docker — Isolation Without the Mess

The logical next step — containers. Each MCP server with its own runtime, no host pollution:

{
  "mcpServers": {
    "grafana": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "GRAFANA_URL",
        "-e", "GRAFANA_SERVICE_ACCOUNT_TOKEN",
        "grafana/mcp-grafana",
        "-t", "stdio"
      ],
      "env": {
        "GRAFANA_URL": "https://grafana.example.com",
        "GRAFANA_SERVICE_ACCOUNT_TOKEN": "<token>"
      }
    }
  }
}

For one engineer on one machine — enough. But when ten people need access, questions pile up:

Production tokens are scattered across laptops.
Automated workflows (n8n, CI/CD) need MCP access too — and they run remotely.
Managers and analysts want AI tools but aren't ready to deal with docker run.

One conclusion: MCP servers need to move into shared infrastructure.

Level 4: Kubernetes — Centralized Deployment

The initial idea was straightforward: deploy remote MCP servers inside your infrastructure perimeter. At minimum, you can restrict access via corporate VPN.

Anyone who's tackled this has hit the same wall: most MCP servers communicate via stdio (stdin/stdout). You can't reach them over HTTP directly.

This is where MCP Gateway comes in — a proxy that translates Streamable HTTP to stdio and back.

The flow: client (Claude Desktop, IDE, n8n) → HTTPS → Ingress → Kubernetes Service → Pod with MCP Gateway sidecar (HTTP → stdin) → MCP server process.

Universal Helm Chart

To avoid writing manifests for every MCP server, I built a universal Helm chart: mcp-helm-chart on ArtifactHub.

What it supports:

mode: proxy — runs MCP Gateway as a sidecar, translating HTTP ↔ stdio
mode: native — for servers that already support HTTP (no sidecar needed)
Vault and ExternalSecrets integration for secrets management
Gateway API and classic Ingress support
HPA for horizontal scaling

Installation with Ingress-nginx (no auth):

helm repo add mcp https://javdet.github.io/mcp-helm-chart
helm install my-mcp mcp/mcp -f values.yaml

Key sections of values.yaml for deploying DigitalOcean MCP:

mode: proxy

proxy:
  image:
    repository: node
    tag: "20-bookworm"
    pullPolicy: IfNotPresent
  gateway:
    package: "@michlyn/mcpgateway"
    stdioCommand: "npx -y @digitalocean/mcp --services apps,droplets,doks,networking"
    outputTransport: streamable-http
    port: 8080
    httpPath: /mcp

# Token stored in HashiCorp Vault, injected via Vault Webhook
vault:
  enabled: true
  role: "mcp"
  path: "kubernetes_dev-fra1-01"

env:
  - name: DIGITALOCEAN_API_TOKEN
    value: vault:devops/data/ai/mcp/digitalocean#token

ingress:
  enabled: true
  className: "internal"
  annotations:
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
    nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  hosts:
    - host: aitool.example.com
      paths:
        - path: /digitalocean(/|$)(.*)
          pathType: ImplementationSpecific
  tls:
    - secretName: ssl-certificate
      hosts:
        - aitool.example.com

MCP servers in Streamable HTTP mode are stateless. They scale horizontally with a standard HPA without any issues.

The most pressing question here is authentication — or better yet, authorization. Most MCP servers don't support incoming authentication, so you have to handle it yourself.

Authentication: JWT via Envoy

Basic auth is barely better than nothing, so — straight to JWT. I used Envoy API Gateway since it natively supports JWT validation and was already in our stack.

Key and Token Generation

# 1. Generate RSA keys
openssl genrsa -out mcp-jwt-private.pem 4096
openssl rsa -in mcp-jwt-private.pem -pubout -out mcp-jwt-public.pem

# 2. Generate Key ID
KID=$(openssl rand -hex 16)

# 3. Build JWT header (base64url)
HEADER=$(echo -n "{\"alg\":\"RS256\",\"typ\":\"JWT\",\"kid\":\"${KID}\"}" \
  | base64 -w0 | tr '+/' '-_' | tr -d '=')

# 4. Build JWT payload (1 year expiry)
PAYLOAD=$(echo -n "{\"sub\":\"claude-desktop\",\"aud\":\"mcp-servers\",\"iss\":\"https://your-domain.com\",\"iat\":$(date +%s),\"exp\":$(( $(date +%s) + 31536000 ))}" \
  | base64 -w0 | tr '+/' '-_' | tr -d '=')

# 5. Sign
SIGNATURE=$(echo -n "${HEADER}.${PAYLOAD}" \
  | openssl dgst -sha256 -sign mcp-jwt-private.pem \
  | base64 -w0 | tr '+/' '-_' | tr -d '=')

# 6. Final token
echo "${HEADER}.${PAYLOAD}.${SIGNATURE}"

The public key is packaged into JWKS and stored in a ConfigMap. Envoy validates every incoming request by checking issuer, audience, and signature.

Auth configuration in the chart values (Gateway API variant):

gatewayApi:
  enabled: true
  parentRefs:
    - name: internal
      namespace: ai-infra
      sectionName: https
  hostnames:
    - mcptools.example.com
  timeouts:
    request: "3600s"
    backendRequest: "3600s"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /digitalocean
      filters:
        - type: URLRewrite
          urlRewrite:
            path:
              type: ReplacePrefixMatch
              replacePrefixMatch: /
  auth:
    type: jwt
    jwt:
      providers:
        - name: mcp-jwt-auth
          issuer: mcp-issuer
          audiences:
            - mcptools.example.com
          localJWKS:
            type: ValueRef
            valueRef:
              group: ""
              kind: ConfigMap
              name: jwks-config

# If you use External Secrets Operator, secrets can be fetched through it
externalSecrets:
  enabled: true
  refreshInterval: 1h
  secretStoreRef:
    name: aws
    kind: ClusterSecretStore
  target:
    creationPolicy: Owner
  dataFrom:
    - extract:
        key: infra/mcp/digitalocean

Currently, access to target systems (DigitalOcean, Grafana, Kubernetes) goes through a single service account. For read-only tasks — monitoring, diagnostics, fetching info — this is enough. For write operations, the question remains open..

Automated Access

Periodic tasks (n8n workflows, CI/CD pipelines) connect to the same MCP servers over Streamable HTTP with separate service JWT tokens. The setup is identical — only the subject in the token payload differs, and optionally the access scope at the Gateway level.

What's Working, What's Not

MCP tooling and infrastructure still have a few steps to take toward each other before usage becomes truly simple, reliable, and secure.

The current setup works: six MCP servers in Kubernetes, one Helm chart, JWT auth via Envoy, secrets in Vault. Colleagues connect to remote MCP servers with zero local dependencies, automation uses the same endpoints.

What's still missing:

Per-user authorization. The MCP protocol doesn't support passing user context. We're living with service accounts for now.
Audit logging. Who called which tool with what parameters — not logged at the MCP level. You can collect this at the Envoy layer, but without call context.
Auth standard. Every vendor does it differently. OAuth, API Key, Bearer — no unified approach.

The Helm chart is open source: ArtifactHub.

How do you handle per-user authorization for MCP? We're still on a single service account — would love to hear from anyone who's moved past that.

AI Is for DevOps: How a Neural Network Debugs Failed Pipelines

Sergey Byvshev — Wed, 01 Apr 2026 14:38:51 +0000

How often does someone rush to you wide-eyed, begging for help with a broken pipeline? Or you find yourself staring at a red status in Slack on a Friday evening, knowing the next 15–20 minutes will be spent on routine work: open the log, find the error line, compare with the last commit, check dependencies…

The work is straightforward. And that's exactly why it's boring — a perfect candidate for automation.

Fortunately, neural networks can now handle this for us and provide solid advice (not all of them, but some definitely can).

What to Think Through Beforehand

Before writing any code, it's worth answering four questions. They'll define the architecture of the entire solution.

What events trigger the analysis? In our case — a job that finished unsuccessfully in CI/CD. To start diagnostics, it's enough to pass the agent the last 50 lines of the build log and the pipeline file contents.

What data sources will be needed? The main ones are the version control system (repository access), CI/CD (full log, related jobs), an endpoint availability checker, and CI agent resource consumption metrics.

To build such an engineer, you first need to determine:

What events and in what format to provide to the engineer?
What sources and data might be needed?
How to manipulate this data to identify the root cause?
What should the diagnostic output look like in form and content?

How to analyze the data? This is the most interesting part, because there are many scenarios depending on the job type:

Build jobs — dependency issues (missing, incorrectly specified, unavailable), code errors, insufficient build resources.
Test jobs — code errors, incorrectly written tests.
Deploy jobs — manifest errors, issues on the target platform side.
Common problems — errors in the pipeline/workflow itself, missing utilities on the agent, agent initialization issues.

What should the report format be? Most often, such a report is read by human eyes in a chat, so it should be written in plain language. Concise, facts only: what was found, most probable causes, specific steps to fix. A convenient place for such a report is a thread under the corresponding error message or a dedicated channel.

Solution Architecture

At a high level, the flow works like this: an event arrives about a job completing unsuccessfully. Then we request data for analysis: build logs, pipeline description. Based on this data and its system prompt, the AI agent performs the failure analysis. During the process, the assistant can independently check the repository, see what changed, and so on. In case of external endpoint unavailability errors, it can verify this. On a failed deploy, it can check application logs and metrics. As a result, several most probable failure causes and remediation steps are generated. This data is then sent to the team chat.

Time to Implement

Some implementation aspects are covered in more detail in a the article.

n8n — You can quickly launch n8n with the MCP update using docker-compose (see below)
Gitlab
Grafana
Loki
Prometheus
Slack

Setting Up Incoming Events

The first step is to create a webhook in n8n. Gitlab uses the X-Gitlab-Token header for authentication, so in n8n we select Header Auth and specify the corresponding credential.

In Gitlab, we configure webhook delivery. This can be done for an individual repository or for an entire group. We specify the webhook address and the secret token, and from the event types we select Pipeline events.

Then, using the If node, we filter out all non-failed events — we don't need them.

Data Collection

As soon as we receive a job failure event, we request details from Gitlab. For this, you'll need to create a gitlab access token (I recommend read-only) and the corresponding credential in n8n.

As soon as we receive a job failure event, we request details from Gitlab. For this, you'll need to create a gitlab access token (I recommend read-only) and the corresponding credential in n8n.
Then we merge all the collected data using the Merge node.

Error Analysis

Data arrives to the agent in the following format:

{ "job_log": "Last 50 lines of failed job" },

{ "data": "Content .gitlab-ci.yml" },

{ "pipeline": {} },

{ "failed_job": {} }

This format should be communicated to the agent in advance via the system prompt. There we also describe the available tools, the investigation strategy (based on the considerations outlined above), and the desired output report format.

It's best to use the latest model versions, as they handle MCP tool use significantly better. We don't connect memory here, since each build failure is an independent event for the agent.

MCP Tools

The agent has access to three tools:

Gitlab MCP — for retrieving additional information about the failed job, code changes, etc.
Grafana MCP — for retrieving CI agent metrics, as well as failed deploy logs.
HTTP Request — n8n's built-in tool for checking endpoint availability.

Important note: make sure your MCP servers are running in remote mode. If an MCP server doesn't support remote out of the box, you can solve this with mcpgateway — it proxies HTTP to stdin. For the transport method, streaming HTTP is the best choice.

Posting to Chat

The final step is sending the generated report to Slack. The report goes to the selected channel or thread.

Testing and Real-World Examples

The final workflow looks like this.

Example 1: Failed Build

Gradle can't resolve a dependency. The agent determines that this is a dependency resolution issue, not a compilation error. It provides specific causes: the artifact isn't published in the repository, or credentials are unavailable inside the Docker build context. For each cause — concrete steps to fix.

Example 2: Infrastructure Change Errors

Terraform plan fails with Unsupported argument errors. The agent recognizes that the HCL configuration contains attributes not supported by the current DigitalOcean provider schema. It provides three probable causes — from the wrong resource type to provider version mismatch — with specific remediation steps for each.

Conclusion

We've built an assistant that performs full error analysis in approximately 30 seconds. This allows the team to respond to failed jobs significantly faster and spend their time on real engineering tasks rather than routine log analysis.

Token consumption stays at the level of a few thousand per analysis.

Base workflow version is here.
Full tutorial with all scripts can be seen here

AI Alert Assistant: How n8n + LLM Replace Routine Diagnostics

Sergey Byvshev — Tue, 24 Mar 2026 06:35:05 +0000

Anyone who has dealt with keeping services running knows how exhausting and unpredictably time-consuming incident diagnostics and resolution can be.

Over the years, I've watched the evolution of incident response processes — from "whoever spots the problem first owns it" to strictly defined 24/7 on-call rotations, SLA-driven response times, runbook adherence, and separation of responsibility across platforms.

One thing has remained constant:

Gathering data from multiple sources 1.1. Metrics 1.2. Logs 1.3. Traces 1.4. Release and maintenance timelines
Analysis based on personal knowledge and experience
Formulating possible solutions

If you have a documented procedure for every situation, that simplifies things somewhat — but it doesn't teach the investigative mindset needed for real troubleshooting.

Writing and maintaining a runbook for every alert is tedious work, which is exactly why an experienced engineer will always outperform a library of hundreds of runbooks.

But what if an engineer's function could be performed even when no engineer is physically present?

Designing the Assistant: What to Define Upfront

Before writing any code, four questions need to be answered:

What events, and in what format, should be provided to the agent?
What data sources might it need?
How should it manipulate that data to identify the root cause?
What should the diagnostic report look like in terms of form and content?

Let's break down each one. A link to the workflow itself can be found below.

Events and Format

Typically, what's sufficient to kick off diagnostics is an event containing:

Alertname
Description
Labels
job_name
namespac
pod
env
region
Grafana Dashboard
Runbook Url

Data Sources

Most frequently, we turn to:

Metrics that have breached acceptable thresholds
Resource consumption and load metrics
Error logs
The platform — Kubernetes or a standalone server
Related CI/CD releases
Alert definitions and firing conditions

Data Analysis

There is arguably no canonical sequence of steps for analysis. The diagnostic process is inherently variable — which is why no one has yet managed to write a single script that covers every possible scenario. But we'll give it a shot.

First, let's consider how we ourselves approach incident diagnosis:

Examine what's happening with the metric that triggered the alert: determine the nature of the anomaly — a spike, monotonic growth, or a persistently critical value
Determine whether this is a software-level failure or caused by issues at a lower layer
Check infrastructure metrics: resources, networking, system limits
Inspect logs at the point where the problem is occurring
Determine how recently the affected components were updated and what changed
Attempt to interact with the components directly — through the orchestrator or a Linux shell

Report Format

In most cases, this kind of report is meant to be read by humans, so it should be written in plain, natural language. Concise — just the discovered facts, a list of hypotheses, and possible remediation steps. The most convenient place for such a report is a thread under the corresponding alert in the team chat.

Solution Architecture
Here's the desired flow: when an alert fires, the event is sent to a webhook that extracts the relevant data and assembles a clear, well-structured prompt for the AI agent.

The AI agent, guided by its system prompt and the available MCP tools, performs diagnostics and generates a report in a predefined format.

The report is then posted to the team chat.

Implementation

If you're in a hurry, you can view the finished workflow below.

As the execution environment for workflows like this, I chose n8n because it:

Lets you build easily readable automations fairly quickly
Makes it simple to share your work
Separates logic from secrets and other hardcoded values
Has a free self-hosted version
Has an enormous community

Personally, it reminds me of Jenkins about ten years ago — and Jenkins was great.
You can install n8n using any of the methods described in the documentation, for example using docker-compose

From here, the implementation will depend on the systems you use. In my case:

Preprocessing Incoming Events

Alertmanager can send alerts to a custom webhook. In n8n, all you need to do is create a Webhook trigger node and and you can also specify authentication parameters.

Add data about the created webhook to the new receiver n8n in alertmanager.
After this, we'll be able to send alerts from Alertmanager to our workflow. However, the received messages contain unnecessary data, and the format is not entirely appropriate. This will make it difficult for LLM to understand what's being asked of it, leading to increased token consumption. Therefore, we'll make a small modification using Code node.

You'll likely want to store certain values as variables — for instance, the UID of your Prometheus datasource in Grafana.

AI Agent

A prerequisite for the AI Agent node to operate is a connected LLM. Almost any neural network can be connected, but in my experience, Codex and Opus perform best.

We don't use Memory here, since each alert is an independent event unrelated to others.
One of the key aspects is writing the system prompt. What should it include?

Agent purpose — what it's supposed to do
Brief description of your infrastructure and the type of service you provide
Description of each MCP tool — e.g., use the Kubernetes MCP to get pod status, related events, etc.
Important rules to follow and pitfalls to avoid — e.g., never ask questions, write the response in a specific language, never make any changes to the infrastructure
Diagnostic guidelines — essentially what we discussed in the Data Analysis section above

MCP — The Agent's Eyes and Ears

MCP tools serve as the agent's eyes and ears, giving it the ability to interact with the subject of diagnosis. The specific list may vary depending on your infrastructure, but the core categories of data sources (which we outlined earlier) remain the same. In my case, the list looks like this:

Metrics — mcp-grafana
Logs — mcp-grafana
Platform — kubernetes-mcp, digitalocean-mcp
CI/CD releases — gitlab-mcp
Alert descriptions — vector store

When running your mcp's, make sure they are running in remote http streaming mode.

Vector Store

The knowledge base deserves separate attention. It allows you to store large volumes of information and perform fast lookups. This saves tokens and reduces the time spent on external system queries. I use Qdrant as this knowledge base. I strongly recommend setting a service API token for authentication.

Next, you need to create a collection where your knowledge will be stored. You can do this through the web interface at http://:6333/dashboard.

Create a QdrantApi account and use it to connect.

Once the database is connected to the agent as a tool, it's time to load it with knowledge. I use a separate workflow for this.

Simply run this workflow and upload your knowledge file(s) through the form that appears — they'll be saved to the database.

Posting to Chat

After the AI agent completes its work, we need to send the results to the chat where engineers will see them. The delivery chain consists of three nodes:

Search for recent messages in the alerts channel. Unfortunately, not all group chats support keyword search via API, so the last 10 messages are retrieved instead.
Find the message that corresponds to our alert.
Post the diagnostic results as a thread reply to that message.

For Slack integration, you'll need to set up authentication following the official Slack API documentation.

Testing and Examples

Here's what the final workflow looks like.

I've tested minor variations of this workflow across several projects, and here are the results.

On average, analyzing an alert takes 30 seconds. In that time, the agent manages to inspect metrics, review logs, assess the state of the K8s cluster, and deliver a verdict.

Conclusion

What we end up with is an assistant that gets to work the instant an alert fires. The analysis time is minimal, which guarantees that by the time an engineer sees the alert, the initial diagnostics will have already been completed.

This is just one of the directions where AI can meaningfully simplify life for infrastructure teams — and for development teams who are forced to handle their own support. The agent doesn't replace the engineer, but it takes on the first-response diagnostics and shortens the gap between "alert fired" and "we understand what's going on." And at night — when the on-call engineer is asleep — that can be invaluable.

There is base version of workflow: https://github.com/javdet/automagicops-workflows/tree/main/workflows/AlertAssistant
Want to quickly implement a similar flow for yourself? Read the full Patreon guide with detailed examples and practical tips.
Author of the article: https://linkedin.com/in/sergeybyvshev