<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sergey Byvshev</title>
    <description>The latest articles on DEV Community by Sergey Byvshev (@javdet).</description>
    <link>https://dev.to/javdet</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3830322%2Fb6d584b2-9826-428d-a744-0a9dd730ad89.jpeg</url>
      <title>DEV Community: Sergey Byvshev</title>
      <link>https://dev.to/javdet</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/javdet"/>
    <language>en</language>
    <item>
      <title>Building an AI First-Line for DevOps Support on n8n for $250/Month - Part 2</title>
      <dc:creator>Sergey Byvshev</dc:creator>
      <pubDate>Tue, 09 Jun 2026 06:30:35 +0000</pubDate>
      <link>https://dev.to/javdet/building-an-ai-first-line-for-devops-support-on-n8n-for-250month-part-2-5g9b</link>
      <guid>https://dev.to/javdet/building-an-ai-first-line-for-devops-support-on-n8n-for-250month-part-2-5g9b</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/javdet/building-an-ai-first-line-for-devops-support-on-n8n-for-250month-part-1-1m8j"&gt;Part 1&lt;/a&gt;, I walked through how chat requests in Slack land in n8n, get classified by category, and are routed into the right processing branch. Under the hood of the classifier sits an LLM with access to Slack over MCP — it reads the thread context and decides what the new request is about. Around it run a few helper sub-workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;attachmentsAnalyzer&lt;/code&gt; — parses screenshots and text logs;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;httpProbeTool&lt;/code&gt; — performs endpoint availability checks correctly, without taking the agent chain down with it;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;errorReporter&lt;/code&gt; — covers us when the workflow itself fails so the requester isn't left in the dark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the CI/CD assistant as an example, I showed how those building blocks compose into a processing branch. Today — the three remaining branches. Each handles a different class of requests, but architecturally they follow the same template. Thanks to that, adding a new branch takes a couple of hours rather than reinventing the wheel from scratch.&lt;/p&gt;

&lt;p&gt;In this part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Incident investigator&lt;/strong&gt; — the most temperamental category, with the lowest rate of fully autonomous resolution and the most interesting edge cases;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task manager&lt;/strong&gt; — handling infrastructure modification requests with automatic ticket creation in Jira;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure knowledge assistant&lt;/strong&gt; — answers to "where is X configured?" with a small trick involving auto-generated READMEs in IaC repos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All workflows and system prompts are published separately — link at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Investigator
&lt;/h2&gt;

&lt;p&gt;This is probably the most "temperamental" branch of them all. The &lt;code&gt;incident&lt;/code&gt; category catches pretty much anything that means "something just broke": a hung Postgres, 502s on the ingress controller, sudden OOMs in some consumer, mysterious "all my requests are slow but my neighbor's are fine." The spectrum is huge, there's no universal recipe — so the rate of fully autonomous resolution here is the lowest across all branches.&lt;/p&gt;

&lt;p&gt;But even when the automation doesn't close the problem end-to-end, the dossier it puts together saves the on-call engineer 10–15 minutes at the start: firing alerts are already pulled, metrics and logs are already eyeballed, hypotheses are already framed. When you get dragged into an on-call rotation on a Saturday morning, those 10 minutes are sometimes the difference between "I had time to wake up" and "I'm already answering in chat with coffee in one hand."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bb2e9nby0ysfksxk3g3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bb2e9nby0ysfksxk3g3.png" alt="Incident" width="799" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Input data
&lt;/h3&gt;

&lt;p&gt;The sub-workflow expects the same input structure as the CI/CD assistant from Part 1. To save you from jumping between articles — a short description of the fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Chat request text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"post_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Message ID — needed to reply into that exact thread"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channel_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slack channel ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channel_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Channel name, passed into the prompt for context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Author of the request, mentioned in the final reply"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"file_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Attachment IDs, if any"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"incident"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Short summary from the classifier"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_thread"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"thread_root_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ID of the thread's root message"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"on_call_user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"On-call engineer name — comes in handy for escalation"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, &lt;code&gt;attachmentsAnalyzer2&lt;/code&gt; runs — the same sub-workflow familiar from Part 1 that parses screenshots and logs. Triggering it only requires passing &lt;code&gt;file_ids&lt;/code&gt;. If there are no attachments, the empty-attachments branch goes through a Merge node, and the pipeline doesn't crash on missing data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collecting variables in SetVars
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;code&gt;SetVars&lt;/code&gt; node assembles everything that'll be needed both in the agent's system prompt and when posting the reply back to Slack. This part is worth pausing on, because these variables effectively parameterize the system prompt without forcing you to edit it by hand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;K8S_CLUSTERS&lt;/code&gt; — list of available contexts (we have two: dev and prod, both in DigitalOcean Frankfurt);&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;K8S_NAMESPACE&lt;/code&gt; — the main namespace with production workload;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GITHUB_ORG&lt;/code&gt; — GitHub organization name, so the agent doesn't try to search code across the entire internet;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prometheus_uid&lt;/code&gt; and &lt;code&gt;loki_uid&lt;/code&gt; — Grafana datasource UIDs; without them, the agent has no idea where to knock for metrics and logs;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reply_root_id&lt;/code&gt; — ID of the message the agent will reply into (either the thread's root post or the original request post itself).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These values then get substituted into the system prompt via &lt;code&gt;{{ $('SetVars').first().json.* }}&lt;/code&gt;. When a new cluster comes up or the namespace changes, you just edit values in a single node — instead of crawling through the big block of prompt text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying the agent
&lt;/h3&gt;

&lt;p&gt;The user prompt is built with the same template as the CI/CD assistant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight handlebars"&gt;&lt;code&gt;Investigate incident from &lt;span class="k"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;user_name&lt;/span&gt; &lt;span class="k"&gt;}}&lt;/span&gt;
in channel &lt;span class="k"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;channel_name&lt;/span&gt; &lt;span class="k"&gt;}}{{&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;is_thread&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;' (message in thread, thread_ts='&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;thread_root_id&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="s1"&gt;' — read the history through Slack conversations.replies first)'&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;message&lt;/span&gt; &lt;span class="k"&gt;}}{{&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;attachments_context&lt;/span&gt; &lt;span class="err"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;attachments_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nv"&gt;length&lt;/span&gt; &lt;span class="err"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;0&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'\n\nAdditional information from attachments:\n'&lt;/span&gt; &lt;span class="err"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;attachments_context&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three blocks inside it: the original message, an explicit instruction to read the thread history (if the request came from a thread), and context from attachments if any were present.&lt;/p&gt;

&lt;p&gt;I use GPT-5.5 from OpenAI as the model. On this task, Sonnet and Gemini show comparable quality — the choice is more out of habit. What actually moves the needle isn't the model but the completeness of the system prompt and the toolset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools the agent has access to
&lt;/h3&gt;

&lt;p&gt;For the agent to make sense of an incident, it has to see the infrastructure through an engineer's eyes. When our on-call goes through a breakdown, the usual path is: "what do the alerts say → what's in the service logs → what's in the metrics → what releases went out → what's in the IaC." A corresponding MCP tool is wired up for each step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes MCP&lt;/strong&gt; — look at pods, events, read container logs. The system prompt explicitly says: don't call &lt;code&gt;pods_log&lt;/code&gt; for the same pod more than twice. Without that constraint, the agent loves to loop trying to "double-check just one more time."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana MCP&lt;/strong&gt; — queries into Prometheus (&lt;code&gt;query_prometheus&lt;/code&gt;) and Loki (&lt;code&gt;query_loki_logs&lt;/code&gt;). Same tool also covers dashboard search, in case the agent wants to drop a link to a ready-made panel into the reply.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DigitalOcean MCP&lt;/strong&gt; — needed when the incident touches the infra layer: App Platform, droplets, the DOKS cluster, load balancers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub MCP&lt;/strong&gt; — look at the latest commits, Actions runs, open PRs. Especially useful for the "everything broke right after deploy" scenario — and these scenarios, like for everyone else, aren't rare.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack MCP&lt;/strong&gt; — read-only. Mostly used for &lt;code&gt;conversations.replies&lt;/code&gt; at the very start of the investigation. We don't trust the agent with sending the reply itself — that's done by a separate node after the output is received. If something goes wrong at the posting step, the thread just stays without a final reply, but the execution doesn't crash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant Vector Store&lt;/strong&gt; — knowledge base on our infrastructure: component descriptions, service relationships, naming conventions, useful labels. Used when the agent runs into the name of an unfamiliar service and wants to figure out where it lives and what it talks to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And as a separate line item — &lt;code&gt;prometheusAlertSearch&lt;/code&gt;. This is a custom sub-workflow that the agent reaches for almost always within the first 2–3 steps of an investigation. It's worth describing in more detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alert tool: prometheusAlertSearch
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2gvyekxy8pnvuy3q7fn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2gvyekxy8pnvuy3q7fn.png" alt="alerts" width="800" height="448"&gt;&lt;/a&gt;&lt;br&gt;
The idea is simple: the lion's share of complaints from developers essentially mirror an alert that's already firing in Prometheus. "Postgres is kinda slow" usually arrives at the exact moment when &lt;code&gt;PostgresHighLatency&lt;/code&gt; has already been firing for ten minutes. It's logical to first check whether the cause is sitting right on the surface — and only then go digging through logs.&lt;/p&gt;

&lt;p&gt;The sub-workflow accepts keywords and a match mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"keywords"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pgbouncer"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"match_mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"any"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside, it queries Prometheus's &lt;code&gt;/api/v1/alerts&lt;/code&gt; and, among the firing alerts, picks the ones where the keywords appear in the name, labels, or annotations. &lt;code&gt;match_mode: "any"&lt;/code&gt; (default) is an OR across the keywords; &lt;code&gt;"all"&lt;/code&gt; is AND.&lt;/p&gt;

&lt;p&gt;It connects to the agent through the &lt;code&gt;toolWorkflow&lt;/code&gt; node as a regular MCP-tool. The description for the agent is critically important — without it, the agent doesn't understand when and how to use this tool:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Search currently firing alerts in Prometheus by keywords. Use early in&lt;br&gt;
investigation to find correlated active alerts across the platform.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Input:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keywords (array of strings, required): lowercase substrings matched
against alert names, all label keys/values, and all annotation values.
Examples: ["postgres"], ["http","5xx"], ["kafka","redpanda"].&lt;/li&gt;
&lt;li&gt;match_mode (string, optional): "any" (default, OR) or "all" (AND).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Returns: { ok, total_firing, matched_count, returned_count, truncated,&lt;br&gt;
  alerts:[...] }&lt;br&gt;
Each alert has alertname, severity, labels, summary, description,&lt;br&gt;
activeAt, value. Output capped at 25 alerts — if truncated=true,&lt;br&gt;
refine keywords.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28y6zzzbk9gmlmjev30z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28y6zzzbk9gmlmjev30z.png" alt=" " width="748" height="1576"&gt;&lt;/a&gt;&lt;br&gt;
Without an explicit "call this in the first 2–3 steps," the agent likes to first wander into logs, then into cluster events, then somewhere else — and only at the end remember about alerts. A hard hint in the tool description visibly changes the behavior.&lt;/p&gt;
&lt;h3&gt;
  
  
  Response format
&lt;/h3&gt;

&lt;p&gt;The system prompt defines a strict output format:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;Short description of the incident&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Timeline of events
&lt;/h3&gt;

&lt;p&gt;Events from 10 minutes before the problem appeared&lt;/p&gt;
&lt;h3&gt;
  
  
  Likely causes
&lt;/h3&gt;

&lt;p&gt;At most two hypotheses&lt;/p&gt;
&lt;h3&gt;
  
  
  What to try
&lt;/h3&gt;

&lt;p&gt;Step-by-step actions for each hypothesis&lt;/p&gt;

&lt;p&gt;The structure mirrors the familiar incident-review format — easy to lift into a postmortem if the incident turns out to be significant, without rewriting.&lt;br&gt;
Average processing time per request and token consumption: roughly under 2 minutes end-to-end and 40–50K tokens depending on how deep the agent has to go. Cost-wise — peanuts compared to engineer time, especially considering that some of these requests used to require a call rather than just a chat exchange.&lt;/p&gt;
&lt;h2&gt;
  
  
  Task Manager
&lt;/h2&gt;

&lt;p&gt;The "infrastructure modification" category covers requests like "roll us out a new service," "give us access to Grafana," "add a bucket for analytics." Full automation here is off the table: nearly every such request needs approval, estimation, and just plain human attention. But what definitely &lt;em&gt;can&lt;/em&gt; be automated is turning free-form text into a properly formatted ticket.&lt;/p&gt;

&lt;p&gt;The previous typical scenario looked like this: a developer writes in chat, the on-call reads it, asks clarifying questions, and writes all of it into Jira in their own words. Now Jira gets the request straight away — and the on-call gets a notification with a ready-made link.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipurczzes06ogs8gkic7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipurczzes06ogs8gkic7.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;br&gt;
The input is the same structure with fields from the classifier. Then — the familiar sequence: attachment parsing through &lt;code&gt;attachmentsAnalyzer2&lt;/code&gt;, variable collection in &lt;code&gt;SetVars&lt;/code&gt;, handoff to the LLM agent.&lt;/p&gt;
&lt;h3&gt;
  
  
  Output format
&lt;/h3&gt;

&lt;p&gt;The distinctive part here is the strictly defined response structure. The agent must return JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;area&amp;gt;: &amp;lt;what needs to be done&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;1-3 sentences with details&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;one of the predefined directions&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;label&lt;/code&gt; is picked from a constrained dictionary: &lt;code&gt;kubernetes&lt;/code&gt;, &lt;code&gt;monitoring&lt;/code&gt;, &lt;code&gt;network&lt;/code&gt;, &lt;code&gt;access&lt;/code&gt;, &lt;code&gt;database&lt;/code&gt;, &lt;code&gt;ci-cd&lt;/code&gt;, and so on. This simplifies further task routing by engineer competence and gathering analytics on "who's working on what and how much of it."&lt;/p&gt;

&lt;p&gt;For the agent to write decent &lt;code&gt;summary&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt;, it has access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant Vector Store&lt;/strong&gt; — to see how the area the request belongs to is generally structured at our place. This is needed so the agent doesn't invent a new name where an existing one already exists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack MCP&lt;/strong&gt; — if the request came in a thread, read the backstory. People often clarify details exactly in the thread, while the first message is a single line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Request&lt;/strong&gt; — in case the request contains a link to someone's PR, a Confluence doc, or an external spec.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Validation and task creation
&lt;/h3&gt;

&lt;p&gt;After receiving the JSON, validation runs: we check for required fields and a permitted &lt;code&gt;label&lt;/code&gt; value. If something's off — a fallback message goes out ("couldn't formulate the task automatically, please take a look"), and the on-call handles it manually. If everything's fine — the task is filed in Jira through n8n's built-in node.&lt;br&gt;
A short message comes back to Slack with the task title and a link to it. The on-call then sees a ready ticket with the right label and decides: take it on, reassign, or ask for clarifications.&lt;/p&gt;

&lt;p&gt;An important point: automatic task creation doesn't cancel manual validation. Sometimes the classifier gets it wrong and reads an incident as a modification (especially when the author writes in the "we need X to work" style instead of "X is not working"). That's why the system prompt explicitly carries a rule: if it's unclear from the text whether new work is actually needed or whether this is about something existing — ask "could you clarify what exactly needs to be done?" as a separate message, and don't create the task. A cheap measure that noticeably cuts down on junk tickets.&lt;/p&gt;
&lt;h2&gt;
  
  
  Infrastructure Knowledge Assistant
&lt;/h2&gt;

&lt;p&gt;The last workflow for today is the "calmest" one. The &lt;code&gt;question&lt;/code&gt; category catches requests like "where is the connection limit configured in pgbouncer?", "where do alloy logs from droplets go?", "what region is the &lt;code&gt;assets-prod&lt;/code&gt; bucket in?". Sometimes from new joiners on the team, sometimes from the same DevOps engineers who forgot where what lives. (It happens, I won't lie.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jex68qirk2x9isa2i3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jex68qirk2x9isa2i3p.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
The structure is almost identical to the incident investigator: the same input JSON structure, the same chain with attachment parsing, the same &lt;code&gt;SetVars&lt;/code&gt;. Toolset: GitHub MCP, Slack MCP, Kubernetes MCP, Grafana MCP, DigitalOcean MCP, Qdrant Vector Store, and HTTP Request for pulling official component documentation when the required parameter isn't in the code and you have to look up vendor defaults.&lt;/p&gt;

&lt;p&gt;I won't dwell on the same nodes — let me tell you about one trick without which the agent would hit a wall pretty quickly.&lt;/p&gt;
&lt;h3&gt;
  
  
  A skill that writes READMEs in IaC repositories
&lt;/h3&gt;

&lt;p&gt;The initial hypothesis was: give the agent GitHub MCP and a knowledge base in Qdrant — and it'll figure it out. In practice, it turned out that the structure in IaC repositories is almost always non-obvious: somewhere Terragrunt is mixed with Helmfile, somewhere Ansible playbooks live with inventories two subdirectories deep, somewhere Terraform modules are laid out under names that only make sense to us. The agent burned a ton of tokens and time just to figure out where to look in the first place.&lt;/p&gt;

&lt;p&gt;The solution came from the Claude Code Skills format: I wrote a separate skill that runs locally in the IDE and generates/updates the README in every infrastructure repo. The skill reads the directory structure, identifies entry points, and describes them in a unified format. The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# infra

Main IaC repository: terragrunt code for cloud resources,
Ansible inventories and playbooks for VMs, Helm releases and
manifests for k8s.

## Directory structure

- `terraform/` — terragrunt code, organized by cloud and region:
  - `do/fra1/` — DigitalOcean Frankfurt resources (DOKS clusters,
    buckets, load balancers, droplets for DBs).
  - `yc/ru-central1/` — Yandex Cloud resources.
- `ansible/` — playbooks and inventories for Droplet management:
  - `playbooks/` — entry points (`postgres.yml`, `redpanda.yml`, ...).
  - `inventories/{stage,prod}/` — per-environment inventories with
    `group_vars/` next to them.
- `helm/` — Helmfile releases of infrastructure components in k8s
  (ingress-nginx, cert-manager, kube-prometheus-stack, etc.).
- `manifests/&amp;lt;cluster&amp;gt;/` — manifests applied on top of Helm releases:
  alerts, ServiceMonitors, standalone CRDs.

## Naming conventions

- VM: `&amp;lt;project&amp;gt;-&amp;lt;env&amp;gt;-&amp;lt;kind&amp;gt;-[&amp;lt;purpose&amp;gt;]-&amp;lt;index&amp;gt;.example.com`
- clusters: `&amp;lt;env&amp;gt;-&amp;lt;region&amp;gt;-01` (e.g. `prod-fra1-01`)
- buckets: `s3-&amp;lt;project&amp;gt;-&amp;lt;purpose&amp;gt;-&amp;lt;env&amp;gt;` (e.g. `s3-project-assets-prod`)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the agent's perspective, this changes everything: the first thing it does is read &lt;code&gt;README.md&lt;/code&gt; through &lt;code&gt;get_file_contents&lt;/code&gt;, understand the structure, and then go into the right subdirectory for a specific file. The number of GitHub MCP calls dropped roughly 3×, and answer quality went up noticeably — especially on questions of the "where is X configured" type.&lt;/p&gt;

&lt;p&gt;I'll publish the skill itself in the repo — it's simple and easy to adapt to someone else's structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we got in total
&lt;/h2&gt;

&lt;p&gt;After all three branches rolled into production and lived through a couple of months of real traffic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Incident investigator&lt;/strong&gt; — closes about a quarter of requests fully; in the rest, the on-call gets a ready-made breakdown with hypotheses and saves 10–15 minutes at the start.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task manager&lt;/strong&gt; — practically every modification request makes it into Jira with a meaningful summary and the right label set; manual fixes are needed rarely, usually around the description wording.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge assistant&lt;/strong&gt; — closes roughly half of the questions without an engineer; for the rest, the agent's answer is still useful as a starting point for the on-call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combined cost&lt;/strong&gt; at our flow stays around $250/month on LLM spend. Considering the system works 24/7 and doesn't take vacations — laughably small money.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What hasn't worked out yet
&lt;/h2&gt;

&lt;p&gt;To avoid creating the illusion that everything's smooth — a few rough edges we're still living with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Objective quality metrics&lt;/strong&gt; — for now, the only feedback channel is occasional comments along the lines of "no, the agent guessed wrong here." I want to wire up more structured feedback — for example, through emoji reactions in Slack with automatic stats collection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long incident threads&lt;/strong&gt; — the agent reliably gets lost when a thread already has 30+ messages. Right now, I cap the context depth hard in the prompt, but it's a tradeoff: sometimes important context from the start gets dropped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "security" category&lt;/strong&gt; — it doesn't exist in the classifier, but requests like "does this comply with GDPR?" come in periodically. For now, I shove them into &lt;code&gt;question&lt;/code&gt;, but answer quality there is shaky.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Plans for the future
&lt;/h2&gt;

&lt;p&gt;What's in the queue for the coming months:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expand the agent toolset to "actions, not just reads"&lt;/strong&gt; — carefully give access to restarting pods, applying pre-vetted manifests, restarting systemd services. Obviously, this is minefield territory, so it'll be done through explicit engineer confirmation in Slack — without confirmation, no changes get applied.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add handling for maintenance announcements&lt;/strong&gt; — for now, such messages just get tagged and ignored, but they could be pushed into a separate digest channel with an automatic "what's planned for this week" summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire up a dedicated branch for security questions&lt;/strong&gt; — with its own knowledge base on our compliance docs and security policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add analytics over processed requests&lt;/strong&gt; — which categories are growing, where the autonomous resolution rate is what, how many tokens go to which category. Without numbers, it's hard to understand what should actually be improved first.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;In short: AI agents driven by n8n with MCP tools turned out to be a very workable way to offload a significant chunk of routine work from the on-call engineer. Not a silver bullet — but something close to a modest, hardworking intern who works around the clock and costs about as much as a couple of lunches.&lt;/p&gt;

&lt;p&gt;The key is not to try automating everything at once. Better to ship one branch, live closely with it, understand its weak spots — and only then extend the approach to the other categories. From my first working CI/CD assistant to the full set of all branches took about three months — and I have no regrets about that pacing.&lt;/p&gt;




&lt;p&gt;Link to the repo with workflows: &lt;a href="https://github.com/javdet/automagicops-workflows" rel="noopener noreferrer"&gt;https://github.com/javdet/automagicops-workflows&lt;/a&gt;&lt;br&gt;
Which category of requests automates best in your setup? And have you ever had the situation where an AI agent confidently nailed the wrong diagnosis and led the engineer down the wrong path? Share in the comments — especially curious to compare where everyone’s stepping on rakes.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>automation</category>
    </item>
    <item>
      <title>Building an AI First-Line for DevOps Support on n8n for $250/Month — Part 1</title>
      <dc:creator>Sergey Byvshev</dc:creator>
      <pubDate>Tue, 02 Jun 2026 10:07:28 +0000</pubDate>
      <link>https://dev.to/javdet/building-an-ai-first-line-for-devops-support-on-n8n-for-250month-part-1-1m8j</link>
      <guid>https://dev.to/javdet/building-an-ai-first-line-for-devops-support-on-n8n-for-250month-part-1-1m8j</guid>
      <description>&lt;p&gt;&lt;strong&gt;How a small DevOps team offloaded chat triage, CI/CD diagnostics, and attachment parsing to an AI agent — and what’s still rough about it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;f you, like me, run infrastructure for a small team, you’ve probably been in this spot: the engineering org keeps growing while the DevOps headcount stays the same. With the rise of vibe-coding, that imbalance became especially obvious — our dev team at the studio grew roughly 1.5× in a couple of months, because every product manager wanted their own mini-application. On top of that, we got an extra headache from increasingly frequent availability issues from certain regions.&lt;/p&gt;

&lt;p&gt;As a result, the flow of messages in our Slack support channel grew to the point where a significant chunk of an engineer’s day was spent triaging them. And the most frustrating part: not every request actually fell under DevOps responsibility, but each one still required at least a shallow diagnostic to figure that out.&lt;/p&gt;

&lt;p&gt;That’s how the idea to offload the first line to an AI agent came up. By that point, we’d already &lt;a href="https://dev.to/javdet/ai-alert-assistant-how-n8n-llm-replace-routine-diagnostics-a9b"&gt;automated incident analysis triggered by alerts&lt;/a&gt; and the approach had proven itself. Extending the same pattern to manual chat requests was the logical next step.&lt;br&gt;
In this article — the first of two — I’ll walk through how we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classified the past year’s request flow and picked categories worth automating;&lt;/li&gt;
&lt;li&gt;built a classifier in n8n with Slack integration over MCP;&lt;/li&gt;
&lt;li&gt;implemented a CI/CD incident assistant as the first production-ready branch;&lt;/li&gt;
&lt;li&gt;added attachment parsing (error screenshots, log files);&lt;/li&gt;
&lt;li&gt;set up proper error reporting for the workflows themselves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant for routine questions, and the handler for infrastructure modification tasks.&lt;/p&gt;

&lt;p&gt;All workflows and system prompts are published separately — link at the end of the article.&lt;/p&gt;
&lt;h2&gt;
  
  
  Preparation: classifying requests
&lt;/h2&gt;

&lt;p&gt;Before automating anything, you need to understand &lt;em&gt;what&lt;/em&gt;. I exported the request history from our Slack channels for the past year and bucketed it into categories. The result:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure modification&lt;/strong&gt; — changes to existing infrastructure, adding standard resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New installation&lt;/strong&gt; — deploying new systems and integrations that didn't exist before.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incident&lt;/strong&gt; — something stopped working in the current infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD&lt;/strong&gt; — failing builds, broken tests, broken deploys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Question&lt;/strong&gt; — general questions about our infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Announcement&lt;/strong&gt; — informational messages: planned maintenance, for example.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other&lt;/strong&gt; — anything that didn't fit above.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The bulk of requests fell into categories 3–5 — those were the obvious starting point. Categories 1–2 require approvals and almost always need an engineer in the loop, so there's no point automating them. Announcements don't need agent handling at all — it's enough to recognize them correctly and not page the on-call.&lt;/p&gt;

&lt;p&gt;But before building any handling branches, we needed to reliably identify which category a new request belongs to.&lt;/p&gt;
&lt;h2&gt;
  
  
  The classifier
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fioa0ip8qa6zoozkd83xb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fioa0ip8qa6zoozkd83xb.png" alt="classifier" width="800" height="347"&gt;&lt;/a&gt;&lt;br&gt;
At first glance, the setup looked straightforward: create a Slack app with a bot user, subscribe to &lt;code&gt;app_mention&lt;/code&gt; events, point them at an n8n webhook, run the payload through an LLM, get the category.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28emx5igq22tse67nipi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28emx5igq22tse67nipi.png" alt=" " width="800" height="834"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wheyfzbhs3vgp2f0xwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wheyfzbhs3vgp2f0xwz.png" alt=" " width="800" height="992"&gt;&lt;/a&gt;&lt;br&gt;
The first nuance surfaced quickly: an incoming message can either start a new thread or be a reply inside an existing one. In the second case, without thread context the classifier will misfire — a one-liner like "same problem on my end" makes no sense in isolation.&lt;/p&gt;

&lt;p&gt;Instead of calling the Slack API directly from n8n, I offloaded this to the agent — we already have &lt;strong&gt;slack-mcp&lt;/strong&gt;, which can read messages from channels and threads. The agent itself decides whether to pull the thread history and does so when the context calls for it. The system prompt needs to describe how to do this, plus a few other things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;category descriptions with examples;&lt;/li&gt;
&lt;li&gt;the expected output format:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;category_key&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;0.0-1.0&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;one-sentence summary in the same language as the user message&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"acknowledge"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;a short response that you accepted the request and started working on it&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_thread"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parent_thread_ts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;thread_ts to use when replying — ALWAYS set&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In the user prompt, I additionally pass &lt;code&gt;channel_id&lt;/code&gt;, &lt;code&gt;channel_name&lt;/code&gt;, &lt;code&gt;message_ts&lt;/code&gt;, and &lt;code&gt;user_name&lt;/code&gt; — this helps the classifier orient itself in the message.&lt;/p&gt;

&lt;p&gt;For the model I use Sonnet or GPT-5 Codex — on classification, both show comparable quality.&lt;/p&gt;

&lt;p&gt;At this stage I don't yet touch attachments — screenshots and log files come into play further down, inside the logic of specific branches.&lt;/p&gt;

&lt;p&gt;Once the agent's response is received and its fields are validated, we need to determine the on-call engineer — they may be needed if automated handling can't close the request. On-call rotations live in Google Calendar, so I had to configure OAuth2 access to it following the &lt;a href="https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/" rel="noopener noreferrer"&gt;n8n docs&lt;/a&gt;.&lt;br&gt;
After the category and on-call are determined, the corresponding sub-workflow kicks off. In parallel, an &lt;em&gt;acknowledge&lt;/em&gt; message goes to Slack so the author can see the request was received and is being worked on. That's an important detail — without it, the person keeps typing "hey, is anyone looking at this?" into the thread, which defeats the whole point.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bhkcffygq2bu9h8avqi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bhkcffygq2bu9h8avqi.png" alt="acknoledge" width="746" height="234"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  CI/CD assistant
&lt;/h2&gt;

&lt;p&gt;CI/CD is one of the most common categories, so that's where I started. A solid share of these issues can be resolved without an engineer: builds that fell over because of a temporarily unreachable repository, flaky tests, misconfigured pipelines, expired tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73ilo9rzesyf906yqa9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73ilo9rzesyf906yqa9y.png" alt="cicd-assistant" width="800" height="375"&gt;&lt;/a&gt;&lt;br&gt;
The sub-workflow expects an input structure with the request data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Chat request text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message_ts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slack message timestamp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channel_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slack channel ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channel_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slack channel name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sender's display name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sender's Slack user ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"file_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"List of attachment IDs"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"One of the categories"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Confidence score"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Short request description"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_thread"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Whether the message came from inside a thread"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"thread_ts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Parent message timestamp, if this is a thread reply"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"on_call_user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"On-call engineer's name"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Parsing attachments
&lt;/h3&gt;

&lt;p&gt;Most CI requests arrive in the format "build failed" + an error screenshot. That description clearly isn't enough to identify a specific build, so before the main agent runs, a helper sub-workflow — &lt;code&gt;attachmentsAnalyzer&lt;/code&gt; — kicks off first.&lt;/p&gt;

&lt;p&gt;It processes attached files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Images&lt;/strong&gt; (error screenshots) — sent to &lt;code&gt;gpt-4o-mini&lt;/code&gt; to extract text and describe context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text files&lt;/strong&gt; (logs) — if the size doesn't exceed the limit set in the &lt;code&gt;Config&lt;/code&gt; node, the content is passed along as extra context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is a compact text block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attachments_context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"...human-readable block..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attachments_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I deliberately split this out into its own workflow — it's reused in other handling branches. If &lt;code&gt;attachmentsAnalyzer&lt;/code&gt; fails, the main workflow keeps going without the extra context instead of falling over entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehcs98hrak8zft8ibz9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehcs98hrak8zft8ibz9z.png" alt="attachment-analizer" width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Gathering context and calling the agent
&lt;/h3&gt;

&lt;p&gt;Before forming the LLM request, the &lt;code&gt;SetVars&lt;/code&gt; node assembles everything needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the raw request data;&lt;/li&gt;
&lt;li&gt;the output of &lt;code&gt;attachmentsAnalyzer&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;auxiliary context for the system prompt: GitHub organization names, Kubernetes contexts and namespaces, Grafana data source names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent itself works with a set of MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub MCP&lt;/strong&gt; — access to Actions build logs, PRs, source code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack MCP&lt;/strong&gt; — reading messages in a thread when the initial request doesn't carry enough context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana / Kubernetes MCP&lt;/strong&gt; — looking up cluster logs and events on deploy-related issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt should cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which teams exist and which repository groups belong to them — speeds up identifying the right repo;&lt;/li&gt;
&lt;li&gt;how to pull additional context from a Slack thread;&lt;/li&gt;
&lt;li&gt;the DevOps team's scope of responsibility — if an error falls within it, the assistant additionally tags the on-call at the end of the investigation;&lt;/li&gt;
&lt;li&gt;a general description of the available tools and when to use each;&lt;/li&gt;
&lt;li&gt;a few worked examples;&lt;/li&gt;
&lt;li&gt;the output format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user prompt is templated like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Investigate the issue from {{ $json.user_name }} in channel {{ $json.channel_name }}{{ $json.is_thread ? ' (message is in a thread, thread_ts=' + $json.thread_ts + ' — first read the history via Slack conversations.replies)' : '' }}&lt;br&gt;
{{ $json.message }}{{ $json.attachments_context &amp;amp;&amp;amp; $json.attachments_context.trim().length &amp;gt; 0 ? '\n\nAdditional information from attachments:\n' + $json.attachments_context : '' }}&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Alongside the message itself, this passes who sent it and which channel it came from, whether it's a thread reply, and any attachments if present.&lt;/p&gt;

&lt;h3&gt;
  
  
  The HTTP-probe workaround
&lt;/h3&gt;

&lt;p&gt;One of the common reasons builds fail is an unreachable external resource: a dependency repository, a proxy, a registry. The natural move would be to give the agent a built-in &lt;code&gt;HTTP Request&lt;/code&gt; tool and let it check availability. In practice, that didn't work — the built-in n8n node doesn't handle timeouts and network errors gracefully, and on a failure it brings down the whole agent chain.&lt;/p&gt;

&lt;p&gt;So I wrapped the check in a separate sub-workflow &lt;code&gt;httpProbeTool&lt;/code&gt; that always returns a structured result: success, failure with a reason, or timeout. The agent uses it like any other tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F127dt7q6r5ozbf8ehio3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F127dt7q6r5ozbf8ehio3.png" alt="http-probe-tool" width="800" height="728"&gt;&lt;/a&gt;&lt;br&gt;
Once the agent responds, there's a short format validation step, and the message gets posted into the Slack thread.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling workflow errors
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ace98fg5td81aibml0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ace98fg5td81aibml0u.png" alt="error-reporter" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
When you build a system that handles real user requests, reliability is critical. If a workflow falls over for any reason — LLM quota exhausted, MCP server unreachable, invalid JSON — nobody in chat will know, and the request just hangs there.&lt;/p&gt;

&lt;p&gt;This is especially relevant in the first weeks after launch, when you're constantly tweaking things.&lt;/p&gt;

&lt;p&gt;The solution is simple: a dedicated workflow specified in the main workflows' settings as the &lt;code&gt;Error Workflow&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gucq3wmvadxk0bhqgxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gucq3wmvadxk0bhqgxd.png" alt=" " width="796" height="82"&gt;&lt;/a&gt;&lt;br&gt;
What it does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pulls information about the failed execution from the n8n API (you'll need to generate an API key for this).&lt;/li&gt;
&lt;li&gt;Through the &lt;code&gt;Extract Thread Context&lt;/code&gt; node, determines &lt;code&gt;channel_id&lt;/code&gt; and &lt;code&gt;thread_ts&lt;/code&gt; of the original message.&lt;/li&gt;
&lt;li&gt;Posts a short error message directly into the request's thread, so the author isn't left in the dark.
A more verbose error report also goes into the DevOps team's internal channel — this lets us react quickly to regressions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyd6tkgc2unqwla5u4ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyd6tkgc2unqwla5u4ss.png" alt=" " width="799" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3l5i6c5xu7jg30yd7av.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3l5i6c5xu7jg30yd7av.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l83njd5ab128me08x0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l83njd5ab128me08x0g.png" alt=" " width="748" height="1614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What we got
&lt;/h2&gt;

&lt;p&gt;After a couple of months running in production, the picture looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Average response time&lt;/strong&gt; — up to 3 minutes from the moment a message appears in the channel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~25% of requests&lt;/strong&gt; are fully closed without an engineer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~40% of requests&lt;/strong&gt; are resolved faster than usual — the agent does a preliminary diagnostic, and the on-call gets ready-made context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; at our volume (several dozen requests per week) — up to $250/month on LLM usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example creation issue by request in Slack&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zkdgy0uvwtfjagjq5nj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zkdgy0uvwtfjagjq5nj.png" alt="create-task" width="747" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ju9lpc6olnqi17c9flk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ju9lpc6olnqi17c9flk.png" alt="jira-task" width="800" height="351"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Example CI request resolved without an engineer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3unmnde09lza4azazqx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3unmnde09lza4azazqx.png" alt="build-failed" width="748" height="784"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Example investigating a build crash based on a screenshot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkntclmre8fcf5j7y5y20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkntclmre8fcf5j7y5y20.png" alt=" " width="800" height="855"&gt;&lt;/a&gt;&lt;br&gt;
Compared to an engineer's hourly rate, this looks like a very cost-effective team addition — especially given the agent works around the clock and doesn't pull the on-call away from their main work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's still rough
&lt;/h2&gt;

&lt;p&gt;To avoid leaving the impression that everything is smooth, here are the rough edges we still live with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent sometimes gets lost in long threads with dozens of messages — we have to limit context depth in the system prompt explicitly.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Incident&lt;/code&gt; category has the lowest autonomous resolution rate so far — too many non-standard situations. We're working on expanding the MCP toolset.&lt;/li&gt;
&lt;li&gt;It's hard to objectively evaluate the quality of answers to "infrastructure questions" — we need a feedback mechanism from engineers (planning Slack reactions as the simplest signal).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant with RAG search over our docs, and the handler for modification tasks with automatic ticket creation.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;How do you offload first-line support on your team?&lt;/strong&gt; Are you using off-the-shelf products (like PagerDuty AIOps) or building your own? Which request categories automate best in your environment — share in the comments, I'm curious to compare the distribution.&lt;/p&gt;

&lt;p&gt;Repository with workflows and system prompts: &lt;a href="https://github.com/javdet/automagicops-workflows" rel="noopener noreferrer"&gt;https://github.com/javdet/automagicops-workflows&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>development</category>
      <category>devops</category>
    </item>
    <item>
      <title>MCP servers for the entire team: from local launch to centralized access</title>
      <dc:creator>Sergey Byvshev</dc:creator>
      <pubDate>Wed, 15 Apr 2026 06:50:14 +0000</pubDate>
      <link>https://dev.to/javdet/mcp-servers-for-the-entire-team-from-local-launch-to-centralized-access-20e7</link>
      <guid>https://dev.to/javdet/mcp-servers-for-the-entire-team-from-local-launch-to-centralized-access-20e7</guid>
      <description>&lt;p&gt;When you have six MCP servers and ten colleagues, "just run npx locally" stops working. Not everyone wants to install Node.js, managers don't have Docker, and your local &lt;code&gt;claude_desktop_config.json&lt;/code&gt; starts looking like a secrets vault for every production system.&lt;/p&gt;

&lt;p&gt;I went from remote MCP → local setup → Docker → Kubernetes with a universal Helm chart and JWT auth via Envoy. Here's what I hit along the way, what worked, and what's still unsolved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 1: Remote MCP — When the Vendor Did the Work
&lt;/h2&gt;

&lt;p&gt;My first MCP experience was dead simple. I added the Atlassian MCP server to Claude as a remote MCP, authenticated, and it just worked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"atlassian"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mcp.atlassian.com/v1/sse"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem? Very few SaaS products offer this. Everything self-hosted or without native MCP support is a different story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 2: Local Setup — The Dependency Zoo
&lt;/h2&gt;

&lt;p&gt;Next, I wanted to connect my IDE to Kubernetes. No built-in MCP support here, so dependencies it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"kubernetes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kubernetes-mcp-server@latest"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdkr883on91tcen8rfxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdkr883on91tcen8rfxd.png" alt="claude-mcp" width="733" height="833"&gt;&lt;/a&gt;&lt;br&gt;
It worked, but one server needs Node.js, another needs Python and &lt;code&gt;uvx&lt;/code&gt;, a third needs a Go binary. The runtime zoo on your machine grows with every new MCP server. Not great when you're not even a developer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Level 3: Docker — Isolation Without the Mess
&lt;/h2&gt;

&lt;p&gt;The logical next step — containers. Each MCP server with its own runtime, no host pollution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"grafana"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--rm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GRAFANA_URL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GRAFANA_SERVICE_ACCOUNT_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"grafana/mcp-grafana"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-t"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"GRAFANA_URL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://grafana.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"GRAFANA_SERVICE_ACCOUNT_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;token&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For one engineer on one machine — enough. But when ten people need access, questions pile up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production tokens are scattered across laptops.&lt;/li&gt;
&lt;li&gt;Automated workflows (n8n, CI/CD) need MCP access too — and they run remotely.&lt;/li&gt;
&lt;li&gt;Managers and analysts want AI tools but aren't ready to deal with &lt;code&gt;docker run&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One conclusion: MCP servers need to move into shared infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 4: Kubernetes — Centralized Deployment
&lt;/h2&gt;

&lt;p&gt;The initial idea was straightforward: deploy remote MCP servers inside your infrastructure perimeter. At minimum, you can restrict access via corporate VPN.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3gf7ya3m83wgf0p0veg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3gf7ya3m83wgf0p0veg.png" alt="Balancing" width="800" height="310"&gt;&lt;/a&gt;&lt;br&gt;
Anyone who's tackled this has hit the same wall: most MCP servers communicate via stdio (stdin/stdout). You can't reach them over HTTP directly.&lt;/p&gt;

&lt;p&gt;This is where &lt;a href="https://github.com/michlyn/mcpgateway" rel="noopener noreferrer"&gt;MCP Gateway&lt;/a&gt; comes in — a proxy that translates Streamable HTTP to stdio and back.&lt;/p&gt;

&lt;p&gt;The flow: client (Claude Desktop, IDE, n8n) → HTTPS → Ingress → Kubernetes Service → Pod with MCP Gateway sidecar (HTTP → stdin) → MCP server process.&lt;/p&gt;
&lt;h3&gt;
  
  
  Universal Helm Chart
&lt;/h3&gt;

&lt;p&gt;To avoid writing manifests for every MCP server, I built a universal Helm chart: &lt;a href="https://artifacthub.io/packages/helm/mcp-helm-chart/mcp" rel="noopener noreferrer"&gt;mcp-helm-chart on ArtifactHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What it supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mode: proxy&lt;/code&gt;&lt;/strong&gt; — runs MCP Gateway as a sidecar, translating HTTP ↔ stdio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mode: native&lt;/code&gt;&lt;/strong&gt; — for servers that already support HTTP (no sidecar needed)&lt;/li&gt;
&lt;li&gt;Vault and ExternalSecrets integration for secrets management&lt;/li&gt;
&lt;li&gt;Gateway API and classic Ingress support&lt;/li&gt;
&lt;li&gt;HPA for horizontal scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Installation with Ingress-nginx (no auth):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add mcp https://javdet.github.io/mcp-helm-chart
helm &lt;span class="nb"&gt;install &lt;/span&gt;my-mcp mcp/mcp &lt;span class="nt"&gt;-f&lt;/span&gt; values.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key sections of &lt;code&gt;values.yaml&lt;/code&gt; for deploying DigitalOcean MCP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;proxy&lt;/span&gt;

&lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node&lt;/span&gt;
    &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20-bookworm"&lt;/span&gt;
    &lt;span class="na"&gt;pullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfNotPresent&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@michlyn/mcpgateway"&lt;/span&gt;
    &lt;span class="na"&gt;stdioCommand&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;@digitalocean/mcp&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--services&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;apps,droplets,doks,networking"&lt;/span&gt;
    &lt;span class="na"&gt;outputTransport&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;streamable-http&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;httpPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/mcp&lt;/span&gt;

&lt;span class="c1"&gt;# Token stored in HashiCorp Vault, injected via Vault Webhook&lt;/span&gt;
&lt;span class="na"&gt;vault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp"&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kubernetes_dev-fra1-01"&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DIGITALOCEAN_API_TOKEN&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault:devops/data/ai/mcp/digitalocean#token&lt;/span&gt;

&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal"&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-buffering&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;off"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-http-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.1"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-read-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-send-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/use-regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/$2&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aitool.example.com&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/digitalocean(/|$)(.*)&lt;/span&gt;
          &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ImplementationSpecific&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ssl-certificate&lt;/span&gt;
      &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;aitool.example.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g9mhx8zj4kq6ixa39s1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g9mhx8zj4kq6ixa39s1.png" alt="Ingress" width="799" height="647"&gt;&lt;/a&gt;&lt;br&gt;
MCP servers in Streamable HTTP mode are stateless. They scale horizontally with a standard HPA without any issues.&lt;/p&gt;

&lt;p&gt;The most pressing question here is authentication — or better yet, authorization. Most MCP servers don't support incoming authentication, so you have to handle it yourself.&lt;/p&gt;
&lt;h2&gt;
  
  
  Authentication: JWT via Envoy
&lt;/h2&gt;

&lt;p&gt;Basic auth is barely better than nothing, so — straight to JWT. I used Envoy API Gateway since it natively supports JWT validation and was already in our stack.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key and Token Generation
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Generate RSA keys&lt;/span&gt;
openssl genrsa &lt;span class="nt"&gt;-out&lt;/span&gt; mcp-jwt-private.pem 4096
openssl rsa &lt;span class="nt"&gt;-in&lt;/span&gt; mcp-jwt-private.pem &lt;span class="nt"&gt;-pubout&lt;/span&gt; &lt;span class="nt"&gt;-out&lt;/span&gt; mcp-jwt-public.pem

&lt;span class="c"&gt;# 2. Generate Key ID&lt;/span&gt;
&lt;span class="nv"&gt;KID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-hex&lt;/span&gt; 16&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Build JWT header (base64url)&lt;/span&gt;
&lt;span class="nv"&gt;HEADER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;alg&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;RS256&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;typ&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;JWT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;kid&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;KID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-w0&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'+/'&lt;/span&gt; &lt;span class="s1"&gt;'-_'&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'='&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 4. Build JWT payload (1 year expiry)&lt;/span&gt;
&lt;span class="nv"&gt;PAYLOAD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;sub&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;claude-desktop&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;aud&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;mcp-servers&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;iss&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;https://your-domain.com&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;iat&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;exp&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;31536000&lt;/span&gt; &lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-w0&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'+/'&lt;/span&gt; &lt;span class="s1"&gt;'-_'&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'='&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 5. Sign&lt;/span&gt;
&lt;span class="nv"&gt;SIGNATURE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HEADER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PAYLOAD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | openssl dgst &lt;span class="nt"&gt;-sha256&lt;/span&gt; &lt;span class="nt"&gt;-sign&lt;/span&gt; mcp-jwt-private.pem &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-w0&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'+/'&lt;/span&gt; &lt;span class="s1"&gt;'-_'&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'='&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 6. Final token&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HEADER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PAYLOAD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SIGNATURE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The public key is packaged into JWKS and stored in a ConfigMap. Envoy validates every incoming request by checking issuer, audience, and signature.&lt;/p&gt;

&lt;p&gt;Auth configuration in the chart values (Gateway API variant):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;gatewayApi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;parentRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;internal&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-infra&lt;/span&gt;
      &lt;span class="na"&gt;sectionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https&lt;/span&gt;
  &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mcptools.example.com&lt;/span&gt;
  &lt;span class="na"&gt;timeouts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600s"&lt;/span&gt;
    &lt;span class="na"&gt;backendRequest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600s"&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PathPrefix&lt;/span&gt;
            &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/digitalocean&lt;/span&gt;
      &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;URLRewrite&lt;/span&gt;
          &lt;span class="na"&gt;urlRewrite&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReplacePrefixMatch&lt;/span&gt;
              &lt;span class="na"&gt;replacePrefixMatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
  &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jwt&lt;/span&gt;
    &lt;span class="na"&gt;jwt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-jwt-auth&lt;/span&gt;
          &lt;span class="na"&gt;issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-issuer&lt;/span&gt;
          &lt;span class="na"&gt;audiences&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mcptools.example.com&lt;/span&gt;
          &lt;span class="na"&gt;localJWKS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ValueRef&lt;/span&gt;
            &lt;span class="na"&gt;valueRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
              &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
              &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jwks-config&lt;/span&gt;

&lt;span class="c1"&gt;# If you use External Secrets Operator, secrets can be fetched through it&lt;/span&gt;
&lt;span class="na"&gt;externalSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;refreshInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1h&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretStore&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;creationPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Owner&lt;/span&gt;
  &lt;span class="na"&gt;dataFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;extract&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;infra/mcp/digitalocean&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jjyhzjsz4ez5e00uplv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jjyhzjsz4ez5e00uplv.png" alt="API gateway" width="800" height="803"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Currently, access to target systems (DigitalOcean, Grafana, Kubernetes) goes through a single service account. For read-only tasks — monitoring, diagnostics, fetching info — this is enough. For write operations, the question remains open..&lt;/p&gt;

&lt;h2&gt;
  
  
  Automated Access
&lt;/h2&gt;

&lt;p&gt;Periodic tasks (n8n workflows, CI/CD pipelines) connect to the same MCP servers over Streamable HTTP with separate service JWT tokens. The setup is identical — only the subject in the token payload differs, and optionally the access scope at the Gateway level.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Working, What's Not
&lt;/h2&gt;

&lt;p&gt;MCP tooling and infrastructure still have a few steps to take toward each other before usage becomes truly simple, reliable, and secure.&lt;/p&gt;

&lt;p&gt;The current setup works: six MCP servers in Kubernetes, one Helm chart, JWT auth via Envoy, secrets in Vault. Colleagues connect to remote MCP servers with zero local dependencies, automation uses the same endpoints.&lt;/p&gt;

&lt;p&gt;What's still missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-user authorization.&lt;/strong&gt; The MCP protocol doesn't support passing user context. We're living with service accounts for now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logging.&lt;/strong&gt; Who called which tool with what parameters — not logged at the MCP level. You can collect this at the Envoy layer, but without call context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth standard.&lt;/strong&gt; Every vendor does it differently. OAuth, API Key, Bearer — no unified approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Helm chart is open source: &lt;a href="https://artifacthub.io/packages/helm/mcp-helm-chart/mcp" rel="noopener noreferrer"&gt;ArtifactHub&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;How do you handle per-user authorization for MCP? We're still on a single service account — would love to hear from anyone who's moved past that.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>AI Is for DevOps: How a Neural Network Debugs Failed Pipelines</title>
      <dc:creator>Sergey Byvshev</dc:creator>
      <pubDate>Wed, 01 Apr 2026 14:38:51 +0000</pubDate>
      <link>https://dev.to/javdet/ai-is-for-devops-how-a-neural-network-debugs-failed-pipelines-h72</link>
      <guid>https://dev.to/javdet/ai-is-for-devops-how-a-neural-network-debugs-failed-pipelines-h72</guid>
      <description>&lt;p&gt;How often does someone rush to you wide-eyed, begging for help with a broken pipeline? Or you find yourself staring at a red status in Slack on a Friday evening, knowing the next 15–20 minutes will be spent on routine work: open the log, find the error line, compare with the last commit, check dependencies…&lt;/p&gt;

&lt;p&gt;The work is straightforward. And that's exactly why it's boring — a perfect candidate for automation.&lt;/p&gt;

&lt;p&gt;Fortunately, neural networks can now handle this for us and provide solid advice (not all of them, but some definitely can).&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Think Through Beforehand
&lt;/h2&gt;

&lt;p&gt;Before writing any code, it's worth answering four questions. They'll define the architecture of the entire solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What events trigger the analysis?&lt;/strong&gt; In our case — a job that finished unsuccessfully in CI/CD. To start diagnostics, it's enough to pass the agent the last 50 lines of the build log and the pipeline file contents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What data sources will be needed?&lt;/strong&gt; The main ones are the version control system (repository access), CI/CD (full log, related jobs), an endpoint availability checker, and CI agent resource consumption metrics.&lt;/p&gt;

&lt;p&gt;To build such an engineer, you first need to determine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What events and in what format to provide to the engineer?&lt;/li&gt;
&lt;li&gt;What sources and data might be needed?&lt;/li&gt;
&lt;li&gt;How to manipulate this data to identify the root cause?&lt;/li&gt;
&lt;li&gt;What should the diagnostic output look like in form and content?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to analyze the data?&lt;/strong&gt; This is the most interesting part, because there are many scenarios depending on the job type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build jobs — dependency issues (missing, incorrectly specified, unavailable), code errors, insufficient build resources.&lt;/li&gt;
&lt;li&gt;Test jobs — code errors, incorrectly written tests.&lt;/li&gt;
&lt;li&gt;Deploy jobs — manifest errors, issues on the target platform side.&lt;/li&gt;
&lt;li&gt;Common problems — errors in the pipeline/workflow itself, missing utilities on the agent, agent initialization issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What should the report format be?&lt;/strong&gt; Most often, such a report is read by human eyes in a chat, so it should be written in plain language. Concise, facts only: what was found, most probable causes, specific steps to fix. A convenient place for such a report is a thread under the corresponding error message or a dedicated channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution Architecture
&lt;/h2&gt;

&lt;p&gt;At a high level, the flow works like this: an event arrives about a job completing unsuccessfully. Then we request data for analysis: build logs, pipeline description. Based on this data and its system prompt, the AI agent performs the failure analysis. During the process, the assistant can independently check the repository, see what changed, and so on. In case of external endpoint unavailability errors, it can verify this. On a failed deploy, it can check application logs and metrics. As a result, several most probable failure causes and remediation steps are generated. This data is then sent to the team chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76eu1yjwb3uolirk8sga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76eu1yjwb3uolirk8sga.png" alt="Architecture" width="781" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Time to Implement
&lt;/h2&gt;

&lt;p&gt;Some implementation aspects are covered in more detail in a the article.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; — You can quickly launch n8n with the MCP update using docker-compose (see below)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gitlab.com/" rel="noopener noreferrer"&gt;Gitlab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com/" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com/oss/loki/" rel="noopener noreferrer"&gt;Loki&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slack.com/" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up Incoming Events
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnamijbm1k9f5drasnnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnamijbm1k9f5drasnnz.png" alt="Input chain" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first step is to create a webhook in n8n. Gitlab uses the &lt;code&gt;X-Gitlab-Token&lt;/code&gt; header for authentication, so in n8n we select Header Auth and specify the corresponding credential.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe72bpm3a5mvsidk2orc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe72bpm3a5mvsidk2orc2.png" alt="Webhook" width="800" height="1511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rkarpsyjf6xp5lovxgn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rkarpsyjf6xp5lovxgn.png" alt="Webhook Auth" width="800" height="464"&gt;&lt;/a&gt;&lt;br&gt;
In Gitlab, we configure webhook delivery. This can be done for an individual repository or for an entire group. We specify the webhook address and the secret token, and from the event types we select Pipeline events.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdyqvazodc14hz1wia91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdyqvazodc14hz1wia91.png" alt="Gitlab webhook" width="800" height="632"&gt;&lt;/a&gt;&lt;br&gt;
Then, using the &lt;code&gt;If&lt;/code&gt; node, we filter out all non-failed events — we don't need them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Collection
&lt;/h2&gt;

&lt;p&gt;As soon as we receive a job failure event, we request details from Gitlab. For this, you'll need to create a gitlab access token (I recommend read-only) and the corresponding credential in n8n.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji0db0u4ritjd7hre4pk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji0db0u4ritjd7hre4pk.png" alt="Collection data" width="799" height="492"&gt;&lt;/a&gt;&lt;br&gt;
As soon as we receive a job failure event, we request details from Gitlab. For this, you'll need to create a gitlab access token (I recommend read-only) and the corresponding credential in n8n.&lt;br&gt;
Then we merge all the collected data using the Merge node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error Analysis
&lt;/h2&gt;

&lt;p&gt;Data arrives to the agent in the following format:&lt;/p&gt;

&lt;p&gt;`[&lt;/p&gt;

&lt;p&gt;{ "job_log": "Last 50 lines of failed job" },&lt;/p&gt;

&lt;p&gt;{ "data": "Content .gitlab-ci.yml" },&lt;/p&gt;

&lt;p&gt;{ "pipeline": {} },&lt;/p&gt;

&lt;p&gt;{ "failed_job": {} }&lt;/p&gt;

&lt;p&gt;]`&lt;/p&gt;

&lt;p&gt;This format should be communicated to the agent in advance via the system prompt. There we also describe the available tools, the investigation strategy (based on the considerations outlined above), and the desired output report format.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ot5k9cg3bm9vzd4hdn3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ot5k9cg3bm9vzd4hdn3.png" alt="ai-asisstant" width="800" height="549"&gt;&lt;/a&gt;&lt;br&gt;
It's best to use the latest model versions, as they handle MCP tool use significantly better. We don't connect memory here, since each build failure is an independent event for the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Tools
&lt;/h2&gt;

&lt;p&gt;The agent has access to three tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gitlab MCP — for retrieving additional information about the failed job, code changes, etc.&lt;/li&gt;
&lt;li&gt;Grafana MCP — for retrieving CI agent metrics, as well as failed deploy logs.&lt;/li&gt;
&lt;li&gt;HTTP Request — n8n's built-in tool for checking endpoint availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important note&lt;/strong&gt;: make sure your MCP servers are running in remote mode. If an MCP server doesn't support remote out of the box, you can solve this with mcpgateway — it proxies HTTP to stdin. For the transport method, streaming HTTP is the best choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Posting to Chat
&lt;/h2&gt;

&lt;p&gt;The final step is sending the generated report to Slack. The report goes to the selected channel or thread.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcu8mpigda60jxtk9w7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcu8mpigda60jxtk9w7g.png" alt="Output" width="720" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing and Real-World Examples
&lt;/h2&gt;

&lt;p&gt;The final workflow looks like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag2bbghealcpt5vigua5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag2bbghealcpt5vigua5.png" alt="full-workflow" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 1: Failed Build
&lt;/h2&gt;

&lt;p&gt;Gradle can't resolve a dependency. The agent determines that this is a dependency resolution issue, not a compilation error. It provides specific causes: the artifact isn't published in the repository, or credentials are unavailable inside the Docker build context. For each cause — concrete steps to fix.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawrraja08mehvaosn86s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawrraja08mehvaosn86s.png" alt="Gitlab-logs" width="799" height="344"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivrjyesmjis90gj8mt5w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivrjyesmjis90gj8mt5w.png" alt="Slack-message" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2: Infrastructure Change Errors
&lt;/h2&gt;

&lt;p&gt;Terraform plan fails with Unsupported argument errors. The agent recognizes that the HCL configuration contains attributes not supported by the current DigitalOcean provider schema. It provides three probable causes — from the wrong resource type to provider version mismatch — with specific remediation steps for each.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8gpz8eucbk2a0pwoci1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8gpz8eucbk2a0pwoci1.png" alt="gitlab-example" width="799" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs1f3cz89trpgp2sh5st.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs1f3cz89trpgp2sh5st.png" alt="report-example" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We've built an assistant that performs full error analysis in approximately 30 seconds. This allows the team to respond to failed jobs significantly faster and spend their time on real engineering tasks rather than routine log analysis.&lt;/p&gt;

&lt;p&gt;Token consumption stays at the level of a few thousand per analysis.&lt;/p&gt;




&lt;p&gt;Base workflow version is &lt;a href="https://github.com/javdet/automagicops-workflows/tree/main/workflows/CI_CDAssistant" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
Full tutorial with all scripts can be seen &lt;a href="https://www.patreon.com/posts/how-neural-in-30-153563490" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>infrastructure</category>
      <category>cicd</category>
    </item>
    <item>
      <title>AI Alert Assistant: How n8n + LLM Replace Routine Diagnostics</title>
      <dc:creator>Sergey Byvshev</dc:creator>
      <pubDate>Tue, 24 Mar 2026 06:35:05 +0000</pubDate>
      <link>https://dev.to/javdet/ai-alert-assistant-how-n8n-llm-replace-routine-diagnostics-a9b</link>
      <guid>https://dev.to/javdet/ai-alert-assistant-how-n8n-llm-replace-routine-diagnostics-a9b</guid>
      <description>&lt;p&gt;Anyone who has dealt with keeping services running knows how exhausting and unpredictably time-consuming incident diagnostics and resolution can be.&lt;/p&gt;

&lt;p&gt;Over the years, I've watched the evolution of incident response processes — from "whoever spots the problem first owns it" to strictly defined 24/7 on-call rotations, SLA-driven response times, runbook adherence, and separation of responsibility across platforms.&lt;/p&gt;

&lt;p&gt;One thing has remained constant:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Gathering data from multiple sources
1.1. Metrics
1.2. Logs
1.3. Traces
1.4. Release and maintenance timelines&lt;/li&gt;
&lt;li&gt;Analysis based on personal knowledge and experience&lt;/li&gt;
&lt;li&gt;Formulating possible solutions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you have a documented procedure for every situation, that simplifies things somewhat — but it doesn't teach the investigative mindset needed for real troubleshooting.&lt;/p&gt;

&lt;p&gt;Writing and maintaining a &lt;a href="https://runbooks.prometheus-operator.dev/" rel="noopener noreferrer"&gt;runbook&lt;/a&gt; for every alert is tedious work, which is exactly why an experienced engineer will always outperform a library of hundreds of runbooks.&lt;/p&gt;

&lt;p&gt;But what if an engineer's function could be performed even when no engineer is physically present?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6yyrj5u5xrhsza6ujf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6yyrj5u5xrhsza6ujf6.png" alt=" " width="799" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Designing the Assistant: What to Define Upfront
&lt;/h1&gt;

&lt;p&gt;Before writing any code, four questions need to be answered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What events, and in what format, should be provided to the agent?&lt;/li&gt;
&lt;li&gt;What data sources might it need?&lt;/li&gt;
&lt;li&gt;How should it manipulate that data to identify the root cause?&lt;/li&gt;
&lt;li&gt;What should the diagnostic report look like in terms of form and content?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's break down each one. A link to the workflow itself can be found below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Events and Format
&lt;/h2&gt;

&lt;p&gt;Typically, what's sufficient to kick off diagnostics is an event containing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Alertname&lt;br&gt;
Description&lt;br&gt;
Labels&lt;br&gt;
job_name&lt;br&gt;
namespac&lt;br&gt;
pod&lt;br&gt;
env&lt;br&gt;
region&lt;br&gt;
Grafana Dashboard&lt;br&gt;
Runbook Url&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;p&gt;Most frequently, we turn to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics that have breached acceptable thresholds&lt;/li&gt;
&lt;li&gt;Resource consumption and load metrics&lt;/li&gt;
&lt;li&gt;Error logs&lt;/li&gt;
&lt;li&gt;The platform — Kubernetes or a standalone server&lt;/li&gt;
&lt;li&gt;Related CI/CD releases&lt;/li&gt;
&lt;li&gt;Alert definitions and firing conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Analysis
&lt;/h2&gt;

&lt;p&gt;There is arguably no canonical sequence of steps for analysis. The diagnostic process is inherently variable — which is why no one has yet managed to write a single script that covers every possible scenario. But we'll give it a shot.&lt;/p&gt;

&lt;p&gt;First, let's consider how we ourselves approach incident diagnosis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Examine what's happening with the metric that triggered the alert: determine the nature of the anomaly — a spike, monotonic growth, or a persistently critical value&lt;/li&gt;
&lt;li&gt;Determine whether this is a software-level failure or caused by issues at a lower layer&lt;/li&gt;
&lt;li&gt;Check infrastructure metrics: resources, networking, system limits&lt;/li&gt;
&lt;li&gt;Inspect logs at the point where the problem is occurring&lt;/li&gt;
&lt;li&gt;Determine how recently the affected components were updated and what changed&lt;/li&gt;
&lt;li&gt;Attempt to interact with the components directly — through the orchestrator or a Linux shell&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Report Format
&lt;/h2&gt;

&lt;p&gt;In most cases, this kind of report is meant to be read by humans, so it should be written in plain, natural language. Concise — just the discovered facts, a list of hypotheses, and possible remediation steps. The most convenient place for such a report is a thread under the corresponding alert in the team chat.&lt;/p&gt;

&lt;p&gt;Solution Architecture&lt;br&gt;
Here's the desired flow: when an alert fires, the event is sent to a webhook that extracts the relevant data and assembles a clear, well-structured prompt for the AI agent.&lt;/p&gt;

&lt;p&gt;The AI agent, guided by its system prompt and the available &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; tools, performs diagnostics and generates a report in a predefined format.&lt;/p&gt;

&lt;p&gt;The report is then posted to the team chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcczx1ggz3w9newtmwmc9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcczx1ggz3w9newtmwmc9.png" alt=" " width="691" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Implementation
&lt;/h1&gt;

&lt;p&gt;If you're in a hurry, you can view the finished workflow below. &lt;/p&gt;

&lt;p&gt;As the execution environment for workflows like this, I chose n8n because it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lets you build easily readable automations fairly quickly&lt;/li&gt;
&lt;li&gt;Makes it simple to share your work&lt;/li&gt;
&lt;li&gt;Separates logic from secrets and other hardcoded values&lt;/li&gt;
&lt;li&gt;Has a free self-hosted version&lt;/li&gt;
&lt;li&gt;Has an enormous community&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Personally, it reminds me of &lt;a href="https://www.jenkins.io/" rel="noopener noreferrer"&gt;Jenkins&lt;/a&gt; about ten years ago — and Jenkins was great.&lt;br&gt;
You can install n8n using any of the methods described in the documentation, for example using &lt;a href="https://docs.n8n.io/hosting/installation/server-setups/docker-compose/" rel="noopener noreferrer"&gt;docker-compose&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here, the implementation will depend on the systems you use. In my case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; + &lt;a href="https://github.com/prometheus/alertmanager" rel="noopener noreferrer"&gt;Alertmanager&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com/oss/loki/" rel="noopener noreferrer"&gt;Grafana Loki&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/" rel="noopener noreferrer"&gt;DigitalOcean&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slack.com/" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Preprocessing Incoming Events
&lt;/h2&gt;

&lt;p&gt;Alertmanager can send alerts to a custom webhook. In n8n, all you need to do is create a Webhook trigger node and and you can also specify authentication parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7i9h86evv27zjif4b9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7i9h86evv27zjif4b9e.png" alt=" " width="800" height="1486"&gt;&lt;/a&gt;&lt;br&gt;
Add data about the created webhook to the new receiver n8n in alertmanager.&lt;br&gt;
After this, we'll be able to send alerts from Alertmanager to our workflow. However, the received messages contain unnecessary data, and the format is not entirely appropriate. This will make it difficult for LLM to understand what's being asked of it, leading to increased token consumption. Therefore, we'll make a small modification using Code node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejvgyp0a56oac6tors37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejvgyp0a56oac6tors37.png" alt=" " width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You'll likely want to store certain values as variables — for instance, the UID of your Prometheus datasource in Grafana.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qquaey1xourwhmfig24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qquaey1xourwhmfig24.png" alt=" " width="800" height="627"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Agent
&lt;/h2&gt;

&lt;p&gt;A prerequisite for the AI ​​Agent node to operate is a connected LLM. Almost any neural network can be connected, but in my experience, Codex and Opus perform best.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdq6wcclbpob7abjpku2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdq6wcclbpob7abjpku2a.png" alt=" " width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We don't use Memory here, since each alert is an independent event unrelated to others.&lt;br&gt;
One of the key aspects is writing the system prompt. What should it include?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent purpose — what it's supposed to do&lt;/li&gt;
&lt;li&gt;Brief description of your infrastructure and the type of service you provide&lt;/li&gt;
&lt;li&gt;Description of each MCP tool — e.g., use the Kubernetes MCP to get pod status, related events, etc.&lt;/li&gt;
&lt;li&gt;Important rules to follow and pitfalls to avoid — e.g., never ask questions, write the response in a specific language, never make any changes to the infrastructure&lt;/li&gt;
&lt;li&gt;Diagnostic guidelines — essentially what we discussed in the Data Analysis section above&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  MCP — The Agent's Eyes and Ears
&lt;/h2&gt;

&lt;p&gt;MCP tools serve as the agent's eyes and ears, giving it the ability to interact with the subject of diagnosis. The specific list may vary depending on your infrastructure, but the core categories of data sources (which we outlined earlier) remain the same. In my case, the list looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics — &lt;a href="https://github.com/grafana/mcp-grafana" rel="noopener noreferrer"&gt;mcp-grafana&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Logs — &lt;a href="https://github.com/grafana/mcp-grafana" rel="noopener noreferrer"&gt;mcp-grafana&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Platform — &lt;a href="https://github.com/containers/kubernetes-mcp-server" rel="noopener noreferrer"&gt;kubernetes-mcp&lt;/a&gt;, &lt;a href="https://github.com/digitalocean/digitalocean-mcp" rel="noopener noreferrer"&gt;digitalocean-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;CI/CD releases — &lt;a href="https://github.com/zereight/gitlab-mcp" rel="noopener noreferrer"&gt;gitlab-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Alert descriptions — vector store&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48zj1bjwnlzolmoane7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48zj1bjwnlzolmoane7y.png" alt=" " width="800" height="434"&gt;&lt;/a&gt;&lt;br&gt;
When running your mcp's, make sure they are running in remote http streaming mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector Store
&lt;/h2&gt;

&lt;p&gt;The knowledge base deserves separate attention. It allows you to store large volumes of information and perform fast lookups. This saves tokens and reduces the time spent on external system queries. I use Qdrant as this knowledge base. I strongly recommend setting a service API token for authentication.&lt;/p&gt;

&lt;p&gt;Next, you need to create a collection where your knowledge will be stored. You can do this through the web interface at http://:6333/dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqamyycaucetlgylnqxm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqamyycaucetlgylnqxm.png" alt=" " width="800" height="337"&gt;&lt;/a&gt;&lt;br&gt;
Create a QdrantApi account and use it to connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu444rl04e9i9e4hyv407.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu444rl04e9i9e4hyv407.png" alt=" " width="800" height="461"&gt;&lt;/a&gt;&lt;br&gt;
Once the database is connected to the agent as a tool, it's time to load it with knowledge. I use a separate workflow for this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjqq71gbwc56ivwpxpsh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjqq71gbwc56ivwpxpsh.png" alt=" " width="800" height="619"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Simply run this workflow and upload your knowledge file(s) through the form that appears — they'll be saved to the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Posting to Chat
&lt;/h2&gt;

&lt;p&gt;After the AI agent completes its work, we need to send the results to the chat where engineers will see them. The delivery chain consists of three nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for recent messages in the alerts channel. Unfortunately, not all group chats support keyword search via API, so the last 10 messages are retrieved instead.&lt;/li&gt;
&lt;li&gt;Find the message that corresponds to our alert.&lt;/li&gt;
&lt;li&gt;Post the diagnostic results as a thread reply to that message.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Slack integration, you'll need to set up authentication following the official &lt;a href="https://docs.n8n.io/integrations/builtin/credentials/slack/?utm_source=n8n_app&amp;amp;utm_medium=credential_settings&amp;amp;utm_campaign=create_new_credentials_modal#slack-trigger-configuration" rel="noopener noreferrer"&gt;Slack API documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51mnmg8nmgxjdgdfyee1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51mnmg8nmgxjdgdfyee1.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Testing and Examples
&lt;/h1&gt;

&lt;p&gt;Here's what the final workflow looks like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpj44qptf9lgz4zi6w9m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpj44qptf9lgz4zi6w9m.png" alt=" " width="799" height="374"&gt;&lt;/a&gt;&lt;br&gt;
I've tested minor variations of this workflow across several projects, and here are the results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehzl4movw132nvuo0x4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehzl4movw132nvuo0x4f.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl569fg5ms49qh2w9xr9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl569fg5ms49qh2w9xr9g.png" alt=" " width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On average, analyzing an alert takes 30 seconds. In that time, the agent manages to inspect metrics, review logs, assess the state of the K8s cluster, and deliver a verdict.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqh4cokq5kxwm1ubp6gtv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqh4cokq5kxwm1ubp6gtv.png" alt=" " width="800" height="738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flvnnk1snyymtspeot21i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flvnnk1snyymtspeot21i.png" alt=" " width="799" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;What we end up with is an assistant that gets to work the instant an alert fires. The analysis time is minimal, which guarantees that by the time an engineer sees the alert, the initial diagnostics will have already been completed.&lt;/p&gt;

&lt;p&gt;This is just one of the directions where AI can meaningfully simplify life for infrastructure teams — and for development teams who are forced to handle their own support. The agent doesn't replace the engineer, but it takes on the first-response diagnostics and shortens the gap between "alert fired" and "we understand what's going on." And at night — when the on-call engineer is asleep — that can be invaluable.&lt;/p&gt;




&lt;p&gt;There is base version of workflow: &lt;a href="https://github.com/javdet/automagicops-workflows/tree/main/workflows/AlertAssistant" rel="noopener noreferrer"&gt;https://github.com/javdet/automagicops-workflows/tree/main/workflows/AlertAssistant&lt;/a&gt;&lt;br&gt;
Want to quickly implement a similar flow for yourself? Read the full &lt;a href="https://www.patreon.com/posts/153231436" rel="noopener noreferrer"&gt;Patreon&lt;/a&gt; guide with detailed examples and practical tips.&lt;br&gt;
Author of the article: &lt;a href="https://linkedin.com/in/sergeybyvshev" rel="noopener noreferrer"&gt;https://linkedin.com/in/sergeybyvshev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
