How a small DevOps team offloaded chat triage, CI/CD diagnostics, and attachment parsing to an AI agent — and what’s still rough about it.
f you, like me, run infrastructure for a small team, you’ve probably been in this spot: the engineering org keeps growing while the DevOps headcount stays the same. With the rise of vibe-coding, that imbalance became especially obvious — our dev team at the studio grew roughly 1.5× in a couple of months, because every product manager wanted their own mini-application. On top of that, we got an extra headache from increasingly frequent availability issues from certain regions.
As a result, the flow of messages in our Slack support channel grew to the point where a significant chunk of an engineer’s day was spent triaging them. And the most frustrating part: not every request actually fell under DevOps responsibility, but each one still required at least a shallow diagnostic to figure that out.
That’s how the idea to offload the first line to an AI agent came up. By that point, we’d already automated incident analysis triggered by alerts and the approach had proven itself. Extending the same pattern to manual chat requests was the logical next step.
In this article — the first of two — I’ll walk through how we:
- classified the past year’s request flow and picked categories worth automating;
- built a classifier in n8n with Slack integration over MCP;
- implemented a CI/CD incident assistant as the first production-ready branch;
- added attachment parsing (error screenshots, log files);
- set up proper error reporting for the workflows themselves.
Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant for routine questions, and the handler for infrastructure modification tasks.
All workflows and system prompts are published separately — link at the end of the article.
Preparation: classifying requests
Before automating anything, you need to understand what. I exported the request history from our Slack channels for the past year and bucketed it into categories. The result:
- Infrastructure modification — changes to existing infrastructure, adding standard resources.
- New installation — deploying new systems and integrations that didn't exist before.
- Incident — something stopped working in the current infrastructure.
- CI/CD — failing builds, broken tests, broken deploys.
- Question — general questions about our infrastructure.
- Announcement — informational messages: planned maintenance, for example.
- Other — anything that didn't fit above.
The bulk of requests fell into categories 3–5 — those were the obvious starting point. Categories 1–2 require approvals and almost always need an engineer in the loop, so there's no point automating them. Announcements don't need agent handling at all — it's enough to recognize them correctly and not page the on-call.
But before building any handling branches, we needed to reliably identify which category a new request belongs to.
The classifier

At first glance, the setup looked straightforward: create a Slack app with a bot user, subscribe to app_mention events, point them at an n8n webhook, run the payload through an LLM, get the category.

The first nuance surfaced quickly: an incoming message can either start a new thread or be a reply inside an existing one. In the second case, without thread context the classifier will misfire — a one-liner like "same problem on my end" makes no sense in isolation.
Instead of calling the Slack API directly from n8n, I offloaded this to the agent — we already have slack-mcp, which can read messages from channels and threads. The agent itself decides whether to pull the thread history and does so when the context calls for it. The system prompt needs to describe how to do this, plus a few other things:
- category descriptions with examples;
- the expected output format:
{
"category": "<category_key>",
"confidence": <0.0-1.0>,
"summary": "<one-sentence summary in the same language as the user message>",
"acknowledge": "<a short response that you accepted the request and started working on it>",
"is_thread": <true|false>,
"parent_thread_ts": "<thread_ts to use when replying — ALWAYS set>"
}
In the user prompt, I additionally pass channel_id, channel_name, message_ts, and user_name — this helps the classifier orient itself in the message.
For the model I use Sonnet or GPT-5 Codex — on classification, both show comparable quality.
At this stage I don't yet touch attachments — screenshots and log files come into play further down, inside the logic of specific branches.
Once the agent's response is received and its fields are validated, we need to determine the on-call engineer — they may be needed if automated handling can't close the request. On-call rotations live in Google Calendar, so I had to configure OAuth2 access to it following the n8n docs.
After the category and on-call are determined, the corresponding sub-workflow kicks off. In parallel, an acknowledge message goes to Slack so the author can see the request was received and is being worked on. That's an important detail — without it, the person keeps typing "hey, is anyone looking at this?" into the thread, which defeats the whole point.

CI/CD assistant
CI/CD is one of the most common categories, so that's where I started. A solid share of these issues can be resolved without an engineer: builds that fell over because of a temporarily unreachable repository, flaky tests, misconfigured pipelines, expired tokens.

The sub-workflow expects an input structure with the request data:
{
"message": "Chat request text",
"message_ts": "Slack message timestamp",
"channel_id": "Slack channel ID",
"channel_name": "Slack channel name",
"user_name": "Sender's display name",
"user_id": "Sender's Slack user ID",
"file_ids": ["List of attachment IDs"],
"category": "One of the categories",
"confidence": "Confidence score",
"summary": "Short request description",
"is_thread": "Whether the message came from inside a thread",
"thread_ts": "Parent message timestamp, if this is a thread reply",
"on_call_user": "On-call engineer's name"
}
Parsing attachments
Most CI requests arrive in the format "build failed" + an error screenshot. That description clearly isn't enough to identify a specific build, so before the main agent runs, a helper sub-workflow — attachmentsAnalyzer — kicks off first.
It processes attached files:
-
Images (error screenshots) — sent to
gpt-4o-minito extract text and describe context. -
Text files (logs) — if the size doesn't exceed the limit set in the
Confignode, the content is passed along as extra context.
The output is a compact text block:
{
"attachments_context": "...human-readable block...",
"attachments_count": 1
}
I deliberately split this out into its own workflow — it's reused in other handling branches. If attachmentsAnalyzer fails, the main workflow keeps going without the extra context instead of falling over entirely.
Gathering context and calling the agent
Before forming the LLM request, the SetVars node assembles everything needed:
- the raw request data;
- the output of
attachmentsAnalyzer; - auxiliary context for the system prompt: GitHub organization names, Kubernetes contexts and namespaces, Grafana data source names.
The agent itself works with a set of MCP tools:
- GitHub MCP — access to Actions build logs, PRs, source code.
- Slack MCP — reading messages in a thread when the initial request doesn't carry enough context.
- Grafana / Kubernetes MCP — looking up cluster logs and events on deploy-related issues.
The system prompt should cover:
- which teams exist and which repository groups belong to them — speeds up identifying the right repo;
- how to pull additional context from a Slack thread;
- the DevOps team's scope of responsibility — if an error falls within it, the assistant additionally tags the on-call at the end of the investigation;
- a general description of the available tools and when to use each;
- a few worked examples;
- the output format.
The user prompt is templated like this:
Investigate the issue from {{ $json.user_name }} in channel {{ $json.channel_name }}{{ $json.is_thread ? ' (message is in a thread, thread_ts=' + $json.thread_ts + ' — first read the history via Slack conversations.replies)' : '' }}
{{ $json.message }}{{ $json.attachments_context && $json.attachments_context.trim().length > 0 ? '\n\nAdditional information from attachments:\n' + $json.attachments_context : '' }}
Alongside the message itself, this passes who sent it and which channel it came from, whether it's a thread reply, and any attachments if present.
The HTTP-probe workaround
One of the common reasons builds fail is an unreachable external resource: a dependency repository, a proxy, a registry. The natural move would be to give the agent a built-in HTTP Request tool and let it check availability. In practice, that didn't work — the built-in n8n node doesn't handle timeouts and network errors gracefully, and on a failure it brings down the whole agent chain.
So I wrapped the check in a separate sub-workflow httpProbeTool that always returns a structured result: success, failure with a reason, or timeout. The agent uses it like any other tool.

Once the agent responds, there's a short format validation step, and the message gets posted into the Slack thread.
Handling workflow errors

When you build a system that handles real user requests, reliability is critical. If a workflow falls over for any reason — LLM quota exhausted, MCP server unreachable, invalid JSON — nobody in chat will know, and the request just hangs there.
This is especially relevant in the first weeks after launch, when you're constantly tweaking things.
The solution is simple: a dedicated workflow specified in the main workflows' settings as the Error Workflow.
- Pulls information about the failed execution from the n8n API (you'll need to generate an API key for this).
- Through the
Extract Thread Contextnode, determineschannel_idandthread_tsof the original message. - Posts a short error message directly into the request's thread, so the author isn't left in the dark. A more verbose error report also goes into the DevOps team's internal channel — this lets us react quickly to regressions.
What we got
After a couple of months running in production, the picture looks like this:
- Average response time — up to 3 minutes from the moment a message appears in the channel.
- ~25% of requests are fully closed without an engineer.
- ~40% of requests are resolved faster than usual — the agent does a preliminary diagnostic, and the on-call gets ready-made context.
- Cost at our volume (several dozen requests per week) — up to $250/month on LLM usage. Example creation issue by request in Slack

Example CI request resolved without an engineer

Example investigating a build crash based on a screenshot

Compared to an engineer's hourly rate, this looks like a very cost-effective team addition — especially given the agent works around the clock and doesn't pull the on-call away from their main work.
What's still rough
To avoid leaving the impression that everything is smooth, here are the rough edges we still live with:
- The agent sometimes gets lost in long threads with dozens of messages — we have to limit context depth in the system prompt explicitly.
- The
Incidentcategory has the lowest autonomous resolution rate so far — too many non-standard situations. We're working on expanding the MCP toolset. - It's hard to objectively evaluate the quality of answers to "infrastructure questions" — we need a feedback mechanism from engineers (planning Slack reactions as the simplest signal).
Part two will cover the remaining branches: the infrastructure incident assistant, the knowledge assistant with RAG search over our docs, and the handler for modification tasks with automatic ticket creation.
How do you offload first-line support on your team? Are you using off-the-shelf products (like PagerDuty AIOps) or building your own? Which request categories automate best in your environment — share in the comments, I'm curious to compare the distribution.
Repository with workflows and system prompts: https://github.com/javdet/automagicops-workflows




Top comments (0)