DEV Community

Cover image for Building AI Workflows for DevOps Teams
Swrly
Swrly

Posted on • Originally published at swrly.com

Building AI Workflows for DevOps Teams

DevOps teams are some of the best candidates for multi-agent automation. The work is repetitive, high-stakes, and built on integrations between tools that already have APIs. When your PagerDuty fires at 2 AM, the response is always the same sequence: check the alert, pull the recent deploys, look at the error logs, assess severity, notify the right people. Every step is manual, every step costs time, and every step is something an agent can do.

This guide walks through four production-ready DevOps workflows you can build with Swrly. We will go deep on the first one — Incident Response — and then sketch the other three so you can adapt them to your stack.

Use Case 1: Incident Response

This is the workflow teams ask about most. The premise is straightforward: when an incident fires, an AI agent triages it before a human even opens their laptop.

What the Workflow Does

  1. A PagerDuty webhook fires when a new incident is created
  2. The workflow pulls incident details from PagerDuty and recent errors from Sentry in parallel
  3. An AI analyst agent correlates the signals: PagerDuty alert details, Sentry stack traces, and recent GitHub commits
  4. A second AI agent drafts a response runbook with mitigation steps
  5. A condition node checks severity
  6. Critical incidents go to #incidents-critical on Slack; everything else goes to #ops-log

By the time the on-call engineer opens Slack, there is already a root cause hypothesis, a list of affected components, and step-by-step mitigation instructions waiting for them.

Walking Through the Build

Start by creating a new swirl and dragging a Trigger node onto the canvas. Set the trigger type to Webhook. Copy the generated URL and configure it as a PagerDuty webhook — under your PagerDuty service, add a webhook subscription for the incident.triggered event type.

Next, add a PagerDuty integration node connected to the trigger. Configure it with the pagerduty_get_incident action and pass {{trigger.payload.incident_id}} as the incident ID parameter. This fetches the full incident details including title, description, service, urgency, and assigned escalation policy.

Now add two integration nodes in parallel — this is where Swrly's directed graph model shines. Connect the PagerDuty node to both a Sentry node (using sentry_list_issues with the query is:unresolved) and a GitHub node (using github_list_commits to pull recent commits from your main repository). These run simultaneously, cutting the data-gathering time in half.

Connect both integration nodes to an Agent node called "Incident Analyst." This is the core of the workflow. Here is the system prompt:

You are a senior SRE performing incident triage.

PagerDuty incident details:
{{get_incident.output}}

Recent Sentry errors:
{{list_issues.output}}

Recent GitHub commits:
{{list_commits.output}}

Correlate these signals to identify the most likely root cause.
Output a structured analysis with:
1. Root cause hypothesis (with confidence level)
2. Affected components
3. Timeline of events
4. Severity estimate (1 = critical, 2 = major, 3 = minor)

Output the severity as a number in a field called `severity`.
Enter fullscreen mode Exit fullscreen mode

The agent receives data from three sources — PagerDuty, Sentry, and GitHub — and produces a structured analysis. Set maxTurns to 15 and enable accumulateContext so the agent retains all upstream data.

Add a second agent, "Incident Responder," connected to the analyst. Its prompt takes the analyst's output and produces a step-by-step runbook:

Based on the analysis, produce:
1. A step-by-step runbook to resolve the issue
2. Immediate mitigation actions (first 15 minutes)
3. Communication template for stakeholders
4. Estimated time to recovery
Enter fullscreen mode Exit fullscreen mode

Finally, add a Condition node after the responder. Set the field to output, the operator to contains, and the value to severity: 1. Connect the true branch to a Slack node posting to #incidents-critical and the false branch to a Slack node posting to #ops-log.

Why This Works

The key insight is that incident triage is pattern matching. Given an alert, some errors, and some recent changes, an experienced SRE correlates the signals and forms a hypothesis. An AI agent does the same thing, but in 30 seconds instead of 10 minutes. The human still makes the final call — the agent just gives them a running start.

The condition node adds intelligent routing. Critical incidents get immediate visibility in the high-urgency channel. Lower-severity issues are logged but do not wake anyone up unnecessarily.

Integration Highlights

This workflow uses four integrations:

  • PagerDuty (pagerduty_get_incident) — Fetches full incident details including metadata, assignments, and escalation policy
  • Sentry (sentry_list_issues) — Pulls recent unresolved errors with stack traces and frequency data
  • GitHub (github_list_commits) — Lists recent commits with messages, authors, and timestamps for change correlation
  • Slack (slack_send_message) — Delivers the triage report to the right channel based on severity

All four are available in Swrly's integration library. Connect them once in Settings, and every workflow in your workspace can use them.

Use Case 2: Deployment Monitoring

Trigger: Cron schedule, every 30 minutes

After every deployment, you want to know if things are healthy. This workflow runs on a timer and checks for signs of trouble.

The flow:

  1. Cron trigger fires every 30 minutes
  2. Sentry integration node fetches errors from the last 30 minutes (firstSeen:-30m)
  3. Agent node analyzes error volume, new error types, and spike patterns
  4. Condition node checks if the agent flagged any anomalies
  5. If yes: Slack message to #deploys with the analysis and a recommendation to investigate or rollback
  6. If no: silent — no noise when things are healthy

Why it is useful: Most deployment monitoring is threshold-based. "Alert if error rate exceeds 5%." But threshold alerts miss slow-burn regressions and novel error types. An AI agent can identify patterns that static rules miss: "Three new TypeError exceptions appeared in the auth module 20 minutes after the last deploy. These were not present in the previous 24 hours."

Integration highlights: Sentry for error data, Slack for notifications. Add a Datadog node if you want to correlate with infrastructure metrics. Add a GitHub node with github_list_deployments to tie errors to specific deploys.

Use Case 3: Security Audit

Trigger: Daily cron, 6 AM

Security reviews should not be a quarterly event. This workflow runs every morning and checks for security-relevant changes.

The flow:

  1. Daily cron trigger at 6 AM
  2. Two parallel integration nodes: GitHub (github_list_commits for the last 24 hours) and Sentry (sentry_list_issues for new unresolved errors)
  3. Security Analyst agent reviews commits for security-sensitive changes: auth logic modifications, dependency updates, config file changes, new API endpoints, cryptographic code
  4. Security Reporter agent formats findings into a structured report with severity ratings
  5. Condition node checks if any critical findings exist
  6. Critical findings: Slack to #security and PagerDuty incident creation
  7. Non-critical findings: Slack to #security-log for the daily digest

Why it is useful: The security audit template is one of the most popular in Swrly's template library. It catches the kinds of changes that slip through code review: a dependency bump that introduces a known CVE, an environment variable accidentally logged in a new error handler, an auth middleware accidentally removed from a route. The agent checks every commit, every day, without getting tired or distracted.

Tips for production:

  • Set the GitHub integration to scan all repositories in your organization, not just one
  • Add Snyk or Dependabot integration nodes for automated vulnerability database checks
  • Adjust the analyst's max turns based on your daily commit volume — 20 turns handles most teams

Use Case 4: Standup Reports

Trigger: Daily cron, 8 AM (or Monday/Wednesday/Friday for async standups)

The flow:

  1. Cron trigger fires before standup
  2. Three parallel integration nodes: GitHub (commits and PRs from the last 24 hours), Linear or Jira (recently updated issues), and Slack (messages from #engineering, optional)
  3. Report Writer agent synthesizes everything into a standup-format summary: what shipped, what is in progress, what is blocked
  4. Slack message to #standup with the report

Why it is useful: Standup reports are a tax on engineering time. Every developer spends 5-10 minutes remembering what they did yesterday and writing it up. Multiply by team size and frequency, and it adds up. An AI agent that reads the actual work artifacts — commits, PRs, issue updates — produces a more accurate summary than memory-based self-reporting, and it takes zero developer time.

Variations:

  • Add a condition node to flag blockers and route them to #engineering-leads
  • Run it weekly instead of daily for sprint summaries
  • Add a Notion integration to auto-update a team wiki page

How Triggers Work

All four workflows above use one of two trigger types.

Webhook triggers fire when an external service sends an HTTP POST to the workflow's unique URL. The payload is available to all downstream nodes via {{trigger.payload.*}} template variables. PagerDuty, GitHub, Sentry, Stripe, and most modern SaaS tools support webhook configuration. The webhook URL is generated automatically when you add a trigger node — no server configuration required.

Cron triggers fire on a schedule using standard cron syntax. 0 6 * * * runs daily at 6 AM UTC. */30 * * * * runs every 30 minutes. 0 8 * * 1-5 runs at 8 AM on weekdays only. Cron triggers do not carry a payload, so downstream nodes rely on integration calls to fetch current data rather than event-driven input.

Choose webhooks for event-driven workflows where something needs to happen immediately after an external event. Choose cron for polling-based workflows that check on things periodically.

Getting Started

The fastest way to start is to clone a template. Go to the template gallery in Swrly and search for "Incident Response," "Security Audit," or "Standup Report." Click "Use Template" to clone it into your workspace. Then customize the integration connections, adjust the agent prompts for your stack, and run a test.

All templates are free and available on every plan. You can modify them however you want — add nodes, change prompts, swap integrations, add condition branches. They are starting points, not locked configurations.

Clone the Incident Response Template

Sign up at swrly.com, navigate to Templates, and search for "Incident Response." One click clones it into your workspace. Connect your PagerDuty, Sentry, GitHub, and Slack integrations, and you have a production-ready incident triage pipeline in under 5 minutes.

If your team is already drowning in alerts and spending too much time on manual triage, this is the highest-ROI workflow to automate first. Let the agents do the correlation. Let the humans make the decisions.

Top comments (0)