If you've ever been paged at 2am, opened Slack, typed 'what broke?' and then spent 20 minutes switching between terminals, dashboards, and GitHub tabs to figure out the answer, this tutorial is for you.
We're going to build a DevOps agent that lives in your Slack channel. When an engineer asks 'what broke in prod?', the agent:
- Pulls recent access logs from your Vercel deployment
- Identifies the error pattern (500s, failed routes, console exceptions)
- Locates the relevant file in your GitHub repo
- Creates a feature branch, commits a fix
- Opens a PR with a clear description
- Posts the PR link back in Slack, in the same thread
All of this happens in a single conversation turn. No context switching. No separate tools.
What You're Building
This is a Cosmic Team Agent with four capabilities enabled:
- CMS Read so it can reference your content model if needed
- Code Read to browse your repo, read files, check deployments, and pull access logs
- Code Write to create branches, commit files, and open PRs
- Send Notifications to post back to Slack with structured results
The agent connects to your Slack workspace and a GitHub repo via Cosmic's native integrations. No custom webhooks. No third-party glue.
Prerequisites
Before you start:
- A Cosmic account (free tier works: cosmicjs.com)
- A GitHub repo connected to a Vercel project
- Slack workspace with the Cosmic Slack integration installed (Bucket Settings > Integrations)
Step 1: Create the Team Agent
Go to your Cosmic project and click Team Agents in the sidebar. Click Create Team Agent and fill in:
Name: DevOps Agent (or give it a human name, 'Morgan', 'Sam', etc.)
Persona prompt:
You are a DevOps agent for [your company]. You have access to production logs,
the GitHub repository, and Slack. When asked about production errors:
1. Call get_access_logs to check for recent 500s, 4xxs, or console errors
2. Identify the most likely root cause based on the error pattern
3. Use list_repository_files and read_file to locate the relevant code
4. Create a fix branch, commit a targeted change, and open a PR
5. Summarize what you found and what you changed in a clear Slack message
Be concise. Engineers are busy. Lead with the finding, then the fix.
If you cannot determine the root cause from logs alone, say so clearly
and describe what additional context you need.
Capabilities to enable:
- Code Read (gives access to
get_access_logs,get_deployments,read_file,list_repository_files) - Code Write (gives access to
create_branch,commit_files,create_pull_request) - Send Notifications (for posting structured Slack messages)
Memory: Set to Persistent so the agent remembers recent incidents across sessions.
Step 2: Connect Your GitHub Repo
In the Team Agent edit form, look for the Repository section. Select your connected GitHub repo from the dropdown. If you haven't connected it yet, go to Project Settings > Integrations and add your GitHub repo first.
Once connected, the agent has access to all Code Read and Code Write tools scoped to that repository.
Step 3: Connect Slack
In the Channels section of the agent form, enable Slack and select the channel you want the agent to live in. A #devops, #engineering, or #incidents channel works well.
Optionally enable Only respond when @mentioned if you want the agent to stay quiet unless explicitly tagged. For a dedicated incidents channel, leave this off so the agent can respond to any message.
Save the agent.
Step 4: Test It
Go to your Slack channel and send a message:
what broke in prod in the last hour?
The agent will:
- Call
get_access_logswithtime_range: "1h"andstatus_filter: "errors" - Parse the response for error patterns
- Reply with a structured summary in Slack
Here's an example of what a response looks like:
Found 14 errors in the last hour:
• POST /api/contact — 11x 500 (11:42–12:03 PT)
Console: TypeError: Cannot read properties of undefined (reading 'email')
File: src/app/api/contact/route.ts, line 34
• GET /blog/[slug] — 3x 404
No fix needed — these are likely stale links or bots probing old URLs.
Working on a fix for the contact route now...
Step 5: From Diagnosis to PR
After diagnosing the error, ask the agent to fix it:
go ahead and open a PR for the contact route fix
The agent will:
- Call
read_fileonsrc/app/api/contact/route.ts - Identify the null-check that's missing
- Call
create_branchto createfix/contact-route-null-check - Call
commit_fileswith the corrected version of the file - Call
create_pull_requestwith a clear title and description - Post the PR URL back in Slack:
PR opened: fix(contact): add null check for email field before send
https://github.com/your-org/your-repo/pull/94
Changed: src/app/api/contact/route.ts
Added optional chaining on req.body.email before passing to the mailer.
Ready for review.
The entire flow, log pull, diagnosis, branch, commit, PR, Slack post, happens in under two minutes without the engineer leaving Slack.
Step 6: Wire Up the Cosmic SDK (Optional)
If your app pulls content from Cosmic (blog posts, product pages, documentation), you can give the agent CMS Read capability so it can cross-reference content changes with production errors. For example:
import { createBucketClient } from '@cosmicjs/sdk';
const cosmic = createBucketClient({
bucketSlug: process.env.COSMIC_BUCKET_SLUG as string,
readKey: process.env.COSMIC_READ_KEY as string,
});
// Fetch recent published posts — the agent can check if a new publish caused an error spike
const { objects } = await cosmic.objects
.find({ type: 'blog-posts' })
.props('title,slug,published_date,metadata')
.sort('-created_at')
.limit(5);
With CMS Read enabled, you can ask the agent:
did the error spike correlate with any recent content publishes?
And it will cross-reference the access log timestamps with recently published objects to look for causation.
Extending the Agent
Once the base agent is working, here are three natural extensions:
Heartbeat schedule: Enable a daily heartbeat at 9am that posts a summary of the previous 24 hours of errors to Slack automatically. No one has to ask.
Event trigger on deploy: Connect an inbound webhook to the agent's webhook channel endpoint. Configure Vercel to POST to that webhook after each deploy. The agent will automatically check post-deploy logs and flag any new errors introduced by the deployment.
Escalation logic: Add to the agent prompt: 'If you find more than 20 errors in an hour, send an email to the project owner immediately.' The agent has send_email available and will follow this instruction.
What This Demonstrates About Cosmic Agents
This tutorial shows a pattern that goes beyond the DevOps use case: a Cosmic Team Agent as an operational interface to your stack.
The agent isn't a chatbot with pre-scripted answers. It reads live data, makes decisions based on what it finds, writes code, and takes action in external systems, all while staying inside the communication channel your team already uses.
The same pattern applies to customer support (reads your docs bucket, escalates in Slack), content operations (monitors your CMS, fires workflows on publish), and competitive intelligence (runs on a schedule, posts findings without being asked).
Agents that act are fundamentally different from agents that answer. This is the distinction that matters.
What to Build Next
Now that your DevOps agent is running, here are the next tutorials in this series:
- Customer Support Agent (WhatsApp/Telegram): A Team Agent connected to WhatsApp with CMS Read on your docs bucket. Tier-1 support automation with human escalation.
- Competitive Intelligence Agent: A Content Agent on a Heartbeat schedule that crawls competitor blogs and posts a Slack summary every Monday morning.
- Localization Pipeline Agent: An event-triggered Content Agent that auto-translates new blog posts into Spanish, French, and German on publish.
Get Started
Ready to build your own DevOps agent?
Sign up free, no credit card required
Already have an account? Go to Team Agents
Want a walkthrough for your specific stack? Book 15 minutes with Tony
Or read the full agent documentation: cosmicjs.com/docs/dashboard/ai/agents
Top comments (0)