DEV Community

Cover image for How We Built Reef: A Production Incident Agent with Coral, Sentry Webhooks, and Slack
Ernest Essien
Ernest Essien

Posted on

How We Built Reef: A Production Incident Agent with Coral, Sentry Webhooks, and Slack

When production breaks, the hardest part is not finding data — it is connecting it.

You open GitHub for recent PRs. Sentry for the error. Slack for what on-call said. Vercel for the deploy that just went out. Then you stare at four tabs and try to line up timestamps in your head.

We built Reef for the Pirates of the Coral-bean hackathon to automate that workflow: one investigation, one report, with optional remediation based on severity.

This post is our Captain's Log — how we built it, how we used Coral, and how you can wire the same pattern yourself.


What Reef does

Reef is a production incident intelligence agent. You trigger it from:

  • The dashboard (natural language or a Vercel deploy link)
  • A Sentry webhook (new issue → auto-investigate)
  • Slack (slash command — coming soon in full production wiring)

Reef runs a stateful investigation loop: plan a query → run it through Coral → judge the evidence → repeat until confident → generate a report → post to Slack if triggered by webhook.

The output includes:

  • Root cause hypothesis
  • Timeline of iterations
  • Suspected PRs
  • Citations for every query (coral://query-run/1)
  • Severity score and remediation mode (autonomous_fix vs human_agent_paired)

Why we chose Coral

Before Coral, cross-tool incident triage usually means:

  1. Four API clients
  2. Normalizing different timestamp formats
  3. Joining in application code
  4. Stuffing large JSON blobs into an LLM

Coral flips that model. It exposes GitHub, Sentry, Slack, and Vercel as SQL tables. You write queries like:

SELECT g.title AS pr_title, g.number AS pr_number,
       s.title AS error_message, s.level AS error_level
FROM github.pulls g
JOIN sentry.issues s ON s.first_seen >= g.merged_at
WHERE g.owner = 'your-org'
  AND g.repo = 'your-repo'
  AND s.level IN ('fatal', 'error')
  AND g.state = 'closed'
ORDER BY s.first_seen DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

One query. Two sources. No warehouse. No ETL. Credentials stay on your machine — Coral resolves APIs at query time.

That temporal join — errors that appeared after a PR merged — is the core insight Reef automates.


Architecture at a glance

Trigger (Dashboard / Sentry webhook / Slack)
        ↓
Investigation Orchestrator (max 5 iterations)
        ↓
Planner (Gemini) ──→ Coral SQL ──→ Judge (Groq)
        ↓                    ↓
   Evidence Store      Query citations
        ↓
Escalation + Severity Gate
        ↓
Report → Slack (for webhooks)
Enter fullscreen mode Exit fullscreen mode

Backend: Python 3.11+, FastAPI, SQLAlchemy (SQLite dev / Postgres prod)

Frontend: React 19, TypeScript, Vite, Tailwind

Data layer: Coral CLI (coral sql) or mock mode for demos without Coral installed

AI: Gemini 2.5 Flash (planner) + Groq Llama 3.3 70B (judge). Falls back to template planner + rules judge if no API keys.


Step 1 — Wire Coral sources

Install Coral and register your production tools once:

brew install withcoral/tap/coral

# From your backend directory
cp .env.example .env
# Fill: GITHUB_TOKEN, GITHUB_OWNER, GITHUB_REPO,
#       SENTRY_ORG, SENTRY_TOKEN, SLACK_TOKEN, VERCEL_TOKEN

set -a && source .env && set +a
./scripts/setup_coral_sources.sh
Enter fullscreen mode Exit fullscreen mode

Our setup script adds github, sentry, slack, and vercel (community manifest), then runs smoke queries including the PR↔Sentry join.

Verify with:

coral sql "SELECT schema_name, table_name FROM coral.tables
  WHERE schema_name IN ('github','sentry','slack','vercel') LIMIT 20"
Enter fullscreen mode Exit fullscreen mode

Set CORAL_MODE=cli in .env when you are ready for real data. Use CORAL_MODE=mock for local demos without Coral installed — Reef returns a coherent checkout-failure story (PR #234 + fatal TypeError + Slack thread).


Step 2 — How Reef calls Coral from Python

Reef does not embed Coral as a library. It shells out:

# Simplified flow in app/clients/coral_runtime_client.py
subprocess.run(["coral", "sql", "--output", "json", sql], ...)
Enter fullscreen mode Exit fullscreen mode

The query executor enforces read-only SQL (SELECT, WITH, EXPLAIN only), normalizes rows, and stores each run in the database with a citation URI.

Typical investigation sequence:

Iteration Coral query purpose
0 coral.tables — discover connected schemas
1 github.pulls JOIN sentry.issues — correlate deploys and errors
2 slack.messages in #incidents — on-call context
3 vercel.deployments — deployment timeline
4 github.teams or github.collaborators — ownership for remediation

The planner (LLM or template) picks the next query. The judge scores confidence 0.0–1.0. If confidence stays below 0.6, the loop continues.


Step 3 — Sentry webhook → automatic investigations

This was our favorite demo path.

Flow:

  1. Sentry fires issue.created to Reef
  2. Reef responds 202 Accepted immediately (Sentry will not wait for a full investigation)
  3. Background worker normalizes the payload, resolves the org, runs the orchestrator
  4. Coral queries run across your stack
  5. Reef posts a summary to Slack #incidents

Configure in Sentry:

  • Settings → Developer Settings → New Internal Integration
  • Webhook URL: https://your-reef-host/api/v1/webhooks/sentry
  • Subscribe to issue events

Reef .env:

SLACK_BOT_TOKEN=xoxb-...
SLACK_INCIDENT_CHANNEL=incidents
WEBHOOK_ORGANIZATION_ID=your-reef-org-uuid   # optional but recommended
Enter fullscreen mode Exit fullscreen mode

Test locally:

curl -X POST http://127.0.0.1:8000/api/v1/webhooks/sentry \
  -H "Content-Type: application/json" \
  -d '{
    "action": "created",
    "organization": {"slug": "YOUR_SENTRY_ORG"},
    "data": {
      "issue": {
        "id": "123118378",
        "shortId": "PYTHON-FASTAPI-1",
        "title": "TypeError in checkout payment validation",
        "level": "fatal",
        "project": {"slug": "python-fastapi"}
      }
    }
  }'
Enter fullscreen mode Exit fullscreen mode

You should see 202 with "Investigation queued; report will post to Slack when complete." — then watch Slack for the finished report.


Step 4 — Severity gate and human-in-the-loop

Not every incident should auto-revert a PR.

Reef scores severity from:

  • Judge confidence
  • Blast radius (affected users from Sentry)
  • Fatal error penalty
  • Missing ownership penalty
Score Mode Behavior
≤ 0.7 autonomous_fix Agent can proceed with remediation workflow
> 0.7 human_agent_paired Slack approval required before risky actions

High-severity incidents always keep a human in the loop. Low-risk ones can resolve without paging anyone.


Step 5 — Run it yourself

Backend:

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
cp .env.example .env
uvicorn app.main:app --reload
Enter fullscreen mode Exit fullscreen mode

Frontend:

cd frontend
pnpm install && pnpm dev
Enter fullscreen mode Exit fullscreen mode

Trigger from dashboard:

curl -X POST http://127.0.0.1:8000/api/v1/triggers/dashboard \
  -H "Content-Type: application/json" \
  -d '{"query": "Why did checkout fail after the last deploy?"}'
Enter fullscreen mode Exit fullscreen mode

Simulate all scenarios:

./backend/scripts/simulate_triggers.sh all
Enter fullscreen mode Exit fullscreen mode

API docs: http://127.0.0.1:8000/docs


What we learned

Stateful loops beat one-shot prompts. Investigation is inherently iterative. Persisting every Coral query run with citations made the agent auditable — judges and humans can see why Reef concluded what it did.

Coral removed integration busywork. We spent time on orchestration, severity gating, and Slack notifications instead of four bespoke API normalizers.

SQL matches how SREs think. Deploys, PRs, errors, threads, and owners are one timeline. Expressing that as JOINs felt natural.

Async webhooks need async workers. Returning 202 immediately and investigating in the background kept Sentry happy and Slack informed when ready.


What's next

  • Full Slack /reef approve remediation flow
  • Richer LLM planner prompts from prior investigations
  • Per-org Coral config isolation at scale
  • Observability on Coral query latency and failures

Try Reef

If you are building an agent that needs data from more than one SaaS tool, start with Coral SQL before you write your fifth API wrapper.

Questions? Drop them in the comments—I'm happy to share webhook payloads and Coral source setup tips.


Built for the WeMakeDevs Coral Hackathon, May 2026. 🏴‍☠️

Top comments (0)