Ernest Essien

Posted on May 31

How We Built Reef: A Production Incident Agent with Coral, Sentry Webhooks, and Slack

#coral #sentry #slack #hackathon

When production breaks, the hardest part is not finding data — it is connecting it.

You open GitHub for recent PRs. Sentry for the error. Slack for what on-call said. Vercel for the deploy that just went out. Then you stare at four tabs and try to line up timestamps in your head.

We built Reef for the Pirates of the Coral-bean hackathon to automate that workflow: one investigation, one report, with optional remediation based on severity.

This post is our Captain's Log — how we built it, how we used Coral, and how you can wire the same pattern yourself.

What Reef does

Reef is a production incident intelligence agent. You trigger it from:

The dashboard (natural language or a Vercel deploy link)
A Sentry webhook (new issue → auto-investigate)
Slack (slash command — coming soon in full production wiring)

Reef runs a stateful investigation loop: plan a query → run it through Coral → judge the evidence → repeat until confident → generate a report → post to Slack if triggered by webhook.

The output includes:

Root cause hypothesis
Timeline of iterations
Suspected PRs
Citations for every query (coral://query-run/1)
Severity score and remediation mode (autonomous_fix vs human_agent_paired)

Why we chose Coral

Before Coral, cross-tool incident triage usually means:

Four API clients
Normalizing different timestamp formats
Joining in application code
Stuffing large JSON blobs into an LLM

Coral flips that model. It exposes GitHub, Sentry, Slack, and Vercel as SQL tables. You write queries like:

SELECT g.title AS pr_title, g.number AS pr_number,
       s.title AS error_message, s.level AS error_level
FROM github.pulls g
JOIN sentry.issues s ON s.first_seen >= g.merged_at
WHERE g.owner = 'your-org'
  AND g.repo = 'your-repo'
  AND s.level IN ('fatal', 'error')
  AND g.state = 'closed'
ORDER BY s.first_seen DESC
LIMIT 20;

One query. Two sources. No warehouse. No ETL. Credentials stay on your machine — Coral resolves APIs at query time.

That temporal join — errors that appeared after a PR merged — is the core insight Reef automates.

Architecture at a glance

Trigger (Dashboard / Sentry webhook / Slack)
        ↓
Investigation Orchestrator (max 5 iterations)
        ↓
Planner (Gemini) ──→ Coral SQL ──→ Judge (Groq)
        ↓                    ↓
   Evidence Store      Query citations
        ↓
Escalation + Severity Gate
        ↓
Report → Slack (for webhooks)

Backend: Python 3.11+, FastAPI, SQLAlchemy (SQLite dev / Postgres prod)

Frontend: React 19, TypeScript, Vite, Tailwind

Data layer: Coral CLI (coral sql) or mock mode for demos without Coral installed

AI: Gemini 2.5 Flash (planner) + Groq Llama 3.3 70B (judge). Falls back to template planner + rules judge if no API keys.

Step 1 — Wire Coral sources

Install Coral and register your production tools once:

brew install withcoral/tap/coral

# From your backend directory
cp .env.example .env
# Fill: GITHUB_TOKEN, GITHUB_OWNER, GITHUB_REPO,
#       SENTRY_ORG, SENTRY_TOKEN, SLACK_TOKEN, VERCEL_TOKEN

set -a && source .env && set +a
./scripts/setup_coral_sources.sh

Our setup script adds github, sentry, slack, and vercel (community manifest), then runs smoke queries including the PR↔Sentry join.

Verify with:

coral sql "SELECT schema_name, table_name FROM coral.tables
  WHERE schema_name IN ('github','sentry','slack','vercel') LIMIT 20"

Set CORAL_MODE=cli in .env when you are ready for real data. Use CORAL_MODE=mock for local demos without Coral installed — Reef returns a coherent checkout-failure story (PR #234 + fatal TypeError + Slack thread).

Step 2 — How Reef calls Coral from Python

Reef does not embed Coral as a library. It shells out:

# Simplified flow in app/clients/coral_runtime_client.py
subprocess.run(["coral", "sql", "--output", "json", sql], ...)

The query executor enforces read-only SQL (SELECT, WITH, EXPLAIN only), normalizes rows, and stores each run in the database with a citation URI.

Typical investigation sequence:

Iteration	Coral query purpose
0	`coral.tables` — discover connected schemas
1	`github.pulls` JOIN `sentry.issues` — correlate deploys and errors
2	`slack.messages` in `#incidents` — on-call context
3	`vercel.deployments` — deployment timeline
4	`github.teams` or `github.collaborators` — ownership for remediation

The planner (LLM or template) picks the next query. The judge scores confidence 0.0–1.0. If confidence stays below 0.6, the loop continues.

Step 3 — Sentry webhook → automatic investigations

This was our favorite demo path.

Flow:

Sentry fires issue.created to Reef
Reef responds 202 Accepted immediately (Sentry will not wait for a full investigation)
Background worker normalizes the payload, resolves the org, runs the orchestrator
Coral queries run across your stack
Reef posts a summary to Slack #incidents

Configure in Sentry:

Settings → Developer Settings → New Internal Integration
Webhook URL: https://your-reef-host/api/v1/webhooks/sentry
Subscribe to issue events

Reef .env:

SLACK_BOT_TOKEN=xoxb-...
SLACK_INCIDENT_CHANNEL=incidents
WEBHOOK_ORGANIZATION_ID=your-reef-org-uuid   # optional but recommended

Test locally:

curl -X POST http://127.0.0.1:8000/api/v1/webhooks/sentry \
  -H "Content-Type: application/json" \
  -d '{
    "action": "created",
    "organization": {"slug": "YOUR_SENTRY_ORG"},
    "data": {
      "issue": {
        "id": "123118378",
        "shortId": "PYTHON-FASTAPI-1",
        "title": "TypeError in checkout payment validation",
        "level": "fatal",
        "project": {"slug": "python-fastapi"}
      }
    }
  }'

You should see 202 with "Investigation queued; report will post to Slack when complete." — then watch Slack for the finished report.

Step 4 — Severity gate and human-in-the-loop

Not every incident should auto-revert a PR.

Reef scores severity from:

Judge confidence
Blast radius (affected users from Sentry)
Fatal error penalty
Missing ownership penalty

Score	Mode	Behavior
≤ 0.7	`autonomous_fix`	Agent can proceed with remediation workflow
> 0.7	`human_agent_paired`	Slack approval required before risky actions

High-severity incidents always keep a human in the loop. Low-risk ones can resolve without paging anyone.

Step 5 — Run it yourself

Backend:

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
cp .env.example .env
uvicorn app.main:app --reload

Frontend:

cd frontend
pnpm install && pnpm dev

Trigger from dashboard:

curl -X POST http://127.0.0.1:8000/api/v1/triggers/dashboard \
  -H "Content-Type: application/json" \
  -d '{"query": "Why did checkout fail after the last deploy?"}'

Simulate all scenarios:

./backend/scripts/simulate_triggers.sh all

API docs: http://127.0.0.1:8000/docs

What we learned

Stateful loops beat one-shot prompts. Investigation is inherently iterative. Persisting every Coral query run with citations made the agent auditable — judges and humans can see why Reef concluded what it did.

Coral removed integration busywork. We spent time on orchestration, severity gating, and Slack notifications instead of four bespoke API normalizers.

SQL matches how SREs think. Deploys, PRs, errors, threads, and owners are one timeline. Expressing that as JOINs felt natural.

Async webhooks need async workers. Returning 202 immediately and investigating in the background kept Sentry happy and Slack informed when ready.

What's next

Full Slack /reef approve remediation flow
Richer LLM planner prompts from prior investigations
Per-org Coral config isolation at scale
Observability on Coral query latency and failures

Try Reef

Live URL: https://usereef.grandkojo.my
GitHub: https://github.com/Grandkojo/coral_hackers
Coral docs: withcoral.com/docs

If you are building an agent that needs data from more than one SaaS tool, start with Coral SQL before you write your fifth API wrapper.

Questions? Drop them in the comments—I'm happy to share webhook payloads and Coral source setup tips.

Built for the WeMakeDevs Coral Hackathon, May 2026. 🏴‍☠️

DEV Community