When production breaks, the hardest part is not finding data — it is connecting it.
You open GitHub for recent PRs. Sentry for the error. Slack for what on-call said. Vercel for the deploy that just went out. Then you stare at four tabs and try to line up timestamps in your head.
We built Reef for the Pirates of the Coral-bean hackathon to automate that workflow: one investigation, one report, with optional remediation based on severity.
This post is our Captain's Log — how we built it, how we used Coral, and how you can wire the same pattern yourself.
What Reef does
Reef is a production incident intelligence agent. You trigger it from:
- The dashboard (natural language or a Vercel deploy link)
- A Sentry webhook (new issue → auto-investigate)
- Slack (slash command — coming soon in full production wiring)
Reef runs a stateful investigation loop: plan a query → run it through Coral → judge the evidence → repeat until confident → generate a report → post to Slack if triggered by webhook.
The output includes:
- Root cause hypothesis
- Timeline of iterations
- Suspected PRs
- Citations for every query (
coral://query-run/1) - Severity score and remediation mode (
autonomous_fixvshuman_agent_paired)
Why we chose Coral
Before Coral, cross-tool incident triage usually means:
- Four API clients
- Normalizing different timestamp formats
- Joining in application code
- Stuffing large JSON blobs into an LLM
Coral flips that model. It exposes GitHub, Sentry, Slack, and Vercel as SQL tables. You write queries like:
SELECT g.title AS pr_title, g.number AS pr_number,
s.title AS error_message, s.level AS error_level
FROM github.pulls g
JOIN sentry.issues s ON s.first_seen >= g.merged_at
WHERE g.owner = 'your-org'
AND g.repo = 'your-repo'
AND s.level IN ('fatal', 'error')
AND g.state = 'closed'
ORDER BY s.first_seen DESC
LIMIT 20;
One query. Two sources. No warehouse. No ETL. Credentials stay on your machine — Coral resolves APIs at query time.
That temporal join — errors that appeared after a PR merged — is the core insight Reef automates.
Architecture at a glance
Trigger (Dashboard / Sentry webhook / Slack)
↓
Investigation Orchestrator (max 5 iterations)
↓
Planner (Gemini) ──→ Coral SQL ──→ Judge (Groq)
↓ ↓
Evidence Store Query citations
↓
Escalation + Severity Gate
↓
Report → Slack (for webhooks)
Backend: Python 3.11+, FastAPI, SQLAlchemy (SQLite dev / Postgres prod)
Frontend: React 19, TypeScript, Vite, Tailwind
Data layer: Coral CLI (coral sql) or mock mode for demos without Coral installed
AI: Gemini 2.5 Flash (planner) + Groq Llama 3.3 70B (judge). Falls back to template planner + rules judge if no API keys.
Step 1 — Wire Coral sources
Install Coral and register your production tools once:
brew install withcoral/tap/coral
# From your backend directory
cp .env.example .env
# Fill: GITHUB_TOKEN, GITHUB_OWNER, GITHUB_REPO,
# SENTRY_ORG, SENTRY_TOKEN, SLACK_TOKEN, VERCEL_TOKEN
set -a && source .env && set +a
./scripts/setup_coral_sources.sh
Our setup script adds github, sentry, slack, and vercel (community manifest), then runs smoke queries including the PR↔Sentry join.
Verify with:
coral sql "SELECT schema_name, table_name FROM coral.tables
WHERE schema_name IN ('github','sentry','slack','vercel') LIMIT 20"
Set CORAL_MODE=cli in .env when you are ready for real data. Use CORAL_MODE=mock for local demos without Coral installed — Reef returns a coherent checkout-failure story (PR #234 + fatal TypeError + Slack thread).
Step 2 — How Reef calls Coral from Python
Reef does not embed Coral as a library. It shells out:
# Simplified flow in app/clients/coral_runtime_client.py
subprocess.run(["coral", "sql", "--output", "json", sql], ...)
The query executor enforces read-only SQL (SELECT, WITH, EXPLAIN only), normalizes rows, and stores each run in the database with a citation URI.
Typical investigation sequence:
| Iteration | Coral query purpose |
|---|---|
| 0 |
coral.tables — discover connected schemas |
| 1 |
github.pulls JOIN sentry.issues — correlate deploys and errors |
| 2 |
slack.messages in #incidents — on-call context |
| 3 |
vercel.deployments — deployment timeline |
| 4 |
github.teams or github.collaborators — ownership for remediation |
The planner (LLM or template) picks the next query. The judge scores confidence 0.0–1.0. If confidence stays below 0.6, the loop continues.
Step 3 — Sentry webhook → automatic investigations
This was our favorite demo path.
Flow:
- Sentry fires
issue.createdto Reef - Reef responds 202 Accepted immediately (Sentry will not wait for a full investigation)
- Background worker normalizes the payload, resolves the org, runs the orchestrator
- Coral queries run across your stack
- Reef posts a summary to Slack
#incidents
Configure in Sentry:
- Settings → Developer Settings → New Internal Integration
- Webhook URL:
https://your-reef-host/api/v1/webhooks/sentry - Subscribe to issue events
Reef .env:
SLACK_BOT_TOKEN=xoxb-...
SLACK_INCIDENT_CHANNEL=incidents
WEBHOOK_ORGANIZATION_ID=your-reef-org-uuid # optional but recommended
Test locally:
curl -X POST http://127.0.0.1:8000/api/v1/webhooks/sentry \
-H "Content-Type: application/json" \
-d '{
"action": "created",
"organization": {"slug": "YOUR_SENTRY_ORG"},
"data": {
"issue": {
"id": "123118378",
"shortId": "PYTHON-FASTAPI-1",
"title": "TypeError in checkout payment validation",
"level": "fatal",
"project": {"slug": "python-fastapi"}
}
}
}'
You should see 202 with "Investigation queued; report will post to Slack when complete." — then watch Slack for the finished report.
Step 4 — Severity gate and human-in-the-loop
Not every incident should auto-revert a PR.
Reef scores severity from:
- Judge confidence
- Blast radius (affected users from Sentry)
- Fatal error penalty
- Missing ownership penalty
| Score | Mode | Behavior |
|---|---|---|
| ≤ 0.7 | autonomous_fix |
Agent can proceed with remediation workflow |
| > 0.7 | human_agent_paired |
Slack approval required before risky actions |
High-severity incidents always keep a human in the loop. Low-risk ones can resolve without paging anyone.
Step 5 — Run it yourself
Backend:
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
cp .env.example .env
uvicorn app.main:app --reload
Frontend:
cd frontend
pnpm install && pnpm dev
Trigger from dashboard:
curl -X POST http://127.0.0.1:8000/api/v1/triggers/dashboard \
-H "Content-Type: application/json" \
-d '{"query": "Why did checkout fail after the last deploy?"}'
Simulate all scenarios:
./backend/scripts/simulate_triggers.sh all
API docs: http://127.0.0.1:8000/docs
What we learned
Stateful loops beat one-shot prompts. Investigation is inherently iterative. Persisting every Coral query run with citations made the agent auditable — judges and humans can see why Reef concluded what it did.
Coral removed integration busywork. We spent time on orchestration, severity gating, and Slack notifications instead of four bespoke API normalizers.
SQL matches how SREs think. Deploys, PRs, errors, threads, and owners are one timeline. Expressing that as JOINs felt natural.
Async webhooks need async workers. Returning 202 immediately and investigating in the background kept Sentry happy and Slack informed when ready.
What's next
- Full Slack
/reef approveremediation flow - Richer LLM planner prompts from prior investigations
- Per-org Coral config isolation at scale
- Observability on Coral query latency and failures
Try Reef
- Live URL: https://usereef.grandkojo.my
Coral docs: withcoral.com/docs
If you are building an agent that needs data from more than one SaaS tool, start with Coral SQL before you write your fifth API wrapper.
Questions? Drop them in the comments—I'm happy to share webhook payloads and Coral source setup tips.
Built for the WeMakeDevs Coral Hackathon, May 2026. 🏴☠️
Top comments (0)