Building ReefWatch, a Coral-Powered Production Triage Agent

Siddhant Rai — Sat, 30 May 2026 06:43:00 +0000

Production incidents almost never break in one place.

The alert fires in one tool. The broken deploy is in Netlify. The suspicious
change is in GitHub. The stack trace is in Sentry. The human context is in
Slack. The runbook is in Notion. The "is this actually paging someone?" answer
is in PagerDuty.

A normal chatbot can sound helpful in that situation. It can say things like
"you should check your recent deployments" and "look for related errors in
Sentry."

But that is not triage. That is a polished to-do list.

I wanted something more useful: an agent that could go get the evidence, connect
the dots across sources, show its work, and give an operator-grade answer
grounded in real system data.

The design constraint from the start was simple: no evidence, no answer.

That became ReefWatch, a Coral-powered production triage agent built to
investigate instead of improvise.

It discovers the tools connected to a workspace at runtime, queries them as
evidence, correlates records across systems, and produces a compact answer only
when the facts support one.

Coral became the backbone because it turns the messiest part of agent tooling
into something the model can actually reason about: SQL.

What This Guide Builds

By the end of this route, you will have a blueprint for an agent that can:

discover connected Coral sources at runtime
query production systems through read-only SQL
correlate evidence across code, deploys, errors, alerts, chats, and runbooks
stream every query and row count into an inspectable UI
run the same investigation workflow from a CLI when you want a scriptable path
generate an incident report only when the evidence supports one
stay focused with policy layers instead of a giant prompt blob

In one sentence:

ReefWatch is a Coral-powered investigation workspace that lets an agent discover connected tools at runtime, query them with read-only SQL, stream the evidence trail, and generate an incident report only when the facts actually support one.

Why Coral Belongs At The Center

MCP is excellent as an integration layer. It gives models a way to call tools
with schemas instead of scraping humans through UI glue.

But if every source becomes a separate collection of bespoke tools, a new
problem appears:

the model has to learn many tool shapes
every API has different pagination and filters
joining records across sources becomes a prompt exercise
source-specific errors leak straight into the agent loop
every new integration asks the app to own more integration logic

Coral changes the abstraction.

It still uses MCP, but the agent mostly sees a small set of stable capabilities:
discover catalog, inspect schema, read the guide, and run SQL.

That means a new source is not:

teach ReefWatch another SDK

It becomes:

install a Coral source -> discover the tables -> query evidence with SQL

The practical win is boring in the best way: ReefWatch can stay small.

The app does not own GitHub pagination, Sentry auth, Slack table shapes, or
Netlify deploy schemas.

Coral owns that. ReefWatch owns the investigation behavior.

That split also maps well to how I think reliable agents should be built:
ground the model in real environment feedback, keep tools composable, trace the
work, and wrap the loop with small guardrails instead of hoping one perfect
prompt behaves forever.

MCP gives the agent hands. Coral gives it a map and a query language.

The Build Path

If I were rebuilding ReefWatch from scratch, I would not start with the UI.

I would start with the investigation pipeline and make each layer earn its
place.

Remember, it is tempting to start with the surface, but you should make the surface reflect a system that was already worth trusting.

The project came together in eight slices:

Slice	What I Built	Why It Mattered
1	Coral MCP client	Proved Coral could be the data plane
2	Warm Coral session	Removed repeated MCP startup cost
3	Schema context	Kept prompts aligned with live Coral metadata
4	Minimal agent loop	Exposed the real model failure modes
5	Policy modules	Made the agent reliable without hardcoding a demo
6	Persistence	Made runs debuggable and conversations durable
7	Streaming UI	Made the investigation inspectable
8	Source profiles	Made setup reproducible without requiring every token

The final project shape looks roughly like this:

(a FastAPI backend, with SQLite store and React frontend)

reefwatch/
|-- src/
|   |-- api/
|   |   |-- routes/
|   |   |   |-- coral.py              # Coral health and source setup
|   |   |   |-- conversations.py      # persisted investigation threads
|   |   |   |-- investigations.py     # REST + SSE investigation runs
|   |   |   `-- schema.py             # schema visibility for the UI
|   |   |-- dependencies.py           # shared Settings, store, agent, session
|   |   |-- mappers.py                # domain models to API responses
|   |   `-- schemas.py                # API contracts
|   `-- app/
|       |-- adapters/
|       |   |-- coral_session.py      # long-lived Coral process + warm cache
|       |   |-- mcp_client.py         # JSON-RPC over coral mcp-stdio
|       |   `-- store.py              # SQLite run/conversation persistence
|       |-- agent/
|       |   |-- context.py            # conversation compression
|       |   |-- coverage.py           # evidence lane policy
|       |   |-- events.py             # streamable trace event contracts
|       |   |-- guardrails.py         # evidence-first retries
|       |   |-- intent.py             # structured artifact routing
|       |   |-- execution_policy.py   # duplicate and SQL-shape hygiene
|       |   |-- loop.py               # LLM/tool loop
|       |   |-- policy.py             # budgets and finalization
|       |   |-- prompts.py            # schema-aware operating contract
|       |   |-- schema.py             # Coral table/column context builder
|       |   |-- source_guidance.py    # compact source idiom hints
|       |   |-- taxonomy.py           # source lanes and shared intent vocabulary
|       |   |-- workflow.py           # coverage, correlation, and stop checkpoints
|       |   `-- synthesis.py          # optional incident report synthesis
|       |-- config.py                 # centralized runtime knobs
|       |-- coral_setup.py            # install/test Coral source profiles
|       `-- source_profiles.py        # triage, demo, enterprise profiles
`-- frontend/
    `-- src/
        |-- components/chat/          # chat surface, markdown, evidence trail
        |-- store/                    # conversation/run state
        `-- api/                      # backend client

That structure came from the order of problems I solved.

Slice 1: Prove Coral Can Be The Data Plane

The first backend slice was deliberately small.

I wanted to answer one question:

Can ReefWatch treat Coral as the source of operational truth?

The first proof was:

Launch coral mcp-stdio.
Initialize MCP over JSON-RPC.
Read coral://tables.
Call the sql tool.
Return rows to a plain API endpoint.

At that point, ReefWatch was not an agent yet. It was a thin Coral client.

That was useful because it proved the most important bet: the app could treat
Coral as the data plane instead of building direct SDK integrations for
GitHub, Slack, Sentry, and every other source.

The first reusable module was mcp_client.py.

It owns the boring but essential transport details:

spawn the Coral binary
speak JSON-RPC over stdio
convert MCP tools to OpenAI-compatible function tools
read Coral resources such as coral://guide and coral://tables
surface stderr and decoding errors clearly

Design decision: keep transport boring. Once mcp_client.py worked, the
rest of the app could stop thinking about processes and start thinking about
investigations.

Slice 2: Keep Coral Warm

The naive approach would be:

user asks question -> spawn Coral -> discover schema -> ask model -> run SQL

That is fine for a script.

It feels rough in a product.

So the second slice was coral_session.py. It keeps one Coral process alive,
warms the schema/guide/tool cache, and recreates the process if it dies.

That gave ReefWatch a cleaner runtime shape:

app starts -> warm Coral once -> investigations reuse the session

The session cache stores three things:

the Coral source schema
the Coral guide
the OpenAI-compatible tool definitions

That one decision made the product feel different.

Instead of every user prompt waiting on MCP bootstrapping and catalog discovery,
ReefWatch starts from a warm map of the available sources.

There is still a fallback path. If the process dies, CoralSession can recreate
the client and warm the cache again.

The MCP client reads and writes JSON-RPC over stdio with UTF-8 decoding, drains
stderr on a background thread, and reports useful transport errors rather than
hanging silently.

A production triage agent that randomly waits on its own plumbing is not a production triage agent.

Slice 3: Build Schema Context From Coral, Not Memory

Hardcoding source schemas would defeat the point of using Coral.

The agent must discover what is installed right now.

The temptation was to write a hand-authored prompt like:

GitHub has these tables. Sentry has these tables. Slack uses channel ids.

That would have made ReefWatch brittle and less Coral-native.

This was one of those small choices where the architecture either respects the
tool it is built on, or quietly works around it.

Instead, ReefWatch builds its prompt context from Coral itself.

It reads coral://tables, enriches the result with coral.columns, groups
tables by source, and includes only a compact slice of each source in the
prompt.

SELECT schema_name, table_name, column_name, data_type
FROM coral.columns
ORDER BY schema_name, table_name, ordinal_position

The key idea: the model gets a map, not a maze.

If the source catalog is small, the model sees most of it.

If the catalog is large, the model gets enough to start and can use Coral
discovery tools for the rest.

That keeps the prompt useful without pretending the app has permanent knowledge of every source.

Slice 4: Start With The Smallest Useful Agent Loop

Only after Coral transport and schema context worked did I build the agent loop.

The first version of agent/loop.py had one job:

messages -> LLM tool call -> Coral SQL -> tool result -> final answer

That version was intentionally plain.

It let me see the raw failure modes:

advice instead of action: the model answered with instructions instead of querying
catalog instead of evidence: it listed tables instead of using them
unnecessary clarification: it asked for repo names GitHub auth could reveal
single-lane tunnel vision: it queried one source and claimed it investigated everything
false negatives: it treated a filtered zero-row query as proof of no evidence

Those failures were useful.

They showed which parts belonged in the prompt and which parts deserved
code-level policy.

A bad first agent run is not wasted time if it tells you where the system needs structure.

Slice 5: Add Policy Around The Loop

This was the real turning point.

I stopped trying to make one heroic system prompt do everything.

Instead, I split agent behavior into focused modules:

policy.py decides query budgets and finalization behavior.
guardrails.py handles evidence-first retries and missing-source retries.
coverage.py decides which evidence lanes matter for a request.
workflow.py turns coverage and correlation into small checkpoint prompts.
execution_policy.py skips duplicate/noisy query shapes and catches table/function syntax mistakes before they hit Coral.
context.py compresses conversation history.
synthesis.py decides whether a structured report is appropriate.

This was better than making one giant prompt because each module has a clear
reason to exist and can be tested:

Failure Mode	Layer That Handles It
Agent answers without querying	`guardrails.py`
Agent stops after one source	`coverage.py`
Agent ignores missing evidence lanes	`workflow.py`
Agent skips cross-source correlation	`workflow.py`
Agent repeats query shapes	`execution_policy.py`
Agent loops too long	`policy.py`
Conversation gets too large	`context.py`
Report appears for ordinary questions	`synthesis.py`

The model still has agency. The code does not prescribe exact SQL for a demo
scenario.

The policy layer just keeps the model inside the kind of investigation a
human operator would expect.

Slice 6: Persist Runs Before Building A Fancy Chat

The next slice was persistence.

I started with SQLite because this is a proof-of-concept and local operator
tool, not a multi-tenant SaaS backend.

The important part was not Postgres. The important part was recording:

conversation IDs
user questions
model used
final answer
report payload
every SQL execution
row counts and errors

That made debugging dramatically easier.

When a run looked bad, I could inspect the exact queries and decide whether the
failure was prompt, policy, schema, model, or source setup.

This is also why the frontend can hydrate conversations and show evidence
instead of keeping everything only in Redux memory.

Slice 7: Build The UI Around Evidence

Only then did the chat UI become valuable.

The UI was not designed as "talk to an AI."

It was designed as an investigation workspace:

The input starts the investigation.
The agent streams progress.
SQL queries appear as evidence.
The trail collapses when the final answer arrives.
The final answer renders as Markdown.
Conversations can be revisited because runs are persisted.

That UI decision matters because Coral is visualizable.

The user can see source counts, SQL queries, row counts, and the final
synthesis.

ReefWatch shows the route instead of hiding it behind one polished
paragraph.

Slice 8: Add Source Profiles Last

The last piece was source profiles.

I did not want the default setup to require every possible token. That creates a
bad demo path.

Instead, ReefWatch has profiles:

Profile	Sources	Use Case
`triage`	GitHub, Sentry, Slack, Netlify	lightweight production triage
`demo`	triage + PagerDuty	richer incident response demo
`security`	GitHub, Slack, Notion, OSV	compliance/security route
`enterprise`	demo + Notion + OSV	default hackathon showcase
`observability`	demo + Datadog + StatusGator	deeper ops setup

This keeps the build reproducible.

A reader can start with triage, get a real agent working, then add
Notion/OSV/PagerDuty when they want a stronger story.

The Agent Loop That Made It Work

The main loop is intentionally simple:

Build messages from the user prompt, schema context, Coral guide, shared source taxonomy, and compressed conversation memory.
Ask the LLM for tool calls.
Execute Coral MCP tools.
Record SQL executions.
Stream trace events to the UI.
Apply workflow checkpoints and lightweight execution hygiene.
Stop when evidence is sufficient or the configured budget is reached.
Classify the artifact the current request deserves.
Optionally synthesize a structured incident report.

The important part is not that the loop is complicated.

It is that the loop is surrounded by small pieces of judgment.

Evidence-first guardrail

If the user asks an operational question like "what issues are on my GitHub?"
and the model tries to answer without querying Coral, ReefWatch injects a retry
message:

"You have not queried Coral yet. Do not answer with table recommendations or ask
for repo/org names until you first run metadata/source SQL queries to infer them.""

This fixed the first embarrassing failure mode: the agent giving me instructions
instead of doing the investigation.

Source coverage policy

For production triage, one source is almost never enough.

ReefWatch treats sources as evidence lanes:

Category	Sources
Ops	GitHub, Sentry, Netlify, Slack, PagerDuty, StatusGator
Knowledge	Notion
Security	GitHub, OSV, Notion, Slack
Observability	Datadog

The policy does not say "always query everything."

It checks what is actually installed and what the user asked. If the user asks
specifically about GitHub, the coverage stays GitHub-scoped. If the user asks
for production triage, the agent should cover the available ops lanes before
finalizing.

You have only checked GitHub, but Sentry and Netlify are available, so prefer those lanes next.

That is the kind of judgment I wanted outside the model.

The important refinement: coverage is a guide, not a cage.

If the model just discovered the right Sentry project ID or hit a column error,
it is allowed to inspect Coral metadata and correct that source query before
moving on. That matters because real triage has tiny detours:

find the ID
inspect the columns
fix the table/function shape
then continue the lane plan

Hard-blocking those detours made the agent worse. ReefWatch now nudges the
investigation path without preventing useful schema correction.

The source lane definitions and shared intent vocabulary live in taxonomy.py.

That small file exists for a boring but important reason: coverage, budgets,
and intent classification should not each carry their own slightly different
definition of what "incident" means.

The agent is still dynamic. taxonomy.py does not contain TraceChat queries,
table names for a demo, or source-specific SQL recipes. It only describes the
categories ReefWatch can reason about:

ops evidence
knowledge evidence
security evidence
observability evidence

Coral still discovers the actual tables, functions, filters, and columns at
runtime.

Cross-source correlation checkpoint

This was the final thing I tightened before the demo.

Once multiple evidence lanes return concrete anchors, ReefWatch asks for a
Coral-side correlation query instead of letting the model stitch everything
together in prose.

The preferred shape is:

WITH deploy AS (...),
     errors AS (...),
     notes AS (...)
SELECT ...
FROM deploy
JOIN errors ON ...
LEFT JOIN notes ON ...

or, when the relationship is time-based instead of key-based:

WITH deploy AS (...),
     errors AS (...),
     notes AS (...)
SELECT ...
FROM deploy
CROSS JOIN errors
LEFT JOIN notes ON notes.ts <= errors.first_seen
WHERE errors.first_seen >= deploy.created_at

That checkpoint is still source-agnostic. It does not say "for TraceChat, run
this SQL." It says: if the evidence exposes IDs, URLs, releases, commits,
service names, channel IDs, or timestamps, prove the relationship inside Coral.

If a correlation query fails because of SQL shape, the next instruction is not
"give up." It is:

inspect Coral metadata,
correct the table, function, column, or filter shape,
retry with a smaller join.

This made ReefWatch feel much less like a chatbot and much more like an
investigation workbench.

Decisive evidence, not accidental emptiness

One subtle failure: a model can run a query with a hallucinated timestamp column,
get zero rows, and conclude "Slack had no evidence."

That is bad triage.

ReefWatch treats a filtered zero-row evidence query as not fully decisive until
the model relaxes the filter or inspects the schema.

A broad zero-row data query can satisfy a lane. A narrow zero-row query with
extra WHERE filters cannot automatically close the book.

That small distinction protects against false negatives without hardcoding
Slack or any other source.

Scope discipline

Another failure mode showed up with quiet repositories.

The model would discover the correct repo, then drift into global GitHub
searches anyway.

The fix was not "hardcode this repository"

The fix was a general scope policy.

If ReefWatch has discovered a concrete owner/repo, and the agent keeps
running broad GitHub searches without repo:owner/repo, it nudges the agent
back to scoped checks.

This now lives as workflow guidance rather than a hard execution block. The
point is the same: once a concrete anchor exists, prefer scoped evidence over
another broad search, but still allow a corrective metadata query when the model
needs to fix the route.

Query budgets

The budget is not about limiting Coral.

Coral SQL queries are cheap compared to LLM loops.

The budget is about preventing agent drift and making the product predictable.

ReefWatch uses different budgets by request type:

health checks get a smaller budget
general triage gets a medium budget
incident/root-cause prompts get a larger budget

When the budget is reached, the model must stop querying and produce the best
evidence-backed answer it can, explicitly naming unknowns.

Conversation memory

The UI is conversational, but the product is not trying to become a general chat
companion.

The conversation flow exists for follow-up investigations:

"check the same repo again"
"what about Sentry?"
"show me the deployment angle"
"now make that an incident report"

ReefWatch persists conversations and runs in SQLite.

For the agent prompt, it builds a compact context from recent runs and SQL
executions. If the message history gets too large, ContextWindow compresses
older tool chatter into an execution summary and keeps the latest turns.

That gives the model continuity without stuffing every old row into the prompt.

Intent classification

The first version of ReefWatch used a small keyword policy to decide whether a
run should produce an incident report.

That was useful as a fallback, but it was too blunt for a real conversation.

For example:

What did it find on Slack?

That follow-up might mention "incident chatter" or "deploy errors" in the
answer, but the user did not ask for a new incident report. They asked for a
source-specific explanation.

The fix was a structured intent classifier.

After the evidence answer is drafted, ReefWatch asks a lightweight structured
LLM step to classify the artifact:

answer_only
incident_report
audit_report
follow_up

The prompt is intentionally narrow. It classifies the current user request,
not random words that appear in the answer draft or previous conversation
context.

There are still deterministic policy boundaries:

report_policy=never always disables reports
report_policy=always always enables an incident report
no evidence queries means no report
if the classifier fails, ReefWatch falls back to a conservative heuristic

This is the pattern I ended up liking most: let the model handle semantic
intent, but keep product policy outside the model.

Report synthesis

Not every question deserves a report.

If I ask "are there any open issues on my GitHub?", an incident report would be
the wrong artifact.

If I ask "investigate the production regression," a report is useful.

The intent classifier decides the artifact. Report synthesis only runs when the
mode is incident_report.

The structured synthesizer gets only the findings, SQL summary, and sources
used.

It has to stay grounded in the evidence already collected. If evidence is weak,
it must lower confidence rather than invent a root cause.

The CLI Path

The UI is the best place to watch the investigation unfold.

The CLI is the best place to prove the plumbing works.

That split matters for a production agent. Before I ask the model to connect
GitHub, Sentry, Netlify, Slack, PagerDuty, Notion, and OSV into one answer, I
want a boring setup path that can validate each lane by itself.

ReefWatch exposes that through reefwatch coral:

uv run reefwatch coral doctor
uv run reefwatch coral build
uv run reefwatch coral install-profile
uv run reefwatch coral test-source github
uv run reefwatch coral test-source sentry
uv run reefwatch coral test-source netlify
uv run reefwatch coral test-source slack
uv run reefwatch coral test-source pagerduty
uv run reefwatch coral test-source notion
uv run reefwatch coral sql "SELECT * FROM pagerduty.abilities LIMIT 5"

The important detail is that the CLI does not invent another integration layer.

It uses the same Coral configuration and the same MCP transport that the agent
uses. The difference is intent: the CLI is for setup, validation, and scripted
investigations; the web workspace is for watching evidence appear and reading
the final answer.

For example, a teammate can run:

uv run reefwatch investigate "Investigate the current production issue for tracechat-ledger and tell me what needs attention now." --trace

That gives the project a second interface without splitting the product in two.

Reproducing The Demo

Here is the practical route another developer can follow.

1. Clone Coral and ReefWatch

Build Coral locally and point ReefWatch at the binary:

git clone https://github.com/withcoral/coral.git
cd coral
cargo build

Then configure ReefWatch:

RW_CORAL_EXECUTABLE=../coral/target/debug/coral.exe
RW_CORAL_REPO_PATH=../coral
RW_CORAL_CONFIG_DIR=state/coral
RW_SOURCE_PROFILE=enterprise

2. Choosing a capable LLM

The LLM I went for at the time of making and testing ReefWatch was DeepSeek v4 Pro as it is quite a powerful model for agentic workflows and is very cost efficient for the amount of work it does.

ReefWatch supports multi-modal LLM requests for the different stages, i.e inference, the main agent loop and the synthesis, so depending on your budget and use-case you can customise it!

3. Install the first source set

Start with the sources that give the best incident story without too much setup:

GitHub for code, issues, PRs, workflows
Sentry for runtime errors
Netlify for deployments
Slack for human context
PagerDuty if available

For the security/compliance variant, add:

Notion for runbooks and policies
OSV for vulnerability intelligence

The important UX decision is profiles.

ReefWatch does not force every source into every demo. It has triage, demo,
security, enterprise, and observability profiles so the setup can match
the story.

4. Ask one strong prompt

Use a prompt that gives the agent enough intent but not a scripted path:

Investigate the current production issue for tracechat-ledger and tell me what
needs attention now.

A good run should show:

schema discovery from Coral
GitHub repo/issue resolution
Sentry project and event lookup
Netlify site/deploy lookup
Slack channel/message lookup
optional Notion runbook lookup
an answer that labels lanes as confirmed, checked-empty, partial, blocked, or not-linked
a report only if the incident shape is present

What A Quiet Result Should Look Like

Quiet repos are harder than they look.

A lazy agent says "no issues" after one empty query. A paranoid agent runs 30
searches and still sounds unsure.

The ReefWatch answer I want is calmer:

I did not find an active issue for <repo name>.

GitHub is checked-empty for open issues and PRs on that repository. I did not
find linked deployment/runtime evidence in the installed sources. No incident
report was generated because this looks like a quiet repository check, not an
active production incident.

That is the product philosophy in miniature:

useful
scoped
evidence-backed
not dramatic for no reason

Some Highlights

Coral is not a checkbox

ReefWatch depends on Coral's core strengths:

runtime source discovery
SQL-first querying
source manifests
MCP tool exposure
local execution
cacheable schema/guide/tool metadata
cross-source correlation through common identifiers

The agent does not just "call Coral once."

Coral is the investigation substrate.

The agent is layered

The code is intentionally split:

Layer	Responsibility
MCP adapter	JSON-RPC over Coral stdio, UTF-8 safety, guide/resources/tools
Coral session	Long-lived process and warm cache
Schema model	Compact source/table/column context
Prompt builder	Operating contract and live schema context
Agent loop	LLM/tool loop and execution recording
Policy	Budgets and finalization
Coverage	Evidence lane requirements and source-level completeness
Workflow	Coverage, correlation, correction, and stop checkpoints
Taxonomy	Shared source lanes and investigation vocabulary
Guardrails	Evidence-first and missing-source retries
Context	Conversation compression
Synthesis	Optional structured report
API	Persistence and SSE streaming

The model is guided, not spoon-fed

ReefWatch does not hardcode "for tracechat, query these exact tables."

It gives the model a source-agnostic investigation workflow, then lets Coral's
live catalog expose the actual tables, functions, filters, and source idioms.

Closing The Log

Thanks for reading, if you've reached this part!
My teammate and I built ReefWatch for the Coral Hackathon. The experience taught me so much about building autonomous agents from scratch and shaping ReefWatch into a helpful tool.

The most useful thing Coral gave ReefWatch was not just another integration.

It gave the agent a way to move through operational data with a consistent
mental model:

discover -> inspect -> query -> correlate -> report

That is the difference between a chatbot that knows what tools exist and an
agent that can actually investigate.

ReefWatch is still a proof-of-concept, but the shape feels right: Coral handles
the source layer, ReefWatch handles the investigation behavior, and the UI shows
the route clearly enough that an operator can trust or challenge the answer.

That is the kind of agent I wanted to build.

Not a narrator.

An investigator.

DEV Community: Siddhant Rai