π¨ Introduction: The Problem Every Engineer Faces
Every engineering team eventually hits the same painful wallβproduction incidents.
A service goes down. Alerts fire everywhere. Logs, dashboards, and notifications start flooding in.
And suddenly, engineers are doing this:
Opening Sentry to check errors
Jumping to Datadog for metrics
Searching GitHub for recent deployments
Scrolling Slack for βwhat changed?β messages
Each tool holds a piece of the truth, but none of them connect the dots.
The real problem is not lack of data.
It is fragmentation.
Root cause analysis becomes a manual, stressful, and time-consuming process that can easily take 30β60 minutes per incident.
I wanted to fix that.
π‘ The Idea: What if AI Could Do RCA in Seconds?
The core idea behind RootLens is simple:
What if we could automatically connect all engineering signals and identify the root cause of an incident instantly?
Instead of engineers manually correlating data, an AI system should:
Detect recent deployments
Match them with error spikes
Correlate with infrastructure metrics
Read incident discussions
And produce a final root cause report
That is how RootLens was born.
βοΈ What is RootLens?
RootLens is an AI-powered root cause analysis agent that automatically identifies the most likely cause of production incidents.
It connects:
GitHub β Pull requests & commits
Sentry β Errors & stack traces
Datadog β System metrics
Slack β Incident conversations
And produces a complete incident breakdown in under 10 seconds.
ποΈ Architecture: How RootLens Works
At a high level, RootLens follows this pipeline:
Incident Triggered
β
RootLens AI Agent
β
CORAL SQL LAYER
β
GitHub β Sentry β Datadog β Slack
β
Cross-Source JOIN Query
β
AI Analysis (LLM)
β
Root Cause Report + Dashboard
The most important component in this system is Coral.
π§ The Core Innovation: Coral SQL Layer
Without Coral, building this system would require:
Writing 4 separate API integrations
Handling authentication for each tool
Managing pagination and rate limits
Normalizing inconsistent schemas
Writing custom logic to join data
This is weeks of engineering effort.
With Coral, everything changes.
We use a single SQL query across all systems.
π§ͺ Example: Root Cause Query
Here is the core query powering RootLens:
SELECT
g.title AS pr_title,
g.author AS pr_author,
g.merged_at AS deploy_time,
s.error_message AS first_error,
s.first_seen AS error_start,
DATEDIFF('minute', g.merged_at, s.first_seen) AS minutes_to_failure,
d.cpu_spike AS cpu_at_incident,
d.error_rate AS error_rate_percent,
sl.text AS team_discussion,
sl.author AS who_responded
FROM github.pull_requests g
JOIN sentry.issues s
ON s.first_seen BETWEEN g.merged_at AND DATEADD('hour', 1, g.merged_at)
AND s.level = 'fatal'
JOIN datadog.metrics d
ON d.timestamp BETWEEN g.merged_at AND s.first_seen
AND d.service = g.repository
JOIN slack.messages sl
ON sl.channel = '#incidents'
AND sl.timestamp >= s.first_seen
AND sl.timestamp <= DATEADD('hour', 2, s.first_seen)
WHERE g.merged_at >= DATEADD('hour', -2, NOW())
ORDER BY minutes_to_failure ASC
LIMIT 1;
This single query:
Finds recent deployments
Correlates them with fatal errors
Matches system metric spikes
Pulls incident conversation context
Ranks the most likely root cause
π§© How Coral Makes This Possible
Coral acts as a cross-source query engine.
It handles:
π Authentication across tools
π Schema mapping between systems
π¦ Pagination automatically
π Cross-source JOIN execution
β‘ Returning clean structured data
Instead of raw API noise, the AI receives ready-to-analyze structured context.
This is critical.
Because without structured data, LLMs would struggle to reliably correlate signals.
π Demo Flow: What Happens in Real Time
A PR is merged (e.g., Redis config change)
System starts failing
Sentry reports fatal errors
Datadog shows CPU spike
Slack channel lights up with alerts
RootLens runs Coral query
AI analyzes the result
Root cause report is generated
Output includes:
guilty PR
first error trace
system metrics spike
Slack discussion context
confidence score
All in under 10 seconds.
π Impact: Before vs After RootLens
Metric Before After
Time to root cause 30β60 min < 10 sec
Tools opened 4β6 0
Context switching High None
Postmortem writing Manual Auto-generated
Engineer stress High Low
π₯ Key Learnings
Building RootLens taught me:
- Observability data is powerfulβbut fragmented
Each tool holds critical context, but none of them talk to each other.
- Correlation is harder than detection
Detecting errors is easy. Linking them to deployments is the real challenge.
- AI is only as good as its context
Structured, joined data dramatically improves LLM reasoning.
- Unified query layers change everything
Coral transforms multi-system complexity into a single query interface.
π§ Final Thoughts
RootLens is not just an AI tool.
It is a shift in how we think about debugging production systems.
Instead of manually hunting for root causes, we can now ask:
βWhat broke and why?β
And get a precise answer in seconds.
That is the future of incident analysis.
π΄ββ οΈ Built for
Pirates of the Coral-bean Hackathon
Track: Enterprise Agent
Powered by Coral SQL
Top comments (0)