What I Built
I built OpsLens, an autonomous incident response orchestrator that uses Notion MCP as its core data layer.
Here is the problem I was solving: when a production incident fires at 3 AM, the on-call engineer has to do six things at once. Triage the alert. Search for past incidents. Find the runbook. Check recent deployments. Notify the team. Document everything for the postmortem. Every step is manual, scattered across different tools, and easy to mess up when you are running on two hours of sleep.
OpsLens takes the alert, runs five AI agents against it, and writes everything back to Notion. The engineer opens their incident page and finds: severity assessment, related past incidents, applicable runbook steps, a draft postmortem, and a list of who to notify. All in one place, all searchable, all happened in seconds.
But the part I am most proud of is that it is not a one-way pipe. OpsLens watches for human edits in Notion. If you disagree with the AI triage and change the severity from P2 to P0, the system detects that within 30 seconds and re-runs the relevant agents with the updated context. The AI proposes. The human decides. The system adapts.
What it actually does
Alert Ingestion: Accepts real webhook payloads from Prometheus AlertManager, Grafana, PagerDuty, or any custom JSON source. Normalizes them into a canonical format, deduplicates, and groups related alerts into a single incident.
Five AI Agents run in sequence on every new incident:
- Triage Agent - Validates severity, identifies the affected service, assesses blast radius
- Correlation Agent - Searches past incidents, Slack conversations, Google Drive docs, Jira tickets via Notion MCP's connected tool search
- Remediation Agent - Finds applicable runbooks, proposes specific commands and rollback steps
- Comms Agent - Orchestrates notifications and escalation
- Postmortem Agent - Generates a blameless postmortem when the incident resolves
Every agent writes its analysis as a structured comment on the Notion incident page. This is not dumped into a database somewhere. It lives in Notion, searchable, shareable, and visible to everyone.
Incident Commander: A contextual AI co-pilot embedded in the dashboard. During an active incident, you can ask it questions like "What changed recently in this service?" or "Find the runbook for this." It searches Notion, fetches pages, checks past incidents, and comes back with specific answers and clickable action buttons (search, escalate, transition status, notify someone, run a remediation step).
Bi-directional Notion Sync: The Notion Watcher polls active incident pages every 30 seconds. It detects when a human changes severity, updates status, adds a root cause, or writes an escalation comment directly in Notion. When it spots a change, it fires the appropriate callback, re-runs agents, and updates the dashboard via WebSocket.
Real-Time Dashboard: React frontend with live updates. Incident list with filters, full timeline view, agent activity feed, audit trail, semantic search across Notion, a webhook playground for testing, and settings page for configuring integrations, all connected via WebSocket for instant updates.
Enterprise Integrations: Slack war rooms, GitHub deployment correlation, Jira ticket creation, Linear issue tracking, and outbound webhooks with retry logic.
The architecture in one picture
Prometheus/Grafana/PagerDuty
|
v (webhooks)
+------------------+ JSON-RPC 2.0 +------------------+
| OpsLens Backend | <-------------------> | Notion MCP |
| (FastAPI) | Streamable HTTP | Server (:3100) |
| | | |
| - Incident Mgr | +--------+---------+
| - 5 AI Agents | |
| - Notion Watcher | v
| - WebSocket Hub | +------------------+
+--------+---------+ | Notion |
| | - Incidents DB |
v | - Runbooks DB |
+------------------+ | - Services DB |
| React Dashboard | | - Postmortems |
| - Incident List | | - On-Call DB |
| - Commander | +------------------+
| - Agent Feed |
| - Audit Trail |
+------------------+
Video Demo
Show us the code
OpsLens
Autonomous Incident Response Orchestrator powered by Notion MCP
OpsLens transforms Notion into an AI-powered incident command center. It ingests alerts from monitoring tools, runs a pipeline of specialized AI agents for triage, correlation, remediation, and postmortem generation, and writes every finding back to Notion as structured, searchable knowledge. Engineers interact through a real-time dashboard or directly in Notion. The system watches for human edits and reacts, creating a true human-in-the-loop incident response workflow.
Built for the Notion MCP Challenge on DEV.to.
Table of Contents
- Problem Statement
- How It Works
- Architecture
- Use Cases
- Features
- Tech Stack
- Project Structure
- Getting Started
- Configuration
- API Reference
- Webhook Integration
- Slash Commands
- Docker Deployment
- Development
- Author
- License
Problem Statement
When a production incident fires at 3 AM, the on-call engineer faces a wall of context switching: triage the alert, search for past incidents, find the runbook, notify stakeholders, check recent deployments, and document everything for…
The full source is on GitHub. Key files if you want to dive in:
-
src/notion_mcp/client.py- The async JSON-RPC 2.0 client that talks to Notion MCP -
src/notion_mcp/tools.py- Typed wrappers around every MCP tool (search, fetch, create, comment) -
src/agents/orchestrator.py- The pipeline that coordinates all five agents -
src/agents/commander.py- The Incident Commander with agentic tool-use loop -
src/sync/notion_watcher.py- Bi-directional sync that detects human edits in Notion -
src/incidents/manager.py- Incident lifecycle, dedup, grouping, and Notion rehydration -
src/webhooks/normalizer.py- Converts Prometheus/Grafana/PagerDuty payloads to a canonical format
How I Used Notion MCP
Notion MCP is not a side feature in OpsLens. It is the foundation. Every piece of data flows through it. Here is how.
The MCP Client
OpsLens communicates with the Notion MCP server over Streamable HTTP using JSON-RPC 2.0. The client at src/notion_mcp/client.py handles session initialization, request/response parsing, and rate limiting (180 requests/min, 30 searches/min). Every call is async. Every error is retried with exponential backoff via tenacity.
# Simplified view of how OpsLens talks to Notion MCP
async def call_tool(self, tool_name: str, arguments: dict) -> Any:
payload = {
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": tool_name,
"arguments": arguments,
},
}
response = await self._http.post(self.url, json=payload)
return self._parse_response(response.json())
Six MCP Tools in Active Use
I use six of the Notion MCP tools throughout the system:
notion-search - This is the workhorse. Every agent starts by searching the workspace for relevant context. The Triage Agent searches for service documentation. The Correlation Agent searches for past incidents with similar patterns. The Remediation Agent searches for runbooks. The Incident Commander uses it interactively when the engineer asks a question.
The best part: notion-search searches across connected tools too. When Slack, Google Drive, Jira, or Confluence are connected to the Notion workspace, the search results include matches from all of them. The Correlation Agent does not need separate API integrations for each tool. One MCP search call, and it gets Slack threads about the last time this service broke, the Jira ticket from the previous incident, and the Confluence page with the architecture diagram. That is a massive unlock.
create-pages - Every new incident gets a structured Notion page. Properties include incident ID, status, severity, alert source, service name, triggered timestamp, and impact description. The content includes a formatted summary with alert details, labels, and linked URLs.
create-comment - Every agent writes its analysis as a comment on the incident page. This was a deliberate design choice. Comments are timestamped, attributed, and visible in the page history. When you open an incident page in Notion, you see the full conversation: the Triage Agent's severity assessment, the Correlation Agent's findings about similar past incidents, the Remediation Agent's suggested fix. It reads like a collaborative investigation.
list-comments - On startup, OpsLens rehydrates its in-memory state from Notion. It queries the Incidents database, then loads all comments for each incident and parses them back into timeline events. This means you can restart the server and lose nothing. The state lives in Notion.
update-page - When agents update severity or status, the incident page properties are updated. The Command Center also uses this to maintain a living dashboard page in Notion with real-time metrics.
query-data-source - Used during startup rehydration to query the Incidents database and rebuild in-memory state. Also used by the Incident Commander to find past incidents by specific criteria.
Notion as the Single Source of Truth
The key insight that shaped the architecture: if every agent writes to Notion, then Notion becomes the knowledge base automatically. When the Correlation Agent searches for "payment service memory leak," it finds not just manually written docs, but also the AI-generated analyses from previous incidents. The system builds institutional memory over time, and it all lives in a place where everyone on the team can see it, search it, and edit it.
This also enables the bi-directional sync. Because the data is in Notion, humans can interact with it using the Notion UI they already know. Change a property, add a comment, update a status. The Notion Watcher picks it up and the system responds. No special tools needed. Just Notion.
Workspace Setup
OpsLens creates six databases automatically on first run via workspace_setup.py:
- Incidents - Structured incident tracking with status, severity, service, and timeline
- Runbooks - Step-by-step remediation procedures, searchable by service
- Services - Service registry with criticality tiers and ownership
- Postmortems - Blameless post-incident reviews linked to their incidents
- On-Call - Rotation schedules for escalation routing
- Confidence Tracker - Agent confidence scores over time for quality monitoring
The setup is idempotent. Run it twice and it finds the existing databases instead of creating duplicates.
What Notion MCP Unlocked
Without Notion MCP, building this would have required maintaining a separate database, building a custom search index, and creating a UI for humans to interact with the data. Notion MCP eliminated all of that.
The agents write to Notion. Humans read and edit in Notion. The search covers the entire workspace and connected tools. The data is structured, queryable, and shareable. The MCP server handles the API complexity. OpsLens just calls tools and gets results.
That is what made this project possible in the scope of a challenge. The MCP layer turned what would have been months of infrastructure work into a few hundred lines of client code.
Tech Stack: Python 3.11, FastAPI, React 18, Vite, Tailwind CSS, Google Gemini API (primary LLM), Anthropic Claude API (fallback), Notion MCP Server (Streamable HTTP), Docker Compose
GitHub: github.com/Sherin-SEF-AI/OpsLens

Top comments (0)