DEV Community

Cover image for Silent Triage: The Zero-Chat War Room Protocol (Algolia Agent Studio Challenge)

Silent Triage: The Zero-Chat War Room Protocol (Algolia Agent Studio Challenge)

"Downtime costs an average of $9,000 per minute. In a 3 AM production crisis, you don't need a conversation. You need a deterministic decision."


πŸ’‘ The Inspiration: Beyond the Chatbot

Every SRE knows the "Alert Fatigue." When a P0 incident hits, every second counts. Traditional AI assistants are too conversationalβ€”they waste time with greetings and "How can I help you today?".

I built Silent Triage to implement the Zero-Chat Protocol: a high-speed, RAG-driven interface that turns raw, messy logs into actionable fixes without the small talk.


πŸ›‘οΈ What is Silent Triage?

Silent Triage is a specialized Incident Response agent that acts as a digital first responder.

βœ… Analyzes messy logs: Paste raw stack traces or alerts directly.

βœ… Classifies Severity: Automatically detects if an issue is P0 (Critical) or P3 (Minor).

βœ… Retrieves Ground Truth: Uses Algolia to search for past incident resolutions.

βœ… Prescribes Action: Delivers a structured JSON-based remediation plan.


πŸŽ₯ Watch the 90-second Demo

See how the agent identifies a Database Connection Timeout and suggests a fix based on historical data.


🧩 Technical Architecture: The "Ground Truth" Engine

The system is built on a high-performance RAG (Retrieval-Augmented Generation) stack using Algolia Agent Studio.


1️⃣ The Memory: Algolia Search Index

I populated an Algolia index named incident_history with structured data from past production failures. This ensures the agent's logic is grounded in reality, not hallucinations.

The Knowledge Base (incident_history.json):

[
  {
    "objectID": "inc-001",
    "title": "Database Connection Timeout - Production",
    "description": "Error: SequelizeConnectionError: connect ETIMEDOUT. The database cluster is not responding to heartbeat checks.",
    "severity": "P0",
    "cause": "Database connection pool exhausted due to unclosed connections.",
    "action": "Restart the primary database node and increase the max_connections limit in RDS.",
    "tags": ["database", "timeout", "critical", "backend"]
  },
  {
    "objectID": "inc-002",
    "title": "Slow Page Loads - Frontend Assets",
    "description": "Users reporting 5+ seconds to load the dashboard. Static assets (JS/CSS) are taking too long to download.",
    "severity": "P2",
    "cause": "CDN cache invalidation failed after the last deploy.",
    "action": "Purge CloudFront cache and verify S3 bucket permissions.",
    "tags": ["frontend", "performance", "cdn"]
  },
  {
    "objectID": "inc-003",
    "title": "Failed User Registration - API 500",
    "description": "POST /api/v1/register returning 500 Internal Server Error. Log: null pointer exception at UserService.java:45.",
    "severity": "P1",
    "cause": "Missing validation for null email addresses in the legacy registration flow.",
    "action": "Rollback to the previous stable build (v1.2.4) and add null-check in the UserService.",
    "tags": ["api", "java", "500-error", "auth"]
  },
  {
    "objectID": "inc-004",
    "title": "Broken Images in Product Catalog",
    "description": "Images not rendering in the mobile app. Getting 403 Forbidden when fetching from the media server.",
    "severity": "P3",
    "cause": "Expired SSL certificate on the media subdomain.",
    "action": "Renew the Let's Encrypt certificate via Certbot.",
    "tags": ["images", "ssl", "minor"]
  }
]
Enter fullscreen mode Exit fullscreen mode

Algolia incident_history index dashboard showing 4 production incidents with severity levels and tags


2️⃣ The Brain: Algolia Agent Studio

I used Agent Studio to orchestrate the intelligence layer. By connecting the incident_history index as a Search Tool, the agent "researches" historical data before formulating a response.

Agent Configuration & System Prompt:

Role: Professional SRE & DevOps Incident Triage Expert.
Objective: Analyze the user's error/incident report, SEARCH the 'incident_history' index for context, and provide a structured JSON decision.

Context: You have access to a database of past incidents via the Algolia Search Tool. Use it to find similar patterns.

Instructions:
1. Analyze the user's input.
2. Search the index for similar past issues.
3. Classify severity and recommend actions based on search results.
4. Output ONLY valid JSON. Do not use Markdown formatting.
Enter fullscreen mode Exit fullscreen mode

Algolia Agent Studio configuration screen showing system prompt and search tool integration


πŸ› οΈ The "Silent" Protocol: Structured Output

To maintain the "Zero-Chat" standard, I engineered a strict JSON schema. This allows the frontend to render the solution in a tactical HUD (Heads-Up Display) immediately.

Response Schema:

{
  "severity": "P0" | "P1" | "P2" | "P3",
  "probable_cause": "Brief technical explanation (max 1 sentence)",
  "recommended_action": "Concrete steps to fix or mitigate",
  "related_incident_ids": ["List of objectIDs found"],
  "confidence_score": Number (0-100),
  "language": "es" | "en"
}
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Frontend Architecture: React + Custom Hooks

The UI is built with React following a clean architecture pattern:

src/
β”œβ”€β”€ components/     # Reusable UI components
β”œβ”€β”€ hooks/          # Custom React hooks for state management
β”œβ”€β”€ services/       # API integration layer (Algolia Agent Studio)
└── utils/          # Helper functions and parsers
Enter fullscreen mode Exit fullscreen mode

VS Code project structure showing React components, hooks, services, and utils folders

Key technical decisions:

βœ… Custom hooks for agent communication

βœ… Service layer abstraction for API calls

βœ… Component-based architecture for maintainability

βœ… Dark theme optimized for high-stress environments


πŸš€ Why This Wins: Reliability Over Hallucination

Most AI agents "guess" when they encounter an error. Silent Triage is different. By grounding the model with Algolia's Search Tool, the agent retrieves actual historical context.

βœ… Tactical HUD: A React-based interface designed for dark "War Room" environments.

βœ… Telemetry Extraction: Automatically identifies IPs and endpoints from raw text.

βœ… Confidence Score: Transparent scoring based on how well the input matches historical records.

Silent Triage app interface showing a P0 critical alert analysis with severity classification and recommended actions


πŸ“‹ Post-Incident Automation

Beyond triage, Silent Triage automates the post-mortem workflow:

One-Click PDF Report Generation

After analyzing an incident, the system generates a professional PDF Audit Report using jspdf:

βœ… Incident Summary: Severity, timestamp, and confidence score

βœ… Root Cause Analysis: Grounded in historical data

βœ… Recommended Actions: Step-by-step remediation plan

βœ… Related Incidents: References to past similar cases

This eliminates the manual copy-paste process that wastes critical minutes during P0 events.

Jira-Ready Format

The analysis can be exported in Jira/Textile syntax, allowing instant ticket creation:

h2. [P0] Database Connection Timeout - Production

*Probable Cause:* Database connection pool exhausted due to unclosed connections.

*Recommended Action:* 
# Restart the primary database node
# Increase max_connections limit in RDS
# Review connection pooling configuration

*Related Incidents:* INC-001
*Confidence:* 92%
Enter fullscreen mode Exit fullscreen mode

This bridges the gap between AI-powered triage and enterprise ticketing systems.



πŸ”— Project Links

🌐 Live Demo: silent-triage-hackathon.vercel.app

πŸ’» Source Code: GitHub Repository

πŸ‘€ Developed by Sherman95 for the Algolia Agent Studio Challenge.


Top comments (0)