DEV Community

Cover image for Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions
Ooi Yee Fei
Ooi Yee Fei

Posted on

Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions

Every SRE team has the same nightmare: it's 3am, traffic spikes, and nobody predicted it. By the time CloudWatch alerts fire, customers are already frustrated and revenue is lost.

I built Sentrix — an AI-powered SRE copilot that predicts infrastructure problems before they happen and autonomously scales your cloud resources. But what makes it different isn't just prediction — it's debate.

Three Agents, One Decision
Instead of a single AI making decisions, three Bedrock Claude agents argue about every scaling call:

AGENT_SRE fights for reliability: "Scale now, we can't risk downtime."
AGENT_FINANCE pushes back on cost: "That's 5x the replicas — do we really need all of them?"
AGENT_ARBITER synthesizes both: "Scale to 3x now, monitor for 5 minutes, then reassess."
The result is decisions that balance reliability and cost — and every decision is scored 5 minutes later via a Step Functions feedback loop. The scores become thought signatures that feed back into future analysis. The AI literally learns what works for your specific infrastructure.

What it looks like in practice
I ran Sentrix through a full incident lifecycle: traffic spike → cost optimization → regional degradation → cascading AWS failure → cross-cloud GCP failover → autonomous recovery.

Watch the demo:

During a 4536% traffic surge, the brain detected it in milliseconds and scaled EKS from 2 to 10 pods — no human needed. When traffic normalized, the Finance agent argued for scaling down, and the system optimized from 10 back to 5 pods. When AWS regions cascaded, all three agents unanimously agreed on GCP failover. The feedback loop scored that decision 100/100.

The whole system runs on a single AWS CDK stack — Lambda, Bedrock, EKS, DynamoDB, Step Functions, EventBridge, CloudFront — deployed in 5 minutes.

Full writeup
The full post covers the architecture, severity-based model selection (Haiku for low severity, Sonnet for critical), the thought signature self-evolution mechanism, and a phase-by-phase demo walkthrough with screenshots.

Read the full writeup on Build Signals

I submitted Sentrix to the AWS 10,000 AIdeas competition. The top 300 most-liked articles advance to the next round. If you found this interesting, a like on the article would genuinely help — it takes 2 seconds.

Like the article on AWS Builder Center

Top comments (0)