DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Disaster Recovery Drills That Actually Work

Disaster Recovery Drills That Actually Work

Comments
3 min read
Disaster Recovery Drills That Actually Work

Disaster Recovery Drills That Actually Work

Comments
3 min read
Go Circuit Breakers That Fail Friendly: The 94% Cascade Prevention We Measured

Go Circuit Breakers That Fail Friendly: The 94% Cascade Prevention We Measured

Comments
13 min read
How to Compute Zero Trust Effectiveness: Four Metrics That Survive a Breach

How to Compute Zero Trust Effectiveness: Four Metrics That Survive a Breach

Comments
5 min read
MCP in Production Reality vs the Spec

MCP in Production Reality vs the Spec

Comments
3 min read
RAG vs MCP is the wrong debate — here's the right framing for production AI systems

RAG vs MCP is the wrong debate — here's the right framing for production AI systems

Comments
4 min read
“But it worked on my machine.”

“But it worked on my machine.”

Comments
1 min read
How I Created a DDoS Protection Engine

How I Created a DDoS Protection Engine

Comments
11 min read
How a Simple Python Validator Prevents Config Outages

How a Simple Python Validator Prevents Config Outages

Comments
3 min read
AI agents don’t need more autonomy. They need route, boundary, and receipt.

AI agents don’t need more autonomy. They need route, boundary, and receipt.

3
Comments
3 min read
I built a reference site for the recurring hard parts of software work

I built a reference site for the recurring hard parts of software work

Comments
2 min read
CloudFormation in Production: What Breaks and How to Fix It

CloudFormation in Production: What Breaks and How to Fix It

1
Comments
11 min read
AI Ops Agents Are a New Class of Attack Surface

AI Ops Agents Are a New Class of Attack Surface

Comments
7 min read
Failure Semantics in Distributed Financial Systems: What Does “Failure” Actually Mean?

Failure Semantics in Distributed Financial Systems: What Does “Failure” Actually Mean?

Comments
4 min read
Service Level Objectives for Complex Microservices

Service Level Objectives for Complex Microservices

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.