DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Zero to Platform: How GKE Autopilot and Google Cloud Redefine Modern SRE and Platform Engineering

Zero to Platform: How GKE Autopilot and Google Cloud Redefine Modern SRE and Platform Engineering

Comments
7 min read
How I Built an AI Agent That Fixes Production Errors Using Memory — And Why Memory Changes Everything

How I Built an AI Agent That Fixes Production Errors Using Memory — And Why Memory Changes Everything

Comments
6 min read
The Economics of Reliability: When to Invest, When to Accept Risk

The Economics of Reliability: When to Invest, When to Accept Risk

Comments
2 min read
Engineering Design Document: Reusable Observability Platform V2

Engineering Design Document: Reusable Observability Platform V2

1
Comments 1
19 min read
Why Your Status Page Should Be Boring

Why Your Status Page Should Be Boring

Comments
2 min read
Deployment Isn't the Hard Part. Recovery Is.

Deployment Isn't the Hard Part. Recovery Is.

Comments
3 min read
Hidden Coupling in Distributed Financial Systems: Dependencies You Didn't Know You Had

Hidden Coupling in Distributed Financial Systems: Dependencies You Didn't Know You Had

Comments
9 min read
Building Trust with Product Teams as an SRE

Building Trust with Product Teams as an SRE

Comments
2 min read
Moving From Manual Runbooks to Autonomous Root-Cause Analysis

Moving From Manual Runbooks to Autonomous Root-Cause Analysis

Comments
3 min read
What Documentation Looks Like in a Permanently Operated System

What Documentation Looks Like in a Permanently Operated System

Comments
2 min read
DNS Monitoring vs. Uptime Monitoring: Why You Need Both

DNS Monitoring vs. Uptime Monitoring: Why You Need Both

Comments
11 min read
Incident Command: The Skills They Don't Teach You

Incident Command: The Skills They Don't Teach You

Comments
2 min read
Cross-Region Replication Is Not Resilience

Cross-Region Replication Is Not Resilience

Comments
6 min read
What Building Software That Runs 24/7 Actually Means Day to Day

What Building Software That Runs 24/7 Actually Means Day to Day

Comments
2 min read
Drumbeats vs Hyperping: An Honest 2026 Comparison

Drumbeats vs Hyperping: An Honest 2026 Comparison

Comments
15 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.