DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Worlds of Distributed Systems — Align Your Team’s Mental Model

The Worlds of Distributed Systems — Align Your Team’s Mental Model

Comments
5 min read
Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Comments
6 min read
Incident Management Best Practices for SRE

Incident Management Best Practices for SRE

Comments
6 min read
The Real Reason AI Agents “Work” in Software

The Real Reason AI Agents “Work” in Software

Comments
6 min read
Building Reliable Software: Planning for Things to Break

Building Reliable Software: Planning for Things to Break

Comments
8 min read
Chapter 2: Infrastructure as Code

Chapter 2: Infrastructure as Code

1
Comments
8 min read
Pourquoi mon serveur est devenu lent : le cas du disque SMR

Pourquoi mon serveur est devenu lent : le cas du disque SMR

Comments
2 min read
Your Kubernetes Cluster Shouldn't Need You at 3am

Your Kubernetes Cluster Shouldn't Need You at 3am

Comments
1 min read
Your Traces Look Fine. Your Revenue Isn’t.

Your Traces Look Fine. Your Revenue Isn’t.

1
Comments
2 min read
SLIs, SLOs, SLAs: The Guide to SRE’s Secret Sauce

SLIs, SLOs, SLAs: The Guide to SRE’s Secret Sauce

Comments
3 min read
Stop Checking Uptime. Start Checking What Your Users Actually See.

Stop Checking Uptime. Start Checking What Your Users Actually See.

Comments
2 min read
I Audited 40 Monitoring Setups. Here Are the 3 Blind Spots That Existed in All of Them

I Audited 40 Monitoring Setups. Here Are the 3 Blind Spots That Existed in All of Them

Comments
2 min read
Your APM Is Lying to You: 5 Silent Errors Killing Your Uptime Right Now

Your APM Is Lying to You: 5 Silent Errors Killing Your Uptime Right Now

Comments
2 min read
3 Signs Your Monitoring Is Lying to You (Data From Hundreds of Endpoints)

3 Signs Your Monitoring Is Lying to You (Data From Hundreds of Endpoints)

Comments
3 min read
I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

1
Comments 1
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.