DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Chapter 2 — RML-1 (Closed World): Build a Room Where Failure Is Safe

Chapter 2 — RML-1 (Closed World): Build a Room Where Failure Is Safe

Comments
7 min read
Building Reliable Software: The Trap of Convenience

Building Reliable Software: The Trap of Convenience

Comments
7 min read
Os 4 Sinais Dourados da Google

Os 4 Sinais Dourados da Google

Comments
5 min read
When Systems Fail, Trust Is the Real Incident: A Practical Guide to Communication for Engineers and Founders

When Systems Fail, Trust Is the Real Incident: A Practical Guide to Communication for Engineers and Founders

Comments
5 min read
How to Get IP, ASN, and Network Information with curl (No API Key Required)

How to Get IP, ASN, and Network Information with curl (No API Key Required)

Comments
2 min read
The Silent Process

The Silent Process

1
Comments
3 min read
Backpressure, Buffers, and Logging Sidecars

Backpressure, Buffers, and Logging Sidecars

2
Comments
5 min read
When Everything Is On Fire: Incident Communication That Engineers (and Users) Can Trust

When Everything Is On Fire: Incident Communication That Engineers (and Users) Can Trust

Comments
5 min read
Circuit Breakers for LLM APIs: Applying SRE Patterns to AI Infrastructure

Circuit Breakers for LLM APIs: Applying SRE Patterns to AI Infrastructure

Comments
6 min read
The Worlds of Distributed Systems — Align Your Team’s Mental Model

The Worlds of Distributed Systems — Align Your Team’s Mental Model

Comments
5 min read
Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Comments
6 min read
The Real Reason AI Agents “Work” in Software

The Real Reason AI Agents “Work” in Software

Comments
6 min read
Why is Infrastructure-as-Code so important? Hint: It's correctness

Why is Infrastructure-as-Code so important? Hint: It's correctness

Comments
2 min read
Pourquoi mon serveur est devenu lent : le cas du disque SMR

Pourquoi mon serveur est devenu lent : le cas du disque SMR

Comments
2 min read
I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

1
Comments 1
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.