DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
AI Agents in Production: The Future of SRE and DevOps

AI Agents in Production: The Future of SRE and DevOps

4
Comments 1
3 min read
The Silent Process

The Silent Process

1
Comments
3 min read
When Everything Is On Fire: Incident Communication That Engineers (and Users) Can Trust

When Everything Is On Fire: Incident Communication That Engineers (and Users) Can Trust

Comments
5 min read
Circuit Breakers for LLM APIs: Applying SRE Patterns to AI Infrastructure

Circuit Breakers for LLM APIs: Applying SRE Patterns to AI Infrastructure

Comments
6 min read
The Worlds of Distributed Systems — Align Your Team’s Mental Model

The Worlds of Distributed Systems — Align Your Team’s Mental Model

Comments
5 min read
Why LeetCode Habits Get Senior Engineers Rejected in Google SRE Coding Rounds

Why LeetCode Habits Get Senior Engineers Rejected in Google SRE Coding Rounds

1
Comments
4 min read
Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Comments
6 min read
The Real Reason AI Agents “Work” in Software

The Real Reason AI Agents “Work” in Software

Comments
6 min read
Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

1
Comments
4 min read
Why is Infrastructure-as-Code so important? Hint: It's correctness

Why is Infrastructure-as-Code so important? Hint: It's correctness

Comments
2 min read
Pourquoi mon serveur est devenu lent : le cas du disque SMR

Pourquoi mon serveur est devenu lent : le cas du disque SMR

Comments
2 min read
OpenTelemetry vs Loki - Choosing the Right Observability Tool

OpenTelemetry vs Loki - Choosing the Right Observability Tool

1
Comments
13 min read
OpenTelemetry vs Logstash - Which Logging Tool Is Right for You?

OpenTelemetry vs Logstash - Which Logging Tool Is Right for You?

1
Comments
9 min read
OpenTelemetry Events vs Logs - Key Differences Explained

OpenTelemetry Events vs Logs - Key Differences Explained

1
Comments
15 min read
Your Kubernetes Cluster Shouldn't Need You at 3am

Your Kubernetes Cluster Shouldn't Need You at 3am

Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.