DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Why LeetCode Habits Get Senior Engineers Rejected in Google SRE Coding Rounds

Why LeetCode Habits Get Senior Engineers Rejected in Google SRE Coding Rounds

1
Comments
4 min read
Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Chapter 1 — Thinking About Rollback in Distributed Systems Through Three Worlds (RML-1/2/3)

Comments
6 min read
Most Kubernetes Clusters Are Over-Engineered

Most Kubernetes Clusters Are Over-Engineered

Comments 2
4 min read
A 10% traffic spike took down a stable system in 3 minutes and 47 seconds.

A 10% traffic spike took down a stable system in 3 minutes and 47 seconds.

3
Comments
3 min read
The Big Tech Reality Check: Why "Senior" Architecture Fails at Global Scale

The Big Tech Reality Check: Why "Senior" Architecture Fails at Global Scale

Comments 1
3 min read
What Actually Happens When You Put an AI Agent on Call

What Actually Happens When You Put an AI Agent on Call

9
Comments 2
3 min read
Background Jobs in Production: The Problems Queues Don’t Solve

Background Jobs in Production: The Problems Queues Don’t Solve

2
Comments 1
3 min read
Why is Infrastructure-as-Code so important? Hint: It's correctness

Why is Infrastructure-as-Code so important? Hint: It's correctness

Comments
2 min read
Real-World Incident Automation Using GCP: How I Cut MTTR by 80%

Real-World Incident Automation Using GCP: How I Cut MTTR by 80%

1
Comments
7 min read
Pourquoi mon serveur est devenu lent : le cas du disque SMR

Pourquoi mon serveur est devenu lent : le cas du disque SMR

Comments
2 min read
Scaling SRE Systems with GCP + Kubernetes: Lessons from Running at 10x Traffic

Scaling SRE Systems with GCP + Kubernetes: Lessons from Running at 10x Traffic

1
Comments
5 min read
OpenTelemetry vs Loki - Choosing the Right Observability Tool

OpenTelemetry vs Loki - Choosing the Right Observability Tool

1
Comments
13 min read
OpenTelemetry vs Logstash - Which Logging Tool Is Right for You?

OpenTelemetry vs Logstash - Which Logging Tool Is Right for You?

1
Comments
9 min read
OpenTelemetry Events vs Logs - Key Differences Explained

OpenTelemetry Events vs Logs - Key Differences Explained

1
Comments
15 min read
Your Kubernetes Cluster Shouldn't Need You at 3am

Your Kubernetes Cluster Shouldn't Need You at 3am

Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.