DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
What 100+ Production Incidents Taught Me About System Design

What 100+ Production Incidents Taught Me About System Design

9
Comments 5
5 min read
🚨 How We Rescued a Dead Azure Linux VM After SSH, Agent, and OS Disk All Broke (A Real Production War Story)

🚨 How We Rescued a Dead Azure Linux VM After SSH, Agent, and OS Disk All Broke (A Real Production War Story)

5
Comments
3 min read
When AI Writes Your Code, DevOps Becomes the Last Line of Defense

When AI Writes Your Code, DevOps Becomes the Last Line of Defense

4
Comments
4 min read
The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
Comments
5 min read
Fixing Prometheus namespace monitoring

Fixing Prometheus namespace monitoring

Comments 1
2 min read
AWS SRE's First Day with GCP: 7 Surprising Differences

AWS SRE's First Day with GCP: 7 Surprising Differences

Comments 3
6 min read
Self-Hosting LLMs: Control and Privacy

Self-Hosting LLMs: Control and Privacy

1
Comments
13 min read
Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

1
Comments 2
3 min read
Shift-Left Reliability

Shift-Left Reliability

Comments
4 min read
🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

9
Comments 3
4 min read
I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

I Monitored 10,000 Endpoints for 6 Months — Here's What Broke

1
Comments 1
7 min read
Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

8
Comments 4
2 min read
No More Surprises: Get Notified on Terraform Deprecations

No More Surprises: Get Notified on Terraform Deprecations

11
Comments 1
3 min read
The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

3
Comments
5 min read
SRE in Action: Understanding How Real Teams Use SLOs, SLIs, and Error Budgets to Stay Reliable Through Case Studies - Part 1

SRE in Action: Understanding How Real Teams Use SLOs, SLIs, and Error Budgets to Stay Reliable Through Case Studies - Part 1

4
Comments
7 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.