DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Reliability Is a Socio-Technical Problem

Reliability Is a Socio-Technical Problem

1
Comments
11 min read
When Asynchronous Systems Fail Quietly, Reliability Teams Pay the Price

When Asynchronous Systems Fail Quietly, Reliability Teams Pay the Price

Comments
5 min read
Trust Is a Feature You Can Break

Trust Is a Feature You Can Break

1
Comments
5 min read
What a 60-second war-room scan reveals

What a 60-second war-room scan reveals

Comments
3 min read
Alert Fatigue is Breaking DevOps: Here is the Math

Alert Fatigue is Breaking DevOps: Here is the Math

1
Comments
2 min read
Sherlock Holmes: The Case Of AI Brought Down Our Servers

Sherlock Holmes: The Case Of AI Brought Down Our Servers

6
Comments 3
6 min read
Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

1
Comments 1
6 min read
How to Design a DevOps Monitoring Strategy That Actually Works

How to Design a DevOps Monitoring Strategy That Actually Works

Comments
3 min read
The "DevOps Engineer" is Dead. Long Live the Platform Architect.

The "DevOps Engineer" is Dead. Long Live the Platform Architect.

5
Comments
2 min read
Why your developers hate your internal tooling (and how to fix it)

Why your developers hate your internal tooling (and how to fix it)

Comments
2 min read
PORT VS SOCKET

PORT VS SOCKET

2
Comments
3 min read
Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter

Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter

Comments
3 min read
Your Identity System Is Your Biggest Single Point of Failure

Your Identity System Is Your Biggest Single Point of Failure

1
Comments
5 min read
Context Switching Between DevOps Tools Is Costing You More Than You Think

Context Switching Between DevOps Tools Is Costing You More Than You Think

2
Comments
3 min read
Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

1
Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.