DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Architecture Drift Nobody Measures

The Architecture Drift Nobody Measures

2
Comments 2
9 min read
Reliability Is a Socio-Technical Problem

Reliability Is a Socio-Technical Problem

1
Comments
11 min read
When Asynchronous Systems Fail Quietly, Reliability Teams Pay the Price

When Asynchronous Systems Fail Quietly, Reliability Teams Pay the Price

Comments
5 min read
Trust Is a Feature You Can Break

Trust Is a Feature You Can Break

1
Comments
5 min read
What a 60-second war-room scan reveals

What a 60-second war-room scan reveals

Comments
3 min read
Alert Fatigue is Breaking DevOps: Here is the Math

Alert Fatigue is Breaking DevOps: Here is the Math

1
Comments
2 min read
Sherlock Holmes: The Case Of AI Brought Down Our Servers

Sherlock Holmes: The Case Of AI Brought Down Our Servers

6
Comments 3
6 min read
How to Design a DevOps Monitoring Strategy That Actually Works

How to Design a DevOps Monitoring Strategy That Actually Works

Comments
3 min read
The "DevOps Engineer" is Dead. Long Live the Platform Architect.

The "DevOps Engineer" is Dead. Long Live the Platform Architect.

5
Comments
2 min read
Why your developers hate your internal tooling (and how to fix it)

Why your developers hate your internal tooling (and how to fix it)

Comments
2 min read
PORT VS SOCKET

PORT VS SOCKET

2
Comments
3 min read
Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter

Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter

Comments
3 min read
Your Identity System Is Your Biggest Single Point of Failure

Your Identity System Is Your Biggest Single Point of Failure

1
Comments
5 min read
Context Switching Between DevOps Tools Is Costing You More Than You Think

Context Switching Between DevOps Tools Is Costing You More Than You Think

2
Comments
3 min read
Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

Multi-Cloud Cascading Failure Risks: Why Active-Active is a Trap

1
Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.