DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Trust Is a Feature You Can Break

Trust Is a Feature You Can Break

Comments
5 min read
Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

Comments
6 min read
The Economics of Reliability: Cost, Risk, and Architectural Tradeoffs

The Economics of Reliability: Cost, Risk, and Architectural Tradeoffs

1
Comments
13 min read
The Old Guard vs. The New Way: Traditional Infrastructure Management vs. Modern DevOps

The Old Guard vs. The New Way: Traditional Infrastructure Management vs. Modern DevOps

Comments
5 min read
PORT VS SOCKET

PORT VS SOCKET

1
Comments
3 min read
Why your developers hate your internal tooling (and how to fix it)

Why your developers hate your internal tooling (and how to fix it)

Comments
2 min read
Your Identity System Is Your Biggest Single Point of Failure

Your Identity System Is Your Biggest Single Point of Failure

1
Comments
5 min read
Why Nobody Completes Postmortem Action Items (and How to Fix It)

Why Nobody Completes Postmortem Action Items (and How to Fix It)

1
Comments
1 min read
Your AI Agent Is Available, Fast, and Making Terrible Decisions

Your AI Agent Is Available, Fast, and Making Terrible Decisions

1
Comments
6 min read
Hosted control plane: when it simplifies operations and when it adds complexity

Hosted control plane: when it simplifies operations and when it adds complexity

Comments
11 min read
The Most Expensive Kubernetes Mistake: Memory Limits

The Most Expensive Kubernetes Mistake: Memory Limits

1
Comments 2
3 min read
Chaos by Design: Production Maintenance Drills on Kubernetes

Chaos by Design: Production Maintenance Drills on Kubernetes

2
Comments
5 min read
OpenTelemetry: the one instrumentation standard to rule them all

OpenTelemetry: the one instrumentation standard to rule them all

1
Comments
2 min read
Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

2
Comments
7 min read
Alert Fatigue is Breaking DevOps: Here is the Math

Alert Fatigue is Breaking DevOps: Here is the Math

1
Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.