DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Factories Without Belts #2 - It Began as a Trickle

Factories Without Belts #2 - It Began as a Trickle

Comments
7 min read
Factories Without Belts

Factories Without Belts

Comments
11 min read
Chapter 11 — A Field Recipe for RML: Start Small, Grow It

Chapter 11 — A Field Recipe for RML: Start Small, Grow It

Comments
4 min read
Pod bị OOMKilled trong Kubernetes: chẩn đoán nhanh và cách fix bền vững

Pod bị OOMKilled trong Kubernetes: chẩn đoán nhanh và cách fix bền vững

Comments
3 min read
Infrastructure dilemma

Infrastructure dilemma

1
Comments
2 min read
The Big Tech Reality Check: Why "Senior" Architecture Fails at Global Scale

The Big Tech Reality Check: Why "Senior" Architecture Fails at Global Scale

Comments 1
3 min read
Scaling SRE Systems with GCP + Kubernetes: Lessons from Running at 10x Traffic

Scaling SRE Systems with GCP + Kubernetes: Lessons from Running at 10x Traffic

1
Comments
5 min read
What Actually Happens When You Put an AI Agent on Call

What Actually Happens When You Put an AI Agent on Call

7
Comments 1
3 min read
Incident Debugging in Production Systems (Part 2)

Incident Debugging in Production Systems (Part 2)

Comments
3 min read
Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

1
Comments
5 min read
When Monitoring Saves the Day: How We Optimized Our Production Database Without Increasing Costs

When Monitoring Saves the Day: How We Optimized Our Production Database Without Increasing Costs

Comments
3 min read
Preventing Microservice Meltdowns: Adaptive Retries and Circuit Breakers in Go

Preventing Microservice Meltdowns: Adaptive Retries and Circuit Breakers in Go

Comments
3 min read
Claude on AWS Bedrock was throttling requests and the billing dashboard showed zero issues

Claude on AWS Bedrock was throttling requests and the billing dashboard showed zero issues

Comments
1 min read
The "Google SRE" Interview Process: Why Senior Engineers Fail (2026+ Guide)

The "Google SRE" Interview Process: Why Senior Engineers Fail (2026+ Guide)

1
Comments
10 min read
Syscalls in Kubernetes: The Invisible Layer That Runs Everything

Syscalls in Kubernetes: The Invisible Layer That Runs Everything

1
Comments
21 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.