DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
LiveOps Rollback Planning: What to Do When a Game Event Goes Wrong

LiveOps Rollback Planning: What to Do When a Game Event Goes Wrong

1
Comments
7 min read
The Ultimate Guide to Production-Grade AI Agents

The Ultimate Guide to Production-Grade AI Agents

1
Comments
20 min read
When Platform Engineering Drifts into Control: Why your internal platform may be killing engineering judgement

When Platform Engineering Drifts into Control: Why your internal platform may be killing engineering judgement

Comments
6 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments
2 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
Why did one day of AI cost more than a month of servers?

Why did one day of AI cost more than a month of servers?

Comments
5 min read
Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
GPUs Demystified: What Every Developer Needs to Know in the AI Era

GPUs Demystified: What Every Developer Needs to Know in the AI Era

1
Comments
10 min read
Blameless Postmortems in Practice

Blameless Postmortems in Practice

Comments
3 min read
Daftar Periksa Kesiapan Produksi AI Setelah POC: Dari Sandbox ke Sistem Nyata

Daftar Periksa Kesiapan Produksi AI Setelah POC: Dari Sandbox ke Sistem Nyata

Comments
7 min read
The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
Kubernetes 1.36: 8 Features Worth Your Attention

Kubernetes 1.36: 8 Features Worth Your Attention

Comments
3 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
DevOps vs SRE: Key Differences Explained [2026 Guide]

DevOps vs SRE: Key Differences Explained [2026 Guide]

Comments
2 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.