DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Hiring SREs: What I Look For After Interviewing 100+ Candidates

Hiring SREs: What I Look For After Interviewing 100+ Candidates

Comments
3 min read
What Really Happens When You Type a URL in Your Browser? (Explained Step-by-Step)

What Really Happens When You Type a URL in Your Browser? (Explained Step-by-Step)

Comments
1 min read
The railway went down for 10 hours, and it wasn't their fault. Here's the part nobody is talking about.

The railway went down for 10 hours, and it wasn't their fault. Here's the part nobody is talking about.

1
Comments
5 min read
Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Comments
7 min read
Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Comments
2 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
How We Handle SSL Certificate Expiration Alerts at Scale

How We Handle SSL Certificate Expiration Alerts at Scale

Comments
6 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
This is what separates teams that scale from teams that survive:

This is what separates teams that scale from teams that survive:

1
Comments
1 min read
AWS Summit Seoul 2026: Korean Enterprises And Agentic AI

AWS Summit Seoul 2026: Korean Enterprises And Agentic AI

1
Comments
5 min read
# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

Comments
5 min read
Are you using traffic mirroring in production? If not, try it out.

Are you using traffic mirroring in production? If not, try it out.

Comments
2 min read
Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
Production-Grade Observability: Building a Complete LGTM Stack with SLOs, DORA Metrics, and Intelligent Alerting

Production-Grade Observability: Building a Complete LGTM Stack with SLOs, DORA Metrics, and Intelligent Alerting

2
Comments
10 min read
Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

1
Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.