DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Comments
7 min read
Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Comments
2 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
How We Handle SSL Certificate Expiration Alerts at Scale

How We Handle SSL Certificate Expiration Alerts at Scale

Comments
6 min read
This is what separates teams that scale from teams that survive:

This is what separates teams that scale from teams that survive:

1
Comments
1 min read
# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

Comments
5 min read
Are you using traffic mirroring in production? If not, try it out.

Are you using traffic mirroring in production? If not, try it out.

Comments
2 min read
Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

1
Comments
5 min read
When should you use canary deployments?

When should you use canary deployments?

1
Comments
5 min read
SFMC API Rate Limits: The Cascading Failure Pattern

SFMC API Rate Limits: The Cascading Failure Pattern

Comments
6 min read
Backpressure in document pipelines is an architecture problem first

Backpressure in document pipelines is an architecture problem first

Comments
2 min read
Designing Alerts That Matters using Amazon CloudWatch

Designing Alerts That Matters using Amazon CloudWatch

Comments
4 min read
Lab: next lab sre

Lab: next lab sre

Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.