DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
What 99.9% Uptime Actually Means: 8.7 Hours of Downtime Per Year

What 99.9% Uptime Actually Means: 8.7 Hours of Downtime Per Year

Comments
4 min read
Silent Failures: The Bug That Won't Page You

Silent Failures: The Bug That Won't Page You

1
Comments
3 min read
If You Were a Server: How to Detect Issues and Keep Things Running Smoothly

If You Were a Server: How to Detect Issues and Keep Things Running Smoothly

1
Comments
10 min read
Infrastructure dilemma

Infrastructure dilemma

1
Comments
2 min read
Developer autonomy and the work that repeats after ship

Developer autonomy and the work that repeats after ship

3
Comments
3 min read
When should you use canary deployments?

When should you use canary deployments?

6
Comments
5 min read
Incident Debugging in Production Systems (Part 2)

Incident Debugging in Production Systems (Part 2)

Comments
3 min read
SRE vs DevOps: the sequencing mistake that burns most startups.

SRE vs DevOps: the sequencing mistake that burns most startups.

Comments 1
3 min read
Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

1
Comments
5 min read
Canonical Log Lines Stripe Brilliant Technique for Production Observability

Canonical Log Lines Stripe Brilliant Technique for Production Observability

Comments
4 min read
Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

1
Comments
15 min read
Our Production System Went Down at 2:13AM — Here’s Exactly What Happened

Our Production System Went Down at 2:13AM — Here’s Exactly What Happened

1
Comments
1 min read
When Monitoring Saves the Day: How We Optimized Our Production Database Without Increasing Costs

When Monitoring Saves the Day: How We Optimized Our Production Database Without Increasing Costs

Comments
3 min read
Why SRE Principles Are the Missing Layer in MCP Security

Why SRE Principles Are the Missing Layer in MCP Security

2
Comments 1
5 min read
Claude on AWS Bedrock was throttling requests and the billing dashboard showed zero issues

Claude on AWS Bedrock was throttling requests and the billing dashboard showed zero issues

Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.