DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Measuring What Matters: User-Centric Availability Monitoring

Measuring What Matters: User-Centric Availability Monitoring

Comments
4 min read
Reliability Is a Reputation System: How Technical Teams Earn (or Lose) Trust in Public

Reliability Is a Reputation System: How Technical Teams Earn (or Lose) Trust in Public

Comments
5 min read
Chapter 3 — RML-2 (Dialog World): Rollback as a Conversation

Chapter 3 — RML-2 (Dialog World): Rollback as a Conversation

Comments
6 min read
🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

1
Comments
3 min read
Proof-Driven Engineering: Turning “We Think” Into “We Can Show”

Proof-Driven Engineering: Turning “We Think” Into “We Can Show”

1
Comments
5 min read
Respecting Boundaries: Precise Rate Limiting in Go

Respecting Boundaries: Precise Rate Limiting in Go

1
Comments
3 min read
Stop Writing Alert Rules By Hand

Stop Writing Alert Rules By Hand

1
Comments
3 min read
AI Alert Assistant: How n8n + LLM Replace Routine Diagnostics

AI Alert Assistant: How n8n + LLM Replace Routine Diagnostics

3
Comments
7 min read
Why "Just Restart It" Stopped Working

Why "Just Restart It" Stopped Working

2
Comments
4 min read
I Got Lost in Canary Wharf for 30 Minutes, But I Found the Future of SRE

I Got Lost in Canary Wharf for 30 Minutes, But I Found the Future of SRE

24
Comments 24
4 min read
When Your System Is Up But Users Still Don’t Trust It

When Your System Is Up But Users Still Don’t Trust It

1
Comments
5 min read
Trust Is a Technical Feature: How Engineers Can Communicate Reliability Without Hype

Trust Is a Technical Feature: How Engineers Can Communicate Reliability Without Hype

Comments
5 min read
Trust Is an Engineered Outcome: How Tech Teams Can Communicate Through Failure Without Losing Their Future

Trust Is an Engineered Outcome: How Tech Teams Can Communicate Through Failure Without Losing Their Future

Comments
5 min read
Zero-Downtime Schema Changes in SQL Server: The Reality Behind “Just Run the Migration”

Zero-Downtime Schema Changes in SQL Server: The Reality Behind “Just Run the Migration”

Comments
6 min read
Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

1
Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.