DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Human Operators in Distributed Financial Systems: When People Become Part of the Architecture

Human Operators in Distributed Financial Systems: When People Become Part of the Architecture

Comments
4 min read
80% of GitHub Repos Still Use Static AWS Credentials in 2026

80% of GitHub Repos Still Use Static AWS Credentials in 2026

Comments
4 min read
How to Fixed a Kubernetes CrashLoopBackOff in Production

How to Fixed a Kubernetes CrashLoopBackOff in Production

1
Comments
2 min read
Incident response / On-call: timeouts — operational runbook (playbook thực chiến)

Incident response / On-call: timeouts — operational runbook (playbook thực chiến)

Comments
3 min read
From MVP to Production: Scaling a Speech AI Service

From MVP to Production: Scaling a Speech AI Service

Comments
3 min read
I Don't Want AI to Replace DevOps. I Want It to Read the Docs I'm Too Tired to Read

I Don't Want AI to Replace DevOps. I Want It to Read the Docs I'm Too Tired to Read

4
Comments
9 min read
MCP Security in Action: Decision-Lineage Observability

MCP Security in Action: Decision-Lineage Observability

Comments 1
4 min read
Something I wish someone had told me five years earlier:

Something I wish someone had told me five years earlier:

Comments
2 min read
The Hidden Costs of Real-Time: Latency vs Accuracy Trade-offs

The Hidden Costs of Real-Time: Latency vs Accuracy Trade-offs

Comments
2 min read
AI Observability: the problem nobody is solving well in 2026

AI Observability: the problem nobody is solving well in 2026

Comments
5 min read
A hard-earned rule from incident retrospectives:

A hard-earned rule from incident retrospectives:

1
Comments
2 min read
Exponential Back-off with Jitter: Retries

Exponential Back-off with Jitter: Retries

Comments
3 min read
I recorded a demo of OperatorMesh — paste logs, get root cause in seconds

I recorded a demo of OperatorMesh — paste logs, get root cause in seconds

Comments
1 min read
End of week. Here's the thing I kept coming back to:

End of week. Here's the thing I kept coming back to:

Comments
1 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.