DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
DevOps vs SRE vs Platform Engineering: What’s the Difference?

DevOps vs SRE vs Platform Engineering: What’s the Difference?

1
Comments
2 min read
From cronjobs to controllers: Building a production-grade Kubernetes Backup & Restore Operator

From cronjobs to controllers: Building a production-grade Kubernetes Backup & Restore Operator

1
Comments
4 min read
Datadog vs OneUptime vs OptyxStack – Understanding the Differences in Observability and Operations

Datadog vs OneUptime vs OptyxStack – Understanding the Differences in Observability and Operations

5
Comments
2 min read
Top 10 SRE Tools Dominating 2026: The Ultimate Toolkit for Reliability Engineers 🚀

Top 10 SRE Tools Dominating 2026: The Ultimate Toolkit for Reliability Engineers 🚀

5
Comments
3 min read
Top 7 AI Tools Every DevOps and SRE Engineer Needs in 2026 🚀

Top 7 AI Tools Every DevOps and SRE Engineer Needs in 2026 🚀

3
Comments
3 min read
The Limitations of Text Embeddings in RAG Applications: A Deep Engineering Dive

The Limitations of Text Embeddings in RAG Applications: A Deep Engineering Dive

Comments
19 min read
Infra Proverbs

Infra Proverbs

Comments
1 min read
Spegel, Pixie, and Why :latest Is Evil

Spegel, Pixie, and Why :latest Is Evil

Comments
4 min read
Project: One App — Three Probes — Real Failures

Project: One App — Three Probes — Real Failures

1
Comments
3 min read
Ring 0 Deployment Safety Protocol (Post-CrowdStrike)

Ring 0 Deployment Safety Protocol (Post-CrowdStrike)

1
Comments 1
2 min read
How a Kubernetes Autoscaling Incident Took Down Our API — and How I Now Debug It in Minutes

How a Kubernetes Autoscaling Incident Took Down Our API — and How I Now Debug It in Minutes

Comments 1
2 min read
Kubernetes In-Place Pod Resize

Kubernetes In-Place Pod Resize

Comments
3 min read
Datadog: Observability Lessons from 50+ AWS Apps

Datadog: Observability Lessons from 50+ AWS Apps

4
Comments
7 min read
Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Comments
7 min read
Turning block/goose into an AI SRE Agent

Turning block/goose into an AI SRE Agent

1
Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.