DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
RAG vs MCP is the wrong debate — here's the right framing for production AI systems

RAG vs MCP is the wrong debate — here's the right framing for production AI systems

Comments
4 min read
The Context Window Is RAM — Why Your Agent's SLIs Are Telling You It's Full

The Context Window Is RAM — Why Your Agent's SLIs Are Telling You It's Full

3
Comments
5 min read
“But it worked on my machine.”

“But it worked on my machine.”

Comments
1 min read
How I Created a DDoS Protection Engine

How I Created a DDoS Protection Engine

Comments
11 min read
AI agents don’t need more autonomy. They need route, boundary, and receipt.

AI agents don’t need more autonomy. They need route, boundary, and receipt.

3
Comments
3 min read
I built a reference site for the recurring hard parts of software work

I built a reference site for the recurring hard parts of software work

Comments
2 min read
AI Ops Agents Are a New Class of Attack Surface

AI Ops Agents Are a New Class of Attack Surface

Comments
7 min read
Service Level Objectives for Complex Microservices

Service Level Objectives for Complex Microservices

Comments
3 min read
The Prometheus label that blew our monitoring bill out 6x

The Prometheus label that blew our monitoring bill out 6x

2
Comments
4 min read
Building a Unified Operational Timeline for Multi-Tenant OpenStack Environments

Building a Unified Operational Timeline for Multi-Tenant OpenStack Environments

3
Comments
3 min read
Flip the Axis: A Layer-Based Approach to Multi-Service Migrations

Flip the Axis: A Layer-Based Approach to Multi-Service Migrations

Comments
8 min read
Building a Culture of Reliability: Beyond the SRE Handbook

Building a Culture of Reliability: Beyond the SRE Handbook

Comments
3 min read
AI SRE: The Complete Guide for Engineering Teams in 2026

AI SRE: The Complete Guide for Engineering Teams in 2026

1
Comments
10 min read
Deployment Frequency: How We Went From Weekly to 20x/Day

Deployment Frequency: How We Went From Weekly to 20x/Day

1
Comments
3 min read
Risk Management for Developers: A 2026 Practitioner Guide"

Risk Management for Developers: A 2026 Practitioner Guide"

Comments
15 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.