DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Service Level Objectives for Complex Microservices

Service Level Objectives for Complex Microservices

Comments
3 min read
Flip the Axis: A Layer-Based Approach to Multi-Service Migrations

Flip the Axis: A Layer-Based Approach to Multi-Service Migrations

Comments
8 min read
Building a Culture of Reliability: Beyond the SRE Handbook

Building a Culture of Reliability: Beyond the SRE Handbook

Comments
3 min read
AI SRE: The Complete Guide for Engineering Teams in 2026

AI SRE: The Complete Guide for Engineering Teams in 2026

1
Comments
10 min read
Debugging Kubernetes OOMKilled: A Step-by-Step Guide

Debugging Kubernetes OOMKilled: A Step-by-Step Guide

Comments
3 min read
Deployment Frequency: How We Went From Weekly to 20x/Day

Deployment Frequency: How We Went From Weekly to 20x/Day

1
Comments
3 min read
Risk Management for Developers: A 2026 Practitioner Guide"

Risk Management for Developers: A 2026 Practitioner Guide"

Comments
15 min read
How we cut alert noise 80% with semantic correlation (and a little LLM RCA)

How we cut alert noise 80% with semantic correlation (and a little LLM RCA)

Comments
4 min read
The logs said everything was fine.

The logs said everything was fine.

Comments
1 min read
A2A + MCP in Production: The SRE Reliability Framework Nobody Has Written Yet

A2A + MCP in Production: The SRE Reliability Framework Nobody Has Written Yet

Comments
8 min read
The Actual Cost of Self-Hosting Your LLM (Nobody Does This Math First)

The Actual Cost of Self-Hosting Your LLM (Nobody Does This Math First)

Comments
4 min read
Incident Communication: The Status Page That Builds Trust

Incident Communication: The Status Page That Builds Trust

Comments
3 min read
OCI Run Command Advanced Guide: Remote Execution, Object Storage Scripts, and Production Troubleshooting

OCI Run Command Advanced Guide: Remote Execution, Object Storage Scripts, and Production Troubleshooting

Comments
4 min read
Load Testing in Production: How We Do It Safely

Load Testing in Production: How We Do It Safely

Comments
3 min read
DORA metrics are a CFO tool, not a dev tool

DORA metrics are a CFO tool, not a dev tool

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.