DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How blue/green deployments saved us from out of hours changes and downtime

How blue/green deployments saved us from out of hours changes and downtime

1
Comments
2 min read
When Software Lies Before It Fails

When Software Lies Before It Fails

Comments
5 min read
Alert Fatigue Is Real — Here's What It's Actually Costing Your Team

Alert Fatigue Is Real — Here's What It's Actually Costing Your Team

1
Comments
5 min read
Chapter 9 — RML-3 Case Files: Aligning Your Incident Response Worldview

Chapter 9 — RML-3 Case Files: Aligning Your Incident Response Worldview

1
Comments
6 min read
Automatically Committing Image Tags with Argo CD Image Updater

Automatically Committing Image Tags with Argo CD Image Updater

4
Comments
2 min read
Your Monitoring Stack Has a Blind Spot. Here's the 2-Second Window Where Servers Die

Your Monitoring Stack Has a Blind Spot. Here's the 2-Second Window Where Servers Die

2
Comments
7 min read
What is Agentic Incident Management? The End of 3 AM War Rooms

What is Agentic Incident Management? The End of 3 AM War Rooms

2
Comments
4 min read
You've Shipped Agents. Now You Have to Run Them.

You've Shipped Agents. Now You Have to Run Them.

1
Comments 2
7 min read
Chapter 4: GitOps with Terraform + ArgoCD — Self-Hosting LLMs as a Platform Product

Chapter 4: GitOps with Terraform + ArgoCD — Self-Hosting LLMs as a Platform Product

1
Comments
28 min read
The 5 Error Patterns Engineers Misclassify During Production Incidents

The 5 Error Patterns Engineers Misclassify During Production Incidents

1
Comments
4 min read
PostgreSQL High Availability: Patroni, Replication and Failover Patterns

PostgreSQL High Availability: Patroni, Replication and Failover Patterns

1
Comments
12 min read
Factories Without Belts #2 - It Began as a Trickle

Factories Without Belts #2 - It Began as a Trickle

1
Comments
7 min read
Factories Without Belts

Factories Without Belts

1
Comments
11 min read
The Technology You Never See Is Often What Breaks First

The Technology You Never See Is Often What Breaks First

1
Comments
5 min read
Topology-Aware AI Agents for Observability: Automating SLO Breach Root Cause Analysis

Topology-Aware AI Agents for Observability: Automating SLO Breach Root Cause Analysis

1
Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.