DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Reducing Toil: The Google SRE Book Applied to Startups

Reducing Toil: The Google SRE Book Applied to Startups

Comments
4 min read
How to Write API Integration Tests (That Actually Catch Bugs)

How to Write API Integration Tests (That Actually Catch Bugs)

Comments
2 min read
Failover Sounds Good… Until It Doesn’t Work

Failover Sounds Good… Until It Doesn’t Work

Comments
2 min read
[Guide] Stop 502 errors with queues ⚡

[Guide] Stop 502 errors with queues ⚡

Comments
1 min read
Incident Severity Levels: SEV-1 to SEV-5 Calibration

Incident Severity Levels: SEV-1 to SEV-5 Calibration

Comments
4 min read
AI-Augmented SRE: Where It Earns Its Keep, And Where It Doesn't

AI-Augmented SRE: Where It Earns Its Keep, And Where It Doesn't

Comments
5 min read
How to Write an Incident Postmortem That Actually Prevents Future Outages

How to Write an Incident Postmortem That Actually Prevents Future Outages

Comments
5 min read
Rubrik vs Cohesity: The Enterprise Decision Framework

Rubrik vs Cohesity: The Enterprise Decision Framework

1
Comments
6 min read
Memory Leak Detection in Long-Running Services

Memory Leak Detection in Long-Running Services

Comments
3 min read
Your Agent Just Handled That SEV2. Now What?

Your Agent Just Handled That SEV2. Now What?

Comments
2 min read
When Retries Turn Hostile — How Control Logic Kills Production Systems

When Retries Turn Hostile — How Control Logic Kills Production Systems

1
Comments
4 min read
agent-sre on PyPI: what SRE for AI agents actually means

agent-sre on PyPI: what SRE for AI agents actually means

Comments
2 min read
Agent Sprawl is Your Next Production Incident: An SRE Response to Datadog's State of AI Engineering 2026

Agent Sprawl is Your Next Production Incident: An SRE Response to Datadog's State of AI Engineering 2026

Comments
5 min read
Multi-Region Failover: Lessons from Running It Hot

Multi-Region Failover: Lessons from Running It Hot

Comments
3 min read
Multi-Region Failover: Lessons from Running It Hot

Multi-Region Failover: Lessons from Running It Hot

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.