DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
I built a free incident triage tool — paste logs, get root cause in seconds

I built a free incident triage tool — paste logs, get root cause in seconds

Comments
1 min read
PagerDuty Alternative for Root Cause Analysis: Why SRE Teams Are Adding AI Investigation

PagerDuty Alternative for Root Cause Analysis: Why SRE Teams Are Adding AI Investigation

Comments
6 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud

Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud

6
Comments 2
8 min read
Design DEGRADE (Defer) and Your Agent Becomes “Operations”

Design DEGRADE (Defer) and Your Agent Becomes “Operations”

1
Comments
7 min read
The Next Frontier of SRE: Agentic Operations and Immutable Trust

The Next Frontier of SRE: Agentic Operations and Immutable Trust

Comments
3 min read
Failover Sounds Good… Until It Doesn’t Work

Failover Sounds Good… Until It Doesn’t Work

1
Comments
2 min read
How an AI Agent Spent $12,000 While "Successfully" Fixing a Single Bug

How an AI Agent Spent $12,000 While "Successfully" Fixing a Single Bug

1
Comments
4 min read
I’m looking for a small number of maintainers for NornicDB

I’m looking for a small number of maintainers for NornicDB

Comments
2 min read
Using Graphify to turn Incident Data into a Knowledge Graph

Using Graphify to turn Incident Data into a Knowledge Graph

2
Comments 1
3 min read
Claude Code for the Outer Loop: An AI SRE Playbook to Reduce On-Call Toil

Claude Code for the Outer Loop: An AI SRE Playbook to Reduce On-Call Toil

4
Comments
18 min read
Don’t “Execute” the LLM: Typed Actions + Verifiers for Safe Business Agents

Don’t “Execute” the LLM: Typed Actions + Verifiers for Safe Business Agents

1
Comments
8 min read
Are AI Observability Tools Actually Helping?

Are AI Observability Tools Actually Helping?

10
Comments
1 min read
Something every senior engineer learns the expensive way:

Something every senior engineer learns the expensive way:

1
Comments
1 min read
A hard-earned rule from incident retrospectives:

A hard-earned rule from incident retrospectives:

1
Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.