DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Build an AI Code Review Agent in GitHub Actions (That Actually Reduces Incidents

Build an AI Code Review Agent in GitHub Actions (That Actually Reduces Incidents

Comments
4 min read
Blameless Postmortems That Actually Change Your System

Blameless Postmortems That Actually Change Your System

Comments
7 min read
Debugging Kubernetes Nodes in NotReady State

Debugging Kubernetes Nodes in NotReady State

Comments
4 min read
Kubernetes 1.36 apiserver /readyz now waits for watch cache

Kubernetes 1.36 apiserver /readyz now waits for watch cache

Comments
5 min read
Kubernetes Upgrade Checklist: The Runbook I Wish I Had

Kubernetes Upgrade Checklist: The Runbook I Wish I Had

Comments
5 min read
OpenClaw for SRE: Self-Hosted AI Agents That Actually Respond to Incidents

OpenClaw for SRE: Self-Hosted AI Agents That Actually Respond to Incidents

Comments
6 min read
Chapter 3: Terraform + Helm — A Better Abstraction

Chapter 3: Terraform + Helm — A Better Abstraction

Comments
10 min read
SaaS Uptime Monitoring Explained: How Late Outage Detection Hurts Growth and Trust

SaaS Uptime Monitoring Explained: How Late Outage Detection Hurts Growth and Trust

5
Comments
3 min read
Measuring What Matters: User-Centric Availability Monitoring

Measuring What Matters: User-Centric Availability Monitoring

Comments
4 min read
Reliability Is a Reputation System: How Technical Teams Earn (or Lose) Trust in Public

Reliability Is a Reputation System: How Technical Teams Earn (or Lose) Trust in Public

Comments
5 min read
Chapter 3 — RML-2 (Dialog World): Rollback as a Conversation

Chapter 3 — RML-2 (Dialog World): Rollback as a Conversation

Comments
6 min read
Proof-Driven Engineering: Turning “We Think” Into “We Can Show”

Proof-Driven Engineering: Turning “We Think” Into “We Can Show”

1
Comments
5 min read
Why Platform Engineering Is the Next Big Shift (and How Ops Teams Win)

Why Platform Engineering Is the Next Big Shift (and How Ops Teams Win)

Comments 2
3 min read
Trust Is a Technical Feature: How Engineers Can Communicate Reliability Without Hype

Trust Is a Technical Feature: How Engineers Can Communicate Reliability Without Hype

Comments
5 min read
Trust Is an Engineered Outcome: How Tech Teams Can Communicate Through Failure Without Losing Their Future

Trust Is an Engineered Outcome: How Tech Teams Can Communicate Through Failure Without Losing Their Future

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.