DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
A hard-earned rule from incident retrospectives:

A hard-earned rule from incident retrospectives:

1
Comments
2 min read
Exponential Back-off with Jitter: Retries

Exponential Back-off with Jitter: Retries

Comments
3 min read
I recorded a demo of OperatorMesh — paste logs, get root cause in seconds

I recorded a demo of OperatorMesh — paste logs, get root cause in seconds

Comments
1 min read
End of week. Here's the thing I kept coming back to:

End of week. Here's the thing I kept coming back to:

Comments
1 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
A free AI incident triage tool — paste logs, get root cause in seconds

A free AI incident triage tool — paste logs, get root cause in seconds

Comments
1 min read
On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
Multi-Cloud Incident Management: Challenges and Solutions

Multi-Cloud Incident Management: Challenges and Solutions

Comments
5 min read
Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
When Your AI Agent Has an Incident, Your Runbook Isn't Ready

When Your AI Agent Has an Incident, Your Runbook Isn't Ready

Comments
9 min read
Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.