DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
SFMC API Rate Limits: The Cascading Failure Pattern

SFMC API Rate Limits: The Cascading Failure Pattern

Comments
6 min read
Status pages, trust, and the limits of a green dashboard

Status pages, trust, and the limits of a green dashboard

1
Comments
3 min read
Backpressure in document pipelines is an architecture problem first

Backpressure in document pipelines is an architecture problem first

Comments
2 min read
Designing Alerts That Matters using Amazon CloudWatch

Designing Alerts That Matters using Amazon CloudWatch

Comments
4 min read
Lab: next lab sre

Lab: next lab sre

Comments
6 min read
Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

1
Comments
10 min read
How to Choose a European Dedicated Server: Tier III vs Tier II Data Centers Explained

How to Choose a European Dedicated Server: Tier III vs Tier II Data Centers Explained

Comments
4 min read
Building Production-Grade Observability: OpenTelemetry + Grafana Stack

Building Production-Grade Observability: OpenTelemetry + Grafana Stack

Comments
7 min read
Building a Status Page From Scratch vs Using a Service: A Cost Analysis

Building a Status Page From Scratch vs Using a Service: A Cost Analysis

Comments
4 min read
What Changes and What Stays the Same for SRE with AWS Frontier Agents

What Changes and What Stays the Same for SRE with AWS Frontier Agents

2
Comments
12 min read
Cron Jobs That Fix Themselves

Cron Jobs That Fix Themselves

1
Comments 1
3 min read
How to Fixed a Kubernetes CrashLoopBackOff in Production

How to Fixed a Kubernetes CrashLoopBackOff in Production

Comments
2 min read
# How I Built an On-Call Agent That Never Forgets a Past Incident

# How I Built an On-Call Agent That Never Forgets a Past Incident

Comments
5 min read
Building a Zero-Downtime Web Cluster on a Dell Latitude

Building a Zero-Downtime Web Cluster on a Dell Latitude

Comments
1 min read
The monitoring gaps that page you at 3am are the ones you didn't know existed

The monitoring gaps that page you at 3am are the ones you didn't know existed

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.