DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Your AI Agent Is Available, Fast, and Making Terrible Decisions

Your AI Agent Is Available, Fast, and Making Terrible Decisions

1
Comments
6 min read
Hosted control plane: when it simplifies operations and when it adds complexity

Hosted control plane: when it simplifies operations and when it adds complexity

Comments
11 min read
Terraform Provisioners: The Most Misunderstood Feature in IaC

Terraform Provisioners: The Most Misunderstood Feature in IaC

1
Comments
3 min read
The Most Expensive Kubernetes Mistake: Memory Limits

The Most Expensive Kubernetes Mistake: Memory Limits

1
Comments 2
3 min read
How much torment can my little homelab take? Part 1.

How much torment can my little homelab take? Part 1.

2
Comments
10 min read
Chaos by Design: Production Maintenance Drills on Kubernetes

Chaos by Design: Production Maintenance Drills on Kubernetes

2
Comments
5 min read
Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

2
Comments
7 min read
Trust Is an Engineering Output: How Teams Earn Credibility When Systems Break

Trust Is an Engineering Output: How Teams Earn Credibility When Systems Break

2
Comments
5 min read
OpenTelemetry vs. Telegraf - Choosing the Right Monitoring Tool

OpenTelemetry vs. Telegraf - Choosing the Right Monitoring Tool

Comments
11 min read
OpenTelemetry vs Grafana - Key Differences Explained

OpenTelemetry vs Grafana - Key Differences Explained

Comments
13 min read
Health Check Monitoring With OpenTelemetry | Complete Code Tutorial

Health Check Monitoring With OpenTelemetry | Complete Code Tutorial

1
Comments
13 min read
Why Technical Systems Rarely Fail “Suddenly” — and How to Notice the Warnings Early

Why Technical Systems Rarely Fail “Suddenly” — and How to Notice the Warnings Early

1
Comments
5 min read
Beyond Backups: Architecture That Doesn't Blink

Beyond Backups: Architecture That Doesn't Blink

Comments
8 min read
OpenTelemetry vs ELK - Choosing the Right Observability Stack

OpenTelemetry vs ELK - Choosing the Right Observability Stack

1
Comments
15 min read
When Your Monitoring System Stops Monitoring

When Your Monitoring System Stops Monitoring

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.