DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Crash Dumps in Linux Kernel & Application Deep Dive

Crash Dumps in Linux Kernel & Application Deep Dive

1
Comments
3 min read
Microservices and the Myth of Fault Isolation

Microservices and the Myth of Fault Isolation

Comments
3 min read
The Hidden Cost of AI in SRE: Why Automation Hasn’t Fixed Burnout

The Hidden Cost of AI in SRE: Why Automation Hasn’t Fixed Burnout

1
Comments
2 min read
The Merge Queue Scaling Problem Every Growing Team Hits

The Merge Queue Scaling Problem Every Growing Team Hits

Comments
1 min read
Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

8
Comments 4
2 min read
🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

9
Comments 3
4 min read
OOMKilled Pods: A guide to troubleshooting.

OOMKilled Pods: A guide to troubleshooting.

Comments
5 min read
logbloglogbloglogblog

logbloglogbloglogblog

Comments
4 min read
Why You're Spending Too Much Money on Datadog Metrics

Why You're Spending Too Much Money on Datadog Metrics

1
Comments
2 min read
Gonzo - The Go based TUI for log analysis

Gonzo - The Go based TUI for log analysis

Comments
1 min read
Why Self-Hosting made me a better engineer

Why Self-Hosting made me a better engineer

1
Comments
4 min read
Linux Fundamentals for DevOps & SRE: The Only Guide You'll Ever Need

Linux Fundamentals for DevOps & SRE: The Only Guide You'll Ever Need

10
Comments
15 min read
Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

1
Comments 2
3 min read
Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Comments
2 min read
Importance of Graceful Shutdown in Kubernetes

Importance of Graceful Shutdown in Kubernetes

3
Comments
7 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.