DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Why Self-Hosting made me a better engineer

Why Self-Hosting made me a better engineer

1
Comments
4 min read
Linux Fundamentals for DevOps & SRE: The Only Guide You'll Ever Need

Linux Fundamentals for DevOps & SRE: The Only Guide You'll Ever Need

10
Comments
15 min read
Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

Kubernetes Storage: Trading a Ferrari for a Reliable Minivan.

1
Comments 2
3 min read
Netlify Site + HCP Terraform Remote State

Netlify Site + HCP Terraform Remote State

Comments
3 min read
Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Comments
2 min read
Importance of Graceful Shutdown in Kubernetes

Importance of Graceful Shutdown in Kubernetes

3
Comments
7 min read
Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

8
Comments
2 min read
🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

Comments 1
3 min read
The 67-Second OpenTelemetry Problem

The 67-Second OpenTelemetry Problem

Comments
4 min read
The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

Comments
4 min read
DSA Won’t Save You in Production

DSA Won’t Save You in Production

Comments
2 min read
Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Comments
4 min read
🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

1
Comments
2 min read
The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

2
Comments
11 min read
Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.