DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Why Self-Hosting made me a better engineer

Why Self-Hosting made me a better engineer

Comments
4 min read
Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

4
Comments
2 min read
The 67-Second OpenTelemetry Problem

The 67-Second OpenTelemetry Problem

3
Comments
4 min read
Liveness vs Readiness in Kubernetes: The Truth for Frontend Apps

Liveness vs Readiness in Kubernetes: The Truth for Frontend Apps

Comments
2 min read
Gonzo - The Go based TUI for log analysis

Gonzo - The Go based TUI for log analysis

Comments
1 min read
Why SRE is not for entry-levels

Why SRE is not for entry-levels

Comments
2 min read
AI-Driven DevOps: How AIOps is Transforming Observability, Incident Response, and Automation

AI-Driven DevOps: How AIOps is Transforming Observability, Incident Response, and Automation

Comments 1
3 min read
Observability: Beyond Monitoring in Modern Systems

Observability: Beyond Monitoring in Modern Systems

Comments 1
3 min read
🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

Comments
2 min read
🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

3
Comments
3 min read
Netlify Site + HCP Terraform Remote State

Netlify Site + HCP Terraform Remote State

Comments
3 min read
WTF is Site Reliability Engineering?

WTF is Site Reliability Engineering?

1
Comments
3 min read
It's Always DNS: And Other Lies We Tell Ourselves at 2 AM

It's Always DNS: And Other Lies We Tell Ourselves at 2 AM

Comments
5 min read
Amazon Cognito Observability Best Practices with Datadog

Amazon Cognito Observability Best Practices with Datadog

1
Comments
5 min read
The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

Comments
4 min read
DSA Won’t Save You in Production

DSA Won’t Save You in Production

Comments
2 min read
Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Comments
4 min read
10 Essential Tips for Setting Up Monitoring for Your SaaS

10 Essential Tips for Setting Up Monitoring for Your SaaS

Comments
5 min read
Kubernetes Node Management - Drain, Cordon and Uncordon

Kubernetes Node Management - Drain, Cordon and Uncordon

5
Comments
2 min read
The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

1
Comments
11 min read
Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Comments
4 min read
Build Node.js app in Replit & use s3 as static web hosting serving with CDN

Build Node.js app in Replit & use s3 as static web hosting serving with CDN

Comments
2 min read
ComunicaOps: Criando Alicerces para Construção de Plataformas

ComunicaOps: Criando Alicerces para Construção de Plataformas

1
Comments
2 min read
OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

1
Comments
3 min read
Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

15
Comments
11 min read
loading...