DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
ComunicaOps: Criando Alicerces para Construção de Plataformas

ComunicaOps: Criando Alicerces para Construção de Plataformas

3
Comments
2 min read
OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

1
Comments
3 min read
Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

15
Comments
11 min read
Why Platform Engineering? A Tale from a Busy Kitchen

Why Platform Engineering? A Tale from a Busy Kitchen

Comments
1 min read
Unboxing Terraform Internals – Part 1: The Big Picture

Unboxing Terraform Internals – Part 1: The Big Picture

Comments
5 min read
Orchestrating end-to-end service deployment using TypeScript workflows

Orchestrating end-to-end service deployment using TypeScript workflows

4
Comments
2 min read
Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

1
Comments 2
7 min read
I Built an AI-Powered CLI to Help Debug Production Incidents | Meet Incident Helper

I Built an AI-Powered CLI to Help Debug Production Incidents | Meet Incident Helper

1
Comments
3 min read
Amazon API Gateway Observability Best Practices with Datadog

Amazon API Gateway Observability Best Practices with Datadog

1
Comments
4 min read
Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Comments
1 min read
HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

Comments
1 min read
When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

2
Comments
11 min read
Why Use a Status Page Aggregator?

Why Use a Status Page Aggregator?

Comments
5 min read
Cost-Tracking and Model-Spend Monitoring with LiteLLM

Cost-Tracking and Model-Spend Monitoring with LiteLLM

1
Comments
2 min read
Unleashing Resilience: 15+ Essential Chaos Engineering Tools for Robust Systems

Unleashing Resilience: 15+ Essential Chaos Engineering Tools for Robust Systems

Comments
6 min read
AI-Powered Kubernetes Debugging with Python and Ollama

AI-Powered Kubernetes Debugging with Python and Ollama

Comments
6 min read
Understanding `kube-system` in Kubernetes: A City Analogy You’ll Never Forget

Understanding `kube-system` in Kubernetes: A City Analogy You’ll Never Forget

5
Comments
2 min read
Top 15 Must-Have CI/CD Tools for DevOps & SRE Success

Top 15 Must-Have CI/CD Tools for DevOps & SRE Success

Comments
6 min read
Why Was My Localhost SSH Taking 3 Seconds? A Deep Dive.

Why Was My Localhost SSH Taking 3 Seconds? A Deep Dive.

Comments
4 min read
🚀 The Ultimate DevOps Emoji Glossary

🚀 The Ultimate DevOps Emoji Glossary

1
Comments
2 min read
Mastering `map()` and `tolist()` in Terraform 🧰

Mastering `map()` and `tolist()` in Terraform 🧰

Comments
2 min read
How to Write Effective Incident Post-Mortems: A Complete Guide

How to Write Effective Incident Post-Mortems: A Complete Guide

6
Comments
6 min read
🧹 One Bash Script vs. the Entire Hype Stack

🧹 One Bash Script vs. the Entire Hype Stack

Comments
1 min read
Error Budget Is All You Need - Part 1

Error Budget Is All You Need - Part 1

Comments
9 min read
Error Budget Is All You Need - Part 2

Error Budget Is All You Need - Part 2

Comments
9 min read
loading...