DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The DevOps Engineer's Guide to AWS Cost Explorer

The DevOps Engineer's Guide to AWS Cost Explorer

Comments
1 min read
Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Take Control of your Logs: Top 10 ways using the OpenTelemetry Collector

Comments
2 min read
Importance of Graceful Shutdown in Kubernetes

Importance of Graceful Shutdown in Kubernetes

3
Comments
7 min read
Amazon Cognito Observability Best Practices with Datadog

Amazon Cognito Observability Best Practices with Datadog

1
Comments
5 min read
Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

Root Cause Analysis (RCA): entendendo a causa raiz de incidentes

8
Comments
2 min read
🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

🚀 Mini Monitoring App in Go with Prometheus, Grafana & CI/CD

Comments 1
3 min read
The 67-Second OpenTelemetry Problem

The 67-Second OpenTelemetry Problem

Comments
4 min read
The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

The Resilience Playbook: 23 Strategies for Bulletproof Applications 🚀

Comments
4 min read
DSA Won’t Save You in Production

DSA Won’t Save You in Production

Comments
2 min read
Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Automating DNS with ExternalDNS on EKS and Istio: Lessons From Real-World Gotchas

Comments
4 min read
🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

🔮 Une nouvelle manière de vulgariser la programmation : plonge dans le monde magique de Grand Père Kernel

1
Comments
2 min read
The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

The Human-in-the-Loop Factor: Partnering With Amazon Q During a Production Incident

2
Comments
11 min read
Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Unlocking Site Reliability Engineering Tools for DevOps Incident Management

Comments
4 min read
Build Node.js app in Replit & use s3 as static web hosting serving with CDN

Build Node.js app in Replit & use s3 as static web hosting serving with CDN

Comments
2 min read
WTF is Site Reliability Engineering?

WTF is Site Reliability Engineering?

1
Comments
3 min read
ComunicaOps: Criando Alicerces para Construção de Plataformas

ComunicaOps: Criando Alicerces para Construção de Plataformas

3
Comments
2 min read
Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

Blue/Green e Canary no Kubernetes com Argo Rollouts [Lab Session]

15
Comments
11 min read
Why Platform Engineering? A Tale from a Busy Kitchen

Why Platform Engineering? A Tale from a Busy Kitchen

Comments
1 min read
Unboxing Terraform Internals – Part 1: The Big Picture

Unboxing Terraform Internals – Part 1: The Big Picture

Comments
5 min read
Orchestrating end-to-end service deployment using TypeScript workflows

Orchestrating end-to-end service deployment using TypeScript workflows

4
Comments
2 min read
Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

1
Comments 2
7 min read
Amazon API Gateway Observability Best Practices with Datadog

Amazon API Gateway Observability Best Practices with Datadog

1
Comments
4 min read
Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Comments
1 min read
HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

Comments
1 min read
When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

2
Comments
11 min read
loading...