DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

6
Comments
3 min read
Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Comments
3 min read
USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

Comments
7 min read
How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

Comments
3 min read
The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

Comments
3 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Comments
4 min read
Map a Kubernetes cluster with one command

Map a Kubernetes cluster with one command

Comments
1 min read
After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

Comments
3 min read
The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
Comments
5 min read
A Complete Production-Ready Checklist for Smooth, Safe Deployments

A Complete Production-Ready Checklist for Smooth, Safe Deployments

1
Comments
1 min read
StatusGator Alternative in 2025: Why IT Managers Pick IsDown

StatusGator Alternative in 2025: Why IT Managers Pick IsDown

Comments
14 min read
Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Comments
2 min read
From Signals to Reliability: SLOs, Runbooks and Post-Mortems

From Signals to Reliability: SLOs, Runbooks and Post-Mortems

Comments
13 min read
The DynamoDB DNS Race Condition That Broke The Internet (And Why Your Self-Healing Systems Might Be Suicide-Bots)

The DynamoDB DNS Race Condition That Broke The Internet (And Why Your Self-Healing Systems Might Be Suicide-Bots)

Comments
2 min read
🏗️ Building the Platform That Empowers Reliability by Design

🏗️ Building the Platform That Empowers Reliability by Design

Comments
3 min read
Modern CTO Podcast: The AI SRE Hype and How to Get it Right

Modern CTO Podcast: The AI SRE Hype and How to Get it Right

Comments
1 min read
How to reduce on-call friction using AI Voice Agent

How to reduce on-call friction using AI Voice Agent

Comments 1
4 min read
Тулзы для работы с сотнями серверов

Тулзы для работы с сотнями серверов

Comments
1 min read
The Samurai Server: Why "Heroic" Systems Always Die

The Samurai Server: Why "Heroic" Systems Always Die

Comments
4 min read
The Complete 2026 and beyond Google SRE Interview Preparation Guide — Frameworks, Scenarios, and Roadmap

The Complete 2026 and beyond Google SRE Interview Preparation Guide — Frameworks, Scenarios, and Roadmap

Comments
4 min read
Inside the AWS US-East-1 Outage: Why DNS Failure Triggered a Global Cloud Crisis

Inside the AWS US-East-1 Outage: Why DNS Failure Triggered a Global Cloud Crisis

Comments
5 min read
Your Wiki is Useless Under Pressure: 9 Actionable Steps to Drastically Lower MTTR

Your Wiki is Useless Under Pressure: 9 Actionable Steps to Drastically Lower MTTR

Comments
4 min read
AWS Lambda Reload

AWS Lambda Reload

Comments
2 min read
SREday SF 2025: Human Centered SRE In An AI World

SREday SF 2025: Human Centered SRE In An AI World

Comments
7 min read
The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

4
Comments
5 min read
loading...