DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Microservices Reliability Playbook, Part 5 - Write patterns

Microservices Reliability Playbook, Part 5 - Write patterns

Comments
4 min read
Microservices Reliability Playbook, Part 1 - Introduction to Risk

Microservices Reliability Playbook, Part 1 - Introduction to Risk

Comments
6 min read
Building a Kubernetes Cluster on Bare Metal: Insights, Challenges, and a Complete Setup Guide

Building a Kubernetes Cluster on Bare Metal: Insights, Challenges, and a Complete Setup Guide

Comments
1 min read
Passo a Passo: Configuração do WSL para DevOps e SRE no Windows

Passo a Passo: Configuração do WSL para DevOps e SRE no Windows

1
Comments
48 min read
Introduction to DevOps and SRE

Introduction to DevOps and SRE

2
Comments
4 min read
The Open-Source On-Call Integration

The Open-Source On-Call Integration

Comments
5 min read
Causal Reasoning: The Missing Piece to Service Reliability

Causal Reasoning: The Missing Piece to Service Reliability

Comments
6 min read
Metrics at a Glance for Production Clusters

Metrics at a Glance for Production Clusters

Comments
9 min read
How We Used Causely to Solve a Crashing Bug in Our Own App—Fast

How We Used Causely to Solve a Crashing Bug in Our Own App—Fast

Comments
3 min read
🧩 Go Build Tags in 2025: Clean Builds, Zero Ifs

🧩 Go Build Tags in 2025: Clean Builds, Zero Ifs

Comments 3
1 min read
Why Platform Engineering? Do You Really Need It?

Why Platform Engineering? Do You Really Need It?

Comments
4 min read
💡 Build Along with Me: A Beginner’s Guide to Creating a Student API Using Flask

💡 Build Along with Me: A Beginner’s Guide to Creating a Student API Using Flask

25
Comments
7 min read
Oracle Linux - AppStream

Oracle Linux - AppStream

Comments
3 min read
A Comprehensive Guide to Managing Large Scale Infrastructure with GitOps

A Comprehensive Guide to Managing Large Scale Infrastructure with GitOps

3
Comments
9 min read
AWS Appconfig

AWS Appconfig

Comments
2 min read
How to Configure Grafana to Send Alerts to Slack and Telegram

How to Configure Grafana to Send Alerts to Slack and Telegram

3
Comments
4 min read
Kubernetes DaemonSets vs Deployments: Key Differences and Use Cases

Kubernetes DaemonSets vs Deployments: Key Differences and Use Cases

Comments
5 min read
Hosted Prometheus vs. Self-Managed: A Neutral Guide to Costs, Control, and Trade-offs

Hosted Prometheus vs. Self-Managed: A Neutral Guide to Costs, Control, and Trade-offs

1
Comments
3 min read
DevOps Made Simple: A Beginner’s Guide to Self-Healing Systems in DevOps

DevOps Made Simple: A Beginner’s Guide to Self-Healing Systems in DevOps

7
Comments
2 min read
Replace Opsgenie with this open-source alert router

Replace Opsgenie with this open-source alert router

1
Comments
2 min read
Chaos Mesh: O que é e faz?

Chaos Mesh: O que é e faz?

4
Comments
2 min read
Bandwidth and Throughput: A Clear Comparison You Need to Know

Bandwidth and Throughput: A Clear Comparison You Need to Know

2
Comments 3
2 min read
Hack the Planet as a Service

Hack the Planet as a Service

Comments
3 min read
Insider Realities of Site Reliability Engineering: Lessons from a DevRel Perspective

Insider Realities of Site Reliability Engineering: Lessons from a DevRel Perspective

1
Comments
3 min read
The Beginner’s Guide to Observability: From Basics to Better Quality of Life

The Beginner’s Guide to Observability: From Basics to Better Quality of Life

Comments
5 min read
loading...