DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How to Build Your SRE Team

How to Build Your SRE Team

12
Comments
7 min read
Here are the Important Differences Between SLI, SLO, and SLA

Here are the Important Differences Between SLI, SLO, and SLA

3
Comments
5 min read
If you’re not using SSH certificates you’re doing SSH wrong | Episode 2: Certificates improve usability, operability, & security

If you’re not using SSH certificates you’re doing SSH wrong | Episode 2: Certificates improve usability, operability, & security

111
Comments 4
6 min read
If you’re not using SSH certificates you’re doing SSH wrong | Episode 1: Keys versus Certificates

If you’re not using SSH certificates you’re doing SSH wrong | Episode 1: Keys versus Certificates

37
Comments
5 min read
If you’re not using SSH certificates you’re doing SSH wrong | Episode 3: An ideal SSH flow

If you’re not using SSH certificates you’re doing SSH wrong | Episode 3: An ideal SSH flow

31
Comments 2
5 min read
What is a Kubernetes Operator and why it matters for SRE

What is a Kubernetes Operator and why it matters for SRE

16
Comments 1
5 min read
Here are the Metrics you Need to Understand Operational Health

Here are the Metrics you Need to Understand Operational Health

5
Comments
7 min read
Choosing the Right SRE Tools

Choosing the Right SRE Tools

12
Comments
6 min read
Managing infra code ⚙️🛠🧰

Managing infra code ⚙️🛠🧰

19
Comments 5
1 min read
Using this one simple trick you can cut your GCP compute costs by as much as 80%!

Using this one simple trick you can cut your GCP compute costs by as much as 80%!

4
Comments
2 min read
I’m a certified Associate Cloud Engineer!

I’m a certified Associate Cloud Engineer!

40
Comments 5
4 min read
Why SREs Should be Responsible for Development Environments

Why SREs Should be Responsible for Development Environments

40
Comments 13
5 min read
The Importance of Reliability Engineering

The Importance of Reliability Engineering

5
Comments
5 min read
Improving Postmortems from Chores to Masterclass with Paul Osman

Improving Postmortems from Chores to Masterclass with Paul Osman

2
Comments
17 min read
Quick, Pretty and Easy Maintenance Page using Cloudflare Workers & Terraform

Quick, Pretty and Easy Maintenance Page using Cloudflare Workers & Terraform

28
Comments
3 min read
Introduction to LitmusChaos

Introduction to LitmusChaos

24
Comments
11 min read
Conceitos de DevOps e SRE

Conceitos de DevOps e SRE

6
Comments
5 min read
Complete Docker Tutorial - FREE Video Training

Complete Docker Tutorial - FREE Video Training

15
Comments 1
3 min read
Resilience in Action SRE Podcast #4

Resilience in Action SRE Podcast #4

6
Comments
1 min read
Getting started with building bigger, faster and scalable systems (Part 1)

Getting started with building bigger, faster and scalable systems (Part 1)

11
Comments
4 min read
How to Choose Monitoring Tools for DevOps and SRE

How to Choose Monitoring Tools for DevOps and SRE

8
Comments
5 min read
Monitoring Production Methodologically (Talk with transcript)

Monitoring Production Methodologically (Talk with transcript)

3
Comments
19 min read
Monitoring Production Methodologically (Talk with the transcript)

Monitoring Production Methodologically (Talk with the transcript)

6
Comments
20 min read
Explain IaC like I'm Five

Explain IaC like I'm Five

7
Comments
2 min read
5 Tips for Getting Alert Fatigue Under Control

5 Tips for Getting Alert Fatigue Under Control

25
Comments 1
9 min read
loading...