DEV Community

loading...

Site Reliability Engineering

👋 Sign in for the ability sort posts by top and latest.
The Engineer's Guide to Preparing for Black Friday 2020

The Engineer's Guide to Preparing for Black Friday 2020

Reactions 2 Comments
8 min read
Choosing SLOs that users need, not the ones you want to provide

Choosing SLOs that users need, not the ones you want to provide

Reactions 6 Comments
6 min read
Blameless Book Club: Implementing Service Level Objectives, Part 1

Blameless Book Club: Implementing Service Level Objectives, Part 1

Reactions 6 Comments
7 min read
Debugging incidents in Google's Distributed Systems

Debugging incidents in Google's Distributed Systems

Reactions 1 Comments
2 min read
Resilience Engineering and Life

Resilience Engineering and Life

Reactions 3 Comments
4 min read
Testing ML incident detection using a cloud native microservices app

Testing ML incident detection using a cloud native microservices app

Reactions 11 Comments
10 min read
Operational Readiness Review Template

Operational Readiness Review Template

Reactions 6 Comments
7 min read
What is GitOps?

What is GitOps?

Reactions 2 Comments
3 min read
Google Down worldwide | Why is Google Down? Let's break it down

Google Down worldwide | Why is Google Down? Let's break it down

Reactions 15 Comments
4 min read
SREview Issue #7 November 2020

SREview Issue #7 November 2020

Reactions 4 Comments
2 min read
Making Instrumentation Extensible

Making Instrumentation Extensible

Reactions 5 Comments
7 min read
SREview Issue #8 December 2020

SREview Issue #8 December 2020

Reactions 4 Comments
2 min read
Challenges with Implementing SLOs

Challenges with Implementing SLOs

Reactions 3 Comments
11 min read
How to SRE without an SRE on your team

How to SRE without an SRE on your team

Reactions 3 Comments
10 min read
Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb SLO Now Generally Available: Success, Defined.

Reactions 6 Comments
7 min read
Working Toward Service Level Objectives (SLOs), Part 1

Working Toward Service Level Objectives (SLOs), Part 1

Reactions 6 Comments
5 min read
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

Reactions 5 Comments
7 min read
The Operational Excellence Collection

The Operational Excellence Collection

Reactions 4 Comments
1 min read
Here are the Top Predictions for SRE in 2021

Here are the Top Predictions for SRE in 2021

Reactions 4 Comments
6 min read
Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Reactions 2 Comments
7 min read
How an SRE became an Application Security Engineer (and you can too)

How an SRE became an Application Security Engineer (and you can too)

Reactions 5 Comments
8 min read
My Top 5 Books for DevOps/SRE

My Top 5 Books for DevOps/SRE

Reactions 3 Comments
4 min read
How small changes to your SLOs can be SMART for your business - A narrative case study

How small changes to your SLOs can be SMART for your business - A narrative case study

Reactions 5 Comments
11 min read
LitmusChaos: A Reflection On The Past Six Months

LitmusChaos: A Reflection On The Past Six Months

Reactions 21 Comments
15 min read
Creating Chaos and a Giveaway ⚒ 🎁

Creating Chaos and a Giveaway ⚒ 🎁

Reactions 20 Comments 6
2 min read
3 Ways SRE Can Boost your Business Value

3 Ways SRE Can Boost your Business Value

Reactions 3 Comments
6 min read
Essence of Terraform

Essence of Terraform

Reactions 33 Comments 1
3 min read
Building on observability

Building on observability

Reactions 4 Comments
2 min read
SREview Issue #6 October 2020

SREview Issue #6 October 2020

Reactions 4 Comments
2 min read
The Future of Ops Careers — Honeycomb

The Future of Ops Careers — Honeycomb

Reactions 6 Comments
8 min read
The Resilient Architecture Collection

The Resilient Architecture Collection

Reactions 14 Comments
2 min read
DevOps 2021: Paving your way into SRE

DevOps 2021: Paving your way into SRE

Reactions 13 Comments
6 min read
Intro to o11ycast: A Human Perspective on the Role of Observability

Intro to o11ycast: A Human Perspective on the Role of Observability

Reactions 2 Comments
6 min read
Error Budgeting & Site Reliability Engineering

Error Budgeting & Site Reliability Engineering

Reactions 6 Comments
5 min read
From Sysadmin to SRE

From Sysadmin to SRE

Reactions 8 Comments
7 min read
Engineers, Stop Hoarding your Metrics

Engineers, Stop Hoarding your Metrics

Reactions 2 Comments
5 min read
Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

Reactions 4 Comments
6 min read
Disposable Kubernetes clusters

Disposable Kubernetes clusters

Reactions 14 Comments
5 min read
5 Best Practices for Nailing Incident Retrospectives

5 Best Practices for Nailing Incident Retrospectives

Reactions 10 Comments
6 min read
SRE + Honeycomb: Observability for Service Reliability

SRE + Honeycomb: Observability for Service Reliability

Reactions 8 Comments
11 min read
Changes are a good thing

Changes are a good thing

Reactions 2 Comments
4 min read
Let's stop fooling ourselves. What we call CI/CD is actually only CI.

Let's stop fooling ourselves. What we call CI/CD is actually only CI.

Reactions 157 Comments 32
5 min read
Learn How to Apply SRE Outside of Engineering with Dave Rensin

Learn How to Apply SRE Outside of Engineering with Dave Rensin

Reactions 2 Comments
42 min read
Can Security Teams Benefit from SRE? You bet!

Can Security Teams Benefit from SRE? You bet!

Reactions 3 Comments
6 min read
Are you Great at Incident Response?

Are you Great at Incident Response?

Reactions 2 Comments
5 min read
Availability, Maintainability, Reliability: What's the Difference?

Availability, Maintainability, Reliability: What's the Difference?

Reactions 4 Comments
4 min read
SRE for Business Continuity in the Face of Uncertainty

SRE for Business Continuity in the Face of Uncertainty

Reactions 2 Comments
6 min read
5 On-Call Practices to Help you Sleep through the Night

5 On-Call Practices to Help you Sleep through the Night

Reactions 2 Comments
5 min read
Getting SRE Buy-in from a Manager or Lead for Incident Response

Getting SRE Buy-in from a Manager or Lead for Incident Response

Reactions 2 Comments
5 min read
Getting Buy-in from a VP or Director for Automated Metrics and Continuous Learning

Getting Buy-in from a VP or Director for Automated Metrics and Continuous Learning

Reactions 2 Comments
5 min read
Creativity in the Ops

Creativity in the Ops

Reactions 3 Comments 1
3 min read
How to Improve the Reliability of a System

How to Improve the Reliability of a System

Reactions 2 Comments
6 min read
Chaos Middleware: where Spring Boot meets Chaos Engineering

Chaos Middleware: where Spring Boot meets Chaos Engineering

Reactions 6 Comments
2 min read
How to Construct a Reliability Model for your Organization

How to Construct a Reliability Model for your Organization

Reactions 10 Comments
6 min read
GCP DevOps Certification - Pomodoro Eight

GCP DevOps Certification - Pomodoro Eight

Reactions 2 Comments
2 min read
Introduction to Thanos!

Introduction to Thanos!

Reactions 70 Comments 1
5 min read
5 Biggest Downtimes of Q2 2020

5 Biggest Downtimes of Q2 2020

Reactions 2 Comments
10 min read
GCP DevOps Certification - Pomodoro Six

GCP DevOps Certification - Pomodoro Six

Reactions 3 Comments
3 min read
GCP DevOps Certification - Pomodoro Five

GCP DevOps Certification - Pomodoro Five

Reactions 2 Comments
2 min read
The Ultimate, Free Incident Retrospective Template

The Ultimate, Free Incident Retrospective Template

Reactions 5 Comments
6 min read
loading...