DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Working Toward Service Level Objectives (SLOs), Part 1

Working Toward Service Level Objectives (SLOs), Part 1

6
Comments
5 min read
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

2
Comments
7 min read
Here are the Top Predictions for SRE in 2021

Here are the Top Predictions for SRE in 2021

4
Comments
6 min read
Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

2
Comments
7 min read
My Top 5 Books for DevOps/SRE

My Top 5 Books for DevOps/SRE

4
Comments
4 min read
How small changes to your SLOs can be SMART for your business - A narrative case study

How small changes to your SLOs can be SMART for your business - A narrative case study

5
Comments
11 min read
LitmusChaos: A Reflection On The Past Six Months

LitmusChaos: A Reflection On The Past Six Months

21
Comments
15 min read
Creating Chaos and a Giveaway ⚒ 🎁

Creating Chaos and a Giveaway ⚒ 🎁

19
Comments 6
2 min read
3 Ways SRE Can Boost your Business Value

3 Ways SRE Can Boost your Business Value

3
Comments
6 min read
AWS Project: Deploying a Static Website to AWS

AWS Project: Deploying a Static Website to AWS

5
Comments
1 min read
Essence of Terraform

Essence of Terraform

33
Comments 1
3 min read
Building on observability

Building on observability

4
Comments
2 min read
Choosing SLOs that users need, not the ones you want to provide

Choosing SLOs that users need, not the ones you want to provide

6
Comments
6 min read
SREview Issue #6 October 2020

SREview Issue #6 October 2020

4
Comments
2 min read
The Future of Ops Careers — Honeycomb

The Future of Ops Careers — Honeycomb

6
Comments
8 min read
The Resilient Architecture Collection

The Resilient Architecture Collection

14
Comments
2 min read
Operational Readiness Review Template

Operational Readiness Review Template

6
Comments
7 min read
The Operational Excellence Collection

The Operational Excellence Collection

4
Comments
1 min read
DevOps 2021: Paving your way into SRE

DevOps 2021: Paving your way into SRE

19
Comments
6 min read
Intro to o11ycast: A Human Perspective on the Role of Observability

Intro to o11ycast: A Human Perspective on the Role of Observability

2
Comments
6 min read
Error Budgeting & Site Reliability Engineering

Error Budgeting & Site Reliability Engineering

6
Comments
5 min read
From Sysadmin to SRE

From Sysadmin to SRE

8
Comments
7 min read
Engineers, Stop Hoarding your Metrics

Engineers, Stop Hoarding your Metrics

2
Comments
5 min read
Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

4
Comments
6 min read
Disposable Kubernetes clusters

Disposable Kubernetes clusters

15
Comments
5 min read
SRE + Honeycomb: Observability for Service Reliability

SRE + Honeycomb: Observability for Service Reliability

12
Comments
11 min read
How an SRE became an Application Security Engineer (and you can too)

How an SRE became an Application Security Engineer (and you can too)

5
Comments
8 min read
Let's stop fooling ourselves. What we call CI/CD is actually only CI.

Let's stop fooling ourselves. What we call CI/CD is actually only CI.

161
Comments 32
5 min read
Learn How to Apply SRE Outside of Engineering with Dave Rensin

Learn How to Apply SRE Outside of Engineering with Dave Rensin

2
Comments
42 min read
Can Security Teams Benefit from SRE? You bet!

Can Security Teams Benefit from SRE? You bet!

3
Comments
6 min read
Are you Great at Incident Response?

Are you Great at Incident Response?

2
Comments
5 min read
Availability, Maintainability, Reliability: What's the Difference?

Availability, Maintainability, Reliability: What's the Difference?

4
Comments
4 min read
SRE for Business Continuity in the Face of Uncertainty

SRE for Business Continuity in the Face of Uncertainty

2
Comments
6 min read
5 On-Call Practices to Help you Sleep through the Night

5 On-Call Practices to Help you Sleep through the Night

2
Comments
5 min read
Getting SRE Buy-in from a Manager or Lead for Incident Response

Getting SRE Buy-in from a Manager or Lead for Incident Response

2
Comments
5 min read
Getting Buy-in from a VP or Director for Automated Metrics and Continuous Learning

Getting Buy-in from a VP or Director for Automated Metrics and Continuous Learning

2
Comments
5 min read
Creativity in the Ops

Creativity in the Ops

3
Comments 1
3 min read
How to Improve the Reliability of a System

How to Improve the Reliability of a System

2
Comments
6 min read
Chaos Middleware: where Spring Boot meets Chaos Engineering

Chaos Middleware: where Spring Boot meets Chaos Engineering

7
Comments
2 min read
How to Construct a Reliability Model for your Organization

How to Construct a Reliability Model for your Organization

10
Comments
6 min read
GCP DevOps Certification - Pomodoro Eight

GCP DevOps Certification - Pomodoro Eight

2
Comments
2 min read
Introduction to Thanos!

Introduction to Thanos!

72
Comments 1
5 min read
GCP DevOps Certification - Pomodoro Six

GCP DevOps Certification - Pomodoro Six

3
Comments
3 min read
The Ultimate, Free Incident Retrospective Template

The Ultimate, Free Incident Retrospective Template

6
Comments
6 min read
GCP DevOps Certification - Pomodoro Five

GCP DevOps Certification - Pomodoro Five

2
Comments
2 min read
GCP DevOps Certification - Pomodoro Four

GCP DevOps Certification - Pomodoro Four

4
Comments
2 min read
5 Best Practices for Nailing Incident Retrospectives

5 Best Practices for Nailing Incident Retrospectives

11
Comments
6 min read
GCP DevOps Certification - Pomodoro Three

GCP DevOps Certification - Pomodoro Three

6
Comments
2 min read
GCP DevOps Certification - Pomodoro Two

GCP DevOps Certification - Pomodoro Two

5
Comments 3
1 min read
The Road to Reliability: How to Deploy API-Breaking Changes

The Road to Reliability: How to Deploy API-Breaking Changes

2
Comments
4 min read
GCP DevOps Certification - Pomodoro One

GCP DevOps Certification - Pomodoro One

18
Comments
3 min read
Changes are a good thing

Changes are a good thing

2
Comments
4 min read
How to Become a Master at Incident Command

How to Become a Master at Incident Command

5
Comments
12 min read
Here's your Complete Definition of Software Reliability

Here's your Complete Definition of Software Reliability

5
Comments
5 min read
5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

3
Comments
6 min read
SREview Issue #5 September 2020

SREview Issue #5 September 2020

1
Comments
2 min read
SRE Leaders Panel: Testing in Production

SRE Leaders Panel: Testing in Production

5
Comments
26 min read
SRE Leaders Panel: Embracing Resilience During Crises

SRE Leaders Panel: Embracing Resilience During Crises

2
Comments
36 min read
This is How to Use ITIL, DevOps, and SRE Best Practices

This is How to Use ITIL, DevOps, and SRE Best Practices

5
Comments 1
6 min read
Determining Error Budgets and Policies that Work for Your Team

Determining Error Budgets and Policies that Work for Your Team

2
Comments
5 min read
loading...