DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Getting Started as an SRE? Here are 3 Things You Need to Know.

Getting Started as an SRE? Here are 3 Things You Need to Know.

5
Comments
5 min read
How They SRE

How They SRE

7
Comments 1
1 min read
The Key Differences between SLI, SLO, and SLA in SRE

The Key Differences between SLI, SLO, and SLA in SRE

15
Comments
9 min read
How to Backup your Applications Data to S3 with Walrus

How to Backup your Applications Data to S3 with Walrus

6
Comments
2 min read
What is the right AWS Kubernetes distribution for you?

What is the right AWS Kubernetes distribution for you?

4
Comments
5 min read
Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

4
Comments
4 min read
The True Cost of Building your Own Incident Management System (IMS)

The True Cost of Building your Own Incident Management System (IMS)

2
Comments
5 min read
Communication Tool Down? Here are 3 Ways to Handle it

Communication Tool Down? Here are 3 Ways to Handle it

3
Comments
5 min read
GCP DevOps Certification - Pomodoro Ten

GCP DevOps Certification - Pomodoro Ten

4
Comments
3 min read
Azure Front Door: An Overview

Azure Front Door: An Overview

6
Comments
3 min read
Managing health checks at scale

Managing health checks at scale

6
Comments
5 min read
"I'm Just Doing my Job," An SRE Myth

"I'm Just Doing my Job," An SRE Myth

3
Comments
5 min read
Executando AWS cli em múltiplas contas de maneira fácil

Executando AWS cli em múltiplas contas de maneira fácil

6
Comments
3 min read
Quick Survey: IT on-call experience in an "Always-On" world

Quick Survey: IT on-call experience in an "Always-On" world

5
Comments 2
1 min read
Top Observability tools for DevOps Engineers and SREs

Top Observability tools for DevOps Engineers and SREs

17
Comments
7 min read
What is a microservice catalog?

What is a microservice catalog?

2
Comments 1
5 min read
Kubernetes gone bust. Now what?

Kubernetes gone bust. Now what?

6
Comments
4 min read
From SysAdmin to SRE: How to evolve your skillset

From SysAdmin to SRE: How to evolve your skillset

2
Comments
6 min read
How Kyverno helps with policy management

How Kyverno helps with policy management

2
Comments
3 min read
Argo CD

Argo CD

6
Comments
2 min read
The Engineer's Guide to Preparing for Black Friday 2020

The Engineer's Guide to Preparing for Black Friday 2020

2
Comments
8 min read
Blameless Book Club: Implementing Service Level Objectives, Part 1

Blameless Book Club: Implementing Service Level Objectives, Part 1

6
Comments
7 min read
Debugging incidents in Google's Distributed Systems

Debugging incidents in Google's Distributed Systems

1
Comments
2 min read
Resilience Engineering and Life

Resilience Engineering and Life

4
Comments
4 min read
Testing ML incident detection using a cloud native microservices app

Testing ML incident detection using a cloud native microservices app

11
Comments
10 min read
Google Down worldwide | Why is Google Down? Let's break it down

Google Down worldwide | Why is Google Down? Let's break it down

15
Comments
4 min read
What is GitOps?

What is GitOps?

2
Comments
3 min read
SREview Issue #7 November 2020

SREview Issue #7 November 2020

4
Comments
2 min read
Making Instrumentation Extensible

Making Instrumentation Extensible

5
Comments 1
7 min read
SREview Issue #8 December 2020

SREview Issue #8 December 2020

4
Comments
2 min read
Challenges with Implementing SLOs

Challenges with Implementing SLOs

3
Comments
11 min read
How to SRE without an SRE on your team

How to SRE without an SRE on your team

3
Comments
10 min read
Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

5
Comments
8 min read
Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb SLO Now Generally Available: Success, Defined.

6
Comments
7 min read
Working Toward Service Level Objectives (SLOs), Part 1

Working Toward Service Level Objectives (SLOs), Part 1

6
Comments
5 min read
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

2
Comments
7 min read
Here are the Top Predictions for SRE in 2021

Here are the Top Predictions for SRE in 2021

4
Comments
6 min read
Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

2
Comments
7 min read
My Top 5 Books for DevOps/SRE

My Top 5 Books for DevOps/SRE

4
Comments
4 min read
How small changes to your SLOs can be SMART for your business - A narrative case study

How small changes to your SLOs can be SMART for your business - A narrative case study

5
Comments
11 min read
LitmusChaos: A Reflection On The Past Six Months

LitmusChaos: A Reflection On The Past Six Months

21
Comments
15 min read
Creating Chaos and a Giveaway ⚒ 🎁

Creating Chaos and a Giveaway ⚒ 🎁

19
Comments 6
2 min read
3 Ways SRE Can Boost your Business Value

3 Ways SRE Can Boost your Business Value

3
Comments
6 min read
AWS Project: Deploying a Static Website to AWS

AWS Project: Deploying a Static Website to AWS

5
Comments
1 min read
Essence of Terraform

Essence of Terraform

33
Comments 1
3 min read
Building on observability

Building on observability

4
Comments
2 min read
Choosing SLOs that users need, not the ones you want to provide

Choosing SLOs that users need, not the ones you want to provide

6
Comments
6 min read
SREview Issue #6 October 2020

SREview Issue #6 October 2020

4
Comments
2 min read
The Future of Ops Careers — Honeycomb

The Future of Ops Careers — Honeycomb

6
Comments
8 min read
The Resilient Architecture Collection

The Resilient Architecture Collection

14
Comments
2 min read
Operational Readiness Review Template

Operational Readiness Review Template

6
Comments
7 min read
The Operational Excellence Collection

The Operational Excellence Collection

4
Comments
1 min read
DevOps 2021: Paving your way into SRE

DevOps 2021: Paving your way into SRE

19
Comments
6 min read
Intro to o11ycast: A Human Perspective on the Role of Observability

Intro to o11ycast: A Human Perspective on the Role of Observability

2
Comments
6 min read
Error Budgeting & Site Reliability Engineering

Error Budgeting & Site Reliability Engineering

6
Comments
5 min read
From Sysadmin to SRE

From Sysadmin to SRE

8
Comments
7 min read
Engineers, Stop Hoarding your Metrics

Engineers, Stop Hoarding your Metrics

2
Comments
5 min read
Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

4
Comments
6 min read
Disposable Kubernetes clusters

Disposable Kubernetes clusters

15
Comments
5 min read
SRE + Honeycomb: Observability for Service Reliability

SRE + Honeycomb: Observability for Service Reliability

12
Comments
11 min read
loading...