DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How do you wrap your head around observability?

How do you wrap your head around observability?

49
Comments 13
1 min read
Introduce Chaos Platform 2.0 for Azure

Introduce Chaos Platform 2.0 for Azure

7
Comments
2 min read
What Is Nix and Why You Should Use It

What Is Nix and Why You Should Use It

9
Comments
7 min read
Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

2
Comments
14 min read
Reliability as an Inseparable Part of Software Engineering

Reliability as an Inseparable Part of Software Engineering

3
Comments
5 min read
Getting Started as an SRE? Here are 3 Things You Need to Know.

Getting Started as an SRE? Here are 3 Things You Need to Know.

5
Comments
5 min read
How They SRE

How They SRE

8
Comments 1
1 min read
The Key Differences between SLI, SLO, and SLA in SRE

The Key Differences between SLI, SLO, and SLA in SRE

15
Comments
9 min read
How to Backup your Applications Data to S3 with Walrus

How to Backup your Applications Data to S3 with Walrus

6
Comments
2 min read
What is the right AWS Kubernetes distribution for you?

What is the right AWS Kubernetes distribution for you?

4
Comments
5 min read
Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

4
Comments
4 min read
The True Cost of Building your Own Incident Management System (IMS)

The True Cost of Building your Own Incident Management System (IMS)

2
Comments
5 min read
Communication Tool Down? Here are 3 Ways to Handle it

Communication Tool Down? Here are 3 Ways to Handle it

3
Comments
5 min read
GCP DevOps Certification - Pomodoro Ten

GCP DevOps Certification - Pomodoro Ten

4
Comments
3 min read
Azure Front Door: An Overview

Azure Front Door: An Overview

6
Comments
3 min read
Managing health checks at scale

Managing health checks at scale

6
Comments
5 min read
"I'm Just Doing my Job," An SRE Myth

"I'm Just Doing my Job," An SRE Myth

3
Comments
5 min read
Executando AWS cli em múltiplas contas de maneira fácil

Executando AWS cli em múltiplas contas de maneira fácil

6
Comments
3 min read
Top Observability tools for DevOps Engineers and SREs

Top Observability tools for DevOps Engineers and SREs

17
Comments
7 min read
What is a microservice catalog?

What is a microservice catalog?

2
Comments 1
5 min read
Kubernetes gone bust. Now what?

Kubernetes gone bust. Now what?

6
Comments
4 min read
From SysAdmin to SRE: How to evolve your skillset

From SysAdmin to SRE: How to evolve your skillset

2
Comments
6 min read
How Kyverno helps with policy management

How Kyverno helps with policy management

2
Comments
3 min read
Argo CD

Argo CD

6
Comments
2 min read
The Engineer's Guide to Preparing for Black Friday 2020

The Engineer's Guide to Preparing for Black Friday 2020

2
Comments
8 min read
Blameless Book Club: Implementing Service Level Objectives, Part 1

Blameless Book Club: Implementing Service Level Objectives, Part 1

6
Comments
7 min read
Debugging incidents in Google's Distributed Systems

Debugging incidents in Google's Distributed Systems

1
Comments
2 min read
Google Down worldwide | Why is Google Down? Let's break it down

Google Down worldwide | Why is Google Down? Let's break it down

15
Comments
4 min read
SREview Issue #7 November 2020

SREview Issue #7 November 2020

4
Comments
2 min read
Making Instrumentation Extensible

Making Instrumentation Extensible

5
Comments 1
7 min read
SREview Issue #8 December 2020

SREview Issue #8 December 2020

4
Comments
2 min read
Challenges with Implementing SLOs

Challenges with Implementing SLOs

3
Comments
11 min read
How to SRE without an SRE on your team

How to SRE without an SRE on your team

3
Comments
10 min read
Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

5
Comments
8 min read
Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb SLO Now Generally Available: Success, Defined.

6
Comments
7 min read
Working Toward Service Level Objectives (SLOs), Part 1

Working Toward Service Level Objectives (SLOs), Part 1

6
Comments
5 min read
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

2
Comments
7 min read
Here are the Top Predictions for SRE in 2021

Here are the Top Predictions for SRE in 2021

4
Comments
6 min read
My Top 5 Books for DevOps/SRE

My Top 5 Books for DevOps/SRE

4
Comments
4 min read
How small changes to your SLOs can be SMART for your business - A narrative case study

How small changes to your SLOs can be SMART for your business - A narrative case study

5
Comments
11 min read
LitmusChaos: A Reflection On The Past Six Months

LitmusChaos: A Reflection On The Past Six Months

21
Comments
15 min read
Creating Chaos and a Giveaway ⚒ 🎁

Creating Chaos and a Giveaway ⚒ 🎁

19
Comments 6
2 min read
3 Ways SRE Can Boost your Business Value

3 Ways SRE Can Boost your Business Value

3
Comments
6 min read
Essence of Terraform

Essence of Terraform

33
Comments 1
3 min read
AWS Project: Deploying a Static Website to AWS

AWS Project: Deploying a Static Website to AWS

6
Comments
1 min read
Building on observability

Building on observability

4
Comments
2 min read
Choosing SLOs that users need, not the ones you want to provide

Choosing SLOs that users need, not the ones you want to provide

6
Comments
6 min read
SREview Issue #6 October 2020

SREview Issue #6 October 2020

4
Comments
2 min read
The Future of Ops Careers — Honeycomb

The Future of Ops Careers — Honeycomb

6
Comments
8 min read
The Resilient Architecture Collection

The Resilient Architecture Collection

14
Comments
2 min read
Operational Readiness Review Template

Operational Readiness Review Template

6
Comments
7 min read
The Operational Excellence Collection

The Operational Excellence Collection

4
Comments
1 min read
DevOps 2021: Paving your way into SRE

DevOps 2021: Paving your way into SRE

19
Comments
6 min read
Intro to o11ycast: A Human Perspective on the Role of Observability

Intro to o11ycast: A Human Perspective on the Role of Observability

2
Comments
6 min read
Error Budgeting & Site Reliability Engineering

Error Budgeting & Site Reliability Engineering

6
Comments
5 min read
From Sysadmin to SRE

From Sysadmin to SRE

8
Comments
7 min read
Engineers, Stop Hoarding your Metrics

Engineers, Stop Hoarding your Metrics

2
Comments
5 min read
Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

Building Reliability Through Culture with Veteran Google SRE, Steve McGhee

4
Comments
6 min read
Disposable Kubernetes clusters

Disposable Kubernetes clusters

15
Comments
5 min read
SRE + Honeycomb: Observability for Service Reliability

SRE + Honeycomb: Observability for Service Reliability

12
Comments
11 min read
loading...