DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
SRE Newsletter Issue #30

SRE Newsletter Issue #30

2
Comments
1 min read
6 Easy steps for sharing AWS Encrypted RDS snapshot between two accounts.

6 Easy steps for sharing AWS Encrypted RDS snapshot between two accounts.

8
Comments
3 min read
Kubernetes Monitoring: Kube-State-Metrics

Kubernetes Monitoring: Kube-State-Metrics

4
Comments
2 min read
MYSQL Operator: A MYSQL ❤ affair with Kubernetes

MYSQL Operator: A MYSQL ❤ affair with Kubernetes

Comments
5 min read
Serverless Stonks checker app for Wall Street Bets: week 3 activity report

Serverless Stonks checker app for Wall Street Bets: week 3 activity report

3
Comments
4 min read
Introducing Teaming in LitmusChaos to ease your Chaos Engineering experience

Introducing Teaming in LitmusChaos to ease your Chaos Engineering experience

17
Comments
4 min read
GCP DevOps Certification - Pomodoro Twelve

GCP DevOps Certification - Pomodoro Twelve

3
Comments 2
2 min read
What AWS Lambda metrics should you definitely be monitoring?

What AWS Lambda metrics should you definitely be monitoring?

5
Comments
7 min read
GCP DevOps Certification - Pomodoro Eleven

GCP DevOps Certification - Pomodoro Eleven

4
Comments
2 min read
7 Ways SRE Is Changing IT Ops And How To Prepare For Those Changes

7 Ways SRE Is Changing IT Ops And How To Prepare For Those Changes

5
Comments
6 min read
Practical Nix Flakes

Practical Nix Flakes

24
Comments
15 min read
Error Budget

Error Budget

2
Comments
2 min read
Sample CI/CD pipeline using AWS CodePipeline

Sample CI/CD pipeline using AWS CodePipeline

8
Comments
3 min read
Reliability Engineering: Two Mistakes High

Reliability Engineering: Two Mistakes High

3
Comments 1
4 min read
Site Reliability Engineering (SRE) Best Practices

Site Reliability Engineering (SRE) Best Practices

30
Comments 1
8 min read
Load testing. In production.

Load testing. In production.

5
Comments
19 min read
SREview Issue #12 April 2021

SREview Issue #12 April 2021

3
Comments
4 min read
How to Analyze Contributing Factors Blamelessly

How to Analyze Contributing Factors Blamelessly

2
Comments
5 min read
Talking a little bit about Ansible's loops

Talking a little bit about Ansible's loops

6
Comments
4 min read
Litmus 2.0 - Simplifying Chaos Engineering for Enterprises

Litmus 2.0 - Simplifying Chaos Engineering for Enterprises

19
Comments
3 min read
Migrating Applications from VMs to K8s

Migrating Applications from VMs to K8s

9
Comments
3 min read
Como continuar a execução de um build do Jenkins quando um stage falha

Como continuar a execução de um build do Jenkins quando um stage falha

6
Comments
4 min read
A different approach working with Ansible variables

A different approach working with Ansible variables

5
Comments
2 min read
Having On-call Nightmares? Runbooks can Help you Wake Up.

Having On-call Nightmares? Runbooks can Help you Wake Up.

7
Comments
5 min read
How to track your product's SLO/ErrorBudget: A simple tool to keep track of things!

How to track your product's SLO/ErrorBudget: A simple tool to keep track of things!

7
Comments
3 min read
Episode 3: To Boldly Debug

Episode 3: To Boldly Debug

3
Comments
1 min read
So you Want an SRE Tool. Do you Build, Buy, or Open Source?

So you Want an SRE Tool. Do you Build, Buy, or Open Source?

3
Comments
6 min read
Kubernetes Health Checks - 2 Ways to Improve Stability in Your Production Applications

Kubernetes Health Checks - 2 Ways to Improve Stability in Your Production Applications

9
Comments
10 min read
Understanding the ABCs of CD

Understanding the ABCs of CD

3
Comments
3 min read
Infracost diff - "git diff" but for cloud costs

Infracost diff - "git diff" but for cloud costs

7
Comments
2 min read
How to: Pingdom super powered status sage

How to: Pingdom super powered status sage

2
Comments
3 min read
Performance Engineering - The Reliability Edition

Performance Engineering - The Reliability Edition

3
Comments
5 min read
It's all Chaos! And it Makes for Resilience at Scale

It's all Chaos! And it Makes for Resilience at Scale

4
Comments
4 min read
How to Build an SRE Team with a Growth Mindset

How to Build an SRE Team with a Growth Mindset

4
Comments
6 min read
How We Built and Use Runbook Documentation at Blameless

How We Built and Use Runbook Documentation at Blameless

15
Comments 2
5 min read
SigNoz : Open-source alternative to DataDog

SigNoz : Open-source alternative to DataDog

24
Comments 2
3 min read
Lessons from Slack, GCP and Snowflake outages

Lessons from Slack, GCP and Snowflake outages

4
Comments
3 min read
SRE2AUX: How Flight Controllers were the first SREs

SRE2AUX: How Flight Controllers were the first SREs

3
Comments
20 min read
Overview of Incident Lifecycle in SRE

Overview of Incident Lifecycle in SRE

1
Comments
11 min read
Deep Dive into Docker Internals - Union Filesystem

Deep Dive into Docker Internals - Union Filesystem

30
Comments
10 min read
My DevOps learning path

My DevOps learning path

3
Comments
5 min read
How do you wrap your head around observability?

How do you wrap your head around observability?

49
Comments 13
1 min read
Introduce Chaos Platform 2.0 for Azure

Introduce Chaos Platform 2.0 for Azure

7
Comments
2 min read
What Is Nix and Why You Should Use It

What Is Nix and Why You Should Use It

9
Comments
7 min read
Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

2
Comments
14 min read
Reliability as an Inseparable Part of Software Engineering

Reliability as an Inseparable Part of Software Engineering

3
Comments
5 min read
Getting Started as an SRE? Here are 3 Things You Need to Know.

Getting Started as an SRE? Here are 3 Things You Need to Know.

5
Comments
5 min read
How They SRE

How They SRE

7
Comments 1
1 min read
The Key Differences between SLI, SLO, and SLA in SRE

The Key Differences between SLI, SLO, and SLA in SRE

15
Comments
9 min read
How to Backup your Applications Data to S3 with Walrus

How to Backup your Applications Data to S3 with Walrus

6
Comments
2 min read
What is the right AWS Kubernetes distribution for you?

What is the right AWS Kubernetes distribution for you?

4
Comments
5 min read
Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

Resilience Engineering – Don't Be Afraid to Show Your Vulnerable Side!

4
Comments
4 min read
The True Cost of Building your Own Incident Management System (IMS)

The True Cost of Building your Own Incident Management System (IMS)

2
Comments
5 min read
Communication Tool Down? Here are 3 Ways to Handle it

Communication Tool Down? Here are 3 Ways to Handle it

3
Comments
5 min read
GCP DevOps Certification - Pomodoro Ten

GCP DevOps Certification - Pomodoro Ten

4
Comments
3 min read
Azure Front Door: An Overview

Azure Front Door: An Overview

6
Comments
3 min read
Managing health checks at scale

Managing health checks at scale

6
Comments
5 min read
"I'm Just Doing my Job," An SRE Myth

"I'm Just Doing my Job," An SRE Myth

3
Comments
5 min read
Executando AWS cli em múltiplas contas de maneira fácil

Executando AWS cli em múltiplas contas de maneira fácil

6
Comments
3 min read
Quick Survey: IT on-call experience in an "Always-On" world

Quick Survey: IT on-call experience in an "Always-On" world

5
Comments 2
1 min read
loading...