DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Building a Multi-Tenant gRPC Development Platform with Ambassador and AWS EKS

Building a Multi-Tenant gRPC Development Platform with Ambassador and AWS EKS

6
Comments
9 min read
Kafka Chaos Engineering With Litmus

Kafka Chaos Engineering With Litmus

33
Comments
10 min read
Blameless' SRE Journey

Blameless' SRE Journey

8
Comments
8 min read
LitmusChaos in CNCF Sandbox

LitmusChaos in CNCF Sandbox

12
Comments
3 min read
Twitter's Reliability Journey

Twitter's Reliability Journey

5
Comments
6 min read
SRE Leaders Panel: Work as Done vs. Work as Imagined

SRE Leaders Panel: Work as Done vs. Work as Imagined

3
Comments
26 min read
Top Practices for Runbook Automation

Top Practices for Runbook Automation

16
Comments 1
6 min read
Incident Postmortem Template

Incident Postmortem Template

10
Comments
6 min read
SRE: A Human Approach to Systems

SRE: A Human Approach to Systems

8
Comments
7 min read
Leverage JIRA with Squadcast throughout the incident lifecycle

Leverage JIRA with Squadcast throughout the incident lifecycle

1
Comments
3 min read
Chaos Workflows with Argo and LitmusChaos

Chaos Workflows with Argo and LitmusChaos

31
Comments 1
8 min read
3 Common API Integration Mistakes and How to Avoid Them

3 Common API Integration Mistakes and How to Avoid Them

4
Comments
4 min read
Best Practices for Effective Incident Management

Best Practices for Effective Incident Management

7
Comments
9 min read
IntroducciĂłn a IAM - DĂ­a #1 de caminando con un SRE

IntroducciĂłn a IAM - DĂ­a #1 de caminando con un SRE

4
Comments
6 min read
The Chaos Engineering Collection

The Chaos Engineering Collection

19
Comments
2 min read
Creating your own Chaos Monkey with AWS Systems Manager Automation

Creating your own Chaos Monkey with AWS Systems Manager Automation

17
Comments
13 min read
Chaos Engineering for cloud-native systems

Chaos Engineering for cloud-native systems

30
Comments
4 min read
Caminando con un SRE

Caminando con un SRE

4
Comments
2 min read
Slashing Buildkite deployment time by 75%

Slashing Buildkite deployment time by 75%

10
Comments
5 min read
Towards More Effective Incident Postmortems

Towards More Effective Incident Postmortems

2
Comments
10 min read
Site Reliability Engineering: Afrontando el riesgo y los desastres

Site Reliability Engineering: Afrontando el riesgo y los desastres

17
Comments
12 min read
Prometheus blackbox_exporter; Unconventional Way

Prometheus blackbox_exporter; Unconventional Way

6
Comments
2 min read
Chaos Engineering  — How to safely inject failure?

Chaos Engineering  — How to safely inject failure?

4
Comments
6 min read
Feelings during incident response

Feelings during incident response

23
Comments
3 min read
A Reading List & Repo List 📚 for Learning DevOps, SRE, and Automation(w/Python)

A Reading List & Repo List 📚 for Learning DevOps, SRE, and Automation(w/Python)

14
Comments 1
2 min read
Falando sobre SRE - Parte 01 - Uma breve introdução

Falando sobre SRE - Parte 01 - Uma breve introdução

8
Comments
7 min read
Chaos Engineering — What and who is a chaos engineer?

Chaos Engineering — What and who is a chaos engineer?

16
Comments 2
4 min read
Why You Need A Microservice Catalog

Why You Need A Microservice Catalog

5
Comments
9 min read
Have there been more reliability incidents lately?

Have there been more reliability incidents lately?

16
Comments 14
1 min read
6 Responsibilities of a Devops Engineer

6 Responsibilities of a Devops Engineer

7
Comments
2 min read
Retrying groups of tightly coupled tasks in Ansible

Retrying groups of tightly coupled tasks in Ansible

13
Comments 2
3 min read
Cleaning up Zookeeper Logs and Snapshots

Cleaning up Zookeeper Logs and Snapshots

8
Comments
1 min read
How does deployment work at your organization?

How does deployment work at your organization?

71
Comments 73
1 min read
Visualize Google Cloud Billing data in Grafana with BigQuery

Visualize Google Cloud Billing data in Grafana with BigQuery

3
Comments 2
2 min read
go apps + jaeger tracing

go apps + jaeger tracing

9
Comments 2
1 min read
April Fools and the Broken Promises of One-off Hacks

April Fools and the Broken Promises of One-off Hacks

129
Comments 8
4 min read
DevOps Engineer vs. SRE?

DevOps Engineer vs. SRE?

10
Comments 6
1 min read
Ask DEV: LightWeight APM for Kubernetes using OpenTelemetry?

Ask DEV: LightWeight APM for Kubernetes using OpenTelemetry?

5
Comments
2 min read
Dreams and Nightmares of Ops

Dreams and Nightmares of Ops

34
Comments 2
10 min read
Have you considered Site Reliability Engineering as a path?

Have you considered Site Reliability Engineering as a path?

66
Comments 12
1 min read
Towards Operational Excellence — Part 3

Towards Operational Excellence — Part 3

7
Comments
11 min read
Towards Operational Excellence — Part 2

Towards Operational Excellence — Part 2

7
Comments
11 min read
SRE in layman’s terms (4 core concepts)

SRE in layman’s terms (4 core concepts)

6
Comments
4 min read
⁉ Why I started developing 💡 my new software project by building a 🚀 Continuous Deployment 🔃 pipeline

⁉ Why I started developing 💡 my new software project by building a 🚀 Continuous Deployment 🔃 pipeline

7
Comments 1
7 min read
List of DevOps/SRe Conferences in 2020

List of DevOps/SRe Conferences in 2020

6
Comments 1
1 min read
Deploy an Angular App Using Google Cloud Run

Deploy an Angular App Using Google Cloud Run

11
Comments 4
4 min read
How does your team handle critical production errors?

How does your team handle critical production errors?

9
Comments 5
1 min read
Folks, what are some conferences in DevOps/SRE space that you look forward to?

Folks, what are some conferences in DevOps/SRE space that you look forward to?

7
Comments 1
1 min read
7 Site Reliability lessons from Google and Amazon

7 Site Reliability lessons from Google and Amazon

53
Comments
6 min read
My quest for identity in Software Engineering

My quest for identity in Software Engineering

7
Comments
15 min read
Towards Operational Excellence — Part 1

Towards Operational Excellence — Part 1

20
Comments
10 min read
Molly Struve had a long winding journey to SRE... and other things I learned recording her DevJourney

Molly Struve had a long winding journey to SRE... and other things I learned recording her DevJourney

5
Comments
3 min read
Beyond Blameless

Beyond Blameless

10
Comments
6 min read
DevOps vs. Site Reliability Engineering (SRE)

DevOps vs. Site Reliability Engineering (SRE)

53
Comments
31 min read
SLOs with Stackdriver Service Monitoring

SLOs with Stackdriver Service Monitoring

7
Comments
8 min read
The Future of Monitoring is Autonomous

The Future of Monitoring is Autonomous

10
Comments
6 min read
Resources to learn about DevOps cultural concepts and some tools

Resources to learn about DevOps cultural concepts and some tools

7
Comments
1 min read
The Night Before Code Freeze

The Night Before Code Freeze

53
Comments 1
4 min read
How To Get AWS Lambda Logs Into CloudWatch

How To Get AWS Lambda Logs Into CloudWatch

8
Comments
6 min read
Rapid Docker on AWS: How to monitor the application?

Rapid Docker on AWS: How to monitor the application?

10
Comments
4 min read
loading...