DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Questions To Ask Yourself Before Accepting A Software Engineering Role That Involves On Call Duties

Questions To Ask Yourself Before Accepting A Software Engineering Role That Involves On Call Duties

23
Comments
3 min read
What Is a Site Reliability Engineer? Should You Become One?

What Is a Site Reliability Engineer? Should You Become One?

14
Comments 1
10 min read
Three things from today - 8/28

Three things from today - 8/28

9
Comments 2
3 min read
SLI, SLO, and SLA

SLI, SLO, and SLA

11
Comments
2 min read
Managing CNAMEs with Azure Resource Manager Templates

Managing CNAMEs with Azure Resource Manager Templates

25
Comments
3 min read
Using the Azure Portal to Check Configured Privileges

Using the Azure Portal to Check Configured Privileges

8
Comments
1 min read
I'm a DevOps engineer at Playstation; what would you like to know?

I'm a DevOps engineer at Playstation; what would you like to know?

9
Comments 3
2 min read
Surviving On-Call: Tips from a Hosted Graphite SRE

Surviving On-Call: Tips from a Hosted Graphite SRE

8
Comments
8 min read
How to troubleshoot potential DOS attacks

How to troubleshoot potential DOS attacks

17
Comments
5 min read
Making On-Call Not Suck

Making On-Call Not Suck

128
Comments 17
7 min read
Switching From Resque to Sidekiq

Switching From Resque to Sidekiq

79
Comments 7
7 min read
Minimal Monitoring for Production Services

Minimal Monitoring for Production Services

15
Comments
4 min read
For the Love of Bleep! Building a Scalable Monitoring System

For the Love of Bleep! Building a Scalable Monitoring System

140
Comments 12
6 min read
Three quick tips when setting up a new node with Chef Infra!

Three quick tips when setting up a new node with Chef Infra!

7
Comments
2 min read
Testing Infrastructure at ✨ Corp, a DevOps Story

Testing Infrastructure at ✨ Corp, a DevOps Story

20
Comments 2
6 min read
Building Rootless Applications and Services

Building Rootless Applications and Services

7
Comments 1
6 min read
What It Means To Be A Site Reliability Engineer

What It Means To Be A Site Reliability Engineer

312
Comments 13
5 min read
Building Solid Foundations for Operable Applications, Tools and Services

Building Solid Foundations for Operable Applications, Tools and Services

6
Comments
2 min read
Tracking one metric opened a whole new world for me

Tracking one metric opened a whole new world for me

19
Comments
9 min read
SWEs are ruining SRE

SWEs are ruining SRE

18
Comments 1
5 min read
What I love about SRE

What I love about SRE

34
Comments 1
4 min read
Have you ever heard a more beautiful phrase than this?

Have you ever heard a more beautiful phrase than this?

150
Comments 27
1 min read
Progressive Service Architecture At Auth0

Progressive Service Architecture At Auth0

7
Comments
1 min read
Running Production Systems: Level 1, Software Firefighting

Running Production Systems: Level 1, Software Firefighting

29
Comments
7 min read
「最新DevOps事例勉強会」に行ってきました

「最新DevOps事例勉強会」に行ってきました

9
Comments
4 min read
SRE Vs DevOps. What are the factors that overlap?

SRE Vs DevOps. What are the factors that overlap?

36
Comments 13
1 min read
Technical Debt and Embracing Risk: How to find the MVP?

Technical Debt and Embracing Risk: How to find the MVP?

20
Comments
5 min read
6 Devops interview questions

6 Devops interview questions

30
Comments 4
4 min read
10 open-source Kubernetes tools for highly effective SRE and Ops Teams

10 open-source Kubernetes tools for highly effective SRE and Ops Teams

29
Comments
6 min read
How to Monitor the SRE Golden Signals

How to Monitor the SRE Golden Signals

19
Comments
7 min read
Look Upstream to Solve your Team's Reliability Issues

Look Upstream to Solve your Team's Reliability Issues

2
Comments
10 min read
How to Improve On-Call with Better Practices and Tools

How to Improve On-Call with Better Practices and Tools

2
Comments
5 min read
Leaders, Here's how to Encourage Full Service Ownership

Leaders, Here's how to Encourage Full Service Ownership

3
Comments
5 min read
How SLOs Help Your Team with Service Ownership

How SLOs Help Your Team with Service Ownership

2
Comments
5 min read
Augment a PagerDuty Incident with Root Cause

Augment a PagerDuty Incident with Root Cause

4
Comments
7 min read
SREview Issue #3

SREview Issue #3

3
Comments
2 min read
Nobody likes to wait in a Queue

Nobody likes to wait in a Queue

4
Comments
2 min read
Using Automation and SLOs to Create Margin in your Systems

Using Automation and SLOs to Create Margin in your Systems

4
Comments
4 min read
Bringing Operational Excellence to Dev with Github's Lauren Rubin

Bringing Operational Excellence to Dev with Github's Lauren Rubin

4
Comments
33 min read
SLO Adoption at Twitter

SLO Adoption at Twitter

2
Comments
7 min read
How SLIs Help You Understand Users' Needs

How SLIs Help You Understand Users' Needs

4
Comments
5 min read
SRE, DevOps Authors

SRE, DevOps Authors

9
Comments
1 min read
Promoting Continuous Learning with SRE

Promoting Continuous Learning with SRE

3
Comments
4 min read
Teamwork and Culture in the Era of Remote Work

Teamwork and Culture in the Era of Remote Work

6
Comments
4 min read
Managing Burnout During COVID-19

Managing Burnout During COVID-19

4
Comments
8 min read
You've Nailed Incident detection, what about Incident Resolution?

You've Nailed Incident detection, what about Incident Resolution?

5
Comments
6 min read
SREview Issue #2 June 2020

SREview Issue #2 June 2020

2
Comments
2 min read
Reduce Engineering Problems with a Resiliency Mindset

Reduce Engineering Problems with a Resiliency Mindset

3
Comments
8 min read
How DevOps and SRE Fit Together

How DevOps and SRE Fit Together

9
Comments
5 min read
Hints For Engineers During Outages

Hints For Engineers During Outages

2
Comments
1 min read
How SLOs Help Evernote's SRE Team Manage Tech Debt

How SLOs Help Evernote's SRE Team Manage Tech Debt

6
Comments
6 min read
How to master at SRE recruiting?

How to master at SRE recruiting?

3
Comments
1 min read
+Con Online 2020

+Con Online 2020

3
Comments
1 min read
What are you monitoring

What are you monitoring

5
Comments
2 min read
Single Sign-On SSH: User Story

Single Sign-On SSH: User Story

3
Comments
2 min read
Disaster recovery of single node Kubernetes control plane

Disaster recovery of single node Kubernetes control plane

3
Comments
2 min read
High available Kubernetes cluster with single control plane node

High available Kubernetes cluster with single control plane node

6
Comments
4 min read
Load balancing algorithms

Load balancing algorithms

9
Comments
1 min read
Which Kubernetes Container Probe Should I Use?

Which Kubernetes Container Probe Should I Use?

6
Comments
4 min read
Cloud Native Computing Minsk Digest #7

Cloud Native Computing Minsk Digest #7

7
Comments
3 min read
loading...