DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
DevOps vs. Site Reliability Engineering (SRE)

DevOps vs. Site Reliability Engineering (SRE)

52
Comments
31 min read
SLOs with Stackdriver Service Monitoring

SLOs with Stackdriver Service Monitoring

7
Comments
8 min read
The Future of Monitoring is Autonomous

The Future of Monitoring is Autonomous

10
Comments
6 min read
Resources to learn about DevOps cultural concepts and some tools

Resources to learn about DevOps cultural concepts and some tools

7
Comments
1 min read
The Night Before Code Freeze

The Night Before Code Freeze

53
Comments 1
4 min read
How To Get AWS Lambda Logs Into CloudWatch

How To Get AWS Lambda Logs Into CloudWatch

8
Comments
6 min read
Rapid Docker on AWS: How to monitor the application?

Rapid Docker on AWS: How to monitor the application?

10
Comments
4 min read
DevOps vs. SRE? 4 Important Differences

DevOps vs. SRE? 4 Important Differences

19
Comments
8 min read
Becoming a Site Reliability Engineer (SRE)

Becoming a Site Reliability Engineer (SRE)

19
Comments
14 min read
Devops Week News - Issue #158

Devops Week News - Issue #158

4
Comments
1 min read
Best practices for Kubernetes security; scaling write-heavy productions; & SRE

Best practices for Kubernetes security; scaling write-heavy productions; & SRE

22
Comments
2 min read
Introduction to open source observability tools on Kubernetes

Introduction to open source observability tools on Kubernetes

7
Comments
1 min read
How ITIL4 and SRE align with DevOps

How ITIL4 and SRE align with DevOps

14
Comments
4 min read
Questions To Ask Yourself Before Accepting A Software Engineering Role That Involves On Call Duties

Questions To Ask Yourself Before Accepting A Software Engineering Role That Involves On Call Duties

23
Comments
3 min read
What Is a Site Reliability Engineer? Should You Become One?

What Is a Site Reliability Engineer? Should You Become One?

12
Comments
10 min read
Three things from today - 8/28

Three things from today - 8/28

9
Comments 2
3 min read
SLI, SLO, and SLA

SLI, SLO, and SLA

11
Comments
2 min read
Managing CNAMEs with Azure Resource Manager Templates

Managing CNAMEs with Azure Resource Manager Templates

25
Comments
3 min read
Using the Azure Portal to Check Configured Privileges

Using the Azure Portal to Check Configured Privileges

8
Comments
1 min read
I'm a DevOps engineer at Playstation; what would you like to know?

I'm a DevOps engineer at Playstation; what would you like to know?

9
Comments 3
2 min read
Surviving On-Call: Tips from a Hosted Graphite SRE

Surviving On-Call: Tips from a Hosted Graphite SRE

8
Comments
8 min read
How to troubleshoot potential DOS attacks

How to troubleshoot potential DOS attacks

17
Comments
5 min read
Making On-Call Not Suck

Making On-Call Not Suck

127
Comments 17
7 min read
Switching From Resque to Sidekiq

Switching From Resque to Sidekiq

77
Comments 7
7 min read
Minimal Monitoring for Production Services

Minimal Monitoring for Production Services

15
Comments
4 min read
Three quick tips when setting up a new node with Chef Infra!

Three quick tips when setting up a new node with Chef Infra!

7
Comments
2 min read
For the Love of Bleep! Building a Scalable Monitoring System

For the Love of Bleep! Building a Scalable Monitoring System

140
Comments 12
6 min read
Testing Infrastructure at ✨ Corp, a DevOps Story

Testing Infrastructure at ✨ Corp, a DevOps Story

20
Comments 2
6 min read
Building Rootless Applications and Services

Building Rootless Applications and Services

7
Comments 1
6 min read
What It Means To Be A Site Reliability Engineer

What It Means To Be A Site Reliability Engineer

312
Comments 13
5 min read
Building Solid Foundations for Operable Applications, Tools and Services

Building Solid Foundations for Operable Applications, Tools and Services

6
Comments
2 min read
Tracking one metric opened a whole new world for me

Tracking one metric opened a whole new world for me

19
Comments
9 min read
SWEs are ruining SRE

SWEs are ruining SRE

18
Comments 1
5 min read
What I love about SRE

What I love about SRE

34
Comments 1
4 min read
Have you ever heard a more beautiful phrase than this?

Have you ever heard a more beautiful phrase than this?

150
Comments 27
1 min read
Progressive Service Architecture At Auth0

Progressive Service Architecture At Auth0

7
Comments
1 min read
Running Production Systems: Level 1, Software Firefighting

Running Production Systems: Level 1, Software Firefighting

29
Comments
7 min read
「最新DevOps事例勉強会」に行ってきました

「最新DevOps事例勉強会」に行ってきました

9
Comments
4 min read
SRE Vs DevOps. What are the factors that overlap?

SRE Vs DevOps. What are the factors that overlap?

36
Comments 13
1 min read
Technical Debt and Embracing Risk: How to find the MVP?

Technical Debt and Embracing Risk: How to find the MVP?

20
Comments
5 min read
6 Devops interview questions

6 Devops interview questions

30
Comments 4
4 min read
10 open-source Kubernetes tools for highly effective SRE and Ops Teams

10 open-source Kubernetes tools for highly effective SRE and Ops Teams

29
Comments
6 min read
How to Monitor the SRE Golden Signals

How to Monitor the SRE Golden Signals

18
Comments
7 min read
Look Upstream to Solve your Team's Reliability Issues

Look Upstream to Solve your Team's Reliability Issues

2
Comments
10 min read
How to Improve On-Call with Better Practices and Tools

How to Improve On-Call with Better Practices and Tools

2
Comments
5 min read
Leaders, Here's how to Encourage Full Service Ownership

Leaders, Here's how to Encourage Full Service Ownership

3
Comments
5 min read
How SLOs Help Your Team with Service Ownership

How SLOs Help Your Team with Service Ownership

2
Comments
5 min read
Augment a PagerDuty Incident with Root Cause

Augment a PagerDuty Incident with Root Cause

4
Comments
7 min read
SREview Issue #3

SREview Issue #3

3
Comments
2 min read
Nobody likes to wait in a Queue

Nobody likes to wait in a Queue

4
Comments
2 min read
Using Automation and SLOs to Create Margin in your Systems

Using Automation and SLOs to Create Margin in your Systems

4
Comments
4 min read
Bringing Operational Excellence to Dev with Github's Lauren Rubin

Bringing Operational Excellence to Dev with Github's Lauren Rubin

4
Comments
33 min read
SLO Adoption at Twitter

SLO Adoption at Twitter

2
Comments
7 min read
How SLIs Help You Understand Users' Needs

How SLIs Help You Understand Users' Needs

4
Comments
5 min read
SRE, DevOps Authors

SRE, DevOps Authors

9
Comments
1 min read
Promoting Continuous Learning with SRE

Promoting Continuous Learning with SRE

3
Comments
4 min read
Teamwork and Culture in the Era of Remote Work

Teamwork and Culture in the Era of Remote Work

6
Comments
4 min read
Managing Burnout During COVID-19

Managing Burnout During COVID-19

4
Comments
8 min read
Top Practices for Runbook Automation

Top Practices for Runbook Automation

14
Comments 1
6 min read
You've Nailed Incident detection, what about Incident Resolution?

You've Nailed Incident detection, what about Incident Resolution?

5
Comments
6 min read
loading...