DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Best Practices for Choosing a Status Page Provider

Best Practices for Choosing a Status Page Provider

Comments
5 min read
How to Define Engineering Standards (with Backstage)

How to Define Engineering Standards (with Backstage)

Comments
10 min read
Introducing Botkube Fuse: The Platform Engineer’s Copilot

Introducing Botkube Fuse: The Platform Engineer’s Copilot

6
Comments
4 min read
Accelerating Business Growth with a Platform Engineering Team

Accelerating Business Growth with a Platform Engineering Team

Comments
5 min read
The Pulse Of Technology: Why IT Monitoring Is Non-Negotiable In 2024

The Pulse Of Technology: Why IT Monitoring Is Non-Negotiable In 2024

Comments
13 min read
How to improve DORA metrics as a release engineer

How to improve DORA metrics as a release engineer

5
Comments
10 min read
𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗥𝗼𝗹𝗲 𝗼𝗳 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴

𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗥𝗼𝗹𝗲 𝗼𝗳 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴

1
Comments
1 min read
SRE and the Enterprise: Building a Culture of Reliability at Scale

SRE and the Enterprise: Building a Culture of Reliability at Scale

Comments
4 min read
How To Reduce The Alert Noise For Optimal On-Call Performance

How To Reduce The Alert Noise For Optimal On-Call Performance

Comments
10 min read
The Cornerstones of SRE: SLI, SLO and SLA

The Cornerstones of SRE: SLI, SLO and SLA

Comments
4 min read
The “R” in MTTR: Repair or Recover? What’s the difference?

The “R” in MTTR: Repair or Recover? What’s the difference?

Comments
5 min read
Datadog : how to filter metrics on tag "team"

Datadog : how to filter metrics on tag "team"

1
Comments
3 min read
Do You Need All That Support Levels After All?

Do You Need All That Support Levels After All?

3
Comments
7 min read
AWS Observability Maturity Model - V2

AWS Observability Maturity Model - V2

13
Comments
5 min read
Understanding the 0.6-Second Detection Time for Full Outages

Understanding the 0.6-Second Detection Time for Full Outages

6
Comments
3 min read
Context is all you need.

Context is all you need.

1
Comments
1 min read
Enhance Your System Reliability with These Top Log Monitoring Tools

Enhance Your System Reliability with These Top Log Monitoring Tools

Comments 1
2 min read
DevOps

DevOps

1
Comments 1
1 min read
When Alerts Don’t Mean Downtime - Preventing SRE Fatigue

When Alerts Don’t Mean Downtime - Preventing SRE Fatigue

Comments
2 min read
CrowdStrike Incident: 5 Key Lessons for DevOps & IT Teams

CrowdStrike Incident: 5 Key Lessons for DevOps & IT Teams

1
Comments
5 min read
Implementing SLOs in Microservices: A Comprehensive Guide to Reliability and Performance

Implementing SLOs in Microservices: A Comprehensive Guide to Reliability and Performance

1
Comments
9 min read
Cold Storage: A Deep Dive into the Frozen Vaults of Data

Cold Storage: A Deep Dive into the Frozen Vaults of Data

2
Comments
11 min read
DevOps vs. SRE Understanding the Differences and Benefits

DevOps vs. SRE Understanding the Differences and Benefits

Comments
2 min read
Configurando o Terraform para funcionar corretamente com o LocalStack

Configurando o Terraform para funcionar corretamente com o LocalStack

Comments
3 min read
Implementing SLO Error Budget Monitoring with AWS Services Only

Implementing SLO Error Budget Monitoring with AWS Services Only

3
Comments 2
5 min read
loading...