DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
DNS Incidents Like Cloudflare’s Could Turn your Status Page Useless, Here is How to Prevent It

DNS Incidents Like Cloudflare’s Could Turn your Status Page Useless, Here is How to Prevent It

1
Comments
3 min read
Why Every Company Needs DevOps

Why Every Company Needs DevOps

8
Comments 2
7 min read
Rename and Shame

Rename and Shame

9
Comments
2 min read
Criação de múltiplos objetos utilizando locals e loops

Criação de múltiplos objetos utilizando locals e loops

7
Comments
4 min read
How to integrate Datadog Agent in ECS Fargate

How to integrate Datadog Agent in ECS Fargate

18
Comments 5
3 min read
How to setup Prometheus and Grafana

How to setup Prometheus and Grafana

5
Comments 1
1 min read
How to empower your team to own incident response

How to empower your team to own incident response

3
Comments
5 min read
For those who have trouble setting up Datadog RUM

For those who have trouble setting up Datadog RUM

10
Comments
2 min read
Site Reliability Engineering (SRE) Best Practices

Site Reliability Engineering (SRE) Best Practices

16
Comments
9 min read
End-to-End Monitoring with Grafana Cloud with Minimal Effort

End-to-End Monitoring with Grafana Cloud with Minimal Effort

44
Comments
12 min read
Don't count your incidents, make your incidents count

Don't count your incidents, make your incidents count

6
Comments
4 min read
Build custom API integrations with incident.io

Build custom API integrations with incident.io

7
Comments
6 min read
Armazenando dados sensíveis em código Terraform utilizando KMS

Armazenando dados sensíveis em código Terraform utilizando KMS

11
Comments
3 min read
Create your own Platform-As-A-Service(PaaS) Based on Kubernetes

Create your own Platform-As-A-Service(PaaS) Based on Kubernetes

4
Comments 1
2 min read
Software performance testing - How to do it ? [3]

Software performance testing - How to do it ? [3]

3
Comments
2 min read
How to design incident severity levels?

How to design incident severity levels?

5
Comments
4 min read
Suffering Developer Attrition? Remember: Replication Rarely Replaces Recoverability

Suffering Developer Attrition? Remember: Replication Rarely Replaces Recoverability

7
Comments
5 min read
Software performance testing - Why it's important? [2]

Software performance testing - Why it's important? [2]

6
Comments 1
2 min read
Do I need an incident debrief?

Do I need an incident debrief?

5
Comments
6 min read
Multi-Region S3 Strategies

Multi-Region S3 Strategies

9
Comments
8 min read
Software performance testing - What is it? [1]

Software performance testing - What is it? [1]

5
Comments
3 min read
SRE 101 and How to Adopt the Practice in Your Organization

SRE 101 and How to Adopt the Practice in Your Organization

13
Comments 1
8 min read
What's a fair compensation for being on call?

What's a fair compensation for being on call?

6
Comments
7 min read
One week SRE transition crash course

One week SRE transition crash course

6
Comments
4 min read
Startup guide to incident management

Startup guide to incident management

4
Comments
7 min read
Evite configuration drift no seu estado de terraform ao usar aws_security_group

Evite configuration drift no seu estado de terraform ao usar aws_security_group

17
Comments 1
4 min read
Why are we organizing a tech conference called SRE NEXT 2022?

Why are we organizing a tech conference called SRE NEXT 2022?

7
Comments
4 min read
Building a service map using eBPF

Building a service map using eBPF

9
Comments
4 min read
Mining metrics from unstructured logs

Mining metrics from unstructured logs

7
Comments
4 min read
Splunk - 10K rows limit

Splunk - 10K rows limit

2
Comments 1
1 min read
AWS: Launch an EC2 Instance from the Web Console

AWS: Launch an EC2 Instance from the Web Console

8
Comments 1
6 min read
How to move your .ssh generated keys to a new laptop.

How to move your .ssh generated keys to a new laptop.

23
Comments 1
2 min read
Splunk - Dashboard request optimization

Splunk - Dashboard request optimization

6
Comments
1 min read
How important is Observability for SRE?

How important is Observability for SRE?

2
Comments
6 min read
SRE and Tasks of an SRE explained ✅

SRE and Tasks of an SRE explained ✅

96
Comments 2
13 min read
Understanding the Business as a Devops Engineer

Understanding the Business as a Devops Engineer

12
Comments 1
4 min read
#90DaysOfDevOps - Day 4

#90DaysOfDevOps - Day 4

2
Comments
4 min read
What is DevOps? REALLY understand it

What is DevOps? REALLY understand it

273
Comments 4
12 min read
Engineer On-Call: The Dos and Don'ts

Engineer On-Call: The Dos and Don'ts

3
Comments 1
3 min read
How-to setup a HA/DR database in AWS? [6 - Create from snapshot]

How-to setup a HA/DR database in AWS? [6 - Create from snapshot]

6
Comments
3 min read
How-to setup a HA/DR database in AWS? [9 - Generate a random value]

How-to setup a HA/DR database in AWS? [9 - Generate a random value]

6
Comments
3 min read
How-to setup a HA/DR database in AWS? [8 - Multiple instances in multiple regions]

How-to setup a HA/DR database in AWS? [8 - Multiple instances in multiple regions]

6
Comments
2 min read
How-to setup a HA/DR database in AWS? [7 - Dynamic Terraform backend definition]

How-to setup a HA/DR database in AWS? [7 - Dynamic Terraform backend definition]

6
Comments
2 min read
How-to setup a HA/DR database in AWS? [3 - Simple database]

How-to setup a HA/DR database in AWS? [3 - Simple database]

6
Comments
3 min read
How-to setup a HA/DR database in AWS? [4 - HA Database]

How-to setup a HA/DR database in AWS? [4 - HA Database]

6
Comments
5 min read
How-to setup a HA/DR database in AWS? [5 - DR database]

How-to setup a HA/DR database in AWS? [5 - DR database]

6
Comments
3 min read
How-to setup a HA/DR database in AWS? [1]

How-to setup a HA/DR database in AWS? [1]

4
Comments
3 min read
How-to setup a HA/DR database in AWS? [2 - Definitions]

How-to setup a HA/DR database in AWS? [2 - Definitions]

2
Comments
4 min read
The Universal Language: Reliability for Non-Engineering Teams

The Universal Language: Reliability for Non-Engineering Teams

4
Comments
7 min read
Choosing a database instance class in AWS with the maximum simultaneous connexions

Choosing a database instance class in AWS with the maximum simultaneous connexions

2
Comments
2 min read
Building an SRE Team with Specialization

Building an SRE Team with Specialization

4
Comments
7 min read
What happens when Amazon accidentally sends all of their support traffic your way?

What happens when Amazon accidentally sends all of their support traffic your way?

28
Comments 3
3 min read
How Disaster Ready Are Your Backup Systems, Really?

How Disaster Ready Are Your Backup Systems, Really?

2
Comments
6 min read
DevOps - Deployment strategies

DevOps - Deployment strategies

7
Comments
6 min read
#90DaysOfDevOps - Day 3

#90DaysOfDevOps - Day 3

2
Comments
5 min read
#90DaysOfDevOps - Day 1

#90DaysOfDevOps - Day 1

30
Comments 4
4 min read
Fylamynt and Squadcast Team Up To Handle Cloud Incident Response, Management, and Remediation

Fylamynt and Squadcast Team Up To Handle Cloud Incident Response, Management, and Remediation

5
Comments
4 min read
Como criar uma função personalizada para RBAC

Como criar uma função personalizada para RBAC

6
Comments
4 min read
Circumvent STDIN when installing packages with apt

Circumvent STDIN when installing packages with apt

4
Comments
2 min read
Some DevOps Terms definitions

Some DevOps Terms definitions

8
Comments 1
4 min read
loading...