DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Reliability in Legacy Software

Reliability in Legacy Software

Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

Comments
4 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

Comments
4 min read
Smart Chaos: LLMs, No More Human Modeling

Smart Chaos: LLMs, No More Human Modeling

4
Comments
6 min read
Instalando Kubernetes do Zero

Instalando Kubernetes do Zero

Comments
11 min read
Netdata vs Prometheus: Performance Analysis

Netdata vs Prometheus: Performance Analysis

Comments
12 min read
Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

6
Comments
7 min read
#DevOps para noobs - Proxy Reverso

#DevOps para noobs - Proxy Reverso

186
Comments 12
3 min read
Observability for DevOps and SRE - free certificate course on Feb 8th

Observability for DevOps and SRE - free certificate course on Feb 8th

Comments
1 min read
How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

Comments
6 min read
大规模集群下,如何快速实现无死角网络连通性的主动巡检

大规模集群下,如何快速实现无死角网络连通性的主动巡检

Comments
2 min read
Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

8
Comments
3 min read
Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

1
Comments
4 min read
#DevOps para noobs - Requests x limits no Kubernetes

#DevOps para noobs - Requests x limits no Kubernetes

94
Comments 15
2 min read
How OpenTelemetry Organizes Distributed Tracing

How OpenTelemetry Organizes Distributed Tracing

Comments
3 min read
Por que o HAProxy é meu balancer/proxy favorito

Por que o HAProxy é meu balancer/proxy favorito

1
Comments
2 min read
Understanding Goal-Based Software Engineering A Path to Successful Software Development

Understanding Goal-Based Software Engineering A Path to Successful Software Development

1
Comments
4 min read
SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

16
Comments 1
2 min read
Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

6
Comments
11 min read
Como ir além do monitoramento básico

Como ir além do monitoramento básico

10
Comments
2 min read
Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

3
Comments 2
6 min read
Books That Helped Me Become a Tech Lead

Books That Helped Me Become a Tech Lead

335
Comments 32
10 min read
Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Comments
2 min read
Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Comments
1 min read
Mastering Docker: Defining Health Checks in Docker Compose

Mastering Docker: Defining Health Checks in Docker Compose

14
Comments 1
6 min read
Take back control of your tags with Tailwarden - Part 1

Take back control of your tags with Tailwarden - Part 1

2
Comments
7 min read
AWS Observability: Building a Comprehensive Solution for Distributed Systems

AWS Observability: Building a Comprehensive Solution for Distributed Systems

7
Comments 2
12 min read
Combining 2FA and Public Key Authentication for a better Linux SSH security

Combining 2FA and Public Key Authentication for a better Linux SSH security

1
Comments
6 min read
AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

12
Comments 2
3 min read
Applying SRE Principles to CI/CD

Applying SRE Principles to CI/CD

2
Comments
8 min read
Lazy Loading vs Write-Through: A Guide to Performance Optimization

Lazy Loading vs Write-Through: A Guide to Performance Optimization

3
Comments
8 min read
What is an Incident?

What is an Incident?

2
Comments
2 min read
AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

8
Comments
6 min read
Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Comments
3 min read
Mastering Reliability in High-Velocity Software Development

Mastering Reliability in High-Velocity Software Development

Comments
9 min read
Alert Fatigue, and How to Fix it

Alert Fatigue, and How to Fix it

3
Comments
4 min read
Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Comments
7 min read
Code to Cloud: DevOps with AWS

Code to Cloud: DevOps with AWS

2
Comments
5 min read
Navigating On-Call Compensation in the Tech Industry In 2023

Navigating On-Call Compensation in the Tech Industry In 2023

Comments
9 min read
Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

1
Comments
4 min read
6 Outstanding Status Page Examples to Inspire You in 2023

6 Outstanding Status Page Examples to Inspire You in 2023

3
Comments 1
5 min read
MTTx Metrics-Based Incident Response Optimization

MTTx Metrics-Based Incident Response Optimization

2
Comments 1
7 min read
Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

13
Comments 2
7 min read
Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

2
Comments
1 min read
Amazon Grafana demo with EKS

Amazon Grafana demo with EKS

9
Comments 4
6 min read
The Ins and Outs of Status Pages

The Ins and Outs of Status Pages

1
Comments
6 min read
Grafana on AWS Marketplace

Grafana on AWS Marketplace

9
Comments
4 min read
Runbook vs. Playbook: Meaning, Differences, and Uses

Runbook vs. Playbook: Meaning, Differences, and Uses

Comments
6 min read
Chaos Engineering con AWS Fault Injection Simulator

Chaos Engineering con AWS Fault Injection Simulator

2
Comments
5 min read
What Is the Role of an Incident Commander?

What Is the Role of an Incident Commander?

Comments
7 min read
Taints and Tolerations in Kubernetes: A Pocket Guide

Taints and Tolerations in Kubernetes: A Pocket Guide

4
Comments
3 min read
How To Create an Incident Communication Plan

How To Create an Incident Communication Plan

Comments
7 min read
How to create a SLO for Cloud Run programatically

How to create a SLO for Cloud Run programatically

1
Comments 1
3 min read
Unpacking the Power of AWS ECS: A Comparative Look at ECS on EC2 vs. ECS on Fargate

Unpacking the Power of AWS ECS: A Comparative Look at ECS on EC2 vs. ECS on Fargate

2
Comments
3 min read
Did You Know About AWS Always-Free Services

Did You Know About AWS Always-Free Services

9
Comments 2
3 min read
Site Reliability Engineering (SRE) Consulting Services

Site Reliability Engineering (SRE) Consulting Services

Comments
2 min read
Extensões do Visual Studio Code para um SRE

Extensões do Visual Studio Code para um SRE

9
Comments
2 min read
Cloud9 starter guide with Spring Boot

Cloud9 starter guide with Spring Boot

7
Comments 3
3 min read
Vérifier les droits d'un utilisateur dans Kubernetes

Vérifier les droits d'un utilisateur dans Kubernetes

6
Comments
2 min read
Development vs Staging vs Production: What's the Difference?

Development vs Staging vs Production: What's the Difference?

6
Comments
6 min read
loading...