DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

2
Comments
6 min read
Observability Maturity Model for AWS

Observability Maturity Model for AWS

6
Comments
3 min read
Reliability in Legacy Software

Reliability in Legacy Software

1
Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

6
Comments
4 min read
Smart Chaos: LLMs, No More Human Modeling

Smart Chaos: LLMs, No More Human Modeling

5
Comments
6 min read
Instalando Kubernetes do Zero

Instalando Kubernetes do Zero

Comments
11 min read
Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

7
Comments
7 min read
#DevOps para noobs - Proxy Reverso

#DevOps para noobs - Proxy Reverso

199
Comments 12
3 min read
Netdata vs Prometheus: Performance Analysis

Netdata vs Prometheus: Performance Analysis

2
Comments
12 min read
How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

Comments
6 min read
大规模集群下,如何快速实现无死角网络连通性的主动巡检

大规模集群下,如何快速实现无死角网络连通性的主动巡检

Comments
2 min read
Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

9
Comments
3 min read
Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

10
Comments
4 min read
#DevOps para noobs - Requests x limits no Kubernetes

#DevOps para noobs - Requests x limits no Kubernetes

96
Comments 15
2 min read
Observability for DevOps and SRE - free certificate course on Feb 8th

Observability for DevOps and SRE - free certificate course on Feb 8th

1
Comments
1 min read
Best Programming Languages for DevOps in 2024

Best Programming Languages for DevOps in 2024

Comments
6 min read
Evite Custos Desnecessários: Diagnóstico e Otimização de Desempenho em Sua Aplicação

Evite Custos Desnecessários: Diagnóstico e Otimização de Desempenho em Sua Aplicação

Comments
1 min read
How OpenTelemetry Organizes Distributed Tracing

How OpenTelemetry Organizes Distributed Tracing

Comments
3 min read
Por que o HAProxy é meu balancer/proxy favorito

Por que o HAProxy é meu balancer/proxy favorito

1
Comments
2 min read
Understanding Goal-Based Software Engineering A Path to Successful Software Development

Understanding Goal-Based Software Engineering A Path to Successful Software Development

1
Comments
4 min read
SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

17
Comments 1
2 min read
Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

20
Comments
11 min read
Do devs spend significant time on application rolling updates and rollbacks?

Do devs spend significant time on application rolling updates and rollbacks?

Comments
1 min read
Como ir além do monitoramento básico

Como ir além do monitoramento básico

10
Comments
2 min read
Para quê serve o Stream Processing Offload Engine do HAProxy?

Para quê serve o Stream Processing Offload Engine do HAProxy?

1
Comments 1
1 min read
HAProxy FAQ

HAProxy FAQ

Comments
1 min read
Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

3
Comments 2
6 min read
Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Comments
2 min read
Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

2
Comments
1 min read
Books That Helped Me Become a Tech Lead

Books That Helped Me Become a Tech Lead

363
Comments 32
10 min read
Mastering Docker: Defining Health Checks in Docker Compose

Mastering Docker: Defining Health Checks in Docker Compose

25
Comments 1
6 min read
AWS Observability: Building a Comprehensive Solution for Distributed Systems

AWS Observability: Building a Comprehensive Solution for Distributed Systems

9
Comments 2
12 min read
Take back control of your tags with Tailwarden - Part 1

Take back control of your tags with Tailwarden - Part 1

2
Comments
7 min read
Combining 2FA and Public Key Authentication for a better Linux SSH security

Combining 2FA and Public Key Authentication for a better Linux SSH security

10
Comments
6 min read
AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

9
Comments 2
3 min read
Applying SRE Principles to CI/CD

Applying SRE Principles to CI/CD

2
Comments
8 min read
Lazy Loading vs Write-Through: A Guide to Performance Optimization

Lazy Loading vs Write-Through: A Guide to Performance Optimization

5
Comments 1
8 min read
What is an Incident?

What is an Incident?

2
Comments
2 min read
AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

6
Comments
6 min read
Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

2
Comments
3 min read
Mastering Reliability in High-Velocity Software Development

Mastering Reliability in High-Velocity Software Development

Comments
9 min read
Alert Fatigue, and How to Fix it

Alert Fatigue, and How to Fix it

5
Comments
4 min read
Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Comments
7 min read
Code to Cloud: DevOps with AWS

Code to Cloud: DevOps with AWS

2
Comments
5 min read
Navigating On-Call Compensation in the Tech Industry In 2023

Navigating On-Call Compensation in the Tech Industry In 2023

Comments
9 min read
Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

1
Comments
4 min read
6 Outstanding Status Page Examples to Inspire You in 2023

6 Outstanding Status Page Examples to Inspire You in 2023

2
Comments 1
5 min read
MTTx Metrics-Based Incident Response Optimization

MTTx Metrics-Based Incident Response Optimization

2
Comments 1
7 min read
Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

10
Comments 2
7 min read
Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

5
Comments
1 min read
Amazon Grafana demo with EKS

Amazon Grafana demo with EKS

8
Comments 4
6 min read
The Ins and Outs of Status Pages

The Ins and Outs of Status Pages

1
Comments
6 min read
Grafana on AWS Marketplace

Grafana on AWS Marketplace

6
Comments
4 min read
Runbook vs. Playbook: Meaning, Differences, and Uses

Runbook vs. Playbook: Meaning, Differences, and Uses

Comments
6 min read
Chaos Engineering con AWS Fault Injection Simulator

Chaos Engineering con AWS Fault Injection Simulator

2
Comments
5 min read
What Is the Role of an Incident Commander?

What Is the Role of an Incident Commander?

Comments
7 min read
Taints and Tolerations in Kubernetes: A Pocket Guide

Taints and Tolerations in Kubernetes: A Pocket Guide

4
Comments
3 min read
How To Create an Incident Communication Plan

How To Create an Incident Communication Plan

Comments
7 min read
How to create a SLO for Cloud Run programatically

How to create a SLO for Cloud Run programatically

1
Comments 1
3 min read
Siglas da Observabilidade SLI, SLO, SLE, MTTA, MTTR, MTBF e MTTF

Siglas da Observabilidade SLI, SLO, SLE, MTTA, MTTR, MTBF e MTTF

2
Comments
3 min read
loading...