DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

8
Comments
8 min read
How to send Alerts and Notifications with Telegram

How to send Alerts and Notifications with Telegram

7
Comments
3 min read
Kubectl Port-forward Flow Explained

Kubectl Port-forward Flow Explained

Comments
3 min read
2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

Comments
7 min read
Inside the Kubernetes Control Plane

Inside the Kubernetes Control Plane

21
Comments 2
5 min read
Expand your root EBS Volume attached to your Windows EC2

Expand your root EBS Volume attached to your Windows EC2

Comments
2 min read
ARM vs x86 em Docker

ARM vs x86 em Docker

3
Comments
6 min read
Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Comments
2 min read
Why Should Devops/SRE learn Golang?

Why Should Devops/SRE learn Golang?

Comments
4 min read
Reciprocity, Companion Planting & DevSecOps

Reciprocity, Companion Planting & DevSecOps

1
Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

Comments
4 min read
Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

2
Comments
6 min read
Observability Maturity Model for AWS

Observability Maturity Model for AWS

6
Comments
3 min read
Reliability in Legacy Software

Reliability in Legacy Software

1
Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

6
Comments
4 min read
Smart Chaos: LLMs, No More Human Modeling

Smart Chaos: LLMs, No More Human Modeling

5
Comments
6 min read
Instalando Kubernetes do Zero

Instalando Kubernetes do Zero

Comments
11 min read
Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

7
Comments
7 min read
#DevOps para noobs - Proxy Reverso

#DevOps para noobs - Proxy Reverso

201
Comments 12
3 min read
Netdata vs Prometheus: Performance Analysis

Netdata vs Prometheus: Performance Analysis

2
Comments
12 min read
How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

Comments
6 min read
大规模集群下,如何快速实现无死角网络连通性的主动巡检

大规模集群下,如何快速实现无死角网络连通性的主动巡检

Comments
2 min read
Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

9
Comments
3 min read
Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

12
Comments
4 min read
#DevOps para noobs - Requests x limits no Kubernetes

#DevOps para noobs - Requests x limits no Kubernetes

96
Comments 15
2 min read
Observability for DevOps and SRE - free certificate course on Feb 8th

Observability for DevOps and SRE - free certificate course on Feb 8th

1
Comments
1 min read
Best Programming Languages for DevOps in 2024

Best Programming Languages for DevOps in 2024

Comments
6 min read
Evite Custos Desnecessários: Diagnóstico e Otimização de Desempenho em Sua Aplicação

Evite Custos Desnecessários: Diagnóstico e Otimização de Desempenho em Sua Aplicação

Comments
1 min read
How OpenTelemetry Organizes Distributed Tracing

How OpenTelemetry Organizes Distributed Tracing

Comments
3 min read
Por que o HAProxy é meu balancer/proxy favorito

Por que o HAProxy é meu balancer/proxy favorito

1
Comments
2 min read
Understanding Goal-Based Software Engineering A Path to Successful Software Development

Understanding Goal-Based Software Engineering A Path to Successful Software Development

1
Comments
4 min read
SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

17
Comments 1
2 min read
Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

22
Comments
11 min read
Do devs spend significant time on application rolling updates and rollbacks?

Do devs spend significant time on application rolling updates and rollbacks?

Comments
1 min read
Como ir além do monitoramento básico

Como ir além do monitoramento básico

10
Comments
2 min read
Para quê serve o Stream Processing Offload Engine do HAProxy?

Para quê serve o Stream Processing Offload Engine do HAProxy?

1
Comments 1
1 min read
HAProxy FAQ

HAProxy FAQ

Comments
1 min read
Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

4
Comments 2
6 min read
Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Comments
2 min read
Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

3
Comments
1 min read
Books That Helped Me Become a Tech Lead

Books That Helped Me Become a Tech Lead

364
Comments 32
10 min read
Mastering Docker: Defining Health Checks in Docker Compose

Mastering Docker: Defining Health Checks in Docker Compose

25
Comments 1
6 min read
AWS Observability: Building a Comprehensive Solution for Distributed Systems

AWS Observability: Building a Comprehensive Solution for Distributed Systems

9
Comments 2
12 min read
Take back control of your tags with Tailwarden - Part 1

Take back control of your tags with Tailwarden - Part 1

2
Comments
7 min read
Combining 2FA and Public Key Authentication for a better Linux SSH security

Combining 2FA and Public Key Authentication for a better Linux SSH security

11
Comments
6 min read
AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

9
Comments 2
3 min read
Applying SRE Principles to CI/CD

Applying SRE Principles to CI/CD

2
Comments
8 min read
What is an Incident?

What is an Incident?

2
Comments
2 min read
AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

6
Comments
6 min read
Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

2
Comments
3 min read
Lazy Loading vs Write-Through: A Guide to Performance Optimization

Lazy Loading vs Write-Through: A Guide to Performance Optimization

5
Comments 1
8 min read
Mastering Reliability in High-Velocity Software Development

Mastering Reliability in High-Velocity Software Development

Comments
9 min read
Alert Fatigue, and How to Fix it

Alert Fatigue, and How to Fix it

5
Comments
4 min read
Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Comments
7 min read
Code to Cloud: DevOps with AWS

Code to Cloud: DevOps with AWS

2
Comments
5 min read
Navigating On-Call Compensation in the Tech Industry In 2023

Navigating On-Call Compensation in the Tech Industry In 2023

Comments
9 min read
Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

1
Comments
4 min read
6 Outstanding Status Page Examples to Inspire You in 2023

6 Outstanding Status Page Examples to Inspire You in 2023

2
Comments 1
5 min read
MTTx Metrics-Based Incident Response Optimization

MTTx Metrics-Based Incident Response Optimization

2
Comments 1
7 min read
Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

9
Comments 2
7 min read
loading...