DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Kubectl Port-forward Flow Explained

Kubectl Port-forward Flow Explained

Comments
3 min read
Expand your root EBS Volume attached to your Windows EC2

Expand your root EBS Volume attached to your Windows EC2

Comments
2 min read
Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Comments
2 min read
Roles and Responsibilities Matrix

Roles and Responsibilities Matrix

Comments
5 min read
Matriz de Papéis e Responsabilidades

Matriz de Papéis e Responsabilidades

1
Comments
6 min read
On The Importance of End-to-End Monitoring for IoT

On The Importance of End-to-End Monitoring for IoT

2
Comments
2 min read
Why Should Devops/SRE learn Golang?

Why Should Devops/SRE learn Golang?

Comments
4 min read
Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

6
Comments
8 min read
Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

2
Comments
6 min read
How to send Alerts and Notifications with Telegram

How to send Alerts and Notifications with Telegram

Comments
3 min read
2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

Comments
7 min read
Inside the Kubernetes Control Plane

Inside the Kubernetes Control Plane

14
Comments 2
5 min read
Reciprocity, Companion Planting & DevSecOps

Reciprocity, Companion Planting & DevSecOps

1
Comments
3 min read
Observability Maturity Model for AWS

Observability Maturity Model for AWS

4
Comments
3 min read
ARM vs x86 em Docker

ARM vs x86 em Docker

2
Comments
6 min read
Smart Chaos: LLMs, No More Human Modeling

Smart Chaos: LLMs, No More Human Modeling

4
Comments
6 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

Comments
4 min read
Instalando Kubernetes do Zero

Instalando Kubernetes do Zero

Comments
11 min read
Reliability in Legacy Software

Reliability in Legacy Software

1
Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

5
Comments
4 min read
Netdata vs Prometheus: Performance Analysis

Netdata vs Prometheus: Performance Analysis

Comments
12 min read
Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

6
Comments
7 min read
#DevOps para noobs - Proxy Reverso

#DevOps para noobs - Proxy Reverso

192
Comments 12
3 min read
How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

Comments
6 min read
ć€§è§„æšĄé›†çŸ€äž‹ïŒŒćŠ‚äœ•ćż«é€ŸćźžçŽ°æ— æ­»è§’çœ‘ç»œèżžé€šæ€§çš„äž»ćŠšć·ĄæŁ€

ć€§è§„æšĄé›†çŸ€äž‹ïŒŒćŠ‚äœ•ćż«é€ŸćźžçŽ°æ— æ­»è§’çœ‘ç»œèżžé€šæ€§çš„äž»ćŠšć·ĄæŁ€

Comments
2 min read
Discovering the Magic of Service Mesh: Navigating the Microservices Maze đŸŒđŸ•žïžđŸ•”ïžâ€â™‚ïž

Discovering the Magic of Service Mesh: Navigating the Microservices Maze đŸŒđŸ•žïžđŸ•”ïžâ€â™‚ïž

8
Comments
3 min read
Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

Karpenter vs. Cluster Autoscaler in EKS: A Comparative Guide

1
Comments
4 min read
#DevOps para noobs - Requests x limits no Kubernetes

#DevOps para noobs - Requests x limits no Kubernetes

94
Comments 15
2 min read
Observability for DevOps and SRE - free certificate course on Feb 8th

Observability for DevOps and SRE - free certificate course on Feb 8th

1
Comments
1 min read
How OpenTelemetry Organizes Distributed Tracing

How OpenTelemetry Organizes Distributed Tracing

Comments
3 min read
Por que o HAProxy Ă© meu balancer/proxy favorito

Por que o HAProxy Ă© meu balancer/proxy favorito

1
Comments
2 min read
Understanding Goal-Based Software Engineering A Path to Successful Software Development

Understanding Goal-Based Software Engineering A Path to Successful Software Development

1
Comments
4 min read
SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

SRE é sobre criar softwares que resolvem problemas de operação de outros softwares

16
Comments 1
2 min read
Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

Why AWS is poised to lead the Gartner Magic Quadrant for APM and Observability in 2024

6
Comments
11 min read
Como ir além do monitoramento båsico

Como ir além do monitoramento båsico

10
Comments
2 min read
Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

Maximizing Speed, Costs, UX - AWS ElastiCache Serverless

3
Comments 2
6 min read
Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Decoding the Tech Maze: Demystifying SRE and DevOps for Everyone

Comments
2 min read
Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16 with Scripts

Comments
1 min read
Books That Helped Me Become a Tech Lead

Books That Helped Me Become a Tech Lead

336
Comments 32
10 min read
Mastering Docker: Defining Health Checks in Docker Compose

Mastering Docker: Defining Health Checks in Docker Compose

14
Comments 1
6 min read
AWS Observability: Building a Comprehensive Solution for Distributed Systems

AWS Observability: Building a Comprehensive Solution for Distributed Systems

7
Comments 2
12 min read
Take back control of your tags with Tailwarden - Part 1

Take back control of your tags with Tailwarden - Part 1

2
Comments
7 min read
Combining 2FA and Public Key Authentication for a better Linux SSH security

Combining 2FA and Public Key Authentication for a better Linux SSH security

1
Comments
6 min read
AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

AWS re:Invent 2023 - Empowering SREs with Game-Changing Solutions

12
Comments 2
3 min read
Applying SRE Principles to CI/CD

Applying SRE Principles to CI/CD

2
Comments
8 min read
Lazy Loading vs Write-Through: A Guide to Performance Optimization

Lazy Loading vs Write-Through: A Guide to Performance Optimization

3
Comments
8 min read
What is an Incident?

What is an Incident?

2
Comments
2 min read
AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

AWS In-Memory Databases: Complete Guide to Accelerated Data Processing

8
Comments
6 min read
Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Desvendando o Mundo do On-call: Desafios e Estratégias para uma Operação Eficiente

Comments
3 min read
Mastering Reliability in High-Velocity Software Development

Mastering Reliability in High-Velocity Software Development

Comments
9 min read
Alert Fatigue, and How to Fix it

Alert Fatigue, and How to Fix it

3
Comments
4 min read
Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Platform Engineering 101: Supercharging Dev, Sec, and Ops Harmony with Automation

Comments
7 min read
Code to Cloud: DevOps with AWS

Code to Cloud: DevOps with AWS

2
Comments
5 min read
Navigating On-Call Compensation in the Tech Industry In 2023

Navigating On-Call Compensation in the Tech Industry In 2023

Comments
9 min read
Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

Using Projectsveltos to Manage Kubernetes Add-ons on Civo Cloud Clusters

1
Comments
4 min read
6 Outstanding Status Page Examples to Inspire You in 2023

6 Outstanding Status Page Examples to Inspire You in 2023

3
Comments 1
5 min read
Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

Choosing the Right AWS EC2 Instance: Avoiding Common Pitfalls

14
Comments 2
7 min read
MTTx Metrics-Based Incident Response Optimization

MTTx Metrics-Based Incident Response Optimization

2
Comments 1
7 min read
Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability

2
Comments
1 min read
Amazon Grafana demo with EKS

Amazon Grafana demo with EKS

9
Comments 4
6 min read
loading...