DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Platform Developer Portal

Platform Developer Portal

Comments
3 min read
The AI Incident Report Template I Actually Use for Wrong Answers and Tool Failures

The AI Incident Report Template I Actually Use for Wrong Answers and Tool Failures

5
Comments
3 min read
From Stack Trace to Root Cause - Archexa's New Diagnose Command

From Stack Trace to Root Cause - Archexa's New Diagnose Command

Comments
7 min read
12 DevOps Tools You Should Be Using in 2026 (SREs Included)

12 DevOps Tools You Should Be Using in 2026 (SREs Included)

3
Comments
5 min read
Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất) (playbook thực chiến)

Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất) (playbook thực chiến)

Comments
3 min read
Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Comments
15 min read
Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Comments
13 min read
FinOps for SREs: Cutting Costs Without Breaking Things

FinOps for SREs: Cutting Costs Without Breaking Things

1
Comments
3 min read
How I Found $12K/Year in AWS Waste Across 4 Accounts — Without Touching Production

How I Found $12K/Year in AWS Waste Across 4 Accounts — Without Touching Production

Comments
12 min read
The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

Comments
14 min read
AI-Powered Code Generation and Testing in .NET:

AI-Powered Code Generation and Testing in .NET:

Comments
15 min read
The 2026 "Google SRE" Interview: Why Senior Software Engineers Fail the NALSD Round

The 2026 "Google SRE" Interview: Why Senior Software Engineers Fail the NALSD Round

1
Comments
2 min read
How to Audit Your Monitoring Stack (Before the Next Incident Does It for You)

How to Audit Your Monitoring Stack (Before the Next Incident Does It for You)

2
Comments
5 min read
Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

1
Comments
3 min read
🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

1
Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.