"Is the system up?" is the wrong question. The right question is: "Is the system reliable enough for our users?"
In SRE, we use Error Budgets to turn a subjective conversation about "stability" into a data-driven decision.
The Math of Reliability
If your Service Level Objective (SLO) is 99.9%, your error budget is 0.1%.
Total uptime allowed: ~43 minutes of downtime per month.
How to use the budget:
If the budget is full: The team can take risks, ship experimental features, and move fast.
If the budget is exhausted: Feature work stops. All hands move to reliability, bug fixes, and technical debt.
Key Takeaway:
Error budgets aren't just a metric; they are a policy tool that bridges the gap between the "move fast" mentality of developers and the "don't break it" mentality of operations.
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)