1. ⚡ Availability
Definition:
The ability of a system to remain accessible and operational at all times.
Key Points:
Measured as a percentage uptime (e.g., 99.99% availability = ~52 minutes/year downtime).
Requires redundancy: multiple instances, failovers.
Common techniques: Load balancers, health checks, replicas.
2. 🛡️ Reliability
Definition:
The system's ability to function correctly and consistently over time.
Key Points:
Reliability ≠ Availability. A system can be available but return incorrect results.
Achieved through: fault detection, retries, data replication, and monitoring.
Measured with metrics like MTBF (Mean Time Between Failures).
3. 📈 Scalability
Definition:
A system’s ability to handle increased load without performance loss.
Key Points:
Vertical Scaling: Add more power to a single machine.
Horizontal Scaling: Add more machines (preferred for web-scale).
Involves sharding, caching, stateless services, and distributed queues.
4. 🔧 Maintainability
Definition:
How easily a system can be understood, updated, and fixed.
Key Points:
High maintainability = faster iterations and fewer bugs.
Achieved with: clean code, modular architecture, automated tests, observability.
Reduces system downtime and tech debt over time.
5. 🧯 Fault Tolerance
Definition:
The system’s ability to keep running even when some components fail.
Key Points:
Examples: retry logic, failover systems, circuit breakers.
Closely tied with availability and reliability.
Design principle: “Design for failure” — assume things will go wrong.
Top comments (0)