System Design Pillars

#systemdesign

1. ⚡ Availability
Definition:
The ability of a system to remain accessible and operational at all times.

Key Points:

Measured as a percentage uptime (e.g., 99.99% availability = ~52 minutes/year downtime).

Requires redundancy: multiple instances, failovers.

Common techniques: Load balancers, health checks, replicas.

2. 🛡️ Reliability
Definition:
The system's ability to function correctly and consistently over time.

Key Points:

Reliability ≠ Availability. A system can be available but return incorrect results.

Achieved through: fault detection, retries, data replication, and monitoring.

Measured with metrics like MTBF (Mean Time Between Failures).

3. 📈 Scalability
Definition:
A system’s ability to handle increased load without performance loss.

Key Points:

Vertical Scaling: Add more power to a single machine.

Horizontal Scaling: Add more machines (preferred for web-scale).

Involves sharding, caching, stateless services, and distributed queues.

4. 🔧 Maintainability
Definition:
How easily a system can be understood, updated, and fixed.

Key Points:

High maintainability = faster iterations and fewer bugs.

Achieved with: clean code, modular architecture, automated tests, observability.

Reduces system downtime and tech debt over time.

5. 🧯 Fault Tolerance
Definition:
The system’s ability to keep running even when some components fail.

Key Points:

Examples: retry logic, failover systems, circuit breakers.

Closely tied with availability and reliability.

Design principle: “Design for failure” — assume things will go wrong.

DEV Community