DEV Community

ZeeshanAli-0704
ZeeshanAli-0704

Posted on

Interview‑Ready : Vertical vs. Horizontal Scaling

🚀 Scaling Deep‑Dive: Vertical vs. Horizontal (Interview‑Ready Guide)

A practical, interview‑focused deep dive into scaling strategies with trade‑offs, architecture patterns, and scenario playbooks. Includes crisp "interview lines" you can quote, plus detailed explanations for your blog.


đź“‘ Table of Contents


Introduction

Scaling is about increasing a system’s capacity to handle load—more users, more data, more requests—without breaking reliability or blowing up cost. You have two primary levers:

  • Vertical scaling (scale‑up): Make a single machine bigger/faster.
  • Horizontal scaling (scale‑out): Add more machines and distribute work.

Most real systems do both over time: scale up early for simplicity, then scale out for sustained growth and resilience.


Comparison / Trade-offs

Which is easier to implement and why?

Vertical is easier. You upgrade CPU, memory, disk, or instance size. No code changes, minimal architecture churn.

But: you inherit a single point of failure (SPOF) and a hard ceiling (the biggest box you can buy/afford). Costs rise non‑linearly at the high end.

Interview line: “Scale up is the fastest bandaid; scale out is the durable fix.”

Limitations of vertical scaling

  • Hardware ceiling: Limited by max CPU sockets, RAM capacity, memory bandwidth, and storage IOPS.
  • SPOF: One box, one failure domain. If it dies, you’re down.
  • Diminishing returns: Amdahl’s Law—some parts won’t speed up by adding cores.
  • Upgrade blast radius: Upgrades can cause downtime or risky live resizes.
  • Cost curve: High‑end hardware is disproportionately expensive.

Why large-scale systems prefer horizontal scaling

  • Elasticity: Add/remove nodes to match demand.
  • Fault isolation: Fail one node, others keep serving.
  • Throughput & locality: Parallelize work, push compute closer to data/users.
  • Economics: Commodity hardware + pay‑as‑you‑go beats monolithic super‑servers.
  • Multi‑AZ/region: Survive data center failures and improve latency.

Deep Dive / Architecture

Scaling a relational database: vertical vs horizontal

Vertical (scale‑up):

  • Bigger instance (more vCPU/RAM), faster NVMe, larger buffer pool.
  • Tune: connection pooling, query plans, proper indexes, memory settings.
  • Pros: Simple, transactional semantics preserved, minimal app changes.
  • Cons: Ceiling, SPOF, expensive.

Horizontal (scale‑out):

  • Read replicas: Offload reads; app does read/write splitting.
  • Partitioning/Sharding: Split data across shards by key (e.g., user_id, tenant_id, order_id range).

    • Hash sharding: Even distribution; harder range queries.
    • Range sharding: Good for time series/ranges; watch for hot shards.
    • Directory/lookup sharding: Indirect mapping; flexible but adds a hop.
  • Multi‑primary (write scaling): Requires conflict resolution or partitioned ownership.

  • Federation/CQRS: Write to OLTP store; project to read models/materialized views for scale.

Migration sketch:

  1. Add replicas → route safe reads. 2) Introduce a shard key at write path. 3) Backfill + dual‑write. 4) Cut traffic shard‑by‑shard. 5) Decommission monolith.

Interview line: “Start with replicas, then shard by a stable, high‑cardinality key; plan re‑sharding from day one.”

How horizontal scaling impacts consistency (CAP & PACELC)

  • CAP: In the presence of partitions (P), you trade Consistency (C) vs Availability (A). Horizontal systems must tolerate P → many choose availability with eventual consistency (AP). Others choose CP (strong consistency) and accept unavailability during partitions.
  • PACELC: If Partition (P) happens → trade A vs C; **Else* (no partition) trade Latency (L) vs Consistency (C).* Even without failures, cross‑node coordination adds latency for strong consistency.
  • Read models:

    • Strong: Linearizable reads/writes (single leader, quorum).
    • Eventual: Replicas converge over time (read‑after‑write may be stale).
    • Tunable: Quorums (R+W>N) for “strong‑enough”.
  • Client patterns: read‑your‑writes, monotonic reads, session consistency.

Interview line: “Horizontal scale introduces replica lag; we often relax to eventual consistency or use quorums to balance latency and correctness.”

Infrastructure you need for horizontal scaling

  • Load balancer (L4/L7): Health checks, weighted routing, sticky sessions when necessary.
  • Distributed caching: Redis/Memcached (clustered), cache key discipline, eviction/TTL, stampede protection.
  • Distributed storage:

    • Relational with replicas/shards;
    • NoSQL (Cassandra/Dynamo) for partitioned, high‑throughput;
    • Object storage (S3/GCS) for blobs;
    • Distributed FS for shared files where needed.
  • Service discovery & config: DNS, Consul, Eureka, Kubernetes DNS; config/coordination via etcd/ZooKeeper.

  • Orchestration & autoscaling: Kubernetes, ASGs; HPA/VPA and Cluster Autoscaler.

  • Observability: Metrics (Prometheus), tracing (OpenTelemetry), logs (ELK/Loki). SLOs, alerts, dashboards.

  • Networking: VPC design, pod/service CIDRs, egress/ingress, rate limiting, WAF, API gateway.

  • Reliability controls: Circuit breakers, retries with jitter, timeouts, bulkheads, backpressure.

  • Delivery & infra as code: CI/CD, blue‑green/canary, Terraform/Pulumi, secrets management.

What role does caching play in scaling?

  • Primary lever for read scale & latency: Serve hot data from RAM at micro‑ to millisecond speed.
  • Where to cache:

    • Client/browser; CDN/edge (static & edge compute); app tier (Redis); DB/materialized views.
  • Patterns: cache‑aside (lazy), write‑through, write‑back, refresh‑ahead.

  • Keys & invalidation: Namespacing, versioning, TTLs, event‑driven invalidation; protect against stampedes (request coalescing, mutex, jittered TTLs).

  • Consistency: Decide tolerance for staleness; support read‑after‑write where necessary (e.g., bypass or short TTL on user‑profile updates).

Interview line: “Cache first; it’s the cheapest scale. Then scale out storage/compute.”


Failure Handling / Availability

SPOF vs fault tolerance

  • Vertical scale concentrates risk: single machine, single AZ, single NIC/volume.
  • Horizontal scale spreads risk: N replicas, multi‑AZ/region, rolling upgrades.
  • Design for failure: graceful degradation, brownouts, traffic shedding, overload protection.

Read-heavy vs write-heavy: choosing a strategy

Read‑heavy: replicas + caches + denormalized read models.
Write‑heavy: sharding/partition ownership, log‑structured stores, batching/queues, idempotent writes, hot‑key mitigation (random suffixing, consistent hashing, time‑bucketed IDs).

Interview line: “Reads scale with replicas; writes scale with ownership (shards).”


Scenario Playbooks

Slow web app: vertical or horizontal?

  1. Measure first: CPU, memory, GC, heap, disk IOPS, DB QPS/latency, queue depth, p95/p99, error rates, RPS.
  2. If host‑bound: scale up (bigger instance, faster disk) + fix hot paths.
  3. If throughput‑bound: scale out (LB + more app pods/instances).
  4. Always: add caching, index queries, async offload, connection pooling.
  5. Guardrails: timeouts, retries with jitter, circuit breakers to avoid cascades.

Interview line: “Diagnose before dollars; cache before clusters.”

Stateless vs stateful: how to scale horizontally

Stateless services

  • Store no user/session state in memory beyond a request.
  • Scale by simply adding nodes; LB can send any request to any node.
  • Externalize state to Redis/DB; use idempotent handlers.

Stateful services

  • Options: sticky sessions, external session store, or partitioned ownership (e.g., user‑ID‑based ownership).
  • For consensus‑bound state (leaders, locks), use Raft/Zab systems (etcd/ZooKeeper) and expect lower write throughput.
  • For collaborative/near‑real‑time (docs/chat), consider CRDTs for AP trade‑offs.

When AWS/RDS hits vertical limits

  • Exploit replicas for reads; use a read‑write proxy to split traffic.
  • Shard by tenant, region, or entity (user/order). Precompute routing (lookup table/consistent hashing).
  • Denormalize hot paths to reduce cross‑shard joins; use CQRS + events to project read models.
  • Add cache layers (Redis) and materialized views for expensive reads.
  • Archive cold data and compress; narrow indexes; tune autovacuums.
  • Zero‑downtime plan: dual writes + backfill → switch reads → cut writes shard‑by‑shard.

Interview line: “RDS scale‑up buys time; sharding is the destination. Plan re‑shardability on day one.”

Scaling an e‑commerce checkout service

  • Requirements: high availability, idempotency, exactly‑once‑ish payment capture, inventory integrity.
  • Topology:

    • Multiple checkout service instances behind LB.
    • External session/cart store (Redis/DynamoDB).
    • Order DB partitioned by customer/region/order_id.
    • Payment worker consuming from a queue (SQS/Kafka) with retry & dead‑letter.
    • Inventory service with reservation (hold → confirm → release on timeout).
    • Idempotency keys on APIs; dedupe table with TTL.
  • Consistency: prefer sagas over 2PC. Design compensating actions (refund, restock).

  • Resilience: circuit breakers to PSPs, fallback to alternate gateways, poison‑pill handling.

  • Observability: trace an order end‑to‑end (distributed tracing). Auditable logs.

Interview line: “Scale nodes, externalize cart state, partition orders, queue payments, and enforce idempotency throughout.”

How Kubernetes supports horizontal scaling

  • HPA (Horizontal Pod Autoscaler): scales pods based on CPU/memory/custom/external metrics (v2). Set min/max replicas and stabilization windows.
  • Cluster Autoscaler: adds/removes nodes to fit pending pods.
  • Service & Ingress: built‑in L4/L7 load distribution. Readiness/liveness probes for safe rollout.
  • StatefulSets + PVCs: stable identities, ordered updates for stateful workloads.
  • Best practices: pod anti‑affinity (spread across AZs), PDBs, resource requests/limits, autoscaling cooldowns, and slow start to avoid thundering herds.

Interview line: “HPA scales pods, Cluster Autoscaler scales nodes, and readiness gates keep traffic on healthy replicas.”


Quick Interview Lines (Cheat Sheet)

  • “Scale up for simplicity; scale out for longevity.”
  • “Reads scale with replicas; writes scale with ownership (shards).”
  • “Diagnose before dollars; cache before clusters.”
  • “Eventual consistency buys availability; quorums buy confidence.”
  • “Design re‑sharding on day one.”

Glossary

  • SPOF: Single Point of Failure.
  • Shard key: Field used to route data to a shard.
  • Quorum (R/W/N): Read/Write counts over N replicas ensuring overlap.
  • Idempotency: Same request multiple times → one effect.
  • Saga: Orchestration of local transactions with compensations.

Further Reading Pointers

  • Indexing & query tuning checklists (RDBMS).
  • Cache design patterns (cache‑aside, write‑through, write‑back).
  • Sharding strategies (hash/range/directory, re‑sharding).
  • CAP & PACELC, consistency models.
  • Kubernetes autoscaling (HPA v2), PDBs, anti‑affinity.

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Top comments (0)