DEV Community

Cover image for Key components to know for system design interview
Daniel Lee
Daniel Lee

Posted on • Edited on

Key components to know for system design interview

This article is intended for software engineers with prior experience in development.

How to Approach System Design Interviews?

Think like a tech lead guiding junior engineers how to implement your design.

What interviewers want to see:

  • base-level understanding of system design fundamentals
  • back-and-forth about problem constraints and parameters of your service
  • well-reasoned, qualified decisions based on engineering trade-offs
  • unique direction your experience and decisions take them
  • holistic view of a system and its users

1) API

REST

  • APIs must be modelled based on the resources in the system. For instance, a single URL with HTTP verbs (GET, POST, PATCH, PUT, DELETE)
  • Good: versioning, structured
  • Bad: unneeded data also get fetched

RPC

  • Write code that executes on another remote machine internally
  • APIS are thought of as an action/command (ex. /postAnOrder(OrderDetails order)
  • Good: no special syntax to be learned, space-efficient
  • Bad: only to be used for internal communication because of timing issues (it becomes challenging to distinguish concurrent multiple communications between machines)

GraphQL

  • Data are structured in a graph relationships. Vertices (entities) and Edges (relationships)
  • Good: ideal for customer-facing apps; you get what you ask; no more routing in backend to get and modify information
  • Bad: less friendly to generate documentations like REST; not suitable for aggregate data

2) Databases (SQL vs NoSQL)

SQL

  • composed of rows and tables
  • strong ACID (emphasis: strong consistency)
  • support powerful queries
  • bad: writes are slow due to B-Trees splitting/merging pages/blocks.

NoSQL

  • nested key-val store
  • multiple writes can be easily handled
  • emphasis: eventual consistency
  • bad: reads might be stale for a couple of seconds (due to log-structured merge-tree)

Other types

  • document-type (JSON)
  • columnar-type (good for queries involving computing the same value types across multiple values)
  • graph-type

3) Scaling (horizontal vs vertical)

Database scaling

  • utilize replicas, then shard into separate databases. Sharding uses a hash function for even distribution and retrieval of entries.

Compute Scaling

  • divide a processing into pieces and designate each piece as a job in a queue so that multiple computers can work together in parallel.

  • both approaches may introduce some latency between calls/requests.

  • replicas ensures the reliability of a system by avoiding a single point of failure.


4) CAP Theorem

  • In real world, it's impossible to achieve all three
  • one of key fundamentals of distributed system design

Consistency

  • every node in a network will have access toe the same data

Availability

  • even if one or more nodes are down, any client making a data request receives a response

Partition Tolerance (necessary for modern systems)

  • In case of a fault in a network or communication, the system will continue to work

5) Web Authentication and Basic Security

  • It's all about the trade-offs between total safety and total convenience
  • Authentication (JWT, session tokens/cookies) is about verifying identity, whereas authorization is allowing actions.
  • For instance, user password can be secured with hashing and salting.

6) Load Balancers

  • It's used to distribute traffic across machines (adding or removing servers in case of a failure).
  • 3 common techniques: round-robin, least connections/response time, consistent hashing.

Round-Robin

  • sends request to servers one by one
  • can overload a server
  • ideal when servers are stable and loads are random

Least Connections/Response Time

  • ideal when servers with similar compute power and requests have varying connection time

Consistent Hashing

  • install N number of virtual nodes for each server, so that loads are distributed as evenly as possible and only partial of the hash ring is affected when a server is added or removed.

7) Caching

  • To reduce latency of an expensive network computation/network calls/database queries/asset fetching.
  • Popular caching patterns: cache-aside, and write-through/write-back.

Cache-aside

  • fetch from cache first, if not found, fetch from database, then cache it.
  • data can become stale in cache if there's frequent write to the database. "Time-to-Live" can resolve it.
  • Checking cache first might introduce extra latency.

Write-through and write-back

  • Application writes data directly to the cache: asynchronously (write-back) or synchronously (write-through)

Write-back

  • data goes into a queue and writes the data back to database.

Write-through

  • opposite of write-back. Hence synchronous workflow, it can slow down whole streaming process.

  • cache invalidation strategy: Least Recently Used (LRU)


8) Message Queues (Pub/Sub)

  • beneficial if there can be a spike of traffic that potentially brings a server or a database down.
  • queues can send requests to multiple servers/systems instead of clients sending the same request to multiple servers/systems.
  • queues decouple the client from the server by eliminating the need to know the server address.

Common properties (based on implementations)

  • guaranteed delivery
  • no duplicate messages are delivered
  • ensure that the order of messages is maintained

9) Indexing

  • great for fetching a block of data from the hard disk to primary memory
  • can be multi-levelled
  • B-tree (self-adjusting; sorted order of pages)

10) Failover (active-passive or leader-follower)

  • replications are used to avoid a single point of failure. It also helps a system serve global users across geographical locations/regions, and increases throughput.

leaders

  • machine that handles write requests to the data-store

followers

  • replicas of the leader that handles read requests

synchronous replication

  • a write request to the followers must be acknowledged (by the leader machine). It slows down streaming, but ensures guaranteed delivery.

asynchronous replication

  • opposite of synchronous replication.
  • less-time consuming, but no guarantee on delivery.

  • most common types of replication systems: single-leader, multi-leader (multiple machines can handle writes, but each needs to catch up with writes on other machines for consistency)

  • to resolve concurrent write conflicts:

    • keep the update with the largest client timestamp
    • sticky routing: writes from the same client go to the same leader
    • keep all the updates and return all the updates from each other

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay