limjy

Posted on May 29, 2021 • Edited on May 31, 2021

System Design

System Design study guide / interview prep for me

Content here is not original. It is is a summary of the following sources:

freeCodeCamp - System Design Interview Question (really comprehensive 20/10 will recommend you give it a read)

IGotAnOffer - Database: system design interview concepts
freeCodeCamp - System Design interview concepts
medium - 25 java software design questions ### HTTP Methods
GET
- retrieve resources
POST
- submit new data
PUT
- update / replace existing data
DELETE
- removes data
PATCH
- modify resource. only contains changes not whole resource
OPTIONS
- requests available communication methods
HEAD
- request resource metadata without retrieve full resource representation

reference: SO & stackprint

Standard Error Code

400 Bad request
- client side input fail validation
401 Unauthorized
- user not authenticated
403 forbidden
- user authenticated but not privileged to access resources
404 Not Found
- resource not found
500 Internal Server Error
- generic server error. shouldn't be thrown explicitly.
502 Bad gateway
- invalid response from upstream server
503 Service Unavailable
- something unexpected happen on server side (server overload, some parts fiail( reference: SO

Storage: Memory VS Disk

Disk is persistent. (when power is off, data will "persist").
RAM is transient. When lost power memory is wiped away.

If data is useful only during session of server then keep it in Memory.

Keeping it in RAM is much faster and less expensive than writing to persistent database.

Latency VS Throughput

Latency: duration for action to complete / produce result
Throughput: maximum capacity of system / work done in unit time.

Latency:
- load site fast & smoothly
- fast lookups
- avoid pinging distant servers.
Throughput
- can improve by reducing bottleneck
- system will only be as fast as its slowest bottleneck
- increase by improving slowest bottleneck

Hard ware definition from compritech:

Latency: time taken for packet to be transferred across network
Throughput: quantity of data being send & recevied per unit time

Scaling: Vertical VS Horizontal

Vertical Scaling: adding compute (CPU) and memory (RAM, Disk, SSD) resources to a single computer.
Horizontal Scaling: adding more computers to a cluster

Vertical
- fairly straightforward
- much lower overall memory capacity
Horizontal
- higher overall compute and storage capacity
- sized dynamically without downtime
- most popular DB model hard to scale horizontally.

Availability: Intro

availability is like resiliency of system. fault tolerant system makes an available system.

Uptime
Measured using (uptime): percentage time system's primary function is available in given window of time

Service Level Agreement / Assurance
set of guaranteed service level metrics.

How to achieve High Availability

Reduce or eliminate single points of failure
This is done by designing redundancy into the system.

Eg. two or more servers to provide services.

Why Caching

Caching helps reduce latency of system.

app can perform faster
- faster to retrieve data from memory than disk

Works best when

store static / infrequently caching data

source of change is single operations rather than user-generate operations

If data consistency & freshness is critical. Caching may not be optimal unless cache is refreshed frequently (refresh cache frequently may not be ideal either)

Caching: Use Case

when using certain piece of data often

backend has computationally intensive work (cache reduces complexity to O(1))

server makes multiple netwrk requests & API calls

Example

CDN (content delivery network)
- caches content (images / videos/ webpages) in proxy server located closer to end user than origin server
- can deliver content more quickly.

Caching Strategy

deals with the "write" operations of cache.

keep cache & DB in sync

should it be done synchronously or asyncrhonouslt

data eviction - Which old records to "kick out" for new data?

Caching Strategies

LIFO
FIFO
LRU
LFU ### Proxy:basics proxy: basically a middleman between client & origin server.

There are 2 kinds of proxies

forward
reverse Forward
computer make requests to sites on internet
proxy server intercepts request
communicate to web server on behalf of clients

Reverse

client send request to origin server of website
requests intercepted at network edge by reverse proxy server
reverse proxy sends request to & receive response from origin server

Forward VS Reverse Proxy

proxy: basically a middleman between client & origin server.

How proxies work

Forward: sits in front of client, no origin server communicates with specific client. Server has no knowledge of client
Reverse: sits in front of origin server, no client communicate directly with origin server. Client has no knowledge of server.

Forward proxy
- avoid company browsing restrictions.
- clients connect to proxy rather than directly to sites they are visiting
- block access to certain content
- school network, configure to connect to web via proxy that blocks stuff
- privacy
- IP address of client is harder to tarce, only IP address of prxy server is visible
Reverse proxy
- load balancing
- distribute incoming traffic among different origin servers
- protection from attacks
- web site/service never needs to reveal IP address of original server
- harder to have targeted attack against origin server (eg. DDoS). hackers can only target reverse proxy. Reverse proxy will have tighter security & more resources.
- Global Server Local Balancing
- website distributed on several servers around the globe.
- reverse proxy send clients to server geographically closets to them.
- Caching
- reverse proxy can cache content -> faster performance
- SSL encryption
- reverse proxy can decrypt all incoming requests & encrypt all outgoing repsonses
- free up valuable resources on origin server.

Reference: cloudflare

Why Load balancing

Maintain availability & throughput
By distributing incoming request loads across multiple servers

When server get lots of requests it can:

slow down (throughput reduces, more latency)
fail (no availability)

Load Balancing helps this by Distribute incoming traffic among origin servers

Load Balancing Strategies

Probability server selection

Round Robin
- loop through servers in fixed sequence
Weighted Round Robin
- assign different weights / probabilities each server.
- traffic split up according to weights
Load-based server selection
- monitor current capacity / performance of servers
- send request to server with highest throughput / lowest latency.
IP Hashing based selection
- hash IP address of incoming request
- use hash value to allocate server
Path / Service based selection
- route requests based on path / service provided.

CAP Theorem

any distributed DB can only satisfy 2 of three features

Consistency: every node responds with most recent version of data

Availability: Any node can send a response

Partition tolerance: system continues working even if communication between any of the nodes is broken

DB is usually CP or AP database. Since cannot garuntee statbility of network, P is non-negotiable.

Relational Database

database using a relational data model, organizes data in tables with rows of data entries and columns of predetermined data types.

Use when

many-to-many relatinoship between entries
data needs to follow predetermined schema
consistent transactions are important
relationship between data always need to be accurate

ACID properties

Atomicity
- transaction is atomic / smallest unit.
- all instructions in transaction will execute or non at all
Consistency
- If DB in initially consistent, it should remain consistent after every transaction
- eg. write operation (transfer money from A to B) failed & transaction not rolled back. Db is inconsistent because amount of money between A &B (A+B) not equal before & after transaction
Isolation
- multiple transaction running concurrently they should be affected by each other
- result should be same as result obtained if transactions running sequentially.
Durability
- changes committed to DB should remain even in case of software & system failure.

Reference educative & tutorialspoint

Non-relational Database

also known as NoSQL database.
at core, DB hold data in hash-table like structure.
Use cases: caching, environment variables, configuration files / session state
Use environment: in memory & persisten storage

Since their structure is like hashtable there is minimal over head

extremely fast
simple & easy to use

NoSQL Base properties

Basically available: system guarantees availability
soft state: state of system may change over time even without input
eventual consistency: system will be consistent over very short period of time unless inputs are received

Types of Nosql DB

Graph database
- many-to-many realtinoships
- fase at following graph edges
Document Store
- isolated documents, retrieve by a key
- documents with different schemas that are easy to update
- easy to scale
Key-value store
- like a very large hashtable
- opaque values (DB has no notion of what is stored in value only provides read, overwrite and delete operations)
- simple operations (no schemas, joins or indices)
- minimal overhead - easy to scale
- suitable for caching implementations
column-family DB

NoSQL VS Relational

NoSQL
- dynamic schema.
- dev can use "unstructured data" can build application without defining schema
- Scaling
- scales horizontally over cheap commodity servers
- Simple Operations
- data retrieval is simple
- Cheap hardware
- app deployed to commodity hardware like public clouds
SQL
- workload volume is consistent
- ACID garuntees required
- data is predictable & highly structured
- data est expressed relationally
- write safety required
- app deployed to large high-end hardware

https://www.mongodb.com/scale/nosql-vs-relational-databases
https://www.mongodb.com/nosql-explained/when-to-use-nosql
https://docs.microsoft.com/en-us/dotnet/architecture/cloud-native/relational-vs-nosql-data#considerations-for-relational-vs-nosql-systems

Leader Election: Basics

When system scales horizontally, some tasks require precise coordination between nodes. Where there is leader nodes directing follower nodes.

Leader Election algo

how cluster of nodes without leader communicate with each other & choose one to be leader.

algo executed when cluster starts or when leader node goes down.

Use Case

Any node can be leader, no single point of failure required to coordinate system
System doing complex work that need good coordination
- eg. compute how protein folds. cluster needs leader node to assign each node to work on different part then add results together
System executes many distributed writes to data & requires strong consistency
- no matter which node handles request user will always have most up-to-date version of data.
- leader creates consistency by being source of truth on what the most recent state of system is.

Drawbacks

split brain
- bad implementation -> 2 nodes controlling system
single point of failure / bottleneck
leader starts making bad decisions entire system will follow

Reference: iGotAnOffer

MQ Benefits

Resilience
- app specific faults wont impact system
- if one component fails, all others can continue interacting with queue, processing / producing messages.

Reference: IBM

DEV Community