Oussama Belhadi

Posted on Sep 2

Software Engineering/Architecture in a Nutshell: A Four-Year Journey

#systemdesign #webdev #architecture #networking

Whether you're a software engineer, system architect, or a student, system design is a crucial skill for building robust, scalable, and reliable applications. It's the art of creating a blueprint for a software system that meets specific requirements. This guide will walk you through the fundamental principles, common architectures, and essential concepts you'll need to master to design and build systems that can stand the test of time.

Here are the concepts I will pass by in this blog :

Diagram Types in System Design
Production App Architecture
The Pillars of System Design
The Heart of System Design
CAP Theorem (Brewer’s Theorem)
Building Resilience into a System
Measuring Speed
Network Basics
API Design & Best Practices
Scaling & Performance Strategies

This document is your one-stop resource for understanding everything from the core pillars of a well-designed system to practical strategies like caching, load balancing, and network protocols. Let's dive in.

I will start with the most interesting part for me which is Diagrams.
Diagrams help you see a zoomed out picture of a system and visualize the foggy parts when things are so interconnected.

Diagram Types in System Design

When building any system, from a simple app to a large-scale enterprise platform, it's essential to have a clear blueprint. This blueprint is created using various types of diagrams that help visualize the architecture at different levels of detail. Diagrams range from a broad overview to a granular, component-specific view.

High-Level Diagrams

These diagrams provide a zoomed-out view of the system, focusing on its main components and how they interact with each other and the outside world.

Context Diagram: This is a very high-level diagram that shows the system as a single black box. Its purpose is to show how the system interacts with external entities (users, other systems, etc.). It helps define the scope of the system.
Container Diagram: A step down in detail from a context diagram, this view shows the major containers or applications within the system (e.g., a web application, a database, a mobile app). It focuses on how these applications work together to deliver the system's functionality.

Low-Level Diagrams

These diagrams provide a detailed, zoomed-in view of the system's internal workings, showing how specific parts are structured and how they behave.

Component Diagram: This diagram shows the smaller, modular parts within a container and their dependencies.
Class Diagram: This type of diagram illustrates the static structure of a system by showing the different classes, their attributes, methods, and relationships. It's used to define the system's blueprint.

Sequence Diagram: A sequence diagram shows the dynamic behavior of a system by illustrating the order of events and the interactions between objects or components over time. It's great for visualizing a specific use case or flow.

ERD (Entity-Relationship Diagram): This diagram is a blueprint of the database. It shows the data entities (tables), their attributes (columns), and the relationships between them. It is crucial for understanding how data is stored and connected.

The C4 Model: A Framework for Diagrams

The C4 model is a popular framework for visualizing software architecture that organizes these different diagrams into a logical hierarchy. It stands for Context, Containers, Components, and Code. Each "C" represents a specific level of abstraction, or "zoom level," designed for a different audience.

Context (Level 1): The highest level, matching the Context Diagram.
Containers (Level 2): Zooms in to show the major applications and services, matching the Container Diagram.
Components (Level 3): Breaks down a single container into its internal parts, similar to your Component Diagram.
Code (Level 4): The lowest level, showing the actual code structure (e.g., classes and their relationships), which aligns with your Class and Sequence Diagrams.

Production App Architecture example

I will pass the parts that are known for most people which are CI/CD with Github Actions or Jenkins,

and move to implementing a load balancer like Ngnix to handle user requests, ensuring a reliable storage, using external logging and monitoring services like Sentry or PM2, that send alerts to an alert service that shows alerts to the connected users and sends immediate alert messages to the developers using a communication channel like Slack so action can be taken immediately to fix the issue.
and the first thing developers are gonna look at is the logs, so devs go searching for the root of the problem :

The golden rule is to never debug in a production environment but create a staging/testing area to ensure that the users wont get infected by the debugging process

The Pillars of system design

Scalability : system growth.
Maintainability : ensuring future developers can understand and build on top of the system.
Efficiency : best use of existing resources.
Reliability : planning for failure, and ensuring that its handled smoothly and not only working on best case scenarios, (maintaining composure when things go wrong)

The Heart of System Design

Moving Data : ensuring that data can flow seamlessly from one part of the system to another, user requests seeding the servers or transfer between databases.
Storing Data : Choosing between Relational and No-Relational based databases and having a deep understanding of the following concepts :
- access patterns
- indexing strategies
- backup solutions

Ensuring that the data is not just stored securely but readably available when needed.

Transforming Data : taking raw data and turning it into meaningful information, for example aggregating log files for analysis or converting user input to understandable format like json.

CAP Theorem or Brewer’s Theorem

That’s a set of principles that guide us in making informed trade-offs between 3 key components of a distributed system :

Consistency : ensures that all nodes of the same system have the same data at the same time.

Availability : the system should be operational and available for requests 24/7, regardless of what happens behind the scenes

the SLO is the objective of the service, what’s expected and SLA is the agreement with the users about the expected uptime, what we are committing to provide, for example if we commit to 99.9% availability and drop below that, we might have to provide refunds or other compensations

Partition Tolerance : the system ability to continue functioning even when a network partition occur, meaning if there’s a disruption in communication between nodes, the system still works.

Following to CAP’s Theorem, a system can only achieve 2 of these principles/properties at the same time. if we appreciate Consistency and Partition Tolerance we might have to compromise on Availability and vise versa.

Building Resilience into a system

Building resilience into a system means expecting the unexpected, that could mean implementing a resilient system, ensuring that there’s always a backup ready to take over in case of failure

Measuring Speed: Throughput & Latency

we have two unity measures apparently Throughput and Latency

Throughput : Its mainly how much data a system can handle in a given time frame

Server throughout - measured in RPS (Request per second), how many requests our system can take per second, the higher the better.
DB throughput - measured in QPS (Query Per second), how many queries per second can the system take.
Data throughput - measured in Bits/s, that reflects the amount of data transferred over the network or processed by the system in a given period of time.

Latency : Its simply the amount of time taken to process a single request

Optimizing for one can often lead to sacrifices in the other.

Designing a system poorly can lead to several issues from performance bottlenecks to security vulnerabilities, if you think refactoring code is hard then redesigning a system is x10 times harder (monumental task), so designing the right system could lay the right foundation to support futuristic features and user growth.

Network Basics

We are basically discussing how computers communicate with each other.

IP Layer :

Each device in the internet is represented by an IP Address, these days its IPv4 which is based on 32bits and can offer 4Billion unique IP addresses but with the growth of number of devices in the world, a migration to an IPv6 will be necessary, which uses 128-bit significantly increasing the number of unique addresses to 340 Tera addresses.

when devices communicate between each other they send packets of data, each packet contains an IP header which contains some meta data, like the sender and the receiver IP addresses to ensure that the data reaches the right destination, this process is governed by the Internet Protocol (IP) which is a set of rules that defines how data is sent and received.

Application Layer :

here where we store the data related to the application protocol, the data in this package is formatted according to specific application protocol data like HTTP for web browsing, so the data is interpreted correctly by the receiving device

Transport Layer :

where TCP (Transmission Control Protocol) and UDP (User DataGram Protocol) come to play

TCP ensures reliable communication like a delivery guy that makes sure that the package to the right destination AND that nothing is missing, each data packet also includes a TCP Header which includes important information like the ports and control flags necessary for managing the connection and data flow;

TCP is known for reliability and its famous 3 way hand-shake which establishes connection between 3 devices.

UDP is faster but not as reliable as TCP, it doesnt establish connection before sending data nor ensures the delivery or order of the package, makes it preferrable for time sensitive communication like video calls or live streaming where speed is crucial and some data loss is acceptable

To tie all these concepts together we have DNS.

DNS (Domain Name Server) is basically like the internet phone book translating human friendly domain names into IP-Adrresses, when u enter a domain in the browser, the browser sends a DNS query that searches for the corresponding IP Address allowing it to establish the connection and retrieve the web page,

the functioning of DNS is overseen by

ICANN(Internet Corporation For Assigned Names And Numbers), which coordinates the global IP address space and domain name system, and Domain Name Registererers like Namecheap or GoDaddy are accredited by ICANN to sell domain names to the public.

DNS, uses different types of records like
A Records : which map the domain to it’s corresponding IPv4, ensuring that the requests reach the correct destination

or

AAAA Records: which basically do the same thing but map the domains to an IPv6 Address.

Networking Basic Infrastructure :

Public IP Addresses : Unique across the internet

Private IP Addresses : Unique only within the network

and IP Address can be Static, permanently assigned to a device or Dynamic, changing over time (commonly used in residential areas, Cafes and public spaces)

Devices within a Local Area Network can communicate with each other directly, and usually a firewall is used to monitor and filter the ingoing and outgoing packets from that local network.

Device Services are Identified by ports which when combined with an IP Address create a Unique Identifier for a network service.

some ports are reserved for specific services like :

HTTP - 80
HTTPS - 443
MySQL - 3306
SSH - 22

HTTP; Request - Response Protocol with no memory, it doesnt have to store any request context
the POST/GET requests contain all the necessary information like the RequestURL, The Request Method, Status, Policy …etc.

each status code represents a specific state of the request

each request has a method GET, POST, PUT, DELETE (CRUD)

Web Protocols

HTTP is a one way connection, if we want real time two way connection (Real time updates) we use websockets.
WebSockets; provides a 2 way communication channel over a single long-lived connection, allowing servers to push real time updates to clients, like Chat Applications, without the overhead of HTTP Request-Response Cycles

Email Related Protocols :

SMTP; an ideal protocol for sending emails
IMAP; used to retrieve emails from a server (internet Message Access Protocol)
POP3; Used for downloading emails from a server to a local client.

File Transfer Protocols :

FTP : transferring files over the internet
SSH: For Command-line login and file; used for operating network services securely on an unsecured network; exp: logging into a remote machine and executing commands or transferring files

Real-time Communication Protocols :

WebRTC : enables browser to browser applications for voice calling, video chat and file sharing without internal or external plugins, essential for live streaming or video conferencing.

MQTT : Lightweight messaging protocol (message queuing Telemetry Transport); ideal for devices with limited processing power (IoT devices)

RPC (Remote Procedure Call); a protocol that allows a program on one computer to execute code on a server or another computer, its a method used to invoke a function as if it was a local call while in reality the code is executed in a remote machine.

in Web services, HTTP requests can make RPC calls to execute code on behalf of a client to execute a specific functionality.

API Design

In API design we are concerned with defining :

the inputs; (what the users enters), like products details for a new product which is provided by a seller.
the ouputs the information returned when someone queries a product of an API.

The focus is mainly on how the CRUD operations are exposed to the user interface

There different types of APIs like GraphQL, REST and gRPC, each with its own set of principles, protocols and standards, the most used one is REST which stands for REpresational State Transfer,

gRPC stands for google Remote Procedure Call, in the image the positive and negative parts of each type could be seen.

the way an API is structured can largely vary from use case to use case but that’s the ideal format to follow, here’s an example of fetching the orders for a specific user in a E-Commerce application :

common queries using best practices can also include limits and pagination like so :

a Get Request must be Idempotent, the result never changes upon querying the get request repeatedly and a Get request must never affect changes, we have POST request for that.

When modifying endpoints it’s important to maintain Backward Compatibility and Versioning, that means that we need to ensure that changes don’t break existing clients, a common practice is to introduce new versions, like version to product; version 1 API can still serve the old client and the version 2 API should serve the new client

Another best practice is to integrate a Rate Limiter to prevent DDOS attacks, it basically prevents a user from sending multiple requests a user can make in a certain time frame , a common practice is to also set CORS settings (cross origin resource sharing) to only accept requests from specific domains to prevent unwanted cross site interactions.

Now image a company is hosting a website in a server on google cloud data centers in Finland, the response time for users across Europe would be 100ms but for users from US or AFRICA or Mexico the respond time might get longer up to 4000ms or more.

but fortunately, there are strategies to minimize this request latency for people who are far away

these strategies are :

Caching and CDNs

caching is a technique used to improve the performance and efficiency of a system, it involves storing a copy of data in temporary storage so that future request for that data can be stored faster, there are 4 common places where cache can be stored :

Browser Caching;

using websites resources on the user’s local computer (usually bundled files html,css and js) typically in a dedicated cache directory managed by the browser so when the user revisits the site the browser can load the cache rather than fetching everything from the server again, there’s a cache control header to tell the browser how long this content should be cached, we can check that on the developer tools.

   cache hit means cache was searched for and found.

cache miss means cache was searched for and not found.

the higher the cache ratio the better (more reliable cache)

Server Caching;

involves storing frequently used data on the server side, reducing the use of expensive operations like querying the database, they are usually stored on a server or a separate cache server, either on memory like Redis or on disk.

There are several ways to deal with the caching on the server side :

Write around cache; The server checks if the data exists on the cache first, if it wasn’t found then it queries the database and stores it in the cache.
Write-Through Cache; data simultaneously written to cache and the database to ensure asynchronization but it might be slower than write-around cache.

Write-Back cache; where data is first returned to the cache and then to permanent storage at a later time, but the storage can get quickly full like that, that why we have

Eviction Policies :

Database Caching;

that’s another crucial concept, it includes storing query results to improve the performance of database driven applications, It’s often done within the database system itself or via an external caching layer like Redis or Memcache, when the query is made we check the cache first to see if the result of that query has been already stored, if it is we return the cached data, if not we query the database and the result is stored on the cache for future use, that’s super beneficial for Read Focused applications, also the same eviction policies used on the server side caching, apply here too

CDNs;

They are basically a network of servers distributed graphically, they are usually used to serve static content like html, css and javascript, images and other assets.

they cache the content from the original server and deliver it to the end user from the nearest CDN server. There are 2 types of CDNs: Pull-based and Push-Based.

in Pull-based, it first checks if the assets are existing in the nearest server, if not it pulls them from the main server.

In the Push-based we usually put the assets in the main server and it automatically pushed them to the CDN servers, requires more management.

Proxy Servers

they act as an intermediate between the Client asking for resources and the server providing that resources, the proxy servers can serve many other purposes.

there are different types of Proxies :

The most commonly used proxies are Forward Proxy and Reverse Proxy :

A forward proxy acts as a middleman for a client to access the internet. It protects the client's identity by making requests on their behalf, hiding their IP address and providing a layer of security and privacy. Forward proxies are often used in corporate or school networks for content filtering and enforcing usage policies.

A reverse proxy acts as a middleman for a server (or group of servers) to accept requests from the internet. It protects the servers by receiving all incoming traffic and then forwarding it to the appropriate internal server, which hides the server's identity. Reverse proxies are used for things like load balancing, SSL encryption, and protecting against DDoS attacks.

Load Balancers

A load balancer acts as a traffic controller, distributing incoming network traffic across multiple servers to prevent any single server from becoming overwhelmed. It ensures high availability, improves application performance, and allows for seamless scaling.

Types of Load Balancers

There are three main types of load balancers:

Hardware Load Balancers: These are physical appliances with dedicated hardware and software, offering high performance and security for large-scale, on-premise deployments.
Software Load Balancers: These are applications that run on standard servers, offering more flexibility and lower cost. They can be installed on-premise or in the cloud.
Cloud Load Balancers: These are fully managed services provided by cloud providers like AWS or Google Cloud. They are scalable, easy to configure, and integrate seamlessly with other cloud services.

Used Algorithms

Load balancing algorithms determine how traffic is distributed. They can be broadly categorized into two types:

Static Algorithms

These algorithms don't consider the current state of the servers (e.g., their load or health).

Round Robin: Distributes requests sequentially to each server in a rotating fashion. Simple and effective for servers with equal capacity.
Weighted Round Robin: Assigns a "weight" to each server based on its capacity, sending more requests to more powerful servers.
IP Hash: Uses a hash of the client's IP address to ensure requests from the same client are always sent to the same server, which is useful for maintaining session persistence.

Dynamic Algorithms

These algorithms actively monitor server health and load to make more intelligent routing decisions.

Least Connections: Directs new requests to the server with the fewest active connections. This is good for environments where connection times vary.
Least Response Time: Sends requests to the server with the lowest response time, ensuring the fastest service.
Resource-Based: Uses an agent on each server to report real-time metrics (like CPU or memory usage) and directs traffic to the server with the most available resources.

Databases (Sharding, Replication, ACID, Vertical & Horizontal Scaling)

Databases

Databases are organized collections of data, typically stored electronically. To handle increasing amounts of data and traffic, different strategies are used to manage and scale them.

Sharding

Sharding is a technique that breaks up a large database into smaller, more manageable parts called shards. Each shard is a separate database that contains a subset of the total data. The main goal of sharding is to improve performance by distributing the load across multiple servers. Instead of one server handling all the queries, multiple servers handle a portion of the workload, reducing the strain on any single machine. This is a form of horizontal scaling.

Replication

Replication is the process of creating and maintaining multiple copies of a database. It's used for two primary reasons:

High Availability: If one server fails, another copy (or replica) can take over, ensuring the database remains accessible.
Load Balancing: Read traffic can be distributed across multiple replicas, reducing the load on the primary server.

There are different types of replication, but the most common model is Primary-Replica (also known as Master-Slave), where one database is the primary (writes are allowed), and others are replicas (reads are allowed).

ACID Properties

Relational Databases are ACID compliant is a set of properties that guarantee a database transaction is processed reliably. They are fundamental to ensuring data integrity, especially in relational databases.

Atomicity: A transaction is treated as a single, indivisible unit. It either completes entirely or doesn't happen at all. There are no partial transactions.
Consistency: A transaction must bring the database from one valid state to another, ensuring all data integrity rules are maintained.
Isolation: Concurrent transactions are isolated from each other. The result of a transaction is the same as if it were the only transaction running.
Durability: Once a transaction is committed, it will remain so even in the event of a system failure (like a power outage).

Scaling

Scaling is the ability to handle increased demand. There are two main ways to scale a database:

Vertical Scaling (Scaling Up): This involves increasing the resources of a single server, such as adding more CPU, RAM, or storage. It's simpler to implement but has a physical limit and can be more expensive. Think of it as upgrading a single computer with better parts.
Horizontal Scaling (Scaling Out): This involves adding more servers to the system. It's more complex to manage but provides greater flexibility and is often more cost-effective for large-scale systems. Sharding and replication are examples of horizontal scaling. Think of it as adding more computers to the network.

For the record, the images and diagrams used in this guide are from FreeCodeCamp.

DEV Community