Almost everything you need to know about distributed systems

#devops #computerscience #systems #blockchain

What is a distributed system?

A distributed system is a collection of computers that appears to its users as a single coherent system.

Characteristics of a distributed system

Availability – i.e. resources on demand. Distributed systems are designed with multiple failover systems that ensure that traffic is always redirected to healthy instances that are able to serve the request. Users of the system are not concerned with which server delivered their request and if there are failing instances in the background.
Scalability – Distributed systems are able to handle huge amounts in traffic by automatically adding instances when there is increase in network traffic and scaling down when the traffic reduces. One bottleneck to this approach is how to ensure data is routed to the correct instance. This is the work of load balancers. Load balancers are used to distribute traffic between multiple running instances in a network. Load balancers could either be hardware or software based. There are two types of scaling: horizontal scaling and vertical scaling. Horizontal scaling means adding more computing nodes while vertical scaling means adding more computing resources (CPU, RAM).

Horizontal scalability is usually more desirable
.

History of distributed systems

Advances in computing technology and high-speed networks made the area much more attractive. Traditional mainframe servers wasted a lot of resources since there was tight coupling between the kernel and application layer dependencies. This more often than not meant that there would be one server configured for databases, another for serving application content and another one for storing backups. Making changes to the architecture often took months of planning and consultations. This was quite expensive as this meant consulting professionals, shipping the hardware and scheduling downtimes for maintenance. To get an application up and running, you had to wait for this process to end in order to deploy. Since the application dependencies depended so much on the underlying operating system, applications designed for x86 and x64 bit machines would not necessarily work on linux kernel machines.
Then came virtualization which encouraged loose coupling between the application and the operating system. In part this was achieved through the use of a hypervisor which is a logical layer between the operating system and the application. This basically meant that you could install multiple applications in the same server and they would share resources. However, as the application grew in size and more dependencies were added, managing them became difficult. Updating one application’s dependencies would often lead to breaking changes in other applications.

Come in distributed systems…

As the nature of applications became more complex, suddenly engineers were required to make three-tiered web applications and microservices that were used by millions of users located is geographically diverse regions. Traditional on-premise architecture could not serve this niche of applications. Due to latency, users located further from the server would have awful user experiences because of increased round-trip times. In addition, this type of architecture was not optimized to handle spikes in network traffic. If network requests exceeded the hardware capacity, the server simply crashed.

Goals of a distributed system

1) Transparency – Distributed systems hide the fact that the processes and resources are physically distributed across multiple computers
2) Resource sharing – Distributed systems were designed to make it easier for users to access and share resources.
3) Concurrency – A resource may be shared by several competitive users
4) Scalability – A distributed system should remain effective when there is a significant increase in the number of users.
5) Openness – A distributed system should be able to log out all processes that consume resources and previous activity.

Advantages of a distributed system over a centralized system

1) Economics – A collection of microprocessors offers a better price/performance ratio than traditional mainframes.
2) Reliability – If one machine crashes, the system as a whole can survive. Batching and disk scheduling can be used to build systems whose efficiency increases with increase in load. As applications grow more popular, if the underlying architecture is not customized for stable performance under high load then crashes are inevitable.
3) Speed – A distributed system may have more computing power than a mainframe.
4) Scalability – Computing power can be added in small increments which leads to modular expandability.
5) Data sharing – Multiple users of a distributed system can access a common database

Communication in distributed systems

There are two types of protocols:
1) Connection oriented protocols – Necessitates that before exchanging data, the sender and the receiver first establish a connection, negotiate the protocol they will use and after they are done they must terminate the connection.
2) Connectionless protocols – No advance setup (pre-warming) is required. The sender just transmits the first message when it is ready.

Types of communication in distributed systems

1) Persistent – A message that has been submitted for transmission is stored by the communication middleware for as long as it takes to deliver it to the receiver.
2) Transient – A message is stored by the communication system only as long as the sending and receiving application are executing.
3) Asynchronous – A sender continues immediately after it has submitted its message for transmission.
4) Synchronous – The sender is blocked until its request is known to be accepted.
5) Discrete – The parties communicate by messages. Each message forming a complete unit of information.
6) Streaming – Involves sending messages one after another where the messages are related to each other by the order they are sent.
7) Multicast communication – Application-level multicasting, Gossip-based message dissemination (max rounds, seeing nodes with message)

Types of distributed systems

1) Distributed computing systems
I. Cluster computing systems
Supercomputers built from off the shelf hardware then placed in high-speed networks.
Usually, a single program is run in parallel on multiple machines.
II. Grid computing systems
Composed of different types of computers
2) Distributed information systems
Made to distribute information across several servers.
Often used with transaction systems i.e., banks and travel agencies.
The most common communication methods include:
I. Remote procedure calls
Preforming read operation in RPC
i. Client procedure calls the client stub
ii. The client stub builds the message and calls the client operating system.
iii. The client operating system sends the message to the server operating system
iv. The server operating system gives the message to the server stub
v. The server stub unpacks the parameters and calls the server process
vi. The server process does the work and returns the result to the stub.
II. Remote method invocations
3) Distributed pervasive systems
These are distributed systems involving mobile and embedded computer devices.

Architecture of distributed systems

Distributed systems architecture can further be broken down into:
1) Software architecture – Describes how software components are organized
Various software architecture patterns include:
I. Layered architecture
II. Object based architecture
III. Data centered architecture
IV. Event driven architecture – Communication happens through propagation of events
V. Serverless architecture
VI. Micro Service architecture
2) System architecture – Involves the installation and placement of software components on real machines
Various types of systems architectures are:
I. Decentralized architecture – Peer to peer system
II. Centralized architecture – client-server system
III. Hybrid architecture

DEV Community