DISTRIBUTED SYSTEMS

#architecture #computerscience #systemdesign

Distributed vs Decentralized systems.

The most important distinction between the two is integrative view and expansive view
An integrative view is when there is a need to connect existing computer systems to each other.

An expansive view is when an existing system requires an extension through additional computers.

Distributed vs. Decentralized Systems

A key distinction between distributed and decentralized systems lies in their conceptual origin, which can be understood through two primary perspectives: the integrative view and the expansive view.

Core Concepts: Integrative vs. Expansive Views

Integrative View: This perspective arises from the need to connect and integrate pre-existing, often autonomous, computer systems. The goal is to make separate systems work together while respecting their administrative boundaries.
Expansive View: This perspective emerges when an existing, centrally-managed system needs to be extended or scaled out by adding more computers. The goal is to enhance the system's capabilities, such as performance or fault tolerance, while presenting a unified, single-system image to the user.

Decentralized Systems: The Integrative View

Decentralized systems are typically formed when the processes and resources of a networked system are necessarily split across multiple, often administratively separate, computers. They are born from the desire to connect existing systems that must remain independent.

Example: Federated Learning

In traditional machine learning (ML), massive datasets are brought to a central High-Performance Computing (HPC) cluster for model training. However, when data must remain within an organization's boundaries due to privacy or legal constraints, the training must be brought to the data.

Federated learning enables this by running multiple, parallel training sessions on separate, localized datasets. Each session produces a "local model." These local models are then aggregated (e.g., through model weight averaging) to build a more generalized "global model." This approach contrasts with centralized techniques where all datasets are merged into a single location for one large training session.

Distributed Systems: The Expansive View

A distributed system is a networked computer system where processes and resources are split across multiple computers to achieve scalability and reliability, while appearing to users as a single, coherent system. These systems are typically associated with the expansive view.

Example: Google Mail

Consider an email service like Gmail. A user configures their client with server addresses like imap.gmail.com and smtp.gmail.com, giving the impression of interacting with just two machines.

In reality, with billions of users, the service is supported by a massive, complex system spread across countless computers in data centers worldwide. This distributed system is designed to:

Ensure Scalability: Handle the immense load of millions of concurrent users.

Provide Fault Tolerance: Minimize the risk of losing mail due to hardware or software failures.

Maintain Transparency: Hide the underlying complexity from the end-user, who only sees a simple, unified service. The system expands or shrinks based on user demand and dependability requirements.

Key Challenges in Distributed & Decentralized Systems

Understanding the distinction between these systems is crucial because both face a unique set of complex challenges that are not present in single-machine systems.

Partial Failures: Unlike a centralized system that either works or fails completely, these systems can experience partial failures where one component fails while others continue to run. This makes error detection and recovery incredibly complex.
High Dynamism: Nodes (participating computers) can join and leave the network frequently and unpredictably. This dynamism requires sophisticated, automated management and maintenance protocols.
Security Vulnerabilities: Because these systems are networked, used by many applications, and often span multiple administrative domains, they are inherently vulnerable to a wide range of security attacks.

Perspectives for Studying Distributed Systems

To fully grasp their complexity, we study distributed systems from several different perspectives:

Architectural View: Focuses on the common organizational styles and patterns to understand how components interact and what dependencies exist between them.
Process View: Examines the different forms of processes that form the software backbone, including threads, virtualization, clients, and servers.
Communication View: Concerns the mechanisms and protocols that systems provide for exchanging data between processes.
Coordination View: Describes the fundamental coordination tasks (e.g., consensus, leader election) that happen "under the hood" to allow applications to execute correctly.
Naming View: Explores how processes, resources, and other entities are named and located. Effective naming schemes are crucial for accessing any entity in the system.
Consistency & Replication View: To achieve high performance and dependability, resources are often replicated. This view analyzes the challenges of keeping all copies of a resource consistent, especially after updates.
Fault Tolerance View: Dives into the means for masking failures and enabling recovery. This is one of the toughest aspects, as it involves numerous trade-offs, and completely masking all failures is provably impossible.
Security View: Focuses on how to ensure authorized access to resources and protect the system's integrity and confidentiality.

Core Goals of Distributed Systems

Building a distributed system is complex and should only be undertaken when necessary. The primary motivations are typically centered around two goals:

Resource Sharing: A fundamental goal is to make it easy for users and applications to access and share remote resources, which can be anything from hardware (printers, disks) to software and data.
Distribution Transparency: A key objective is to hide the complexity of the system's distribution from users and applications. The system should appear as a single, unified computing environment, masking the fact that its processes and resources are physically separated. Achieving transparency is done through middleware.

Decentralized systems is when processes and resources of a networked computer are split necessarily across multiple computers,

A distributed system is a networked computer system in which processes and resources are split sufficiently across multiple computers.

Decentralized systems are normally associated with integrative views of networked systems. They come to being because we want to connect systems yet, we may be hindered by administrative boundaries, such as in AI, there is need for massive amounts of data, normally data is brought to the HPC(High performance computers) to train the models, but when data needs to stay within the constraints of an org, we need to bring the training to the data. This is called federated learning
Federated learning is an ML approach that allows for multiple separate training sessions running in parallel to run across large boundaries, for example geographically, and aggregate the results to build a generalized model (global model) in the process. More specifically, each training session uses its own dataset and gets its own local model. Local models in different training sessions will be aggregated (for example, model weight aggregation) into a global model during the training process. This approach stands in contrast to centralized ML techniques where datasets are merged for one training session.

Distributed systems are associated with the expansive view of networked systems.

A well-known example is making use
of e-mail services, such as Google Mail. What often happens is that a user logs
into the system through a Web interface to read and send mails. More often,
however, is that users configure their personal computer (such as a laptop) to
make use of a specific mail client. To that end, they need to configure a few
settings, such as the incoming and outgoing server. In the case of Google Mail,
these are imap.gmail.com and smtp.gmail.com, respectively. Logically, it seems
as if these two servers will handle all your mail. However, with an estimate
of close to 2 billion users as of 2022, it is unlikely that only two computers
can handle all their e-mails (which was estimated to be more than 300 billion
per year, that is, some 10,000 mails per second). Behind the scenes, of course,
the entire Google Mail service has been implemented and spread across many
computers, jointly forming a distributed system. That system has been set
up to make sure that so many users can process their mails (i.e., ensures
scalability), but also that the risk of losing mail because of failures, is minimal
(i.e., the system ensures fault tolerance). To the user, however, the image of
just two servers is kept up (i.e., the distribution itself is highly transparent
to the user). The distributed system implementing an e-mail service, such
as Google Mail, typically expands (or shrinks) as dictated by dependability
requirements, in turn, dependent on the number of its users.

Why do we make the distinction between decentralized and distributed systems:

There are many often unexpected dependencies that hinder understanding the behavior of our systems, such as distributed and decentralized systems always suffer from partial failures.

Secondly in systems like this, partcipating nodes which are part of our network come and go which makes this system very dynamic. This therefore requires forms of automated management and maintenance which increases the complexity of the systems.

Lastly, the fact that these systems are networked, used by many users and applications across multiple administrative zones makes them vulnerable to security attacks. Therefore understanding their behavior and the systems as a whole requires that we understand how they can be and are secured.

We will study the systems from different perspecitves:

Architectural view, what are the common organizations, styles. This will teach us how various components of existing systems interact and depend on each other.
Process view, this is about understanding the different forms of processes that occur in distributed systems, such as threads, virtualization of hardware, processes, client, servers and so on. Processes form the software backbone of distributed systems.
Communication view, concerns the facilities that distributed systems provide to exchange data between processes.
Coordination view, what happens under the hood on top of which applications are executed. Describes the fundamental coordination tasks that need to be carried out as part of the system.
Naming view, to access processes and resources we need naming. More or so naming schemes that will lead to the process resources or whatever other type of entity that is being names.
Consistency and replication view, A critical aspect of distributed systems is that they perform well in terms of efficiency and in terms of dependability. The key instrument for both aspects is replicating resources. The only problem with replication is that updates may happen, implying that all copies of a resource need to be updated as well. It is here, that keeping up the appearance of a nondistributed system becomes challenging.
Fault tolerance view, dives into the means for masking failures and their recovery. This is the toughest perspective of understanding distributed systems. This is in part to so many trade offs and also completely masking failures and their recovery is provably impossible.
Security view, there is no nonsecured distributed system, this will allow us to focus on how to ensure authorized access to resources.

In the case of building distributed systems, just because you can build one does not mean it is necessary as they are very complex unless under certain circumstances.

Resource sharing

An important goal of a distributed system is to make it easy for users and applications to access and share remote resources. Resources can be virtually anything.
Distribution Transparency

One thing that distributed systems try to hide is that it’s processes and resources are physically connected by computers that might be very long distances between them. Meaning that it tries to make its resources and processes transparent