They say that no man (or woman, or person, really) is an island; these days, we could really just add “computer” to the list. We are surrounded by machines, computers, and databases that are are talking to one another. In fact, most of the applications and services that we interact with (and build!) every day are actually a whole bunch of computing elements that are talking to one another, even if we might not know it.
The study of these systems and how they work are part of the world of distributed computing , which centers around the study of distributed systems. In some ways, distributed systems are a continuation of or extension from the realm of computer science — it often involves problem-solving, dividing things into discrete tasks, and figuring out how to handle, store, and process data.
But in other ways, distributed systems are nothing like computer science at all. Distributed computing almost requires us to throw our assumptions of how machines work out the window. When dealing with a single computer, we might even find things to be kind of simple. But what about when there are multiple computers involved?
Well, that’s a different story entirely.
In order to understand what exactly constitutes a distributed system, we must first understand what is not a distributed system. To be clear, there is certainly some dispute when it comes to defining what exactly is “the opposite” of a distributed system. In theory, we could define the opposite of a distributed system in different ways, because the definition of a distributed system somewhat depends on what the components of the system really are. But more on that in a bit.
We can think of a non-distributed system as a “single” system. A single system is one that does not communicate with others and functions on its own is not a distributed system.
A single process on our computer is a single system, which operates on its own. If a process does not communicate with other processes, then it inherently isn’t part of a larger system. We could also think of our machine on its own, disconnected from the internet, as a “single” system — although there has been research which claims otherwise (like this 2009 paper).
The word distribute means to disperse, scatter, or spread something across a space. If we consider the definition of this word and how a single system works, then it becomes pretty evident that a single system on its own is not a distributed system. There is only one machine working on its own, so obviously this single machine can’t be scattered about!
So in that case, what actually is a distributed system? Well, if we think about how machines in the real world really interact we begin to realize that, actually, most computers exist within a distributed system. Computers are rarely used in the context of just themselves; we’re almost always using them to interact with some sort of application or service.
If you’ve ever played a multiplayer game online, booked a flight, tweeted a cat gif, streamed a Netflix show, or bought a onesie on Amazon — you relied on a distributed system to do it.
Indeed, you probably interact with the largest distributed system on a daily basis: the Internet! But distributed systems aren’t all large-scale. In fact, the largeness of them isn’t even what makes them distributed.
A distributed system is nothing more than multiple entities that talk to one another in some way, while also performing their own operations. Such a system could be something as simple as smart sensor or a wireless plug in your house that captures and sends data through a wifi network, or even just a wireless keyboard or mouse that can connect to your laptop.
Just as long as all the processes in a system are both autonomous , or capable of doing performing own operations, while also able to communicate with other processes in the system, we can classify the system as being distributed.
Now that we’re a bit more familiar with what a distributed system is, let’s take a closer look at it’s main characters — namely, the “entities” within the system!
You may have noticed that I’ve referred to the components of a distributed system as a “computer”, “process”, and even just as a “machine”. The exact term that we use to describe the pieces of a distributed system truly depend upon what the system itself looks like, and what kind of system it is. If the system is a bunch of distributed servers, then maybe the components could be referred to as “servers”; if the system involves processes talking to one another, then perhaps the entities are just “processes”.
To help combat the discrepancy in terminology here, we actually can use a different, more general term altogether. We can refer to the individual entities in a distributed system as the nodes of the system.
If the term “node” feels familiar (and reminds you of graph theory), then your instinct is correct — there is indeed a connection here! And if we think about a distributed system being a network of computing elements (which is exactly what they are), then we can visualize that network as a graph made up of interconnected nodes.
Since we know that a distributed system can function on a large or small scale, we also can deduce that the actual nodes themselves can vary in nature. A node could be a hardware device (like a sensor), or it could it could be a software process (a client or a server). The nodes themselves also need not be in the same place — hence the “distribution” of the system — and could very well be physically separated by great distances.
Even though the nodes in a distributed system correspond so similarly to the nodes in a graph, there are some aspects to the nodes in a distributed system that make things a little tricky. There are some assumptions we make when dealing with a single system that prove to be incorrect when it comes to a distributed system. And when it comes to distributed computing, almost all of the obstacles that will come in our path have to do with one thing: communication between nodes.
Because the nodes in a distributed system are by definition autonomous, they are capable of running their own operations. The operations that take place within a node (are run by the node itself) need not rely on external information. In other words, a node can run its own operations without needing to communicate with other nodes that exist within the distributed system. This means that the node can run its own operations, without anyone’s help, and can run them quickly.
Operations within a node are fast; however, the same cannot be said for communication between nodes.
As we now know, the nodes of a system could be located in different places and are reliant upon the distributed system and its network to do all their chatting back and forth, communication is an entirely different story. While performing tasks within a node might be fast, communicating between two nodes is not guaranteed to be so. In fact, communication between two nodes is often pretty slow (not to mention unreliable!), which happens to be one of the biggest problems in distributed computing.
Not only do operations within a single node occur quickly — they also happen in order. This might seem as an obvious fact to us at first because of course events happen in the order, right? Well, when it comes to distributed systems, the answer is…not always.
Even though the operations within a node occur in order, the moment that multiple nodes have to work together in a distributed system, things can get a little messier. Once we move from a single system/a single node to a distributed system/multiple nodes, then its possible for the operations across a group of nodes to render in an incorrect order.
Part of the reason for the orderliness of operations within a node is due to the fact that each node within a system operates according to its own clock.
If we think about of the different things that could be a node within a distributed system — a sensor, process, server or database — this fact becomes apparent. But again, we might be able to guess how this could potentially be problematic in a distributed system: what if the clocks in two separate nodes in a system don’t quite match up exactly? This is another difficult problem (which we’ll discuss on in this series!) when it comes to distributed computing.
All of the things we know and love when it comes to dealing with individual nodes start seem unfamiliar and far less lovable once we throw many nodes into the mix. But that’s the fun of learning something new — like distributed systems! We’re going to have to change our perspective and the way that we think about systems, how they work together as a whole, and the different pieces that allow them to do their job effectively.
Seems like a great way to start a brand new series, if you ask me. 😊
There is so much to learn about distributed systems, and many places to start! Here are some introductory resources to help cement your understanding of what makes a distributed system and the entities that comprise it.
- A brief introduction to distributed systems, Maarten van Steen & Andrew S. Tanenbaum
- From layman to superman: distributed systems an introduction, Median Rawashdeh
- Introduction to Distributed Systems (DS), Professor Frank Eliassen
(open source and trusted by devs everywhere ❤️)