Even though we might be new to distributed systems, by now we can see that, by definition, they involve many moving parts. Those moving parts make things infinitely more complex, because now, instead of a one-woman show, we have a whole cast of characters (read: nodes) to deal with!
In a previous post, we began investigating the ways in which the most simple things that we may perhaps take for granted within a single system start to become far more complex in a distributed system. We know that nodes can operate and function autonomously on their own. And in some ways, that’s the easier part of understanding a distributed system.
So what about the ways that the nodes work together? If we think deeply about it, autonomous nodes that do not work together defeat the purpose of distribution! The nodes in a distributed system will inevitably have to communicate with one another, which we already know is where things will get tricky.
But exactly why do we want nodes to cooperate and work together? When effectively implemented, what affordance(s) does a distributed system provide us with? Well, it’s time to find out.
Because distributed systems are still actively being studied and researched, a quick cursory dive into the topic will yield a variety of different results. For example, you might find that even just the definition of a been defined of the most famous definitions of distributed systems comes from Leslie Lamport, a Turing award-winning computer scientist.
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.
— Leslie Lamport
What I find the most interesting about this definition is the idea that, when using distributed systems, things can be happening without you having any knowledge of them, whatsoever. In Lamport’s definition, this means that something can go wrong (part of the system can fail), but you (as the users) may not even know it.
But what I think is really at the heart of this idea is the fact that a distributed system hides things from you.
We know that distributed systems, by definition, involve multiple machines/processes/resources (called nodes), which may be located in different places altogether. However, when working with a distributed system, everything seems to appear as one cohesive unit. In other words, even though we, the user, might be interacting with several nodes at once, to us it doesn’t appear that way at all. Instead, it likely feels like we are dealing with a single system, or a single machine/process.
For example, when we order something off of a web-based ecommerce platform, we may actually be fetching data from one node, sending data to another node, and awaiting a response from another node, which, in turn, could very well be relying on three other nodes for something or the other! To us, the end user, everything might feel seamless, as though we are simply interacting with a single website or platform. But in reality, there are actually many nodes working together to make these processes appear as though they are a single unit.
This is a common concept in distributed systems, and is known as distribution transparency. A distributed system that is made up of autonomous nodes but presents itself as a single system is known as a system that is transparent.
The transparency of a system is a major factor when it comes to designing a distributed system. Ultimately, transparency is responsible for determining the ways in which the nodes of a system will work together. In other words, the way in which a distributed system handles transparency shows us how, exactly, the system hides its “distribution” from the end user, appearing instead as as one cohesive system (that is not distributed).
But what does it mean to hide the “distribution” of a distributed system?
Well, if we think of a distributed system as a bunch of various nodes that are dispersed and scattered about — but are working together — then we can deduce that there could be various things to “hide” here. For example, transparency could hide the fact that a system could be processing data across various nodes. It could also hide the fact that data could live and be stored across many nodes, which are also working together.
No matter the exact way(s) in which a system actually achieves this, the ultimate goal of distributed transparency is to create the illusion that the system is not a bunch of disparate parts (various nodes) but rather one whole (a single system). It is important to note that this illusion — the transparency of the system — is not only created for the end users of a system, but also for programmers using a system, as well as any potential applications that may be occasionally reaching out the system to use it.
Now that we understand the goal behind designing a transparent system, let’s dive into what that actually looks like in practice.
When designing or observing a distributed system, there are different ways to approach the problem of transparency. Mostly, it has to do with what behavior about a system we’re aiming to “hide”, so to speak. Depending on the way that a system is architected, some “pieces” of the system may be easier to disguise as a “whole” than other parts. This will start to become more and more evident as we start looking into the different forms of transparency in a distributed system. So, let’s get started!
In many systems, the way that a resource — could be an object, some data, a process, or even a machine — is accessed may actually be very different than what the resource really looks like, and how it is represented. For example, we could have different nodes in our distributed system, which could be running different operating systems, which might mean that the same resource on one node could be represented slightly differently on another node.
However, while the true representation of a resource may be very different, if we wanted our end users to feel like they were working within a single system, we’d need to hide this from them. Access transparency is what allows us to hide any differences in the data representation of our resource and how it is accessed.
A great example of access transparency in action is something that you might use every day: an API! An API is a reliable way to access resource(s) using the same, repeatable operations, which can be replicated on any machine while still yielding the same result. Even if the data changes shape or form, those details are hidden from the end user (or in the case of an API, they should also be hidden from the programmer or any external application using the API as a service). Access transparency abstracts away those details so that a resource can dependably be retrieved, even if it might be represented differently at certain times, or in certain places.
The more that we think about it, the more we may realize that access transparency is hiding things all around us! For example, the fact that you are able to read this article on some device is a result of access transparency, too! The web relies on access transparency in order to hide the fact that a web page can be accessed the same way (through a URL), no matter if you are browsing on a laptop, a mobile device, on an old operating system, or on a new one. Yet another illusion!
However, we may also be able to get a sense of how access transparency can be tricky to deal with, too. What happens if our system represents a resource in a certain way that an outdated/older operating system can’t handle (or doesn’t handle correctly)? Well, maybe our way of accessing our resource breaks! In that case, an end user would be able to see that they aren’t actually dealing with a single system, which ruins the “illusion” of transparency that we striving for.
Just as a resource might be represented differently on different nodes in a distributed system, it goes to follow that different resources could very well be located in different places! Location transparency is what hides where a resource is located. This form of transparency allows for objects within a system to be consistently accessible without their location being a factor (much less their location being known).
The web lends itself as yet another great example of this transparency; for example, a website lives at a predictable, obvious URL. However, where that website physically lives on the internet is abstracted away, because as far as the end user is concerned, they can find the resource (the web page) at a logically-named location. URL actually stands for “uniform resource locator” and it is a particularly wonderful example of how the name of the resource is located actually reveals nothing about where the resource lives (on a specific server, for example).
And the web isn’t the only example of this. A system could have an object stored in a database, or it could be generated inline, or it could actually live in a static file; in any case, a location transparent system hides this fact from the end user, and simply ensures that the user can always access the object, regardless of where it lives (and without revealing where it lives, either!).
Now that we know that one resource could live in a different location from another resource, we may also be able to conclude that resources can move around — that is to say, a resource could live in one place, and at some point, need to move to another node. Relocation transparency allows for resources to move from one node to another while they are being used or accessed by something (or someone) else.
Relocation transparency requires two important pieces that need to work in tandem together. First, the resource that is being moved needs to be able to “be” moved. Maybe that means being copied over to another location, or maybe perhaps it means duplicating the resource, moving it, and then removing it from its previous location. Secondly, it needs to “hide” the fact that it is moving if it is relocation transparent because someone (an end user, a programmer, or even just another node in the distributed system!) is likely accessing it.
Because of these two facets that both need to be accounted for (somehow!), relocation transparency can sometimes be complex and tricky to implement in a distributed system. Sometimes, it may not even be worth accounting for if “moving” resources is not something that often occurs. In those situations, instead of creating relocation transparent system, it maybe easier to take a resource offline in order to move it.
However, if you have many users accessing your system, or if moving around resources is something that needs to happen often, not accounting for this form of transparency will mean that the distributed network no longer behaves as a single, cohesive unit, and is therefore not entirely transparent to the end user, which may or may not defeat the purpose of the system that one is trying to distribute.
By now, we may be able to pick up on a theme here: some forms of transparency are harder than others, but each form does lend itself towards “hiding” the intricacies of our system from our end users, and does seem to require some thinking-through!
In the second installment of this post, we’ll take a look at some more (complicated) forms of distribution transparency, and the different tradeoffs and obstacles that they present. Stay tuned for part two!
There is a lot to learn about distribution transparency and its many tradeoffs! Here are some of my favorite resources to help you get started in your learning journey.
- Distributed System Principles, Professor Wolfgang Emmerich
- Transparencies, Professor Jon Crowcroft
- Distributed Software Development, Lecture 1, Professor Francis Marchese
- Goals of Distributed Systems, Professor Lu Ruan
- Distribution Transparency, Professor Juhani Toivonen