Introduction
While building distributed systems, Transparency is a very important factor. The Engineer has to consider Access transparency, Concurrency transparency, Location transparency, Replication transparency, etc. Replication transparency answers the question, 'Will my data resources always be consistent ?'.
What is Replication transparency?
With distributed systems, we can access different copies of our resources, which helps with redundancy, backup, speed, etc. Having replicas of a particular resource, also raises the issue of consistency. How do we ensure that all the replicas of a particular resource are consistent at all times? Two-phase commits can help in ensuring that if for any reason, all the replicas of a particular instance don't get updated may be due to timeouts or propagation errors, the instances will be rolled back to their previous state. This means that the update is lost and has to be done again.
Three models help us with handling replicas:
- Primary-Backup / Master-Backup Model
- Peer to Peer Model
- Master-Slave Model
The Primary-Backup model exposes only one instance to all external processes. This instance is the master instance, and it has read and write permissions. All other instances or replicas have only read permissions. So with this model, we are sure that only one instance can be updated, and then the change is propagated. The drawback of this model is that it isn't scalable, because only one instance is exposed and if that instance crashes before propagation happens, we will still encounter inconsistencies.
The Peer to Peer model gives all the instances read and write permissions. With this model, we will observe performance issues, especially when we need to propagate very large chunks of data. Maintaining global consistency will also be difficult. It is best suited for applications that require low data replication. User-specific applications for example.
The Master-Slave model has one instance as the Master model, with read and write permissions. The other instances(slaves) have read permissions, but are "hot-spares" in the sense that immediately they notice that the Master node is down, a slave becomes the Master. It is best used for systems where reading operations are higher than writing. Eg. Databases. This is because to write or update an item on a database, it reads first (read-modify-write).
Which Slave is selected to be the Master?
This is where the Election algorithm comes in. It is used to elect a slave(to be master) after the master node fails.
We have the
- Bully Election Algorithm
- Ring Election Algorithm
- Leader Preelection Algorithm
The Bully election algorithm takes the node with the highest ID as the next master. Once a node realizes that the master node has failed, the election process starts. If the last node to join the conversation is the node with the highest ID then the election process is going to take some time compared to when the node with the highest ID joins first.
The Ring election algorithm implements the Bully election algorithm but the nodes are arranged in a logical ring. This means each node sends messages to its neighboring nodes, and not to every node.
The Leader Preelection algorithm chooses the "backup" master node while the master node is still running. It still implements the election algorithm, but it happens while the master node is still running. This eliminates the overhead that happens with the other methods, but its also a waste of resources because the backup nodes can fail before the master, and then the elections will keep happening.
Simulating the Election Algorithm
We will be simulating the Bully election algorithm, using four docker containers which will represent our nodes. (1 master and 3 slaves) running NodeJS and a message-broker(Rabbitmq). I initially tried using actual VMs, Welp. Good luck with that.
To achieve this simulation, we'll have to:
- Create a Docker network, which will host all the containers and the rabbitmq server.
- Spin up the rabbitmq server, and bind the port to rabbitmq running on our localhost.
- Spin up four docker containers from our Dockerfile.
- Use the Pub/Sub pattern, and the fanout method, so that every node sends and receives messages from every node.
Create a Docker network
# The name of this network is election-algorithm_default
$ docker network create election-algorithm_default
# confirm it exists and copy the network id
$ docker network ls
The Rabbitmq Server
The Server will use the management alpine image, so ports 5672 and 15672 will be used. If any processes are running on these ports, you will need to kill them.
# Run the rabbitmq image in detached mode
$ docker run -it -d --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.6-management-alpine
# Confirm its running and copy the container id
$ docker container ls
Now, we can add the rabbitmq server to our network, so it can communicate with the other nodes.
# connect the rabbitmq server to the network
$ docker network connect <NETWORK_ID> <CONTAINER_ID_OF_THE_RABBITMQ_SERVER>
# Confirm its running
$ docker inspect election-alogithm_default
# You should see a "containers" key with the rabbitmq server.
Create Dockerfile
In our present directory, We'll need a server.js file and some dependencies.
$ npm init && npm i --save amqlib node-cron && touch server.js Dockerfile
Then our Dockerfile
FROM alpine:latest
WORKDIR /usr/src/app
# Install Node js and npm
RUN apk add --update nodejs npm
RUN npm install
COPY . .
CMD ["node","server.js"]
Now, we'll need to get the IP address of the Rabbitmq server, because that is what we'll connect our containers to. This will enable all the containers to see all the messages from neighboring containers or nodes.
$ docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <CONTAINER_ID>
#OR
$ docker inspect <CONTAINER_ID> | grep "IPAddress"
We should be able to see our IP address from any of those results.
Server.js
In this file, every node sends a heartbeat to the rabbitmq server, which is a cron job that runs every 10 seconds. Every node can see all the responses and then sort the information according to the container ids. The container with the highest ID is automatically the master, and if that node fails, the next node takes over! We'll store the messages in a set so that there will only be unique ID's.
The server.js file should look like this
// Require libraries
const amqp = require("amqplib/callback_api");
const cron = require("node-cron");
const os = require("os");
//Connect to the IP address of the Rabbitmq container
const url = `amqp://guest:guest@${IP_ADDRESS_OF_THE_RABBITMQ_SERVER}`;
//The transmitter
const sendContainerIdToOthers = () => {
/**
* method for sending containerId to other nodes
* @param {null}
* @returns {null}
*
*/
// This returns the container id
console.log(`My id is ${os.hostname()}`);
//Connect to the server
amqp.connect(url, (error0, connection) => {
if (error0) throw error0;
//Create channel
connection.createChannel((error1, channel) => {
if (error1) throw error1;
//Create exchange
const exchange = "logs";
//Send Message indicating your ID
const msg = `My id is ${os.hostname()}`;
//Use the fanout mechanism
channel.assertExchange(exchange, "fanout", { durable: false });
//Publish this message
channel.publish(exchange, "", Buffer.from(msg));
});
});
};
//The receiver
amqp.connect(url, (error0, connection) => {
if (error0) throw error0;
connection.createChannel((error1, channel) => {
if (error1) throw error1;
const exchange = "logs";
channel.assertExchange(exchange, "fanout", { durable: false });
channel.assertQueue("", { exclusive: true }, (error2, q) => {
if (error2) throw error2;
console.log(`Waiting for messages in ${q.queue}`);
channel.bindQueue(q.queue, exchange, "");
//Since we want the IDs to be unique, we'll use a set
let resultSet = new Set();
//Clear the set every 15 seconds
setInterval(() => {
resultSet = new Set();
}, 15000);
channel.consume(
q.queue,
msg => {
if (msg.content) {
console.log(`received: ${msg.content.toString()}`);
//Split the response to get the ID
const id = msg.content
.toString()
.split("is")[1]
.trim();
//Add ID to the set
resultSet.add(id);
console.log("Container id's", resultSet);
//FInd the master node
const findMaster = Array.from(resultSet).sort();
console.log(`Our Master Node is ${findMaster[0]}`);
}
},
{
noAck: true
}
);
});
});
});
//Run every 10 seconds
cron.schedule("10 * * * * *", () => sendContainerIdToOthers());
Results
Now we can spin up four servers from the Dockerfile and connect them to the network
# build the image
$ docker build --tag=server1 .
# Run this command for three other servers, server2, server3, and server4.
#Run the image and connect the container to the network election-algorithm_default
$ docker run -it -d --network <NETWORK_ID> server1
# Run this command for three other servers, server2, server3, and server4.
#Confirm they are running
$ docker container ls | grep server1
After 10 Seconds, we can check the logs of any of our nodes
$ docker logs --follow <CONTAINER_ID>
Then, we'll see all the nodes join in, and how the master node is changed when a higher node comes in.
If we kill a node, we'll find out the next elected node according to ID, becomes the Master.
Conclusion
I just got started with Docker / Distributed systems, I hope this informs you a little. The repo for this is here.
Top comments (3)
Nice post!
I love how practical you went. You should write more posts like these ๐ช๐ฝ
P.S.
$ docker run -it -d --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.6-management-alpine
Should this use the interactive flag, since it will be running in detached mode?
Thanks ๐. It shouldn't actually, but I can attach it at any time and run in foreground mode.
vice.com/en/article/8x7akv/masters...