In the previous post, we showed how we used MongoDB replication to solve several problems we were facing.
Replication got to be a part of a bigger migration which brought stability, fault-tolerance, and performance to our systems. In this post, we will dive into the practical preparation of that migration.
I noticed the lack of tutorials of setting up Mongo replication on Docker containers and wanted to fill this gap along with some tests to see how a Mongo cluster behaves on specific scenarios.
To improve our production database and solve the identified limitations, our most clear objectives at this point were:
- Upgrading Mongo v3.4 and v3.6 instances to v4.2 (all community edition);
- Evolving Mongo data backup strategy from
mongorestoreon a mirror server to Mongo Replication (active working backup server);
- Merging Mongo Docker containers into a single container and Mongo Docker volumes into a single volume.
When our applications were developed, there was no need to pass the Mongo connection URI through a variable, as most of the time Mongo was deployed as a microservice in the same stack as the application containers. With the centralization of Mongo databases, this change was introduced in the application code to update the variable on our CI/CD software whenever we need.
openssl rand -base64 756 > <path-to-keyfile> chmod 400 <path-to-keyfile>
The keyfile is passed through a
keyfile argument on the
mongod command, as shown in the next step.
User authentication and role management is out of the scope of this post, but if you are going to use it, configure it before proceeding beyond this step.
mongod --keyfile /keyfile --replSet=rs-myapp
Typically, in this step, you simply choose a server network port to serve your MongoDB. Mongo’s default port is 27017, but since in our case we had 4 apps in our production environment, we defined 4 host ports. You should always choose a network port per Mongo Docker container and stick with them.
- 27001 for app 1
- 27002 for app 2
- 27003 for app 3
- 27004 for app 4
At step 10, after having replication working, we'll only use and expose one port.
Preferably, set up 3 servers on different datacenters, or different regions if possible. This will allow for inter-regional availability. Aside from latency changes, your system will survive datacenter blackouts and disasters.
Why 3? It is the minimum number for a worthy Mongo cluster.
- 1 node: can't have high availability by itself;
- 2 nodes: no automatic failover — when one of them fails, the other one can't elect itself as primary alone;
- 3 nodes: minimum worth number — when one of them fails, the other two vote for the next primary node;
- 4 nodes: has the same benefits as 3 nodes plus one extra copy of data (pricier);
- 5 nodes: can withstand 2 nodes failure at the same time (even pricier).
There are Mongo clusters with arbiters, but that is out of the scope of this post.
Adjust your priorities to your cluster size, hardware, location, or other useful criteria.
In our case, we went for:
appserver: 10 // temporarily primary node1: 3 // designated primary node2: 2 // designated first secondary being promoted node3: 1 // designated second secondary being promoted
We set the node which currently had the data with
priority: 10, since it had to be the primary in the sync phase, while the rest of the cluster is not ready. This allowed continuing serving database queries while data was being replicated.
(*N being the number of Mongo cluster nodes).
Use an orchestrator to deploy 4 Mongo containers in the environment, scaling to 3.
- 4 is the number of different Mongo instances;
- 3 is the number of Mongo cluster nodes.
In our case, this meant having 12 containers in the environment temporarily.
Remember to deploy them as replica set members, as shown in step 3.
This is the moment when we start watching database users and collection data getting synced. You can enter the
mongo shell of a Mongo container (preferably primary) to check the replication progress. These two commands will show you the status, priority and other useful info:
rs.status() # and rs.conf()
When all members reach the secondary state, you can start testing. Stop the primary node to witness secondary promotion. This process is almost instantaneous.
You can stop the primary member by issuing the following command:
docker stop <mongo_docker_container_name_or_d>
When you bring it back online, the cluster will give back the primary role to the member with the highest
priority. This process takes a few seconds, as it is not critical.
docker start <mongo_docker_container_name_or_id>
If everything is working at this point, you can stop the Mongo instance on which we previously set
priority: 10 (stop command in the prior step) and remove that member from the replica set passing its hostname as parameter.
Repeat this step for every Mongo container you had on step 4.
As mentioned in the previous post, one handy feature of MongoDB replication is having a secondary member asking for data to
mongodump from another secondary member.
Previously, we had the application + database server performing
mongodump of its data. As we moved the data to the cluster, we also moved the automated backup tools to a secondary member, to take advantage of said feature.
If you only had 1 Mongo Docker container at the start, skip to step 12.
Besides having simplicity telling us to do this before step 1, we decided to act cautiously and keep apps and databases working in a way as close as they were before until we mastered Mongo replication in our environment.
At this stage, we chose to import data from all Mongo databases to a single Mongo database — the one which contained the most data. When working with MongoDB, remember this line from the official docs:
In MongoDB, databases hold collections of documents.
That means we can take advantage of
mongodump --db <dbname> and
mongorestore --db <dbname> to merge Mongo data into the same instance (this goes for non-Docker as well).
When you have merged your databases into the same instance, you will shut down other instances, right? Then, you will only need to monitor the application and perform backups of that same instance. Don't forget to monitor the new cluster hardware. Even with automatic fault-tolerance, it is not recommended to leave our systems short. As a hint, there is a dedicated role for that called
Sharing this story about our database migration will hopefully help the community — especially those not taking full benefits from MongoDB already — to start seeing MongoDB in a more mature and reliable way.
Even though this is not a regular MongoDB replication "how-to" tutorial, this story shows important details about MongoDB’s internal features, our struggle to not leave any details behind, and, again, the benefits of such technology. That's what I believe technology is for — helping humans with their needs.