In this story I'll write about the solution that I've implemented to authenticate and manage more than six hundred instances of a distributed Node application (the bot) to a back-end. At that time I was working for my start-up that was serving an automation tool for Instagram. I needed to constantly exchange data between the bot instances and the associated user profile.
The Software as a Service allowed clients to sign-up, configure their working profile and control when to start or stop it. When started, a bot would have taken their configuration parameters and would have started to work on it. For each one of my clients I needed one bot that was processing their data.
The main challenge that I've faced for such system is to connect a bot to a profile. Who's working to what? I also wanted to understand how many bots were available at any given time, how many of them were currently working and if all the servers had the correct amount of bots running. This way I was able to understand if there was some kind of failure somewhere in the system.
It was clear that I needed a way to authenticate those bots, a method to update their state and check if they're available to work, a two-way binding to associate a bot to a user profile and a cronjob to check if everything was okay.
The idea about writing an additional software implementing a client-server model solution seemed to be the best thing to do. A single server instance would have taken the responsibility to manage the bots and the data received by them and the connection to the master database would have done the job in terms of data processing. As you might understand, after managing a front-end written in React, a back-end written in Laravel, all the DevOps on AWS and a bot written in NodeJS, I didn't want to add an extra piece to this stack, mostly because I hadn't enough money to hire an additional developer, I had to do it by myself.
This is why I've still implemented something very similar to a client-server model... using REST APIs!
This endpoint had the responsibility to authenticate the bots. After a successful authentication a token would have been generated and returned by the request. This way the system had something to remember the new bot instance. A single string parameter was required, the hostname of the server hosting it.
This one had the responsibility to give the bot instructions on what to do next. By checking both the bot's own state and the system's state, the back-end was able to understand if there were waiting profiles ready to be assigned, if the current profile has been stopped for whatever reason or if the bot could have kept running. This is why this one was the most critical endpoint: a bot would have called it every thirty seconds.
Another reason for this endpoint to exists is that after each request, I was saving an extra field to acknowledge that the current bot has done a request here. This way I could check if a bot was still alive or not.
Finally, this one was used to get data from the bot. There's nothing special to say about it, the required token parameter was what the back-end needed to understand who was the bot sending the data and to which profile was it bound.
What if a bot crashed for whatever reason? Every instance was built inside a Docker image with supervisor installed and configured to restart the instance in case of unexpected exits. But, you know, when you've got paying clients it's always better to double (or triple) check your job.
This is why I've thought about a back-end health check. The responsibility for it was delegated to a cron-job that had the task to check the existence for the heartbeat. If it was expired, the bot was considered as dead and deleted from the authenticated bot list.
Storing bot data using Redis
Bots are volatile and the system must be scalable to the infinity. I don't need any permanent storage solution, the RAM is all that I need: high speed and volatility.
The choice to use Redis has been made because its hashes data sets and the key-value store. The first solution came in handy because given a single hash you can get all the values present inside, very useful to get the state for a single server. On the other hand, the second one has been used to store the last heartbeat by expiring keys.
Speaking about the bot's data, it has been stored as a JSON string. From its class implementation to the plain object and converted back as needed.
Commands that have been used are the following:
- HSET to save a bot instance, used after successful authentication in /bot/auth
- HGET to get a bot instance, mostly used by /bot/heartbeat and /bot/push_data endpoints
- HVALS to get all the bots in a server, used by cronjob and status page
- HDEL used by the cronjob to delete a bot when dead
- SETEX to reset the heartbeat
- EXISTS to check if the heartbeat still exists
Here's a simple drawing about the structure. Each request interacts with some parts of it and it has been uniquely coloured.
If you wish to see the actual implementation... you're a lucky guy! I've taken part of my PHP application and I've re-written it in NodeJS. Everything is hosted in a GitHub repository. Feel free to have a look at it and play with it. Suggestions are welcome if you see something wrong!
Repository url: MrMavin/article-bots-to-backend
What are you going to get when playing with it?
If you've got there, thank you! Please take a moment to give me a feedback. I would like to know what do you think about this solution and if you've enjoyed reading this article :)