Thomas.G

Posted on Mar 24, 2020 • Edited on Apr 16, 2020

SlimIO Architecture #1 - Monolithic to Modular

#node #javascript #monitoring #architecture

Hey !

First article of a long series that will talk of the different architecture choices we have made to craft our monitoring tool SlimIO. I have written an introduction article few week ago if you don't known what is SlimIO yet.

For us SlimIO is what we call a pure Modular monitoring agent (and we like to call others Monolithic agent).

Monolithic Agent

A monolithic agent is always crafted as a one part software with no way of extending it (sometime extensibility is added as a third-party dependency.). From a technical point of view, this leads to many issues:

There is no way to upgrade an Agent without degrading/breaking the service.
No extension possible and customization is only possible by editing configuration file(s) keys that are exclusively managed by the team who craft the product (or the community behind it).
Will be deprecated quickly in time (the technical debt cannot be easily removed without massive impact).
Significantly reduces the possible agent's scope and use.
You are forced to deploy all the features and configure only what you want.

From a maintainability point of view, on the monolithic side, there is a massive project to maintain while on the modular each component is divided into several small projects (so very noticeable differences which are not necessarily negative or positive).

We can not deny that the monolithic architecture can meet specific needs with optimal performance (so there are no dark and white.).

Example

Prometheus
Nagios
Centreon
Zabbix
etc...

Modular Agent

A modular agent revolve around the idea of an architecture that is not by any mean specific to the monitoring world but work as a charm for it. Every features will be added through a new addon (which is itself an insulated container).

The core is the entity responsible of the communication as well as the loading. This is the principle point of failure so there is work to allow that component to be as much fault-tolerant as possible.

The benefits of such an architecture are:

There is no more delimited usage and scope (pick your poison).
Addons can be updated separately (In SlimIO, addons are upgradable with no service degradation 😎).
A clearly defined communication model from the beginning (in SlimIO we have decided to go with a one-one relationship.).

What do I mean when I say "no more delimited usage and scope" ?

Only deploy what matters.
Because modularity is in the DNA architecture, there is no more limitation on how much you can customize and extend the product with new addons.
There are no rules (code contract) on what kind of work an addon should do (Monitoring, maintenance, running tasks... it does not matter to us.).
DevSecOps friendly.

👀 In our team we have defined a list of best practices on how an Addon should be created (we follow the UNIX philosophy.).

This allows to use our Agent in multiple situations and scenarios. (A concentrator, a DMZ Proxy etc..). Our competitors will reinvent the wheel almost systematically to be able to answer these needs 🙊.

In SlimIO, a concentrator is just a SlimIO Agent with a defined set of addons to pull/push data from remote agents (to put it simply).

Keeping our agent for these scenarios allows to:

Simplify installation and administration for integrators (no need to open ten different getting started to able to achieve the monitoring you want).
Having a complete self-monitoring (no need to setup an agent).

The catch

Having such modularity comes with additional technical constraints:

Need for a clear defined ACL mechanism between addons.
Addons must be perfectly isolated from each other (for security and fault-tolerance reason).
Need to synchronize the addons with each other in a purely Asynchronous execution context.

And everything else related to the fact that we wanted to be able to upgrade an addon with no service degradation (this is what we call "Shadow Run/Upgrade").

This is why it took almost two years to build our agent's foundation.

(me when they ask for a metric after two years 😂)

Conclusion

To conclude i would say that no matter the choice there is always a price to pay... In our case we have to manage a lot of quite difficult abstractions and technical constraints.

Modularity also come with his performance price even if this is not clear for us how much it will cost (our goal is to be more efficient than our competitors on the way we deal with memory leaks and performance regression in the long term.).

We made these choices because we believe that they are relevant to answer the different issues we encountered with our current and past customers.

We have planned several articles which will complete this one in the coming weeks.

Thanks for reading!

Best Regards,
Thomas

DEV Community