Sergiy Yevtushenko

Posted on Sep 20, 2019 • Edited on Apr 17, 2020

Nanoservices, or alternative to monoliths and microservices...

#architecture #nanoservices #discuss

Microservices are ubiquitous. These days not doing microservices is something like not writing unit tests or not washing hands before dinner - if you're not doing this, you feel shame. Critical look at microservices sounds almost politically incorrect. But I'll try...

Sober look at microservices

First of all I'll be mostly looking at different types of architectures from pragmatic (read: real life) design point of view.

From design point of view, introduction of microservices gives only one advantage (by imposing limitation, by the way): it requires engineers to think much more carefully about boundaries of each service and how it interacts with the rest of the system.

For monolith application it is also important, but it always possible to take shortcut or hack something internally without much hassle. There is no way to perform such a trick with microservices. Needless to say that better thought out systems are working better.

Rise of microservices matched in time with rise of reactive approaches and maturity of other technologies (such as HTTP server implementations and microframeworks, for example). Along with shifting individual service scaling to external infrastructure all these things together enabled making systems more performant.

All of the above gave the impression that microservices architecture are better, faster and scalable. They, in fact, are. But from the design point of view there only two points which actually attributed to this gain:

Better thought out system
Ability to scale individual service rather than whole application

In order to achieve these two goals we were ought to pay high price. Let's keep aside deployment and maintenance nightmare. After all now it feeds a lot of devops and companies which sell software and services to manage this hell.

From pragmatic design perspective we lost during transition:

Design flexibility. There is no free lunch anymore, you can't easily refactor system and shift functionality from one service to other, can't easily change service API.
Simplicity of local deployment. To debug some issue you need to start a bunch of dependencies. In my practice I see that most apps can't be started locally anymore (or can be, but this is so complex that nobody bothers) and apps are debugged using debug print (say "hello" to 80's era).
Per-application handling of "environment issues" (network and disk failures, configuration management, monitoring, etc.) now need to be handled for each service individually. Yes, there are apps, frameworks and patterns to handle all of these. But now we ought to keep all these issues in mind all the time spending precious brain resources to things which are not directly related to business logic we're implementing.
Predictability of failure patterns. Now they are much more complex and harder to predict and prepare for. A whole lot of "semi-working" states of the system.

All of the above might look like appeal to return to monoliths. But it is not. Monoliths have their own set of issues. I see no point to repeat these issues, every article about microservices don't forget to list them.

So, are there alternatives to microservices and monoliths? Well, I think there is at least one.

Nanoservices

First of all lets try to summarize what alternative architecture should achieve:

Be service-based, where interface of each service is clearly defined. In this way we force engineers to be more careful about design
Be service-friendly - isolate service from environment issues as much as possible
Enable per-service scalability

Ideally it also should have following properties:

Keep external dependencies as minimal as possible
Be simple to deploy and maintain, in particular local deployment should not be an issue

First thing which comes to mind is traditional app server as it was initially envisioned by guys from Sun: apps should just plug in into it and use all available services. The idea didn't get expected acceptance, I think mostly because it was oriented at technologies and approaches which were modern by time of introduction of the idea. Nevertheless, it provides some kind of service-friendly environment, although too regulated and too limited to specific set of API's.

But there is somewhat different approach described below.
At the high level the architecture is a cluster consisting of identical nodes. The cluster is built on top of Data/Computing Grid (for example, Apache Ignite, Infinispan, Hazelcast, etc.). Unlike traditional approach, grid is not something external to application, instead each grid node in the same time is an application node.

Every node consists of a service-friendly shell and user services. Things at the right are part of shell, while things at the left are user services:
(Well, HTTP can be part of shell as well, this actually does not matter much.)

How it works

Every node has 4 working modes - single, dormant, slave and master.
The single mode is used for development/debugging or in very small deployments.
The dormant, slave and master are modes enabled in clustering environment. Node starts in dormant mode and tries to connect to cluster. While node in dormant mode, all user services are stopped, so there is no risk to do something wrong. Node is also switched into dormant mode if for some reason cluster can't be formed, for example if there is no majority of nodes in cluster (either because not enough nodes connected to cluster or cluster is experiencing network issues and node belongs to disconnected minority). Once node is connected to cluster and majority of nodes are available, node switches into either slave or master node, depending on the master election results. There is no difference between master and slave nodes from the point of view of user services. The difference is visible only to Cluster Manager (see below), which is enabled only on master node.

Service Manager

The Service Manager is responsible for starting/stopping individual services according to active configuration (which is stored in the data grid). Service Manager is listening to cluster events and once node is disconnected from cluster or connected only to minority of nodes in cluster, all services are immediately stopped, preserving system consistency.

Cluster Manager

The Cluster Manager is responsible to making decisions which services should be running at each node and how many instances. The Cluster Manager itself is activated only on master node, so there always only one source of truth about services configuration. The decision about number and location of services can be made using different approaches: static configuration, performance monitoring, heuristics, etc. It is also possible to enable Cluster Manager trigger starting/stopping nodes by interacting with external service (Amazon ECS/EC2, Kubernetes, etc.). Note that unlike external monitoring, Cluster Manager has access to all details, so it can make much more informed decision.

Data Grid

Well, this is just data grid code which is part of node.

Persistence Layer

This is an optional component which is necessary if data need to be persisted to disk (or some other storage). Technically this it is part of data grid configuration which enables storing local data to storage.

What we get

First of all, transparent and (almost) instant access to all data in the system. Data not just stored, but also replicated, so entire system is durable and reliable. The replication and consistency can be flexibly tuned to precisely fit requirements. Entire system can survive loss of some number of nodes (up to N/2-1, where N is the maximal cluster size) due to various issues. Loss of the nodes may somewhat affect performance, but does not affect system availability.

Service-friendly environment. Services are isolated from environment issues, and can behave like there are no problems with connectivity or anything like that. Shell takes care of retrying and redirecting calls to other nodes if necessary. All there significantly simplifies writing services and developers can focus on business logic rather than on issue handling. Overall services getting very thin and lightweight, so I've called them "nanoservices".

Whole system is highly scalable. Unlike microservices it has two dimensions to scale: scaling by adding nodes and scaling by starting more service instances. Starting services is much faster than starting new instances, so time necessary to react on load change is significantly smaller.

System is either working or not, there are no intermediate states. Failure patterns are limited in number and predictable.

Minimal dependencies. DB, messaging, queues, distributed computing, etc. are already built in.

Simple deployment and configuration. No need for external "orchestration" services.

It's quite easy to extend shell with more functionality, for example, let each node to be a Kafka node.

Data and processing are collocated, it is possible to design services and configure data grid so all processing will be performed at node which holds all (or most) necessary data locally. This approach can significantly reduce network traffic and distributes processing in natural way. By properly configuring data-to-node assignment it is possible to collect related data at same nodes even further improving performance.

Such an architecture is a natural fit for reactive asynchronous processing.

We have almost the same freedom of refactoring as we do with monolith.

Afterword

There are no real systems built with architecture described above (at least known to me). Nevertheless, system which contains most of elements of the described above architecture, I've designed and implemented few years ago and it still works just fine (at the best of my knowledge, since I'm not working for that company anymore).

Top comments (7)

Peter Harrison • Sep 21 '19

There is in fact a system very similar to what you describe above. Developed originally in 2013 and released as open source in 2015. It was originally developed as a workflow and process orchestration system, but over the last 3 years it has been utilized at a major fortune 500 company for analytics and integration. There is a ancient version of it in a public repo under the GPL, but since then I've done substantial changes to it. Sadly this version is yet to be released back to the public repo. I'm hopeful that we will get this version released and made available for others to use it. It is Java however, not Node. Exposes REST and GraphQL interfaces for clients.

Sergiy Yevtushenko • Sep 21 '19

Yeah, the idea is not new. The implementation which I mentioned in post was done in 2012 and since then its core structure was used as a platform for other systems in one large US bank. And yes, it's also Java. The core was built on top of Infinispan with custom persistence layer and custom queues built on top of Infinispan.

Sergiy Yevtushenko • Sep 21 '19

For some time I'm working on open source framework dedicated to proposed architecture, but progress is rather slow. For now it's more like a deep research on convenient ways to write apps with such design.

Mircea Sirghi • Aug 12 • Edited

IMHO the whole idea is based on the solution provided by Data Grid. This seems to be the core concept, the rest of it is being enabled by this solution. But you basically missed to describe what Data Grid is. And just by checking Chat GPT on microservices with Data Grid - it is used as a particular solution to receive info faster than by using conventional persistence data retrieval.

Sergiy Yevtushenko • Aug 23

The whole idea was implemented without Dara Grid and complete implementation requires Compute Grid elements. And first implementation was done before the term "microservices" became widely used.

Mircea Sirghi • Aug 31

You link this article from here dev.to/siy/the-saga-is-antipattern... as a way to avoid SAGA pattern. This article describes an architecture that relies on data grid (or alike solutions). It is also the core concept that solves the SAGA pattern problem in a different way(2PC or Atomic - if within the same node, or eventually consistent that is somehow also part of data grid to solve it). There is nothing above Data Grid that helps removing SAGA from a project, or I misunderstood ?

Sergiy Yevtushenko • Aug 31

Even a data grid is not necessary. The underlying consensus protocol in the cluster enables reliable implementation of any known algorithm with the required level of guarantees.