The problem we’ll talk about is quite common, as microservices can’t be fully independent. But very often one service needs some data from another one to be able to invoke the business logic or to return this data to the client. In the classical implementation of microservices architecture each service has its own database, so to get this information there should be a method to connect.
First I will mention two solutions that are commonly used but their main purpose is different.
1. UI composition
UI composition is great mainly because it’s very simple for backend developers. They have to do literally nothing to achieve the goal. It may work when there is a need to combine data from several services and the logic behind it can be performed on the client side. But, obviously, it’s not always possible and it’s hard to imagine the application for which it may be enough to use only this way of fetching data. Still, it may cover some use cases and is worth mentioning.
2. Aggregate service
Another way is to have a separate service that calls all the source services, fetching data from each of them. Then it combines the received data into one data structure and returns it to the client. This functionality is what the API Gateway service is usually responsible for. This pattern is very common in microservices architecture and works in most cases related to fetching data for serving GET requests. But it is very limited when you have to apply business logic and, usually, it’s not a functionality of the Gateway service, as all the domain-related logic is contained in the dedicated microservice.
So, these two patterns mentioned above are well-known and widely used but they are not a good fit for the specified problem. That’s why there are other ways.
Direct call to the other service
This is the most obvious way to solve the task. If you need some data from a particular service, just ask for it! Most of the services have an API and this API can be used to fetch data.
At first glance, it looks simple: the first service sends a request to the second. It may be HTTP request, gRPC call, or something else. The second service fetches data from the database, maybe handles it somehow and returns in the specified format. However it only looks simple and sometimes it doesn't work as planned.
The main thing you should occupy yourself with is coupling. Now two services are coupled to each other not only by data; there is also a temporal coupling[1]. What’s that? Temporal coupling occurs when one service has to wait for the response from another and can’t continue processing without it. It means that when Service 2 is not available, Service1 also can’t handle its requests, even though it’s alive and healthy. And that means that some Service3 that calls Service1 can’t process its requests as well. Such situations are called cascading failures. Such failures violate the whole system's work, so, obviously, we should avoid them. To do that developers and architects invented a lot of cool patterns, such as Circuit Breaker[2], but it solves only a part of the problem. Temporal coupling is still one of the most serious problems with asynchronous communication and it’s almost impossible to get rid of it altogether.
Another thing to consider is microservices API versioning. If you are not familiar with it, welcome to a world full of dangers and surprises. There are a lot of recommendations on how to do this versioning properly: Semantic Versioning[3], URL versioning, using headers, etc. But it still may become a pain in the neck anytime, regardless on which side you are - API creator or API consumer. As an API creator, you don’t want to break all the clients with the next microservice deploy, thus you should maintain not only the latest version of the API, but also the previous one (or the previous 10?). When the service is developing and actively changing it may be tricky to maintain its backward compatibility. As an API consumer, you should be confident that nothing is broken after the API version is changed. How to automate it? Here we move to the next question to think about.
Automated testing is what we can’t live without developing big programming products. But what is the point of having services that are well-tested in isolation but can’t communicate properly? And testing service communication is not that simple and requires proper infrastructure. One of the possible solutions is Contract Testing[4].
Having all that in mind we can make a list of advantages and disadvantages of this approach.
Pros:
- received data is always up-to-date
- data-owning microservice can apply additional logic before sending the response
- data encapsulation is not violated
Cons:
- temporal coupling
- additional infrastructure work - circuit breakers, service discovery
- need to deal with API versioning
- may be hard to test
Embedded library
From the direct service calls diagram one could probably conclude that there is one redundant network call. Why can’t the service receive the information directly from the place it’s stored? The main obstacle to it is one of the backbone microservices principles - don’t share a database across all the services. You will be on a safer side if you use Database per service pattern[5] instead. According to this pattern, there should be a dedicated database for each service with the exclusive rights to access. But what's wrong with the shared database? One of them is data encapsulation. When there is a service specifically responsible for working with the database, it encapsulates the internal data structure. But when there are several accessing services, each of them should know this structure and how to query the database. The solution is to create another layer of encapsulation. It may be a package, library or SDK included into the service codebase.
This package (or module, or whatever) may be developed and maintained by the team responsible for data, but included into the other services as a dependency and deployed with them. This module may contain read-only database access and some tiny logic of fetching and composing the data.
Unfortunately, behind all this simplicity there are a lot of hidden problems and the number of possible pitfalls may be even more than in the previous approach. Now that you don’t have to worry about microservices versions, you’ve got a new problem to deal with - a necessity to maintain library versions. In some cases it may be simpler for the library developers, because not every service release will require the library update. But sometimes maintaining several versions of the library can be the same mess as maintaining several microservice API versions.
Another thing to keep an eye on is the way this dependency is managed. It should be included into the service via package manager or another tool. It can become a problem when two services are implemented using different programming languages. In such case library developers will have to implement the client with another language or use some kind of cross-language libraries.
There is one more issue related to encapsulation. Even though other services don’t directly use data schema internals, the developers of the corresponding service still should not relax. Making changes to the database schema they should make sure that these changes won’t break the previous version of the library.
There may be a lot of other not-obvious and unexpected pitfalls, especially when the product becomes huge enough. One such thing, for example, might be the number of database connections. If each instance of the client library keeps one open connection and each service contains several libraries, the number of connections may get too large and even reach the limit.
And here goes a brief sum up.
Pros:
- received data is always up-to-date
- no additional infrastructure or circuit breakers required
- client library may contain additional data-fetching logic
- usually easier to start with
Cons:
- possible schema-related issues
- need to deal with library versioning
- dependency management tools required
- security - all the client library copies use database connection credentials
Local data projection
You may also come across unusual cases of fetching data not assigned to your service. It may be some kind of search or you may need to do table join of such data with the data in your service. Although there may be a lot of ways to manipulate data, usually the responsible service provides only one way to fetch the data and only one format. Your service may not run with this format, fetching options, or even the structure of the data. All these problems can be solved with building the local data projection.
Data is saved in the Service2 database and then transferred to the Service1 database. The copy is saved there in the same or changed format. Then Service 1 can use this data, sending requests only to its local database, and if Service2 or its database is not available it won’t violate Service2 in any way.
The first self-evident peril here is a propagation delay. There is no guarantee that the data was not already changed at the moment of request. Another question is how to copy changes from one database to another. It can be done with the help of database tools (e.g. replication), message brokers, or Change-Data-Capture instruments (you can read more about it in my previous article). In any way it will require additional work on implementation and maintenance.
Here is the list of pros and cons of this approach.
Pros:
- more flexibility with data format and data structure
- no coupling with other services, the service depends only on data in its local database
- ability to work with data from the local database, e.g. making joins, map-reduce, etc.
- easier testing
Cons:
- propagation delay, need to handle eventual consistency
- additional work to implement and maintain reliable changes propagation process
- data duplication, additional disk space is used
Conclusion
So we have three options to choose from. Depending on the specific nature of the project and the task to solve, all of them may be the best or the worst choice. As for the third option (local projection) I wouldn’t use it normally. It may come in handy in rare situations when you need to change data structure or make a join and thus it’s worth mentioning.
Choosing between embedded library and direct calls, I would consider the size of the project and the team working on it. When the team is not big enough and the whole project is stored in the monorepo, an embedded library will probably be an easier option to start with. But when the project contains thousands of microservices implemented with different technologies, the drawbacks will probably outweigh the advantages and it would be better to use direct calls.
Links
- https://en.wikipedia.org/wiki/Coupling_(computer_programming)
- https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern
- https://semver.org/
- https://microservices.io/patterns/testing/service-integration-contract-test.html
- https://microservices.io/patterns/data/database-per-service.html
- https://medium.com/@oleg0potapov/events-patterns-message-relay-with-change-data-capture-1da9b584758e
Top comments (0)