It's amazes me, how i run into distributed transactions implemented in various microservices projects, again and again, I am working on.
In my opinion, distributed transactions is one of the biggest concerns we should be taking consideration, when designing a microservices architecture, yet many projects i was working on, has this big flaw.
What are Distributed Transactions?
Distributed Transactions are, when, from a point multiple transactions starting, but independently from each others, without having any effect to each others state. This means, if one of the transactions is rolling back, it can't make the other ones roll back as well.
For example, if we have a microservice(call it 'A' service), that calls 'B' service and 'C' service(and i'm talking about synchronous REST calls, of course). These calls are starting transactions in the corresponding services, but there are no coordination between these transactions. So imagine, the first transaction in 'B 'service succeeded, then after the second one, in 'C' service fails, then rolls back, the transaction in 'B' service still succeeded, and the failure of the other transaction, has no effect on it's state.
This can leave the state and the data inconsistent in the affected services.
I will illustrate it with a more detailed example. Imagine we have a webshop, the 'A' service is called from the UI, where the user places an order, and sets his/hers payment information. Then the 'A' service calls, the 'B' service to do the payment, and this transaction commits, then the 'A' service calls 'C' service to save the order, but at this point, the transaction at 'B' service fails, then rolls back, meaning, our order is not saved in the database. But at this, point the payment was already done, and we can't roll back the transaction at 'B' service.
Why would anybody have Distributed Transactions in their architecture?
- Most often, what i can identify as a reason is, that whoever designing the microservices architecture, does not have the necessary experience, in fact they are more likely coming from a monolithic world. It's not an absolute, what i am stating here, i am saying this out of my experiences, and also because, i did the same thing when i had the opportunity to design my first microservice project. So this can happen to anyone, no shame in that, we learn from our experiences, and i also had to learn this the hard way.
- The other reason can be, at first the implementation of synchronous rest calls seems like a less heavy investment, than a more complex architectural solution, like event-driven architecture. So we start to design our microservices this way, and as long, as the project is small, this could work, we can avoid distributed transactions cases. But when the project starts to grow in size, and becomes more complex, it's harder to maintain and avoid distributed transactions. When this is the case, we also have to take into account, that we have to join date across microservices. Just remember my example from before. We want to join, the users order and payment information across microservices.
- And probably the main technical reason behind all, is, we want to update data across multiple services. We really can't avoid the need, to join data across microservices, but when we want to update them across the corresponding services, it gets complicated. With an event-driven solution the implementation would be complex, and have possible overheads. But with synchronous REST calls, our system is prone to distributed transactions and it's pitfalls.
How can Distributed Transactions spiral down in downstream service calls?
Another typical place, where distributed transactions are appearing, downstream service calls.
Imagine the same situation with the 'A' service calling, 'B' and 'C' services, but add another, or multiple service calls from the 'B' and/or 'C' services, towards other services.
This has the same issue that, i mentioned before, but it's appears down the path of service calls, and can cause unpredictable inconsistencies when, one of the transactions fails.
As you can see, it's dangerous, not just because one transaction can fail and you can't roll back the other(s), but this can happen anywhere in your downstream service calls.
Other drawbacks of downstream service calls
Other than, of course distributed transactions, there are other drawbacks.
Services are tightly coupled
We are directly connecting services through REST calls, this means if one of our service is down, our whole transaction fails, and of course this can lead to distributed transactions.
Hard to track errors
Because our transactions spanning through multiple services, it's hard to find where did an error occurred, and what services did it affected.
Tracking errors are a possible overhead
To identify errors in a n easy way, we have to use some kind of tracking solutions. We have to know, from where did our transaction started and where did it failed. Of course there are multiple solutions in the market, but implementing and managing them can be an overhead.
What can we do to avoid Distributed Transactions?
Don't do Distributed Transactions!
It takes strict design choices, when you design your system architecture. But it's entirely possible, to design a microservices architecture, with synchronous REST calls, that don't have distributed transactions. For this, domain boundaries, data joins across microservices should be taken into consideration. So you can shape your services in a way, that you don't have to update data across microservices and don't fall into the dangers of distributed transactions.
Of course, for this, we really have to know what services we have to design, from the start of the project, but because of this, we can't really be agile with our architecture. Any new feature, that affects the architecture, can mess up the whole design, and we can run into distributed transactions again.
Because of these reasons, consciously avoiding distributed transactions could work only, if we don't have more significant changes that affects our architecture and our project is not large in size, and our domain and data joins across microservices doesn't grow too complex.
Use different architectural solutions
Fortunately we have different architectural solutions, to implement microservices architecture. For example any event-driven architecture could solve distributed transactions. I don't want to list all the patterns here, but here is a link, if you want to dig deeper in the topic.
With event-driven architecture, we are not using synchronous REST calls, so the root cause of the appearance of distributed transactions are non existent here.
Use mixed architectural solutions
We can avoid distributed transactions, without fully eliminating synchronous REST calls. It is possible to replace our synchronous calls with a solution from an other architectural pattern, for example, with an event-driven solution, but keep the those ones that are not causing this kind of problem. Of course this hybrid approach had downsides as well. We have to keep our architecture consistent, while using different architectural solutions.
Top comments (0)