DEV Community

The Saga is Antipattern

Sergiy Yevtushenko on June 20, 2023

The Saga pattern is often positioned as a better way to handle distributed transactions. I see no point in discussing Saga's disadvantages because ...

Read full post

Khaled Hosseini • Jul 2 '23

Great article.

Kirill Birger • Jul 3 '23

I do not feel as though you've made your point. You made a claim, then backed it up by making more claims, and then went on to talk about different design patterns.

Sagas are not only for microservices. You can have the need to perform distributed transactions even in a monolith. Also, not all transactions terminate within your organization's code.

Imagine a ticketing system where my application collects payment by sending HTTP to one third party, and then books a ticket by sending to another third party.

If the second request fails, then I should unwind and roll back. Sure, it's trivial in this case, but I think that is the argument for a saga pattern.

Sergiy Yevtushenko • Jul 4 '23 • Edited

Thanks for your comment. Let's analyze it step by step:

You made a claim, then backed it up by making more claims, and then went on to talk about different design patterns.

I guess you're mixing up my article with regular microservices' propaganda, which is based on making claims without bothering with evidence or reasoning. What you call "claims" are basic facts about distributed systems in general and microservices in particular. Since my goal is not click-baiting, but educating, besides "claims" I'm also providing information about approaches suitable for cases, where microservices are not applicable.

Sagas are not only for microservices. You can have the need to perform distributed transactions even in a monolith. Also, not all transactions terminate within your organization's code.

The problem is not with Saga, which definitely has its own areas of application (for example, state management in UI apps). The problem is attempting to perform distributed transactions in systems which are inherently incapable of performing them. Like microservices. And yes, you may have such a need, but the presence of need does not mean the presence of ability.

Imagine a ticketing system where my application collects payment by sending HTTP to one third party, and then books a ticket by sending to another third party. If the second request fails, then I should unwind and roll back. Sure, it's trivial in this case, but I think that is the argument for a saga pattern.

Thanks for this example. I don't know why you consider it trivial, but even in this case you can easily observe the consequences of a lack of consensus:

Imagine that your call for payment collection terminates with the timeout, and the same happens with the compensation operation. Your system and external payment processing system now may have different opinion about the status of the payment
Even more funny case happens if payment is processed successfully, but then the booking of ticket timeouts and exactly the same happens with the compensation operation for payment. Now every system may have its own understanding of what happened and what is the state of each part of the user request (payment and tickets)

Of course, in both cases mentioned above, you may try to recover and restore a consistent view of the data across all systems. This will require additional steps outside the Saga pattern, and recovery still may fail, causing even more mess. Adding each additional step exponentially increases the number of possible inconsistencies.

Kirill Birger • Jul 5 '23

I think the points you make are valid, but I also think they apply equally to any system, micro services, or not. It seems like there is a single point of failure in most, if not all improvements of Saga. In principle, any IO can fail.

In practice, you just need your Saga IO to be more highly available than your other IO.

Since the Saga controller should generally be responsible for orchestration, I don't think you should be able to encounter a scenario where multiple parts of the system have different opinions on the state. Wouldn't you simply end up with a stalled or incomplete transaction?

I'm not sure if I know a better solution to issues like this for these scenarios

Sergiy Yevtushenko • Jul 5 '23

Those distributed systems, which have consensus, don't suffer from such issues by design. For example, clusters can perform transactions (including ACID) without any problems.

Yes, in practice it is possible to achieve, as I call it, "parametric consensus". It means that everything will work properly (even in the face of some number of failures) as long as every component of the system works according to expectations of other components. The main issue in this case is the lack of any confidence that all components are working properly unless you have end-to-end testing which covers all (or most) of possible failures. In practice, I didn't see such setups. I guess the main reason is expensive maintenance of such a test setup.

The better (I'd say "best") solution is to use proper design, which does not suffer from microservices issues. A large part of my article is dedicated to possible solutions. From my experience, most organizations can safely stick with modulith. Being implemented with a decent technology stack, it can handle loads way more than those organizations may ever need. But if there is a real need to scale up (especially dynamically), then EDA or clustered approach will be better. EDA is better described in available sources, but requires somewhat specific internal design and mindset. The clustered design might provide more familiar internals, and the app might be designed completely self-contained (zero infrastructure dependency). Unfortunately, there is very little information about it, although the first app designed this way I've implemented more than a decade ago.

Kirill Birger • Jul 6 '23

You're not describing failures of microservices, or sagas. You're discussing failures of straw man implementations of code.

The better (I'd say "best") solution is to use proper design, which does not suffer from microservices issues.

Your design has nothing to do with saga. Let me say it again: You are claiming that saga is an anti pattern, and then you wrote an incredibly verbose pitch to your other blog post. I'm not making any claims about click bait or not, but that is what is coming across here.

The proposals you make have merit. I am simply pointing out that none of this has any meaningful connection to sagas. Hence my comment about your point not being made. It's simply a non sequitur.

Moreover, what you are referring to as cluster based nanoservices is actually how kubernetes and istio work.

You can't just ignore the fact that you will never have full control over every system on the internet. Many transactions do not terminate in your code, but call to third parties. It does not MATTER if you have "nano" services, micro services, mega services, or anything else. You make arguments about microservices with poor boundaries. That's not a trait of microservices, that's a trait of bad software writing.

Yes, if none of the software behaves in reasonable ways, saga won't work. What's your point? If an asteroid hits your router, will you get double charged?

Microservices and saga have disadvantages, but not the ones you're claiming, except for the comments about dependency management, and running locally, which seem to also be an issue in your proposal

Sergiy Yevtushenko • Jul 6 '23

You're not describing failures of microservices, or sagas.

I do. Perhaps you just don't want to accept that.

You're discussing failures of straw man implementations of code.

Even worse: I'm discussing a fundamental flaw in the microservices which Saga can't solve. Moreover, "straw man implementations" is the Saga, any recovery logic on top of it is not Saga.

Your design has nothing to do with saga.

We were discussing an example provided by you.

You are claiming that saga is an anti pattern, and then you wrote an incredibly verbose pitch to your other blog post.

Yes, it is, when applied to distributed transactions in microservices. And that my article exists for so long (first version was published around 2015) that pitching it makes no sense. Actually, today I'd rewrite it from scratch, but I'm keeping it as is for historical reasons.

The proposals you make have merit. I am simply pointing out that none of this has any meaningful connection to sagas.

It has, as long as sagas are used for distributed transactions.

Moreover, what you are referring to as cluster based nanoservices is actually how kubernetes and istio work.

As well as any other cluster - Redis, Apache Ignite, Hazelcast, Infinispan, Cassandra, Zookeeper, etc. etc. But you missed the key point of the proposed architecture: the application is part of the cluster. So, by putting your microservices inside Kubernetes, you don't get a system with the same properties and abilities as clustered nanoservices. There is another missing part: the first time when I've actually implemented a (somewhat simplified) version of the architecture was in 2012, when Kubernetes and istio didn't even exist.

Many transactions do not terminate in your code, but call to third parties.

So what?

Yes, if none of the software behaves in reasonable ways, saga won't work. What's your point?

It's worth reading what I wrote once again. It's not about "reasonable ways", it's about expectations of other parties. Some software may continue working reasonably and according to specs and docs, but no longer support some assumptions. And the whole system built with these assumptions in mind will stop working or, what is worse, start silently damaging or losing data.

Microservices and saga have disadvantages, but not the ones you're claiming,

Are you referring to my other articles? Because in this article, I'm not claiming, but pointing out, that microservices have no consensus. This is not a claim, but the fact.

except for the comments about dependency management, and running locally, which seem to also be an issue in your proposal

It largely depends on the particular implementation. For example, with Apache Ignite, I had an implementation which works starting from one node - perfectly fine for local deployment and development purposes.

chris damour • Jul 9 '23

Many transactions do not terminate in your code, but call to third parties.

So what?

so the cluster approach will not work given their service runs in their nodes and by definition cant run in your cluster.

fwiw your comments come off as arrogant. Start with asssuming you are wrong and reread the comments, theyll make more sense.

Sergiy Yevtushenko • Jul 9 '23

so the cluster approach will not work given their service runs in their nodes and by definition cant run in your cluster
It's not about cluster or EDA, given that we discuss microservices model. I just tried to get (second time) from my opponent explanation why/how this use case makes Saga not antipattern. Necessity can't make bad thing good. Using this use case as an argument sounds like declaring burning of fossil fuels not harmful to environment just because we live too far from the nearest supermarket and use gasoline car to go there.
Real solution for this use case is to provide API suitable for 2PC by such external services, but this requires changes at the far end which we don't control. From the other hand, wide understanding the problem may create demand and vendors start adjusting their APIs.

fwiw your comments come off as arrogant.

Sorry, that's my usual reaction to rude and ignorant comments from some wearers of architect hats.

Start with asssuming you are wrong and reread the comments, theyll make more sense

Thanks for suggestion. That's what I actually do every time.

chris damour • Jul 10 '23 • Edited

from my opponent

its not a battle man, chill out. your article isnt that strong, these comments critiques are offering u a chance to make it stronger.

Real solution for this use case is to provide API suitable for 2PC by such external services, but this requires changes at the far end which we don't control.

we have different definitions of "real", perhaps you mean ideal solution. real to me means real world and in the real world i have to play by "their" rules/implementation. and saga works well enough.

wide understanding the problem may create demand and vendors start adjusting their APIs

i can agree with this. i principal engineer for fairly large (15k employee) biz and have felt it my duty to change our RFPs to ask for EDA and 2PC capabilities, hoping that it moves the needle ever so slightly. if customers dont ask and bandaid every time with existing "rest" (99% of time its just json rpc and not restful at all) service offerings from 3rd parties then we'll never get out of this downward spiral.

overall based on your comments i think the problem with this article is poor title/intro. "X is antipattern" means don't do it. more it seems what you're trying to say is "stop allowing your circumstances to force you into X, think of it as n anti pattern and demand better solutions"

David Alexis • Jul 16 '23

I'm not sure you understand what the saga pattern is and what it solves. It has nothing to do with consensus among nodes. Hence the invalidity of your arguments. It has to do with coordinating the states of long-running business process, where the transition between states can be milliseconds or days. "Transactions" in the sense your describe in your argument area irrelevant in this context.

Sergiy Yevtushenko • Jul 16 '23

That's correct, strictly speaking the Saga pattern has nothing to do with consensus. But it performs coordination of involved nodes and coordination in distributed system requires consensus. You also may find interesting to take a look into other thread, where you can find example.

Khosro Pakmanesh • Jul 4 '23

I read the whole article, but I didn't get your point. Now, what is the alternative to using Saga? Maybe, it was much nicer if you made your point progressively by making some examples. At least, you should have mentioned some references for extra reading.

Sergiy Yevtushenko • Jul 5 '23

The point is explicitly stated at the beginning:

If you need distributed transactions across a few microservices, most likely you incorrectly defined and separated domains.

So, there can't be any alternatives. Instead, application should be designed using other approaches, which have no problems with handling transactions. Possible approaches are listed and discussed in article.