Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - Integration patterns for multi-tenant systems (SAS312)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Integration patterns for multi-tenant systems (SAS312)

In this video, AWS Solutions Architects Alex and Dirk explore integration patterns for multi-tenant SaaS systems using Wild Rydes, a fictional ride-sharing service, as an example. They demonstrate how to handle synchronous vs. asynchronous request-response patterns, implement JWT tokens for tenant identity and quota management, and address the "noisy neighbor" problem through queue strategies including cell sharding, shuffle sharding, and Amazon SQS Fair Queues. The session covers external system integration with payment service providers using proxy patterns and dead letter queues, the scatter-gather pattern for RFQ scenarios, and DynamoDB single-table design considerations for multi-tenant data storage while managing hot partitions and tenant isolation trade-offs.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Integration Architecture as a Journey for Multi-Tenant SaaS Systems

Good morning everybody. Thanks for joining our session about integration patterns for multi-tenant systems. I'm Alex. I'm a Senior Solutions Architect at AWS from Germany, working with European software companies together with my colleague here. Hello everyone, I'm Dirk. I'm also from Germany. I work with software companies too. I work with them on their multi-dimensional transformation. On the technical side, my focus is on everything around integration architecture, hence also this talk today. On the business side, I'm very much interested in innovation culture and how tech communities can improve fragmented techs.

So today it's yet another talk about integration patterns, but this time with a focus on multi-tenant scenarios and SaaS solutions. However, integration architecture as such we believe is relevant for everybody, and it really doesn't matter where on your integration journey you currently are. If you operate a monolithic application or if there is one component that gave you a headache and you started carving it out already, or if you are running and building a divide and conquer architectural style at home, maybe with microservices, you will always have to integrate with other systems. It sometimes depends also if you have more access to the systems that you need to integrate with or less access. The components here in the orange circles can represent your own systems where you certainly have more access and can choose different integration scenarios, while the components in the blue circle might represent third-party systems that you need to integrate with.

Now with every journey, and we believe integration architecture is a journey, you want to be well prepared. If you, for instance, want to go on this bumpy road, you want to be well prepared with a high clearance 4x4 vehicle. In the same sense, you also want to be prepared for your integration architecture journey. And this brings us to two insights. The first is I only do these speaking activities because I want to brag with my nice vacation photos. And the second insight is that not only in modern cloud applications, but particularly also in modern cloud applications, integration can't be an afterthought because it is an integral part of your application architecture.

Now there's many ways to actually design an integration scenario, and the first thing you need to make up your mind is what is the concept or the approach that you want to use. This already is quite ambiguous, and you really need to understand the implications of each of those approaches. But then once you have decided for a particular integration scenario which approach to follow, there's a ton of follow-up questions and design decisions that you need to take, and each of those can easily fill a one-hour tech talk on its own.

So the decisive part here is that you should understand the implications of each of those design decisions. And when you look at a SaaS application or a multi-tenant scenario, you have the additional challenge that you also need to care about tenant isolation. So there's a whole lot of ambiguity involved here, and this is why it is really, really important for software architects to be aware of the patterns and options at hand and also the implications that it has. So I also like to call this statement the beware of the faith healer principle because there is really no silver bullet in software architecture. Every decision you take basically sucks in a certain way, and as a software architect, you should know which one of the options that are on the table sucks less than all the other options.

Wild Rydes for Business: A Multi-Tenant Architecture Example

So with that we actually wanted to make it a little bit more tangible and fun today and illustrate everything we're going to talk about with an example customer, an example technology startup actually. So Alex, you like unicorns I believe. Do you want to introduce our example customer? Absolutely. So there is a company you would like to introduce. We use this kind of fictional company across several workshops and formats at AWS. It's called Wild Rydes, and I mean it's a ride-sharing service, so nothing super special except for one thing. Instead of cars, they have unicorns.

So if you book rides with them, it will be wild, right? This has been hugely popular because under the hood, though it's quite a crazy business model, it is very solid technology. With all the success they have, they've now decided to move further in their journey and go into business. So now they have Wild Rydes for Business, and this is basically a multi-tenant extension of the current use case.

Assume that you would have per-tenant apps with their own branding, but you also have special features for each tenant. You may have a tiered pricing model and typical aspects that you would see in a SaaS application. Now the existing infrastructure and architecture needs to be extended to support this multi-tenant business. This is basically our sample scenario that we will look into when we discuss what it means to have a distributed architecture but also a multi-tenant architecture, and those complexity dimensions basically being combined.

So let's look at different aspects of that, the first being the interconnection of services within such an architecture. When we look at Wild Rydes as an offering, it is coherent from the outside perspective. It might look a bit monolithic, but it's not. First, it's important to understand that Wild Rydes does not live in isolation. It lives in a context with tenants and their systems, which are the applications like a mobile app that we are deploying and providing to them.

But the tenants as businesses also have their own, let's say, finance systems or a federated identity management service that needs to be integrated with the Wild Rydes core system. But Wild Rydes doesn't want to do everything themselves. As a smart, scalable startup, they have decided there are things they don't want to do themselves. Obviously, nobody starts building a payment service provider, but even email is something that you don't need to build yourself, right? So this is a commodity. You put it somewhere else and focus on your core business.

So this is basically the external integrations that this company has. But if you look into the Wild Rydes architecture, you will see that it's also distributed. So it would have, let's say, something like microservices, whatever you would understand that to be. These services talk to each other, but we need to make it in a way that is scalable and resilient. So how is the best way to build this?

Synchronous vs. Asynchronous Request-Response Patterns in API Design

One prominent example for the interconnectivity is that SaaS applications apparently expose APIs. This is probably nothing new for you. It is the way how SaaS is consumed. So you have tenants that consume APIs from your offering. Let's illustrate this with a use case. The use case is that an end user that belongs to a tenant books a ride with the app.

We have our Wild Rydes application, and it exposes the ride bookings API. On the client side, we have one of our tenant's users, and they apparently want to submit the ride details. Now the natural approach that most of us would probably try first is to use the synchronous request-response pattern here. Because most of us are used to this, this is the way how we were using software, programming languages all of our lives, that we send synchronous requests to functions or procedures. And this is also the way how most of us consume websites.

So this would look like this. The client sends out a POST request to Wild Rydes. It takes a while to process. And this while might be also too much for a good user experience, because only after all the processing is done under the hood, the client would receive the response with the details that the booking has been successfully, hopefully successfully, processed. So why can this be not the best user experience? Because we have a lot of stuff to do under the hood. And certainly, as Alex showed before, there's an external payment service provider involved. We also want to capture the money upfront for the ride. That might take a while, so it's not the best user experience.

And why is that so? Because we have that tight coupling at runtime between the client on one hand and the back end on the other hand. And this is apparently an inherent characteristic of the synchronous request-response model. By the way, this is also called a conversation pattern because it spawns a conversation between two parties. But what would be a better way to improve the user experience here?

Well, we can progress and look at asynchronous request-response. And in this case, the downstream request looks the same.

Luckily, HTTP supports this asynchronous behavior by responding with 202 Accepted now and a representation of the task that we have created under the hood on the backend side. The actual processing with Wild Rydes will still take the same amount of time, but the user doesn't have to watch that spinning hourglass all the time and can continue with whatever they want to do next. Now, in the best case, you are also a nice citizen and provide your customers with a representation of that status and a link for the customer to retrieve a status update.

So while the customer is waiting for the processing, they can have a look and see how far we already are. In this case, they can use the link that has been provided with the response and send a GET request downstream to retrieve a status update. If they do it very quickly, there will still be the task and status representation, hopefully with a link to retrieve another update. Maybe two minutes later or so, when also the very slow external payment service provider has done its job, we will retrieve a representation of the actual successful ride booking, and that's the convenience functionality that I would recommend to offer your customers.

However, you obviously don't want to force them to use this and also provide a push notification once you are done with that. So that's the asynchronous request-response pattern that improves the user experience by reducing the runtime coupling between components. Now, looking at APIs, Alex, what is there that we need to look at in terms of multi-tenant architecture?

SaaS Identity and Back Pressure: Using JWT Tokens for Multi-Tenant Communication

That's a good question because, again, APIs, many people think would be something like REST, which is more synchronous, but in this kind of distributed architecture, it needs to be scalable across several patterns. We need to introduce things that deal with the complexity, as I mentioned, of SaaS and of distributed architecture. So we are going to introduce two more patterns that we can use, and when I say patterns, it's actually more concepts that you would have to translate into pattern implementations.

The first one deals with multi-tenancy, quite obviously, that's a SaaS identity. The other one being back pressure, which means that if you have a bit of your infrastructure that is at capacity, you would not accept new input like messages, but instead fail gracefully and let the producer of the message know that you are out. So how does it look in practice? Let's say again we have this multi-tenant system and we need to transport the information about the identity of that tenant across the interactions with tenant systems like mobile apps or finance systems.

One way of many is the usage of JWT tokens. So I think most people of you have heard of JWT tokens, maybe using them in their own architectures, but let's recap quickly what that is. The idea of a JWT token is, well, in this case you see it, it's a piece of JSON that you would use to transport information across the bits of your architecture. The important thing here is when you submit that, it comes with a signature of that message, and that signature guarantees the integrity of the message.

So you can, for instance, send a JWT to a client. They can read it, but they can also ideally verify the signature and know that the message is true. So the veracity of the message is guaranteed here, and this is a very common pattern even outside of SaaS, but you can use it in a SaaS context in combination with distributed architecture to transport information about the tenant. So if you have a mobile app, then you can inject information like the SaaS tenant ID or maybe their name, but even things like the tier of your tenant, and this will allow consuming systems to work with that information without going to the source of truth to retrieve it again, because again, you can validate the veracity of that.

You can even do more than just conveying this more simple meta information. You can have even technical information here. So when we talk about back pressure and meaning that we decouple the producing system in terms of capacity and quota from our consuming system, we could convey that information to the producing system already so that it doesn't have to run into a failure but can itself manage that before running into that. So we could say that, for example, a customer in the basic tier would have a certain quota. They have an amount of requests they can do per minute, per month, or whatever, and you can then say a premium customer or a higher tier customer would have a higher quota.

Obviously, you can also say that here's an enterprise customer who does not have any limitations at all, and they would be able to submit as many messages as they want.

So this is the way to extend the functionality of JWT tokens in order to convey the information to your producing systems without having them run into failures, and this will be very useful as we go further through this talk, so keep that in mind as a very useful mechanism to transport this.

Message Buses and Queues: Decoupling Services and the Noisy Neighbor Problem

So let's have a closer look at what happens inside the Wild Rydes application. Because right now we have been looking at the external systems, but internally, as Doug already said, there are things happening. So we expect that the app sends us a request, we would return with the HTTP 202 that is established, and we said that we will not process everything immediately. So what happens instead if it's not a full cycle?

The way Wild Rydes chose to implement this is through a message bus as the first point of decoupling the booking service that takes in the booking from the downstream systems. And the message bus allows us to spread out the booking information into other services that need to work with that. So for example, here we have a planning service that does resource allocation. We have a payment service that will be working with an external payment service provider. And we have some loyalty services that would allow us to give customers some goodies if they use more and more of our service.

So the downstream service is external again, and we would assume that it expects a synchronous HTTP request. We cannot do much more here. We will come back to that later, but within our own architecture we can use decoupling mechanisms like queues in order to decouple again within our architecture services from their downstream consumers. And these message queues is something that's not new, but it's still a very powerful mechanism, and especially in multi-tenant architectures we can apply specific patterns to use those queues in a way to take a load off producers and consumers.

We can even use queues as we evolve our architecture. So let's say this loyalty service turned out to be a bit sluggish and slow, and really we don't need to process loyalty information in real time. So this is not very relevant to the booking. So we say after a time we introduce a message queue and refactor the customer loyalty service to consume those messages afterwards. So this is an evolution of your architecture that would be more and more decoupled along these queuing patterns.

Now the queues being our main way to create resiliency and scale is something worth looking deeper into. And so if we look into what the queues do here, we see that basically we have a producing system with multiple tenants. We have a consumer, and we will not look closer into that, so it's just a blue box at this point, but we have now a queue that contains the messages of several producers. And that is a very good mechanism, so in a stable, happy scenario there will be no problem. Messages will be consumed and processed as we go.

But it may happen that one of those tenants will start a large batch of messages. So that can happen under two circumstances. One would be that specifically that tenant or the users from that tenant would create a lot of booking rides, like let's say some tech company has a large event and would invite everybody to use their ride services. Or it could be that the consumer has a tenant-specific problem like a misconfiguration or some contractual thing not working right, and that could lead to a scenario where that particular tenant kind of clogs the queue for you.

So that would lead to that tenant using up the resource of clogging it and causing a starvation of the consumption of messages from other systems, and this is what in a multi-tenant system we call a noisy neighbor. So in this scenario, tenant one would be a noisy neighbor that impedes the business functionality for other tenants. And this is the question, can we do something about it, Dirk? Do you have an idea? I have several ideas. Nice.

Load Shedding, Back Pressure, and Queue Strategies: From Multi-Tenant to Single-Tenant Queues

Let's start with a general consideration. A general consideration that you can always apply in this situation and in other situations is load shedding and back pressure. We've already heard about back pressure on API level, but you can also use it further down the line, for instance, directly on your producers. But let's go one step back and have another look at the characteristics of queues.

So the nice thing, why you would want to use queues everywhere in the end, I like them as you can imagine, is that they solve a lot of problems for you. First of all, they can buffer messages.

In that respect, they further reduce the temporal coupling between upstream and downstream systems. When you use APIs, you cannot send a request downstream when the downstream system is not actively listening because it might be struggling. However, with queues, the queue in between will buffer the messages, and with that comes a peak load flattening functionality that is really super handy, so you can protect your consumers downstream. Also, because every message that goes from a queue to a consumer goes to exactly one consumer, you can easily scale out on the consumer side if the number of messages in your queue becomes very high.

However, you probably only want to scale out so much because you have a cost cap on the number of compute resources that you want to add on the consumer side. So if you constantly receive more messages than you can actually consume, your queue will start to fill up. This is actually not a problem or a disadvantage of queues. You would have that problem anyway, but queues make this actually visible, so you can then do something about it. So that's a very positive characteristic here.

Now, I mentioned load shedding and back pressure. Load shedding with queues sounds quite brutal. You can just throw away messages that have reached a certain age. You wouldn't want to do this blindly, but let's imagine in our ride booking scenario you might have an SLA that says, okay, dear customer, we guarantee within five minutes your booking is processed. If a message related to that is in the queue for more than ten minutes already, you can probably throw it away because the customer is not interested anymore. However, you probably don't want to apply this blindly.

An important thing to mention here is that this is rather a business decision and not a technical decision, so the person who owns this use case, this feature, needs to decide when we can consider a message stale and should probably throw it away. Now, back pressure, on the other hand, is a signal that you can send to your producers to slow down. In the ideal case, you send it all the way up to the end user client and keep the end user client from sending more requests, but the business requirements might not allow it.

Alex has shared already how you can apply back pressure on the API level. If you want to apply it also on the producer level, you need to have access to the producers and you have to add some code that reacts to signals on it. Now, on the other hand, what are you doing with this information as a producer while still more requests are coming in and you get a signal from downstream that you should slow down? That alone is a daylong, weeklong business discussion probably, and it can fill several talks. Parts of it are being covered in the other patterns that we're now looking at, and you can combine load shedding and back pressure with all the other patterns that we are now looking at.

Alex, I have decided I will make it very simple for myself and just create a single tenant queue for each of my tenants. With that, I only have to add some engineering when a new tenant is onboarding. I need to set up the dedicated infrastructure for it, not only the queue but probably also producers and consumers. And then while I'm running my application, I only need to multiply everything I'm doing with the number of tenants. Does that sound great?

I'm not sure. When I think about it a little bit more, maybe it's not the best idea. Also from a cost perspective, you want to have an insight into the queue. You want to make sure that messages are being consumed from the queue. If you have tenants that are not super busy but rather idle, you still have to look into the queue to pull off the messages. So you constantly have to operate on the compute, so that's quite wasteful. And this waste is being multiplied with the number of tenants that you have, with the scale that you have, so maybe not the best idea.

Advanced Queue Patterns: Cell Sharding, Shuffle Sharding, and Tier-Based Approaches

Maybe we shouldn't go into extremes. We have a broad spectrum maybe, and we looked so far at the one end where we had one multi-tenant queue for all of the tenants and at the other end of the spectrum where we had multiple single tenant queues for each of our tenants. Alex, is there something in between? Maybe? Let's see. It's a good question, because again, I mean you said it at the beginning, in the end it's a trade-off between several patterns that we can apply, and we can show some that we've seen in reality.

which are a bit more elaborate but would maybe fit in the middle of this spectrum. So let's look at some of them.

One would be a pattern or an approach called cell sharding. The idea here is that rather than having a multi-tenant queue for everybody or a single-tenant queue for each individual tenant, you would share queues between several tenants. The idea here is basically that you reduce the overhead of the infrastructure deployment and at the same time still have the reduction of the blast radius. So for instance, if two tenants would share a queue and Tenant 1 would become a noisy neighbor, then the only affected other tenant would be Tenant 2. Still bad enough for them, but if you think about an operational scenario where you need to investigate and address this and reach out to customers, then you would have much less headache if you just had to talk to two tenants than everybody. So this is a valid approach that we've seen in practice.

Again, it's a trade-off between the simplicity of a shared queue versus the complexity overhead here and the reduction of blast radius, but we can take this even one step further. Another pattern that we've seen in practice is called shuffle sharding. It looks a bit similar to the one we've seen before. We again have tenants distributing into queues, but here it's a bit the other way around. So rather than sharing one queue between two or three or five tenants or however you see fitting, each tenant would write into multiple queues. In this case, you would see that one tenant writes into two queues.

What's important to understand here is that you need to monitor queues because the idea here is that you would monitor the size of the queue. You can do this with CloudWatch, for instance, and if one queue fills up, the tenant would be writing into the other queue that they have at their disposal. You see here a mapping of, I think, eight queues to eight tenants, but of course you could have a higher number of tenants writing into fewer queues. That really depends on the load patterns that you have in the architecture.

Now what happens if one tenant becomes noisy? Let's say Tenant 1 starts producing a lot of messages and fills up both of the queues they're allocated to. The good thing in this scenario is that Tenant 2 would still have another queue they can write into and would not be affected from that noisy neighbor as such. The same is true for Tenant 8, who, as you see in this, would also be writing into a shared queue with Tenant 1, but again they have another one they can write into, so they also would not be affected unless they themselves or other tenants in the shared queues would become noisy neighbors. But the likelihood of that happening is considerably small. I think we calculated it for a set of 100 tenants sharing them, it's in the area of 0.02% likelihood of happening.

So this is a very resilient pattern. It comes with an overhead of having to monitor the queues, of course, of setting them up and maintaining them. As you onboard new tenants, you will have to review if the setup is still enough, so there is some maintenance and engineering overhead involved, but it can help you to not run into the overhead of having all the consumers, all the queues for each tenant.

Now, the previous two patterns and actually also the multi-tenant and single-tenant queues assume that all tenants are treated equally, which is nice, but in business we may have different pricing models, we may have tiers. In that scenario, we could say actually we combine this in a different way. So rather than having this kind of cell sharding and shuffle sharding approaches, we would say simply, for premium customers, we would have a dedicated queue and they would not be affected by noisy neighbors, whereas for basic tier customers you would have those shared infrastructure, assuming also that they don't have that many messages to produce, and everybody could be happy. So this is also a valid approach. Again, it has some engineering overhead, but it would scale with the customers that also pay for those services.

Dynamic Overflow Queues and Amazon SQS Fair Queues: Automated Noisy Neighbor Solutions

So this is quite a static approach, but it helps you to build this out. It would be cool if we could do this a bit more dynamically, right? Indeed, let me look at it. Yes, please. All right. So we can actually build on this approach and look at a dynamic version of this. Imagine during runtime, the producers can have an insight into the fact that, in this case, Tenant 2 became a noisy neighbor through monitoring, for instance. Then the producers can autonomously decide to create a producer-controlled, in this case, overflow queue. Apparently, the consumers also need to know about it. That means there needs to be a signal from the producer side down to the consumer side to indicate, hey, there is now this additional queue for this current noisy neighbor situation. Please also consume from there.

This opens up a lot of questions again on the consumer side. When we receive the signal that we should also now consume from that overflow queue, what does that mean? Should I do it in a round-robin way with the main queue? Should I prefer one over the other? Who decides that? So there are several decisions that we need to take.

Another thing that I personally dislike a little bit with this approach is that we now have again the direct connectivity between producers and consumers, while we actually wanted to get rid of it in the first place by the introduction of queues. And then another question on the consumer side actually is, would we reuse the current consumer fleet or would we actually create some overflow consumers? Well, from a SaaS pricing and packaging perspective, you can make up your mind also in such a situation. Would I charge that particular tenant an extra cost for that because we have some effort?

You can maybe do the same from the other way around, a consumer-controlled overflow queue, because there are situations where the noisy neighbor situation isn't detected really from the producer side by a lot of requests additionally coming in, but maybe from the consumer side. If you run into a situation where the actual message processing takes more time than expected or you receive an increased error rate from those consumer processing parts, in that case, the consumers can actually do the similar thing and also trigger that an overflow queue is created for this noisy neighbor. And now the consumers need to send a signal to the producers, hey, please use this queue now also for all subsequent messages for tenant two.

Well then again, how do you balance on the consumer side with the two queues that you now have? And maybe there's a third noisy neighbor and you have an overflow queue for that one too. There are tough decisions required here, so when would I use my consumer capacity for the main queue and when from the overflow queue so that I make sure I don't leave anyone back in starvation? Also, there's quite some engineering required here to dynamically create all that infrastructure and also make sure when the noisy neighbor situation is scaling down again that we don't throw away a queue while there are still messages in it. It can be quite hard to build this all and to decide on all those questions how to use my capacity.

Alex, what do you think? Wouldn't it be nice if there was a cloud service that could just implement this for us and leave it in a way that it still looks like one queue? Yes, it would be awesome. Not so much engineering effort. Isn't that what the cloud is all about? Yes, that's right. So luckily this summer, Amazon SQS Fair Queues has been launched, which is a feature that does exactly this for you.

And from the standpoint of your producers and your consumers, you still work only on one queue. Under the hood, there are the overflow queues that do all the required engineering to solve this noisy neighbor problem for you. Looking at SQS, I mean, if you employ queues in your architecture, you shift a lot of operational responsibility to that messaging system, right? So you want to make sure that this is one that is highly scalable, available, and reliable. And in terms of scalability, I can mention a fun fact from Prime Day 2025. At peak, Amazon SQS handled 166 million messages per second. Per second, exactly. So it is for sure scalable.

All right, now we can continue with the next chapter. We solved the noisy neighbor problem on queues. Yeah, so keep that in mind again. Combine the complexity of multi-tenant systems with the complexity of distributed architectures, and again, queues are not the only way you can use different mechanisms. We have seen message buses. You will probably know about streaming systems. They help with decoupling and they underlie similar mechanisms. So again, it could be a dedicated talk to talk about those as well, but this is a very powerful tool. And we can actually employ this also when we talk to external systems.

External System Integration: Proxying Payment Service Providers with Queues and JWT Tokens

So we have seen before in our example architecture that we call several external services, and if you have built these types of applications, you probably know that dealing with payment service providers can be especially challenging. So in the next scenario, let's isolate some building blocks from the architecture and look a bit closer at how the flow works here and how we can enhance that with the tooling that we have established.

Assume that we have three contexts. One is the tenant context where we have a user app calling our booking flow, and of course at the end of this whole round trip we need to inform the finance system of a transaction being made. The whole thing will go through our booking flow, and we'll see in a second what we can do here, but the system we need to interact with is the payment processor of our PSP. As we have said, the PSP expects a synchronous call. We cannot do much about that. It might be that they are able to send notifications, so maybe they also do this 200, 202 thing, but expect that we cannot have them call from the queue. But we can use queues ourselves within our own context. We just need to proxy the access to the queues.

So in this flow we would have a request queue for the outbound messages, but because the payment service provider doesn't even know what SQS is or our dedicated cloud services, we would have a little proxy service that is just there to pull from our queue and make a synchronous call. This may seem a bit redundant, but the obvious benefit here is that we don't have to implement this within the booking flow, and potential breaks or congestions would not affect the booking flow. Instead, we would have a little, imagine a Lambda that just pulls from the queue, does the request, and ideally would not have to wait for the whole processing time. So this is my wish scenario that it could just notify me as soon as processing is done. But if the payment processor would say I will not only answer synchronously, then of course you might be using something like a container and you would have the wait time for the few seconds that the PSP needs to respond to you. And then of course instead of going through the API Gateway as in this example, you would just have the response put in by the calling service, by your PSP proxy. So this is a pattern where you would introduce a queue on the way out to a different provider, basically creating a proxy to their system even though you would have to make a synchronous call again.

So this is an example that assumes that these flows always work. There may be delays. There may be back pressure coming from that other system, but let's assume now that there is an outage of the PSP, maybe even a tenant-specific one. So in that scenario we would not be able to submit messages to the payment processor. The way we could address this is by the pattern of a dead letter queue. So the idea is that you have another queue where messages that could not be processed would land into and would, for example, if it's about payments, go into a reconciliation service that once daily just processes the failed messages and tries to capture the payments.

In a multi-tenant system you could combine this with the logic where you would say not every tenant gets to benefit from that. So let's say a user from a base tier customer tries to make a booking, but the payment doesn't go through for whatever reason, cannot pay for the ride. You might say I'm failing this, so it's right now not possible, which is unfortunate for the user experience, but it's the base tier, right, so you're not having any large benefits from that. But you could say a premium customer has a bit of store credit and they would land in this reconciliation service and you would say, okay, for now you don't have to pay. I'll try to capture it later because you are our esteemed premium customer. So you could have a tenant-based logic of handling these kinds of reconciliations.

Now I'm assuming we have captured the payment and the last thing we need to do is get back to the original, or to the tenant's finance system. How can we do that? So we have fulfilled the whole flow. As you know, we have originally established the authentication with the customer, but in order to find out the callback to the finance system, we would have to make another database call to get to the source of truth, which is fine, we could do that. But actually we can go back to our little JWT token here and inject the callback to our finance system here as well. What we do here effectively is rather than convey that information to another system, we convey it to ourselves, so we have a kind of a closed loop and a JWT token just lets us have this information ourselves without having to retrieve it again from the database. And actually we can then use that to access the webhook and make that call.

There's just one little problem here. These JWT tokens are signed, and we can be sure of the integrity of that token. But if that callback, which is effectively an API call, contains any kinds of tokens, then this is unencrypted, right?

So we need to be very careful if we put confidential information there. Of course, everything goes over HTTPS and whatnot, but there may be side channel attacks. There may be security flaws in the system of the user. There may be other ways to intercept that, so we need to be careful what we do here.

So it's important to emphasize if you employ this JWT mechanism to transform and transfer information across your architecture, either within your own architecture or with external partners, you need to not only sign this message but ideally also encrypt it. The good news is that there's a standard around JWT. So most people know JWT, but I'd like to show here that there's a whole standard around that that also has, as you see in the third line, an encryption standard. So RFC 7516 allows you to have encrypted JWT tokens, and if you employ that mechanism to convey confidential information and need secrecy on top of veracity, then you need to encrypt it as well. The good thing is there are several libraries that do that for you, so with a few lines of code you can build that out in your architecture.

Scatter-Gather Pattern: Request for Quote Implementation with Correlation IDs

Now this has been several use cases where we talked about queues and about JWT tokens to convey information, but it might be that now we need to externalize the state. How would that work? Let's see. To introduce that, let's have a look at another complex integration pattern. It's also a conversation pattern. It's called the scatter-gather pattern, one of my favorite ones, and it addresses the question: how can I distribute a task to several parties and afterwards collect the result to find the best response or maybe aggregate those responses?

How would that look like? We have that requester, and here, since the same request should go to multiple downstream systems, the recommendation would be to use a publish-subscribe messaging channel here, for instance, Amazon SNS topics, thinking again about shifting the operational responsibility to a third-party system, and Amazon SNS is capable of that. Now this message, or those messages rather, reach all our responders. They can work on their individual response and send those responses back to a response queue where later an aggregator in the context of the requester can do some aggregation, and there's a final processor for the final processing. So again, this is about finding the best of all responses or finding or creating the accumulation of all responses that are coming in.

There's one thing, however, or actually two things to help our patterns, namely that we need to make sure that also the responders know where to send their response, and then also the aggregator and processor systems afterwards can assign those responses to a previous request, which is the correlation ID and return address pattern. That's meta information that we inject into the messages. Here, Message A contains an address of that response queue and a correlation ID that the responders are asked to also forward back into their responses.

What is a practical use case for this actually? So coming back to Wild Rydes, let's assume Wild Rydes customers are quite special. They don't always want to just get the next unicorn from around the corner. Sometimes they also want to ask for a quote and ask the unicorns, hey, what is your special offer that you have? For instance, we're in Las Vegas this week. Maybe there are unicorns that offer free drinks during the ride or so, and maybe a customer is interested in that. How would that look like implementing the scatter-gather pattern with a concrete use case?

So again, we have the Wild Rydes customers. They use their app and then it reaches the ride booking service. What happens here? So we have an intake processor, and now comes the state, the external state, into play. We persist the incoming request and then we share it in the RFQ share topic, and only then we return to the end user client that we have accepted this request. Under the hood, what happens next? We have the unicorn management service and then unicorn management resources in there that reflect each unicorn that is in a certain vicinity, and they receive a push notification. Hey, there's an RFQ running. Are you interested to participate?

They can make up their minds about their individual response, send it back to the RFQ response queue, and the response aggregator stores it into the RFQ store. And then again we have the usual scenario for end users on how to track the current status. They can refresh the current task status, get an update again with a link to refresh another time, and eventually they will see that the result is ready with status done and can retrieve it from the system. And again, this is a convenience functionality that you want to add, but of course you want to send a push notification too.

Externalizing State with DynamoDB: Single Table Design and Multi-Tenant Data Isolation

Now we externalize the state in this RFQ store. So what do we need to take into account here, Alex? So first, I'm learning a lot about unicorns today. That's really amazing what they can do, very technical animals apparently. So the important thing from an architectural perspective is that we need to store it somewhere in a highly scalable manner, right? And for that, as you maybe have guessed by the icon you see in the middle here, we are using a key value database, in this case DynamoDB. And we will have a look at how we can do this in a single table design, which is the preferred pattern to work with DynamoDB databases, but also in a multi-tenant way.

So first let's look at an approach where you would have each tenant with their own database, so it's not really multi-tenant at this point, but really just to simply recap how you would do single table design with DynamoDB. If you haven't worked with DynamoDB yet or designed these kind of single table approaches, there are a few concepts that you need to understand. So unlike in a relational database where we would normalize the data and have several tables for each entity, you would put all the data objects into one table and have them identified by a combination of a partition key and a sort key. So these are the identifiers for your entries, and then of course you would have the attributes or the key value pairs in this entity or entry.

So this is basically when we talk about unicorns submitting their offer, we could have the RFQ itself as one row, so we store here as an ID for the request for quote, and then each unicorn would submit it. Technically speaking, what we are doing here in this design is we put the ID of the RFQ into a partition key, and we put the type of the message or the entry into the sort key. And if it is an offer, we separate it through a hash symbol and then we add the ID of the unicorn submitting that offer, and this allows us to use the query patterns that we need for the scatter-gather approach in order to work.

So these are typically two requests we need to get a single RFQ in order to represent that in a user context, and as soon as offers come in, we may need to get all offers that belong to one RFQ. And you see here that if this is for a single RFQ, we just call by the identity of the partition key and the sort key, and you see that we can use the begins with syntax to request to get all offers for one RFQ. So this is very nice, it works quite nicely. This would be the kind of a similar thing like a per tenant queue. But again, it doesn't scale very well when we need to onboard customers and create new tables every time.

So ideally we can apply this single table design in a multi-tenant way, and actually that's not super difficult because we don't need to change much. We just need to add the tenant ID to the partition key here, and this would work quite nicely, so problem solved forever, I don't know. There's one problem with that. So it would technically work, you see here that the access patterns are not very different from the queries we had before. One thing that is still important in a multi-tenant environment is tenant isolation for security reasons or for others. So what we lose here as a trade-off with this approach is the data isolation.

So when a tenant would go rogue and it would maybe breach the security of the querying system, it could access data from other tenants. So we could, and again this is an architectural decision we have to make, also build this in a slightly different way, and this would be we put the tenant ID as the sole partition key and would have all other information about the RFQ ID, about the unicorns and their offers into the sort key. This would work almost as well. And you could now use Identity Access Management, as you see on the right hand side of the table, in order to have a role level isolation of these entries.

Each user gets to see their data based on the assumption that the request would be made with this identity. However, there's another problem here, especially when you use different types of key-value stores, and this is called hot partitions. A hot partition is when, as you see here, we have the partition key. If a lot of entries go into one partition key, it creates issues because in the end it's stored physically somewhere, and there are certain limitations.

First, there is a maximum partition size, which is 10 gigabytes if I'm not mistaken. It used to be that if you have a lot of queries into one partition key, you would exceed your request resources. The good thing is, if you ever heard of hot partitions, it's not that much of an issue in DynamoDB anymore because DynamoDB has a mechanism called split for heat, so it would automatically repartition based on the frequency and volume of requests coming in.

There may still be edge cases if you scale very rapidly with new entries, where the scaling of the splitting mechanism takes a while to do the partitioning. So depending on your multi-tenancy use case, you may run into aspects of hot partitions, and then you may need to go to the design we've seen in the previous slide, which would create much smaller partitions. This again leads us to the realization that no architecture is without trade-offs.

Conclusion: Weighing Trade-Offs in Multi-Tenant Distributed Architecture

We showed you how to introduce patterns that are exchangeable at several parts of the architecture, but in the end, each of these introductions needs to be a conscious decision. You need to weigh the trade-offs, the pros and cons, and ideally create an architecture decision record so afterwards you know what has been done there, and then you will choose the least painful option. We hope that we were able to give you some options when you design those rather complex systems in the multi-tenant distributed way.

We would like to thank you so far, but before that, we would like to ask you, if you like this talk, please rate it in the app. That's very important for us, and we have collected several more resources for this session. So if you would like to learn more about how to build SaaS on AWS with AWS services or maybe with open source tools, there's a lot of talks that we had at this re:Invent and at previous sessions. Thank you very much. Thanks a lot.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community