If you have spent a decade building large-scale backend systems, you know that integrating modern, slow-running workloads—like LLM prompts or complex AI tasks—into legacy synchronous architectures is a massive headache.
Standard HTTP REST calls are inherently brittle for this. If an AI model takes 45 seconds to generate a response, your traditional API gateway or HTTP client will likely time out at the 30-second mark. The connection drops, the user gets a 504 Gateway Timeout, and the backend CPU cycles are completely wasted.
The textbook architectural answer is to introduce a message broker to act as a shock absorber. But what if your client-facing frontend requires a synchronous, Request-Reply experience?
You have to build a "Sync-over-Async" bridge. And if you are using Azure Service Bus, doing this at a massive scale exposes a critical bottleneck.
The Problem with Service Bus Sessions
When implementing a Request-Reply pattern on Azure Service Bus, the default recommendation is to use Sessions. You send a message with a specific SessionId, and your consumer locks onto that session to receive the reply.
This approach works beautifully in small systems, but it fails spectacularly at scale for two reasons:
- The "Sticky" Bottleneck: Sessions create exclusive locks. If one session has 1,000 messages and another has 10, a consumer gets stuck on the heavy session while other pods sit idle.
- Hard Limits: On the Standard tier, you are limited to 1,500 concurrent sessions. If you are scaling to hundreds or thousands of Spring Boot replicas during a massive traffic spike, you will hit a wall.
If you try to bypass sessions by having thousands of replicas listen to a single shared reply queue, you create a "competing consumer" disaster, wasting CPU cycles and thrashing the broker.
The Enterprise Solution: The Filtered Topic Pattern
To build a highly scalable, session-less Request-Reply architecture, we need to shift from Queues to Topics with SQL Filters. This is the core engine of an AI-Native Gateway concept designed to modernize legacy software systems without rewriting the clients.
Here is how the architecture flows:
-
The Request: The Spring Boot application generates a unique
InstanceIdon startup. It sends the request to a standard queue, attaching a custom property:ReplyToInstance = 'Instance-123'. -
The Dynamic Subscription: When the pod boots up, it dynamically provisions a lightweight Subscription to a global
reply-topic. -
The Magic (SQL Filter): We apply a
SqlRuleFilterto that subscription:ReplyToInstance = 'Instance-123'.
By leveraging the broker's data plane to evaluate the SQL filter, Azure Service Bus does the heavy lifting. Pod #123 only receives messages destined for Pod #123. There is zero thrashing, no session limits, and you get pure horizontal elasticity.
Achieving True Horizontal Scaling with an HTTP Load Balancer
This architecture is not just powerful for one gateway instance; it is designed for massive scale. You can have 50 or 100 Gateway pods sitting behind a load balancer to handle peak traffic.
To do this, you place a standard HTTP Load Balancer (like Azure Application Gateway or Nginx) in front of your Sentinel Gateway instances.
The Load Balancer's role is crucial:
-
Even Traffic Distribution: Configure the load balancer with a "Round-Robin" or "Least Connections" algorithm. This ensures incoming HTTP requests are sprayed evenly across all available Gateway pods (e.g.,
Gateway-A,Gateway-B, etc.). -
Preventing "Sticky" Bottlenecks: This is critical. You must disable HTTP Session Affinity (Sticky Sessions) on the Load Balancer. Every single request should be routed independently. Because each Gateway instance operates on a strict 1:1 ratio—generating a unique
CorrelationIDand waiting for exactly one reply—they don't need to share state. An even distribution of HTTP traffic naturally leads to an even distribution of Service Bus messages and replies.
This creates a stateless, highly resilient design. If one Gateway instance crashes, the load balancer simply sends the next request to another instance, and the overall system keeps humming.
Introducing the Sentinel Service Bus Starter
Wiring up the Azure Administration Client to dynamically provision and clean up these filtered subscriptions—while managing reactive CompletableFuture mappings—is a lot of boilerplate.
To solve this, I built the Sentinel Service Bus Starter, a plug-and-play Spring Boot library that abstracts this entire pattern into a single dependency.
How it works:
Just drop the dependency into your build.gradle, provide your connection string in application.yml, and inject the SentinelTemplate:
@RestController
@RequestMapping("/api/v1/gateway")
public class GatewayController {
private final SentinelTemplate sentinelTemplate;
public GatewayController(SentinelTemplate sentinelTemplate) {
this.sentinelTemplate = sentinelTemplate;
}
@PostMapping("/process")
public CompletableFuture<ResponseEntity<String>> processRequest(@RequestBody String payload) {
// Sends to the ASB Queue, waits on the dynamic Topic Subscription
return sentinelTemplate.sendAndReceive(payload)
.thenApply(ResponseEntity::ok)
.exceptionally(ex -> ResponseEntity.internalServerError().build());
}
}
Because it leverages Java 21's Virtual Threads (Project Loom) under the hood, Tomcat HTTP threads are never blocked while waiting for the Service Bus round-trip, allowing incredible throughput even when waiting 60 seconds for an AI workload to finish.
Bridging the Legacy Gap
We don't always have the luxury of migrating our entire ecosystem to Event-Driven Architecture overnight. Sometimes, you just need a bulletproof, highly scalable Gateway to protect your modern backends from synchronous legacy clients.
I’d love to hear how other teams are tackling the Sync-over-Async problem in the comments!

Top comments (0)