When you build a multi-tenant SaaS product, integrating with third-party providers is inevitable.
But what happens when their latency becomes your bottleneck?
Recently, we encountered a massive performance issue in our notification engine that was stalling campaigns for hours.
Here is how we solved it and increased our campaign dispatch speed by 10x.
The Context: Journey-Based WhatsApp Campaigns
At Casa Retail AI, we have a journey flow where retailers can target specific groups of customers.
For example:
- Customers with a Lifetime Value (LTV) greater than ₹10,000.
- Customers who haven’t made a purchase in the past year.
Once these cohorts are identified, the system sends them a targeted WhatsApp message to re-engage them.
Because we operate a multi-tenant platform, every tenant (retailer) has their own distinct WhatsApp service provider integrated with Casa.
The workflow seemed straightforward: identify the audience, construct the payloads, and push them to the provider.
The Problem: The API Latency Bottleneck
The issue wasn’t our database.
It wasn't our background job queue either.
The bottleneck was the WhatsApp message providers.
While sending messages, we discovered that some third-party providers took up to 2 seconds to process a single request.
If a tenant launched a campaign targeting a large cohort, processing these messages sequentially was agonizingly slow.
A single campaign could take approximately 5 hours to complete.
The Real Issue: The "Innocent Bystander" Problem
A 5-hour campaign doesn't just affect the tenant running it.
Because we use a shared background job system, those slow jobs were hogging the workers.
Other tenants' jobs were sitting in the queue, waiting for execution.
It wasn’t the other tenants' fault.
It wasn't even the active tenant's fault—they just had a slow provider.
We initially tried to mitigate this by routing payloads to a delayed-queue based on the average response time for that particular campaign.
While this prevented the main queue from grinding to a halt, it was merely treating the symptom, not the root cause.
We needed a way to dispatch these requests much faster.
The Solution: The Concurrent Dispatcher
If a single API call takes 2 seconds, calling it 1,000 times in sequence takes 2,000 seconds.
But if we can run them in parallel, we only pay the latency cost of the slowest concurrent batch.
To achieve this, we built a Concurrent Dispatcher in Scala to process these external API calls simultaneously.
The idea was simple:
Instead of sending one message at a time, group the tasks and fire them concurrently using a dedicated thread pool, awaiting their completion before proceeding.
How It Works:
-
Dedicated Thread Pool: We use a
FixedThreadPoolwith a configurable concurrency limit (defaulting to 30). This isolates these slow, I/O-bound API tasks from the rest of the application's execution context. -
Future Wrapping: Each task is wrapped in a Scala
Future, allowing them to execute immediately on the dedicated thread pool. -
Future.sequence: This powerful Scala standard library method transforms a
Seq[Future[T]]into aFuture[Seq[T]], effectively combining them. -
Bounded Awaiting: We
Await.resultwith a configurable timeout (defaulting to 5 minutes) to ensure a hung third-party API doesn't hold our workers hostage forever.
The Business Impact
The results were immediate and drastic.
By dispatching requests concurrently, a campaign that previously took 5 hours now finished in roughly a fraction of the time.
That's a massive 10x improvement in campaign speed.
Because campaigns finished faster:
- Workers were freed up sooner.
- Other tenants no longer experienced phantom delays in the job queue.
- We no longer had to aggressively route tasks to a delayed queue just to survive high traffic.
What I Learned
When you integrate with external systems, you are bound by their worst-case latency.
If you process external I/O sequentially in a background worker, you are turning a fast, internal asynchronous system into a slow, synchronous one.
Sometimes, the best solution is simply refusing to wait in line.
By recognizing that our bottleneck was external I/O wait time, introducing a concurrent execution model allowed us to reclaim our system's throughput.
Top comments (1)