Search the average backend test suite for sleep or wait_for and you'll find a depressing number of arbitrary numbers — 2, 5, sometimes 30.
Those numbers usually have a story behind them.
Someone wrote an asynchronous test that occasionally failed because a background job hadn't completed yet. To make it pass, they added a sleep(2).
A few months later, the infrastructure changed. The job occasionally took three seconds.
The test became flaky again.
Someone increased the timeout to five seconds.
Later, another environment was slower, so the timeout became thirty seconds.
Eventually, every asynchronous test in the suite was waiting far longer than necessary, pipelines became slower, and intermittent failures were dismissed as "just another flaky test."
If you've experienced this cycle, you're not alone.
Testing asynchronous systems is fundamentally different from testing synchronous APIs. The goal isn't to wait a fixed amount of time—it’s to detect when the expected outcome has actually happened.
Over the past few years, I've found that a few simple patterns eliminate most of the unnecessary sleeps while making asynchronous tests faster and far more reliable.
Here's the approach.
Why Async APIs Are Hard to Test
Unlike a synchronous REST endpoint, asynchronous workflows return before the real work has finished.
A typical request looks like this:
Client
│
POST /orders
│
▼
API
│
Stores message
│
Returns 202 Accepted
│
───────────────
Background Worker
│
Processes message
│
Updates database
│
Publishes event
│
Clears cache
Your API responds immediately.
The actual business logic happens later.
The test now has two responsibilities:
- Verify the request was accepted.
- Verify the background processing completed correctly.
That's where many teams reach for sleep().
Why sleep() Is Almost Always the Wrong Tool
Consider this example:
await sleep(5000);
const order = await db.orders.findOne({
id: orderId
});
expect(order.status).toBe("Completed");
This works…
until it doesn't.
Problems include:
- The job finishes in 200 ms, but the test still waits five seconds.
- The job takes six seconds during peak load, and the test fails.
- CI machines are slower than local development.
- Multiple background jobs compete for resources.
The fixed delay becomes either:
- Too short (flaky tests), or
- Too long (slow pipelines).
Neither outcome is desirable.
Why Polling Beats Sleep
Instead of waiting for a fixed duration, wait for the condition you're expecting.
Conceptually:
Is the order completed?
No.
Wait briefly.
Check again.
Still no.
Wait briefly.
Check again.
Yes.
Continue immediately.
The test finishes as soon as the condition becomes true.
Not one second later.
Polling Without Hammering the Database
One concern is excessive database traffic.
Fortunately, polling doesn't require checking every millisecond.
A practical strategy looks like:
- Poll every 250–500 ms.
- Stop immediately once the condition succeeds.
- Respect an overall timeout.
Example:
await eventually(async () => {
const order = await repository.find(orderId);
expect(order.status).toBe("Completed");
});
Most jobs complete after only a few polling iterations.
The database load remains minimal.
The eventually() Helper — 20 Lines That Eliminate Most Sleeps
One of the most useful utilities we've adopted is a simple helper called eventually().
It repeatedly executes an assertion until either:
- It succeeds, or
- The timeout expires.
A simplified implementation looks like:
async function eventually(assertion, timeout = 10000, interval = 250) {
const start = Date.now();
while (Date.now() - start < timeout) {
try {
await assertion();
return;
} catch (_) {
await new Promise(r => setTimeout(r, interval));
}
}
throw new Error("Condition not satisfied before timeout.");
}
Despite being only a few lines of code, this helper replaces dozens of arbitrary sleeps across a typical test suite.
Why It Works So Well
Instead of writing:
await sleep(10000);
you simply write:
await eventually(async () => {
expect(await orderCompleted(orderId))
.toBe(true);
});
If the job completes after 800 milliseconds, the test finishes after 800 milliseconds.
If it needs four seconds, the helper patiently waits.
No guessing required.
Asserting on Side Effects: The Row, the Event, the Cache
One common mistake is checking only the API response.
Imagine:
POST /orders
returns:
202 Accepted
Many tests stop there.
That tells you only that the message entered the queue.
It says nothing about whether processing succeeded.
Instead, verify the side effects.
1. Database Changes
Example:
await eventually(async () => {
const order = await db.orders.find(orderId);
expect(order.status).toBe("Completed");
});
2. Published Events
Many async systems emit events.
Verify:
- Event exists.
- Payload is correct.
- Event type matches expectations.
For example:
OrderCompleted
should appear exactly once.
3. Cache Updates
Suppose order summaries are cached.
After processing completes:
Verify:
- Cache exists.
- Cached values are correct.
- Stale entries disappeared.
Ignoring cache validation often hides production bugs.
4. Notifications
If background jobs send:
- Emails
- SMS
- Push notifications
Test the message queue or mock notification service rather than relying solely on database assertions.
Choosing a Timeout Strategy
The next question becomes:
"How long should eventually() wait?"
Many teams guess.
I recommend using production metrics instead.
Measure the 99th Percentile
Suppose monitoring shows:
Average:
400 ms
95th percentile:
900 ms
99th percentile:
1.8 seconds
Choose a timeout around:
3 × P99
In this example:
5–6 seconds
This provides enough tolerance for occasional variance without masking genuine performance regressions.
Why Not Infinite Retries?
Infinite retries create dangerous tests.
A failed job should fail the pipeline—not wait forever.
A timeout communicates:
"This condition never became true."
That's valuable debugging information.
Testing Job Retries
Many queues automatically retry failed jobs.
Those retries deserve explicit tests.
Suppose processing fails because an external API is temporarily unavailable.
Expected behavior:
Attempt 1
↓
Failure
↓
Retry
↓
Attempt 2
↓
Success
Your test should verify:
- Retry count.
- Retry delay.
- Final success.
- No duplicate side effects.
Retries often introduce subtle bugs such as duplicate database writes or duplicate notifications.
Testing Dead Letter Queues (DLQs)
Some failures should never succeed.
For example:
- Invalid payload
- Corrupt message
- Missing required fields
After repeated retries:
The message should move into the Dead Letter Queue.
Test expectations include:
- Retry limit reached.
- DLQ contains message.
- Original queue is empty.
- Error logged.
Ignoring DLQ behavior leaves one of the most important resilience mechanisms completely untested.
Common Mistakes in Async API Testing
Over time, I've seen the same patterns repeatedly.
Using Fixed Sleeps
Creates slow and unreliable pipelines.
Verifying Only HTTP Responses
A 202 Accepted response does not guarantee successful processing.
Ignoring Side Effects
Database updates, cache invalidation, and events deserve verification.
Skipping Retry Logic
Retries often behave differently from first attempts.
Never Testing Failure Paths
DLQs and permanent failures are part of the application—not edge cases.
A Practical Async Testing Checklist
For every asynchronous workflow, ask:
- Did the API return the expected acknowledgement?
- Did the background job finish?
- Was the database updated?
- Were downstream events published?
- Was the cache refreshed?
- Were retries handled correctly?
- Were permanent failures routed to the DLQ?
- Did everything complete within acceptable time?
Answering these questions provides much stronger confidence than simply waiting five seconds and hoping the job finished.
Final Thoughts
Asynchronous systems introduce complexity that synchronous APIs simply don't have.
The temptation to sprinkle sleep() calls throughout the test suite is understandable, but those arbitrary delays almost always lead to slower pipelines, flaky builds, and difficult debugging sessions.
A better approach is to wait for outcomes rather than time.
Polling with a lightweight eventually() helper allows tests to complete as soon as work finishes, while side-effect assertions ensure background jobs actually performed the expected business operations.
Combined with sensible timeout strategies based on production metrics and explicit tests for retries and Dead Letter Queues, this creates a much more reliable approach to async API testing.
Instead of asking, "Has enough time passed?", your tests begin asking the more important question:
"Has the expected outcome happened?"
That's the question your users ultimately care about—and your automation should too.
If you're implementing asynchronous APIs with queues, events, or background workers, you'll find additional examples in the async/queue testing pattern we documented, including end-to-end workflows, queue validation techniques, and testing strategies for distributed microservices.
Top comments (0)