DEV Community

Cover image for Testing Async Jobs and Queues End-to-End (Without sleep())
Smeet Gohel
Smeet Gohel

Posted on

Testing Async Jobs and Queues End-to-End (Without sleep())

Search the average backend test suite for sleep or wait_for and you'll find a depressing number of arbitrary numbers — 2, 5, sometimes 30.

Those numbers usually have a story behind them.

Someone wrote an asynchronous test that occasionally failed because a background job hadn't completed yet. To make it pass, they added a sleep(2).

A few months later, the infrastructure changed. The job occasionally took three seconds.

The test became flaky again.

Someone increased the timeout to five seconds.

Later, another environment was slower, so the timeout became thirty seconds.

Eventually, every asynchronous test in the suite was waiting far longer than necessary, pipelines became slower, and intermittent failures were dismissed as "just another flaky test."

If you've experienced this cycle, you're not alone.

Testing asynchronous systems is fundamentally different from testing synchronous APIs. The goal isn't to wait a fixed amount of time—it’s to detect when the expected outcome has actually happened.

Over the past few years, I've found that a few simple patterns eliminate most of the unnecessary sleeps while making asynchronous tests faster and far more reliable.

Here's the approach.


Why Async APIs Are Hard to Test

Unlike a synchronous REST endpoint, asynchronous workflows return before the real work has finished.

A typical request looks like this:

Client
   │
POST /orders
   │
   ▼
API
   │
Stores message
   │
Returns 202 Accepted
   │
───────────────
Background Worker
   │
Processes message
   │
Updates database
   │
Publishes event
   │
Clears cache
Enter fullscreen mode Exit fullscreen mode

Your API responds immediately.

The actual business logic happens later.

The test now has two responsibilities:

  1. Verify the request was accepted.
  2. Verify the background processing completed correctly.

That's where many teams reach for sleep().


Why sleep() Is Almost Always the Wrong Tool

Consider this example:

await sleep(5000);

const order = await db.orders.findOne({
    id: orderId
});

expect(order.status).toBe("Completed");
Enter fullscreen mode Exit fullscreen mode

This works…

until it doesn't.

Problems include:

  • The job finishes in 200 ms, but the test still waits five seconds.
  • The job takes six seconds during peak load, and the test fails.
  • CI machines are slower than local development.
  • Multiple background jobs compete for resources.

The fixed delay becomes either:

  • Too short (flaky tests), or
  • Too long (slow pipelines).

Neither outcome is desirable.


Why Polling Beats Sleep

Instead of waiting for a fixed duration, wait for the condition you're expecting.

Conceptually:

Is the order completed?

No.

Wait briefly.

Check again.

Still no.

Wait briefly.

Check again.

Yes.

Continue immediately.
Enter fullscreen mode Exit fullscreen mode

The test finishes as soon as the condition becomes true.

Not one second later.


Polling Without Hammering the Database

One concern is excessive database traffic.

Fortunately, polling doesn't require checking every millisecond.

A practical strategy looks like:

  • Poll every 250–500 ms.
  • Stop immediately once the condition succeeds.
  • Respect an overall timeout.

Example:

await eventually(async () => {
    const order = await repository.find(orderId);

    expect(order.status).toBe("Completed");
});
Enter fullscreen mode Exit fullscreen mode

Most jobs complete after only a few polling iterations.

The database load remains minimal.


The eventually() Helper — 20 Lines That Eliminate Most Sleeps

One of the most useful utilities we've adopted is a simple helper called eventually().

It repeatedly executes an assertion until either:

  • It succeeds, or
  • The timeout expires.

A simplified implementation looks like:

async function eventually(assertion, timeout = 10000, interval = 250) {
    const start = Date.now();

    while (Date.now() - start < timeout) {
        try {
            await assertion();
            return;
        } catch (_) {
            await new Promise(r => setTimeout(r, interval));
        }
    }

    throw new Error("Condition not satisfied before timeout.");
}
Enter fullscreen mode Exit fullscreen mode

Despite being only a few lines of code, this helper replaces dozens of arbitrary sleeps across a typical test suite.


Why It Works So Well

Instead of writing:

await sleep(10000);
Enter fullscreen mode Exit fullscreen mode

you simply write:

await eventually(async () => {

    expect(await orderCompleted(orderId))
        .toBe(true);

});
Enter fullscreen mode Exit fullscreen mode

If the job completes after 800 milliseconds, the test finishes after 800 milliseconds.

If it needs four seconds, the helper patiently waits.

No guessing required.


Asserting on Side Effects: The Row, the Event, the Cache

One common mistake is checking only the API response.

Imagine:

POST /orders
Enter fullscreen mode Exit fullscreen mode

returns:

202 Accepted
Enter fullscreen mode Exit fullscreen mode

Many tests stop there.

That tells you only that the message entered the queue.

It says nothing about whether processing succeeded.

Instead, verify the side effects.


1. Database Changes

Example:

await eventually(async () => {

    const order = await db.orders.find(orderId);

    expect(order.status).toBe("Completed");

});
Enter fullscreen mode Exit fullscreen mode

2. Published Events

Many async systems emit events.

Verify:

  • Event exists.
  • Payload is correct.
  • Event type matches expectations.

For example:

OrderCompleted
Enter fullscreen mode Exit fullscreen mode

should appear exactly once.


3. Cache Updates

Suppose order summaries are cached.

After processing completes:

Verify:

  • Cache exists.
  • Cached values are correct.
  • Stale entries disappeared.

Ignoring cache validation often hides production bugs.


4. Notifications

If background jobs send:

  • Emails
  • SMS
  • Push notifications

Test the message queue or mock notification service rather than relying solely on database assertions.


Choosing a Timeout Strategy

The next question becomes:

"How long should eventually() wait?"

Many teams guess.

I recommend using production metrics instead.


Measure the 99th Percentile

Suppose monitoring shows:

Average:

400 ms
Enter fullscreen mode Exit fullscreen mode

95th percentile:

900 ms
Enter fullscreen mode Exit fullscreen mode

99th percentile:

1.8 seconds
Enter fullscreen mode Exit fullscreen mode

Choose a timeout around:

3 × P99
Enter fullscreen mode Exit fullscreen mode

In this example:

5–6 seconds
Enter fullscreen mode Exit fullscreen mode

This provides enough tolerance for occasional variance without masking genuine performance regressions.


Why Not Infinite Retries?

Infinite retries create dangerous tests.

A failed job should fail the pipeline—not wait forever.

A timeout communicates:

"This condition never became true."

That's valuable debugging information.


Testing Job Retries

Many queues automatically retry failed jobs.

Those retries deserve explicit tests.

Suppose processing fails because an external API is temporarily unavailable.

Expected behavior:

Attempt 1

↓

Failure

↓

Retry

↓

Attempt 2

↓

Success
Enter fullscreen mode Exit fullscreen mode

Your test should verify:

  • Retry count.
  • Retry delay.
  • Final success.
  • No duplicate side effects.

Retries often introduce subtle bugs such as duplicate database writes or duplicate notifications.


Testing Dead Letter Queues (DLQs)

Some failures should never succeed.

For example:

  • Invalid payload
  • Corrupt message
  • Missing required fields

After repeated retries:

The message should move into the Dead Letter Queue.

Test expectations include:

  • Retry limit reached.
  • DLQ contains message.
  • Original queue is empty.
  • Error logged.

Ignoring DLQ behavior leaves one of the most important resilience mechanisms completely untested.


Common Mistakes in Async API Testing

Over time, I've seen the same patterns repeatedly.

Using Fixed Sleeps

Creates slow and unreliable pipelines.


Verifying Only HTTP Responses

A 202 Accepted response does not guarantee successful processing.


Ignoring Side Effects

Database updates, cache invalidation, and events deserve verification.


Skipping Retry Logic

Retries often behave differently from first attempts.


Never Testing Failure Paths

DLQs and permanent failures are part of the application—not edge cases.


A Practical Async Testing Checklist

For every asynchronous workflow, ask:

  • Did the API return the expected acknowledgement?
  • Did the background job finish?
  • Was the database updated?
  • Were downstream events published?
  • Was the cache refreshed?
  • Were retries handled correctly?
  • Were permanent failures routed to the DLQ?
  • Did everything complete within acceptable time?

Answering these questions provides much stronger confidence than simply waiting five seconds and hoping the job finished.


Final Thoughts

Asynchronous systems introduce complexity that synchronous APIs simply don't have.

The temptation to sprinkle sleep() calls throughout the test suite is understandable, but those arbitrary delays almost always lead to slower pipelines, flaky builds, and difficult debugging sessions.

A better approach is to wait for outcomes rather than time.

Polling with a lightweight eventually() helper allows tests to complete as soon as work finishes, while side-effect assertions ensure background jobs actually performed the expected business operations.

Combined with sensible timeout strategies based on production metrics and explicit tests for retries and Dead Letter Queues, this creates a much more reliable approach to async API testing.

Instead of asking, "Has enough time passed?", your tests begin asking the more important question:

"Has the expected outcome happened?"

That's the question your users ultimately care about—and your automation should too.

If you're implementing asynchronous APIs with queues, events, or background workers, you'll find additional examples in the async/queue testing pattern we documented, including end-to-end workflows, queue validation techniques, and testing strategies for distributed microservices.


Top comments (0)