DEV Community

Cover image for Async Aggregation: Reducing IO Amplification Under Burst Load
Dmitrii
Dmitrii

Posted on

Async Aggregation: Reducing IO Amplification Under Burst Load

In high-load systems, performance degradation often comes from IO amplification rather than CPU.

Consider a simple method:

public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    await using var db = new AppDbContext();

    return await db.Users
        .Where(u => u.Id == userId)
        .Select(u => new UserInfo(u.Id, u.Name, u.Email))
        .FirstOrDefaultAsync();
}
Enter fullscreen mode Exit fullscreen mode

Under low traffic, this is perfectly fine.

Under burst traffic, however, dozens or hundreds of concurrent requests may call this method simultaneously.

If 200 requests arrive within a short window, you don’t get one heavy query.

You get 200 small queries.

Everything is correct. But database load multiplies.


The Idea: Aggregate, Then Distribute

Instead of executing each request immediately, we can:

  1. Collect incoming userIds for a short window (e.g., 500ms).
  2. Combine them into a single batch.
  3. Execute one database query.
  4. Return the same result dictionary to all callers.
  5. Each caller reads its own entry.

The public API does not change:

await GetUserInfoAsync(42);
Enter fullscreen mode Exit fullscreen mode

Callers are unaware of batching.


Conceptual Pseudo-Code

The core idea looks roughly like this:

async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    var tcs = new TaskCompletionSource<UserInfo?>();

    lock (_buffer)
    {
        _buffer.Add(userId, tcs);

        if (!_batchScheduled)
        {
            _batchScheduled = true;
            ScheduleBatchExecution();
        }
    }

    return await tcs.Task;
}

async Task ExecuteBatchAsync()
{
    Dictionary<int, TaskCompletionSource<UserInfo?>> batch;

    lock (_buffer)
    {
        batch = _buffer;
        _buffer = new();
        _batchScheduled = false;
    }

    var ids = batch.Keys;
    var result = await QueryUsersAsync(ids);

    foreach (var (id, tcs) in batch)
    {
        result.TryGetValue(id, out var user);
        tcs.SetResult(user);
    }
}
Enter fullscreen mode Exit fullscreen mode

This is simplified. A correct implementation must handle:

  • Cancellation
  • Exceptions
  • Memory pressure
  • High key cardinality
  • Concurrency edge cases

It is not trivial code.


A Demo Implementation (FlowSync)

To avoid rewriting this coordination logic repeatedly, I extracted it into a small demo library:

https://github.com/0x1000000/FlowSync

Using FlowSync, the same aggregation becomes:

public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    var shared = await GetUsersBatchTask
        .CoalesceInGroupUsing(GetUsersAggStrategy, userId, groupKey: "users");

    return shared.TryGetValue(userId, out var user) ? user : null;
}
Enter fullscreen mode Exit fullscreen mode

Aggregation Strategy

readonly ... GetUsersAggStrategy =
    new AggCoalescingSyncStrategy<Dictionary<int, UserInfo>, int, HashSet<int>>(
        seedFactory: (_, _) => new HashSet<int>(),
        aggregator: (acc, userId) =>
        {
            acc.Add(userId);
            return acc;
        },
        bufferTime: TimeSpan.FromMilliseconds(500)
    );
Enter fullscreen mode Exit fullscreen mode

Batch Task

readonly FlowSyncAggTask<Dictionary<int, UserInfo>, HashSet<int>> GetUsersBatchTask =
    FlowSyncAggTask.Create(async (ids, ct) =>
    {
        await using var db = new AppDbContext();

        var users = await db.Users
            .Where(u => ids.Contains(u.Id))
            .Select(u => new UserInfo(u.Id, u.Name, u.Email))
            .ToListAsync(ct);

        return users.ToDictionary(u => u.Id, u => u);
    });
Enter fullscreen mode Exit fullscreen mode

All callers inside the same 500ms window receive the same dictionary instance and extract their own entry.

One batched query per window instead of N independent queries.

The library hides the coordination complexity, but the architectural idea is independent of it.


Why This Improves System Stability

Without aggregation:

  • N concurrent callers → N database queries

With aggregation:

  • N concurrent callers in window → 1 query

This reduces:

  • Connection pool pressure
  • Network round-trips
  • Lock contention
  • Risk of cascading retries
  • Third-party rate-limit hits

Even if the batch query is slightly heavier, total system load drops significantly.


Bus vs Car (But With a Twist)

When traffic is light, individual cars are optimal.

Each driver leaves immediately and reaches the destination with minimal delay.

When traffic becomes dense, however, cars start interfering with each other. Congestion appears. Travel time increases for everyone.

A bus changes the dynamic.

It may introduce a short waiting time before departure, but it carries many passengers at once. Fewer vehicles enter the road, congestion drops, and overall throughput increases.

Aggregation works the same way.

Instead of executing every request immediately, the system briefly collects them, executes one combined operation, and distributes the result.

Individual latency may increase slightly due to buffering.

System-wide stability and throughput improve under burst load.

And here is the twist:

You don’t always need the bus.

If traffic is light, per-request execution is perfectly fine.

If internal metrics show low concurrency and no pressure on IO resources, there is no reason to batch.

But when bursts appear and contention grows, switching to aggregation can prevent congestion from cascading into latency spikes and retries.

Coalescing is not a rule.
It is a strategy.

When the road is empty, let everyone drive.

When traffic builds up, send a bus.


Important Trade-Off

Aggregation introduces:

  • A short buffering delay
  • Additional coordination logic
  • Slightly increased latency for early callers

It optimizes for system-wide stability under burst load, not minimal single-request latency.

That is a policy decision.


Closing

Async/await prevents thread blocking.

It does not prevent IO amplification.

When multiple callers request related data at the same time, parallelism may not be the optimal strategy.

Sometimes the correct optimization is not "run faster".

It is "run together".


Further reading:

Broader discussion of async coordination strategies:

https://medium.com/itnext/5-common-async-coalescing-patterns-db7b1cac1507

Github:

https://github.com/0x1000000/FlowSync

Top comments (0)