Dmitrii

Posted on Feb 22

Async Aggregation: Reducing IO Amplification Under Burst Load

#dotnet #programming

In high-load systems, performance degradation often comes from IO amplification rather than CPU.

Consider a simple method:

public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    await using var db = new AppDbContext();

    return await db.Users
        .Where(u => u.Id == userId)
        .Select(u => new UserInfo(u.Id, u.Name, u.Email))
        .FirstOrDefaultAsync();
}

Under low traffic, this is perfectly fine.

Under burst traffic, however, dozens or hundreds of concurrent requests may call this method simultaneously.

If 200 requests arrive within a short window, you don’t get one heavy query.

You get 200 small queries.

Everything is correct. But database load multiplies.

The Idea: Aggregate, Then Distribute

Instead of executing each request immediately, we can:

Collect incoming userIds for a short window (e.g., 500ms).
Combine them into a single batch.
Execute one database query.
Return the same result dictionary to all callers.
Each caller reads its own entry.

The public API does not change:

await GetUserInfoAsync(42);

Callers are unaware of batching.

Conceptual Pseudo-Code

The core idea looks roughly like this:

async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    var tcs = new TaskCompletionSource<UserInfo?>();

    lock (_buffer)
    {
        _buffer.Add(userId, tcs);

        if (!_batchScheduled)
        {
            _batchScheduled = true;
            ScheduleBatchExecution();
        }
    }

    return await tcs.Task;
}

async Task ExecuteBatchAsync()
{
    Dictionary<int, TaskCompletionSource<UserInfo?>> batch;

    lock (_buffer)
    {
        batch = _buffer;
        _buffer = new();
        _batchScheduled = false;
    }

    var ids = batch.Keys;
    var result = await QueryUsersAsync(ids);

    foreach (var (id, tcs) in batch)
    {
        result.TryGetValue(id, out var user);
        tcs.SetResult(user);
    }
}

This is simplified. A correct implementation must handle:

Cancellation
Exceptions
Memory pressure
High key cardinality
Concurrency edge cases

It is not trivial code.

A Demo Implementation (FlowSync)

To avoid rewriting this coordination logic repeatedly, I extracted it into a small demo library:

https://github.com/0x1000000/FlowSync

Using FlowSync, the same aggregation becomes:

public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
    var shared = await GetUsersBatchTask
        .CoalesceInGroupUsing(GetUsersAggStrategy, userId, groupKey: "users");

    return shared.TryGetValue(userId, out var user) ? user : null;
}

Aggregation Strategy

readonly ... GetUsersAggStrategy =
    new AggCoalescingSyncStrategy<Dictionary<int, UserInfo>, int, HashSet<int>>(
        seedFactory: (_, _) => new HashSet<int>(),
        aggregator: (acc, userId) =>
        {
            acc.Add(userId);
            return acc;
        },
        bufferTime: TimeSpan.FromMilliseconds(500)
    );

Batch Task

readonly FlowSyncAggTask<Dictionary<int, UserInfo>, HashSet<int>> GetUsersBatchTask =
    FlowSyncAggTask.Create(async (ids, ct) =>
    {
        await using var db = new AppDbContext();

        var users = await db.Users
            .Where(u => ids.Contains(u.Id))
            .Select(u => new UserInfo(u.Id, u.Name, u.Email))
            .ToListAsync(ct);

        return users.ToDictionary(u => u.Id, u => u);
    });

All callers inside the same 500ms window receive the same dictionary instance and extract their own entry.

One batched query per window instead of N independent queries.

The library hides the coordination complexity, but the architectural idea is independent of it.

Why This Improves System Stability

Without aggregation:

N concurrent callers → N database queries

With aggregation:

N concurrent callers in window → 1 query

This reduces:

Connection pool pressure
Network round-trips
Lock contention
Risk of cascading retries
Third-party rate-limit hits

Even if the batch query is slightly heavier, total system load drops significantly.

Bus vs Car (But With a Twist)

When traffic is light, individual cars are optimal.

Each driver leaves immediately and reaches the destination with minimal delay.

When traffic becomes dense, however, cars start interfering with each other. Congestion appears. Travel time increases for everyone.

A bus changes the dynamic.

It may introduce a short waiting time before departure, but it carries many passengers at once. Fewer vehicles enter the road, congestion drops, and overall throughput increases.

Aggregation works the same way.

Instead of executing every request immediately, the system briefly collects them, executes one combined operation, and distributes the result.

Individual latency may increase slightly due to buffering.

System-wide stability and throughput improve under burst load.

And here is the twist:

You don’t always need the bus.

If traffic is light, per-request execution is perfectly fine.

If internal metrics show low concurrency and no pressure on IO resources, there is no reason to batch.

But when bursts appear and contention grows, switching to aggregation can prevent congestion from cascading into latency spikes and retries.

Coalescing is not a rule.
It is a strategy.

When the road is empty, let everyone drive.

When traffic builds up, send a bus.

Important Trade-Off

Aggregation introduces:

A short buffering delay
Additional coordination logic
Slightly increased latency for early callers

It optimizes for system-wide stability under burst load, not minimal single-request latency.

That is a policy decision.

Closing

Async/await prevents thread blocking.

It does not prevent IO amplification.

When multiple callers request related data at the same time, parallelism may not be the optimal strategy.

Sometimes the correct optimization is not "run faster".

It is "run together".

DEV Community