In high-load systems, performance degradation often comes from IO amplification rather than CPU.
Consider a simple method:
public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
await using var db = new AppDbContext();
return await db.Users
.Where(u => u.Id == userId)
.Select(u => new UserInfo(u.Id, u.Name, u.Email))
.FirstOrDefaultAsync();
}
Under low traffic, this is perfectly fine.
Under burst traffic, however, dozens or hundreds of concurrent requests may call this method simultaneously.
If 200 requests arrive within a short window, you don’t get one heavy query.
You get 200 small queries.
Everything is correct. But database load multiplies.
The Idea: Aggregate, Then Distribute
Instead of executing each request immediately, we can:
- Collect incoming
userIds for a short window (e.g., 500ms). - Combine them into a single batch.
- Execute one database query.
- Return the same result dictionary to all callers.
- Each caller reads its own entry.
The public API does not change:
await GetUserInfoAsync(42);
Callers are unaware of batching.
Conceptual Pseudo-Code
The core idea looks roughly like this:
async Task<UserInfo?> GetUserInfoAsync(int userId)
{
var tcs = new TaskCompletionSource<UserInfo?>();
lock (_buffer)
{
_buffer.Add(userId, tcs);
if (!_batchScheduled)
{
_batchScheduled = true;
ScheduleBatchExecution();
}
}
return await tcs.Task;
}
async Task ExecuteBatchAsync()
{
Dictionary<int, TaskCompletionSource<UserInfo?>> batch;
lock (_buffer)
{
batch = _buffer;
_buffer = new();
_batchScheduled = false;
}
var ids = batch.Keys;
var result = await QueryUsersAsync(ids);
foreach (var (id, tcs) in batch)
{
result.TryGetValue(id, out var user);
tcs.SetResult(user);
}
}
This is simplified. A correct implementation must handle:
- Cancellation
- Exceptions
- Memory pressure
- High key cardinality
- Concurrency edge cases
It is not trivial code.
A Demo Implementation (FlowSync)
To avoid rewriting this coordination logic repeatedly, I extracted it into a small demo library:
https://github.com/0x1000000/FlowSync
Using FlowSync, the same aggregation becomes:
public async Task<UserInfo?> GetUserInfoAsync(int userId)
{
var shared = await GetUsersBatchTask
.CoalesceInGroupUsing(GetUsersAggStrategy, userId, groupKey: "users");
return shared.TryGetValue(userId, out var user) ? user : null;
}
Aggregation Strategy
readonly ... GetUsersAggStrategy =
new AggCoalescingSyncStrategy<Dictionary<int, UserInfo>, int, HashSet<int>>(
seedFactory: (_, _) => new HashSet<int>(),
aggregator: (acc, userId) =>
{
acc.Add(userId);
return acc;
},
bufferTime: TimeSpan.FromMilliseconds(500)
);
Batch Task
readonly FlowSyncAggTask<Dictionary<int, UserInfo>, HashSet<int>> GetUsersBatchTask =
FlowSyncAggTask.Create(async (ids, ct) =>
{
await using var db = new AppDbContext();
var users = await db.Users
.Where(u => ids.Contains(u.Id))
.Select(u => new UserInfo(u.Id, u.Name, u.Email))
.ToListAsync(ct);
return users.ToDictionary(u => u.Id, u => u);
});
All callers inside the same 500ms window receive the same dictionary instance and extract their own entry.
One batched query per window instead of N independent queries.
The library hides the coordination complexity, but the architectural idea is independent of it.
Why This Improves System Stability
Without aggregation:
- N concurrent callers → N database queries
With aggregation:
- N concurrent callers in window → 1 query
This reduces:
- Connection pool pressure
- Network round-trips
- Lock contention
- Risk of cascading retries
- Third-party rate-limit hits
Even if the batch query is slightly heavier, total system load drops significantly.
Bus vs Car (But With a Twist)
When traffic is light, individual cars are optimal.
Each driver leaves immediately and reaches the destination with minimal delay.
When traffic becomes dense, however, cars start interfering with each other. Congestion appears. Travel time increases for everyone.
A bus changes the dynamic.
It may introduce a short waiting time before departure, but it carries many passengers at once. Fewer vehicles enter the road, congestion drops, and overall throughput increases.
Aggregation works the same way.
Instead of executing every request immediately, the system briefly collects them, executes one combined operation, and distributes the result.
Individual latency may increase slightly due to buffering.
System-wide stability and throughput improve under burst load.
And here is the twist:
You don’t always need the bus.
If traffic is light, per-request execution is perfectly fine.
If internal metrics show low concurrency and no pressure on IO resources, there is no reason to batch.
But when bursts appear and contention grows, switching to aggregation can prevent congestion from cascading into latency spikes and retries.
Coalescing is not a rule.
It is a strategy.
When the road is empty, let everyone drive.
When traffic builds up, send a bus.
Important Trade-Off
Aggregation introduces:
- A short buffering delay
- Additional coordination logic
- Slightly increased latency for early callers
It optimizes for system-wide stability under burst load, not minimal single-request latency.
That is a policy decision.
Closing
Async/await prevents thread blocking.
It does not prevent IO amplification.
When multiple callers request related data at the same time, parallelism may not be the optimal strategy.
Sometimes the correct optimization is not "run faster".
It is "run together".
Further reading:
Broader discussion of async coordination strategies:
https://medium.com/itnext/5-common-async-coalescing-patterns-db7b1cac1507
Top comments (0)