Martin Oehlert

Posted on Jun 26

Fan-Out/Fan-In and the Async HTTP Pattern

#azure #azurefunctions #dotnet #serverless

You have 500 order line items to process and no ordering dependency between them, so the question is why the workflow takes as long as 500 activities run back to back when nothing forces them to. The chaining pattern from Part 1 awaits each activity before scheduling the next, which is exactly right when step two needs step one's output and exactly wasteful when the items are independent. The fix is fan-out/fan-in: schedule all the activities at once, then aggregate the results, and the only real trick is doing it in a way that survives the orchestrator replaying. There is also a one-character version of this code that compiles, runs, and silently throws the parallelism away, which is the bug this article spends the most time on.

When sequential processing isn't enough

Chaining is the pattern you reach for when the steps form a line: validate the order, then create it, then send the confirmation, each one feeding the next. The await between steps is load-bearing there, because CreateOrderActivity genuinely cannot start until ValidateOrderActivity has returned. That dependency is what makes the sequence correct.

Now change the shape of the work. A batch order arrives with 500 line items, and each one needs the same per-item processing: check stock, price it, reserve inventory. No line item depends on any other. If you write that as a chain, awaiting each item's activity before scheduling the next, the total wall-clock time is the sum of all 500 activities. At a rough sequential ceiling of about 5 activities per second on a single instance, that batch takes roughly a minute and a half, and every second of it is one item waiting on the item before it for no reason.

The independence is the whole point. When the items have no ordering relationship, the latency you actually care about is not the sum of the activities but the slowest single one, because there is nothing stopping them from running at the same time. Fan out across the available workers and the same batch finishes in the time of its longest item, plus a little aggregation overhead. The documented fan-out throughput target is around 100 activities per second per instance, an order of magnitude over the sequential figure, and that gap is entirely the difference between running the work in series and running it in parallel.

Fan-out/fan-in is the orchestration pattern for that situation: fan out by scheduling many activities at once, fan in by waiting for all of them and collecting the results. The replay engine from Part 1 is what makes it safe, and the next section shows the exact shape.

Fan-out/fan-in with Task.WhenAll

The shape has two halves. Fan-out means projecting your inputs into activity calls and collecting the tasks without awaiting any of them. Fan-in means a single await Task.WhenAll(tasks) that completes once every activity has finished, handing you the results.

Every code sample below is from the companion sample (isolated worker, .NET 10). Here is the batch processor as an orchestrator, with an aggregation step at the end.

using Microsoft.Azure.Functions.Worker;
using Microsoft.DurableTask;

public record OrderItem(string Sku, int Quantity);
public record OrderResult(string Sku, bool Reserved, decimal LineTotal);
public record BatchSummary(int Processed, int Reserved, decimal Total);

public static class ProcessBatchOrchestrator
{
    [Function(nameof(ProcessBatchOrchestrator))]
    public static async Task<BatchSummary> Run(
        [OrchestrationTrigger] TaskOrchestrationContext context)
    {
        var items = context.GetInput<OrderItem[]>()!;

        // Fan-out: schedule every item at once, collect the tasks unawaited.
        Task<OrderResult>[] tasks =
            [.. items.Select(item => context.CallActivityAsync<OrderResult>(
                nameof(ProcessItemActivity), item))];

        // Fan-in: one await blocks until all of them complete.
        OrderResult[] results = await Task.WhenAll(tasks);

        // Optional: hand the collected results to a final aggregation activity.
        return await context.CallActivityAsync<BatchSummary>(
            nameof(SummarizeBatchActivity), results);
    }
}

The fan-out is the collection expression. items.Select(...) projects each OrderItem into a CallActivityAsync<OrderResult> call, and because nothing awaits those calls, each one schedules an activity and returns its Task<OrderResult> immediately. The [.. ...] spread materializes them into a Task<OrderResult>[]. After that line, all 500 activities are scheduled; none has been waited on.

await Task.WhenAll(tasks) is the fan-in. It returns a single task that completes only when every task in the array has completed, and its result is an OrderResult[]. The element order matches the order of the tasks array, not the order the activities happened to finish in, so results[0] is always the result for items[0] regardless of which line item processed fastest. That positional guarantee is standard Task.WhenAll<TResult> behavior, and it is why you can return the array directly without sorting or correlating anything back to its input.

The activity itself is an ordinary function doing the per-item work.

public static class ProcessItemActivity
{
    [Function(nameof(ProcessItemActivity))]
    public static OrderResult Run([ActivityTrigger] OrderItem item)
    {
        // Real work and I/O belong here: check stock, price, reserve inventory.
        bool reserved = item.Quantity > 0;
        decimal lineTotal = item.Quantity * 9.99m;
        return new OrderResult(item.Sku, reserved, lineTotal);
    }
}

Task.WhenAll is not one of the nondeterministic APIs banned inside an orchestrator. It is replay-safe, and the reason traces straight back to Part 1's replay engine. The durable calls it aggregates are the replay-checkpointed operations: the runtime records each scheduled CallActivityAsync in the History table the moment the orchestrator yields, and records each result as it arrives. On a replay the orchestrator reaches the same fan-out line, the framework sees those activities already scheduled (and any already completed), and reconstructs the same task array from history rather than re-running the work. The whole parallel batch therefore survives a host recycle the same way a chain does: the activities run across multiple workers concurrently, and the end-to-end execution is resilient to the orchestrator being unloaded and replayed.

One behavior to know before you ship this. When several activities fail, await Task.WhenAll(tasks) throws only the first exception, even though more than one task faulted. If you need to see every failure (to log all of them, or decide based on how many failed) inspect the Exception property on the task that Task.WhenAll returns, which holds the full AggregateException with one inner exception per faulted activity.

The loop mistake (and the fix)

The previous section showed the correct shape. This section is about the version that looks just as correct, compiles cleanly, runs without error, and quietly runs everything in series anyway. It is the most common fan-out bug, and it is worth being able to spot in a code review on sight, because nothing else will flag it for you.

Here is the broken orchestrator. A reviewer skimming it sees a loop over the items, an activity call per item, and a list of results. It reads like a fan-out.

// BROKEN: awaiting inside the loop runs the activities one after another.
[Function(nameof(ProcessBatchOrchestrator))]
public static async Task<OrderResult[]> Run(
    [OrchestrationTrigger] TaskOrchestrationContext context)
{
    var items = context.GetInput<OrderItem[]>()!;
    var results = new List<OrderResult>();
    foreach (var item in items)
        results.Add(await context.CallActivityAsync<OrderResult>(
            nameof(ProcessItemActivity), item));   // awaits before scheduling the next
    return [.. results];
}

The await is in the wrong place. Each iteration calls CallActivityAsync, then await suspends the orchestrator until that one activity returns, and only after the result is added to the list does the loop come back around to schedule the next item. So the activities are scheduled one at a time, each waiting on the one before it. This is the chaining pattern wearing a loop, and it has the chaining pattern's latency: the sum of all 500 activities, back at roughly 5 per second. There is no compiler warning, no runtime exception, no log line that looks wrong. The only symptom is that a batch that should take a couple of seconds takes a minute and a half, and you usually only notice once the batch sizes grow in production.

The fix is to move the await out of the loop. Schedule everything first, collect the tasks, then fan in with a single Task.WhenAll.

// FIXED: collect every task first, then fan in with one Task.WhenAll.
[Function(nameof(ProcessBatchOrchestrator))]
public static async Task<OrderResult[]> Run(
    [OrchestrationTrigger] TaskOrchestrationContext context)
{
    var items = context.GetInput<OrderItem[]>()!;
    Task<OrderResult>[] tasks =
        [.. items.Select(item => context.CallActivityAsync<OrderResult>(
            nameof(ProcessItemActivity), item))];   // scheduled, not awaited
    return await Task.WhenAll(tasks);                // all run in parallel
}

The difference is whether await sits on each individual call or once on the whole array. In the broken version every CallActivityAsync is awaited the instant it is made, which serializes the schedule-and-wait. In the fixed version the Select schedules all of them without awaiting any, and the single await Task.WhenAll(tasks) is the only suspension point, so the activities are in flight together and the worker fans them out across the pool. Same activity, same input, same result array (Task.WhenAll preserves the order of the tasks array, so you can return it directly), and at batch scale the difference is roughly 100 activities per second instead of 5.

The review heuristic: if you see await on a durable call inside a foreach or for loop, stop and ask whether those iterations actually depend on each other. If iteration N needs iteration N-1's output, awaiting in the loop is correct, that is a chain. If they are independent, the await belongs on a Task.WhenAll after the loop, and leaving it inside is the silent serialization trap.

The async HTTP pattern

A 500-item batch that fans out to activities can run for a minute and a half, and no HTTP client should be holding a socket open that long. Browsers, load balancers, and API gateways all time out well before that, so a client function that started the orchestration and blocked on its result would fail the caller before the batch even finished. The async HTTP pattern solves this by separating starting the work from collecting its result: the HTTP trigger kicks off the orchestration and returns immediately with a set of URLs the caller can poll.

Here is the client function that starts ProcessBatchOrchestrator and hands the caller back a status endpoint.

using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.DurableTask.Client;

public static class StartBatchClient
{
    [Function(nameof(StartBatchClient))]
    public static async Task<HttpResponseData> Run(
        [HttpTrigger(AuthorizationLevel.Function, "post", Route = "batches")]
            HttpRequestData req,
        [DurableClient] DurableTaskClient client)
    {
        OrderItem[] items = await req.ReadFromJsonAsync<OrderItem[]>() ?? [];

        string instanceId = await client.ScheduleNewOrchestrationInstanceAsync(
            nameof(ProcessBatchOrchestrator), items);

        // 202 Accepted + Location + the management URLs, without blocking on the result.
        return await client.CreateCheckStatusResponseAsync(req, instanceId);
    }
}

ScheduleNewOrchestrationInstanceAsync enqueues the orchestration and returns its instance id without waiting for it to run. CreateCheckStatusResponseAsync then builds the response: HTTP 202 Accepted, a Location header pointing at the status-query endpoint, and a Retry-After header (10 seconds by default) telling the caller how long to wait before polling again. Prefer the awaited CreateCheckStatusResponseAsync over the synchronous CreateCheckStatusResponse: under the ASP.NET Core integration the synchronous form can throw InvalidOperationException ("Synchronous operations are disallowed"), which is why the async overload exists.

The JSON body carries the management URLs for the instance:

{
  "id": "7f3c1e9a4b8d4f0e9c2a6b5d8e1f0a3c",
  "statusQueryGetUri": "https://.../runtime/webhooks/durabletask/instances/7f3c...?...",
  "sendEventPostUri": "https://.../instances/7f3c.../raiseEvent/{eventName}?...",
  "terminatePostUri": "https://.../instances/7f3c.../terminate?reason={text}&...",
  "suspendPostUri": "https://.../instances/7f3c.../suspend?reason={text}&...",
  "resumePostUri": "https://.../instances/7f3c.../resume?reason={text}&...",
  "purgeHistoryDeleteUri": "https://.../instances/7f3c...?..."
}

statusQueryGetUri is the one the caller polls (it is the same URL as the Location header); the others handle raising an external event, terminating, suspending, resuming, and purging the instance's history. A preview rewindPostUri shows up too on supported plans.

From the caller's side it is a poll loop against statusQueryGetUri. While the instance is still running the status endpoint returns 202, with its own Location header pointing back at itself; once the instance reaches a terminal state it returns 200 with the full status body, and the body's output field carries the orchestration's return value. The runtimeStatus field tells you which state you landed in: Running, Pending, Failed, Canceled, Terminated, Completed, or Suspended.

A curl-style poll honoring Retry-After is the whole client protocol:

using HttpClient http = new();
HttpResponseMessage start = await http.PostAsJsonAsync(
    "https://.../api/batches", items);

// 202 came back; Location is the status-query URL.
Uri statusUri = start.Headers.Location!;

while (true)
{
    HttpResponseMessage status = await http.GetAsync(statusUri);
    if (status.StatusCode == HttpStatusCode.OK)
    {
        BatchSummary? summary = await status.Content
            .ReadFromJsonAsync<StatusResponse>() is { Output: var o } ? o : null;
        // runtimeStatus == "Completed"; the result is in the body's output field.
        break;
    }

    // 202: still running. Wait the Retry-After the server asked for.
    TimeSpan wait = status.Headers.RetryAfter?.Delta ?? TimeSpan.FromSeconds(10);
    await Task.Delay(wait);
}

If your caller can tolerate a short synchronous wait (a small batch that usually finishes in a second or two), WaitForCompletionOrCreateCheckStatusResponseAsync collapses the round trip: it waits for the instance to complete and returns its output with a 200, and if the wait elapses first it falls back to the same 202 + management-URL payload. The isolated-worker signature is worth a careful read, because it differs from the in-process one: it takes a retryInterval for the internal poll cadence and a CancellationToken that bounds the overall wait, but there is no timeout parameter. You cap the wait by cancelling the token, not by passing a TimeSpan.

Memory and concurrency limits

Two things bite at batch scale that never show up on a three-item demo: where all those activity results go, and how parallel the fan-out actually runs.

Start with memory. Every activity output is serialized into the orchestration's history in the <TaskHubName>History table, and Part 1's replay engine loads that full history into memory each time the orchestrator replays. With a 500-wide fan-out, the fan-in array is 500 serialized results sitting in history and getting rehydrated on every replay. If each result is small that is fine; if each activity returns a multi-megabyte payload, history balloons. Past 45 KB serialized, a result spills over to a <taskhub>-largemessages blob container automatically (the underlying Azure Queue message hard cap is 64 KB, and the 45 KB threshold leaves headroom for the compressed form). That spillover is correctness-preserving, but it is not free: it costs CPU and IO for the compress-and-round-trip, and the rehydrated payloads still bloat replay memory.

The fix is to return a reference, not the bytes. Have the activity write the heavy payload to blob storage and return a small id the fan-in can carry cheaply.

// BROKEN: the multi-MB image is serialized into history on fan-in.
[Function(nameof(RenderPageActivity))]
public static RenderedPage Run([ActivityTrigger] int page)
{
    byte[] image = Renderer.ToPng(page);     // multi-MB payload
    return new RenderedPage(page, image);    // every byte lands in history
}

// FIXED: write the payload to blob storage, return a small reference.
[Function(nameof(RenderPageActivity))]
public static async Task<PageRef> Run([ActivityTrigger] int page)
{
    byte[] image = Renderer.ToPng(page);
    string blobName = await Blobs.UploadAsync($"pages/{page}.png", image);
    return new PageRef(page, blobName);      // history carries a reference, not the bytes
}

Whatever needs the bytes later reads them back from blob storage by name. The history stays small, replay stays fast, and you never go near the spillover threshold.

Now concurrency. Fan-out is not unbounded parallelism. Scheduling 500 activities at once does not mean 500 run at once; the host caps how many activities execute concurrently per instance through maxConcurrentActivityFunctions, which defaults to 10 on the Consumption plan (10x the processor count on Dedicated and Premium). Orchestrators have their own ceiling, maxConcurrentOrchestratorFunctions, defaulting to 5. Both live under extensions.durableTask in host.json:

{
  "extensions": {
    "durableTask": {
      "maxConcurrentActivityFunctions": 10,
      "maxConcurrentOrchestratorFunctions": 5
    }
  }
}

Both limits are per-instance, so scale-out multiplies them: ten workers at the default give you up to 100 activities in flight. The surplus past the concurrency limit does not fail, it queues, and the runtime drains it as slots free up. So a 500-wide fan-out on a single instance runs about 10 at a time with the rest waiting their turn, and the batch is only as parallel as your concurrency setting times your worker count allows. Size the fan-out against that ceiling rather than assuming the width you scheduled is the width that runs.

Closing

The whole pattern turns on where one await sits. Move it off the individual CallActivityAsync calls and onto a single Task.WhenAll, and the same orchestrator that ran 500 items in series now runs them in parallel, replay-safe, across every worker the platform gives you. The async HTTP pattern lets a caller start that batch and walk away with a status URL instead of a held-open socket, and the memory and concurrency caps are the guardrails that keep a wide fan-out from quietly bloating history or pretending to be more parallel than it is.

Part 3 picks up the other half of orchestration: workflows that pause and wait on something outside the function, like a human approval or an external event, without burning compute while they wait.

Do you cap your fan-out width with a host.json concurrency limit, or let the platform scale it and size the batch to fit?