DEV Community

Nick
Nick

Posted on

Part 12: Performance Optimization - High-Throughput Concurrency

In the world of workflow automation, efficiency is not just a nice-to-have; it is a fundamental requirement. A single workflow engine instance might need to manage hundreds of concurrent executions, each waiting on I/O, external APIs, or database operations. Today, we explore how Vyshyvanka achieves high-throughput performance using .NET's powerful concurrency model.

The Async-First Architecture

Vyshyvanka was built with an async-first philosophy. We recognized early on that workflow engines are inherently I/O-bound. You are constantly waiting: waiting for a database row to lock, waiting for an API to respond, or waiting for a disk write to confirm.

If we were to use traditional synchronous threads, we would quickly exhaust the thread pool and bring the system to a halt. By using async/await everywhere, we ensure that while a workflow waits for I/O, the underlying thread is returned to the .NET thread pool to handle other work. This allows a relatively small number of threads to manage hundreds of concurrent workflow executions.

Concurrency Control with SemaphoreSlim

The engine uses SemaphoreSlim to limit the degree of parallelism at each execution level. This prevents a single large workflow from saturating all available resources:

private async Task<NodeExecutionResult[]> ExecuteLevelWithThrottlingAsync(
    Workflow workflow, List<string> level, IExecutionContext context,
    ConcurrentBag<NodeExecutionResult> nodeResults, int maxParallelism,
    CancellationToken cancellationToken)
{
    using var semaphore = new SemaphoreSlim(maxParallelism, maxParallelism);

    var tasks = level.Select(async nodeId =>
    {
        await semaphore.WaitAsync(cancellationToken);
        try
        {
            return await ExecuteNodeInWorkflowAsync(
                workflow, nodeId, context, nodeResults, cancellationToken);
        }
        finally
        {
            semaphore.Release();
        }
    }).ToList();

    return await Task.WhenAll(tasks);
}
Enter fullscreen mode Exit fullscreen mode

The default parallelism is Environment.ProcessorCount * 2, configurable per workflow via workflow.Settings.MaxDegreeOfParallelism. This gives you fine-grained control — a lightweight data-sync workflow might run with parallelism of 1, while a fan-out notification workflow might use 20.

Thread Safety Through Design

Managing concurrency requires careful handling of shared state. Here is our approach:

  1. Thread-Safe Collections: The engine tracks active executions and node results using ConcurrentDictionary and ConcurrentBag:
private readonly ConcurrentDictionary<Guid, CancellationTokenSource> _activeExecutions = new();

// Node results are collected concurrently from parallel branches
var nodeResults = new ConcurrentBag<NodeExecutionResult>();
Enter fullscreen mode Exit fullscreen mode
  1. Scoped Execution Context: Each execution gets its own IExecutionContext instance. The context stores node outputs and execution variables. While the context is mutated during execution (nodes write their outputs into it), each execution is isolated — one workflow's context never touches another's.

  2. Linked Cancellation: Each execution creates a linked CancellationTokenSource so that cancellation can come from either the caller or the engine's CancelExecutionAsync method:

using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
_activeExecutions[context.ExecutionId] = cts;
Enter fullscreen mode Exit fullscreen mode

Minimizing Thread Pool Starvation

A common trap in .NET is 'sync-over-async' — calling .Result or .Wait() on an asynchronous method. This blocks a thread, preventing it from doing other work, and is the primary cause of thread pool starvation. We enforce a strict rule in our codebase: No blocking calls. Every call from the engine core through the plugin layer must be properly awaited.

This rule extends to plugin nodes. Plugin execution goes through IPluginHost.ExecuteNodeInIsolationAsync with a configurable timeout. If a plugin blocks or hangs, the timeout fires and the node is marked as failed without affecting other executions.

Memory Efficiency

We apply several techniques to reduce allocations in the hot path:

  • Cached empty objects: A single JsonDocument.Parse("{}").RootElement.Clone() is reused across all nodes that need an empty input, avoiding repeated allocations.
  • StringBuilder reuse: Expression evaluation uses StringBuilder with pre-estimated capacity for string interpolation.
  • Compiled regex: The expression pattern matching uses source-generated [GeneratedRegex] for zero-allocation matching.
// Cached once, shared across all executions
private static readonly JsonElement EmptyObjectElement =
    JsonDocument.Parse("{}").RootElement.Clone();
Enter fullscreen mode Exit fullscreen mode

Plugin Timeout Protection

External plugin nodes get an additional layer of protection. The engine wraps plugin execution with a timeout to prevent runaway code from blocking the pipeline:

private static readonly TimeSpan DefaultPluginTimeout = TimeSpan.FromSeconds(30);

private async Task<NodeOutput> ExecuteNodeInstanceAsync(
    INode nodeInstance, NodeInput input, IExecutionContext context, TimeSpan timeout)
{
    if (_pluginHost is not null && _pluginHost.IsPluginNode(nodeInstance.Type))
        return await _pluginHost.ExecuteNodeInIsolationAsync(nodeInstance, input, context, timeout);

    return await nodeInstance.ExecuteAsync(input, context);
}
Enter fullscreen mode Exit fullscreen mode

Leveraging the .NET Thread Pool

The .NET Thread Pool is incredibly good at its job, but it needs the engine to be a 'good citizen'. By being truly asynchronous, we allow the .NET runtime to:

  • Dynamically scale the number of threads based on CPU load
  • Optimize task scheduling to keep CPU caches warm
  • Avoid context-switching overhead that comes with excessive multi-threading

The Results

By combining non-blocking I/O with granular concurrency control, Vyshyvanka can scale horizontally. Because our execution state is persisted separately from the workflow definition, you can spin up multiple engine instances connected to the same database. Your throughput is limited by your infrastructure, not by a single process.

Performance is a journey of constant refinement. By respecting the asynchronous nature of .NET, isolating execution contexts, and avoiding thread-blocking patterns, we have built an engine that is as fast as it is flexible.

In the next part, we will discuss Part 13: Deployment Strategies - Orchestrating Vyshyvanka with .NET Aspire. Stay tuned!


Check out the project source code here: https://github.com/homolibere/Vyshyvanka

Top comments (0)