DEV Community

Cover image for Building WeaveLLM: Why .NET Deserves a Better then LangChain
Harshil Shah
Harshil Shah

Posted on

Building WeaveLLM: Why .NET Deserves a Better then LangChain

Building WeaveLLM: Why .NET Deserves a Better LangChain

Tags: dotnet, ai, csharp, llm
Cover image: architecture diagram of WeaveLLM pipeline


Introduction

Here's a thing I keep running into: .NET developers building serious AI features, and the ecosystem basically telling them to just use Python. LangChain, LlamaIndex, DSPy — every major orchestration framework is Python-first. .NET is an afterthought, if it shows up at all.

But C# developers aren't waiting around. They're shipping customer support bots, RAG pipelines, code-review agents — right now, in production. They're just doing it by calling OpenAI's REST API by hand, copy-pasting retry logic into every service class, and hoping nothing throws in an async chain at 2 AM.

LangChain does have a .NET port. I've used it. It's incomplete, the types don't map cleanly to .NET idioms, and it leans on exceptions as its main error-handling strategy — which is genuinely painful to compose in async code. The deeper issue is that LangChain was designed around Python's dynamic type system. Porting it to C# without rethinking the API from scratch gives you a framework that fights the language the entire time.

So I built WeaveLLM instead. Started as a hobby project, turned into something I actually want to use at work. It's a .NET 8 AI orchestration library designed specifically for C# — railway-oriented results, IAsyncEnumerable<T> streaming, an ASP.NET-style middleware pipeline, and fully generic chains that catch type mismatches at compile time rather than in a 3 AM Slack alert. Here are the four decisions that shaped it.


Design Decision 1: ChainResult<T> Over Exceptions

Let me describe a problem you've almost certainly hit. You call an LLM, it fails — rate limited, timed out, bad input, provider down — and now you need to handle that failure somewhere. In an exception-based framework, that means try/catch at every composition point. Stack a few chains together and you've got nested try/catch blocks all the way down, each one trying to figure out which exception type maps to which recovery strategy.

It's not that exceptions are wrong. It's that they're invisible. A method signature like Task<string> tells you nothing about whether it can fail and how.

WeaveLLM uses railway-oriented programming. Every chain execution returns ChainResult<T> — a result type where errors are values you work with, not surprises that unwind your stack.

// Errors are data, never thrown
public sealed class ChainResult<T>
{
    public bool IsSuccess { get; }
    public bool IsFailure => !IsSuccess;
    public T? Value { get; }
    public ChainError? Error { get; }
    public TokenUsage? TokenUsage { get; }
    public TimeSpan Duration { get; }
    public IReadOnlyDictionary<string, object> Metadata { get; }

    public static ChainResult<T> Success(T value, TokenUsage? usage = null) { ... }
    public static ChainResult<T> Failure(ChainError error) { ... }
    public static ChainResult<T> Failure(string message, string? code = null) { ... }

    // Projects success value to new type; failure passes through unchanged
    public ChainResult<TNext> Map<TNext>(Func<T, TNext> transform) { ... }

    // Destructure into (isSuccess, value, error)
    public void Deconstruct(out bool isSuccess, out T? value, out ChainError? error) { ... }
}
Enter fullscreen mode Exit fullscreen mode

Errors come with structure too, not just a message string:

public record ChainError(string Message, string Code, Exception? InnerException = null)
{
    public static ChainError Timeout(string msg) => new(msg, "Timeout");
    public static ChainError RateLimited(string msg) => new(msg, "RateLimited");
    public static ChainError InvalidInput(string msg) => new(msg, "InvalidInput");
    public static ChainError ProviderError(string msg, Exception? ex = null)
        => new(msg, "ProviderError", ex);
}
Enter fullscreen mode Exit fullscreen mode

Here's what that looks like at the call site, compared to the try/catch version:

// The old way — try/catch at every layer
try
{
    var response = await chain.ExecuteAsync(input);
    return response.Text;
}
catch (RateLimitException) { /* retry */ }
catch (TimeoutException)   { /* fallback */ }
catch (Exception ex)       { /* log */ throw; }

// WeaveLLM — errors are values you can switch on
var result = await chain.ExecuteAsync(input, context);

var (isSuccess, value, error) = result;
if (!isSuccess)
{
    return error!.Code switch
    {
        "RateLimited" => await fallbackChain.ExecuteAsync(input, context),
        "Timeout"     => ChainResult<string>.Failure(error),
        _             => ChainResult<string>.Failure(error)
    };
}

return result.Map(v => v.ToUpperInvariant()); // transforms success, ignores failure
Enter fullscreen mode Exit fullscreen mode

ChainResult<T> also bundles TokenUsage and a Metadata dictionary that middleware layers write into without breaking the type contract. If you've ever used F# or Rust, this is the same railway pattern — errors short-circuit, successes flow through, and Map() lets you transform values without having to unwrap and re-wrap manually.

Approach Error handling Composability Observability built-in
Raw exceptions try/catch at every level Hard to chain None
LangChain.NET Mix of exceptions and nulls Inconsistent None
WeaveLLM Values (ChainResult<T>) Monadic Map/Deconstruct TokenUsage + Metadata

Design Decision 2: IAsyncEnumerable<T> for Streaming

Streaming isn't a polish feature — it's the thing users actually notice. A response that starts rendering in 200ms feels fast, even if the total generation takes 8 seconds. A response that hangs for 8 seconds and then dumps a wall of text feels broken, even if the numbers are the same.

Python gets async generators for this. They work great in Python. In C#, the equivalent is IAsyncEnumerable<T>, which has been in .NET since Core 3.0 and plugs directly into await foreach, LINQ, and ASP.NET Core's response pipeline. WeaveLLM makes it part of the core contract, not an optional add-on.

Every chain has two execution paths: a request/response ExecuteAsync and a token-by-token StreamAsync:

public interface IChain<TInput, TOutput>
{
    string Name { get; }

    Task<ChainResult<TOutput>> ExecuteAsync(
        TInput input,
        ChainContext context,
        CancellationToken cancellationToken = default);

    IAsyncEnumerable<TOutput> StreamAsync(
        TInput input,
        ChainContext context,
        CancellationToken cancellationToken = default);
}
Enter fullscreen mode Exit fullscreen mode

The provider layer exposes the same interface. Here's what consuming it looks like:

public interface IStreamingChatModel : IChatModel
{
    IAsyncEnumerable<string> StreamChatAsync(
        IReadOnlyList<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);
}

// Each token prints as it arrives
await foreach (var token in model.StreamChatAsync(messages, cancellationToken: ct))
{
    Console.Write(token);
}
Enter fullscreen mode Exit fullscreen mode

And in ASP.NET Core, wiring up a streaming endpoint with Server-Sent Events is about 12 lines:

app.MapGet("/stream", async (HttpContext http, IChain<string, string> chain) =>
{
    http.Response.ContentType = "text/event-stream";
    var context = new ChainContext();

    await foreach (var token in chain.StreamAsync("Tell me a story", context))
    {
        await http.Response.WriteAsync($"data: {token}\n\n");
        await http.Response.Body.FlushAsync();
    }
});
Enter fullscreen mode Exit fullscreen mode

Compare that to callback-based streaming, which is what a lot of .NET AI libraries still use:

// Callback-based — no backpressure, no real cancellation support,
// can't await async work per-token
await chain.StreamAsync(input, onToken: token =>
{
    Console.Write(token);
});
Enter fullscreen mode Exit fullscreen mode

With IAsyncEnumerable<T> you get CancellationToken integration automatically (the consumer cancels mid-stream and the producer stops), full LINQ support via System.Linq.Async, and no adapter layer between your chain and the framework. It's just the language doing what it already does well.

Streaming model Backpressure CancellationToken LINQ composable ASP.NET SSE
Callbacks Manual Awkward No Custom plumbing
Events No No No Custom plumbing
IAsyncEnumerable Built-in Built-in Yes Native

Design Decision 3: The Middleware Pipeline

If you've built anything in ASP.NET Core, you already know how middleware works: components that wrap a request, can inspect or modify it, can short-circuit or pass it through, and compose in a predictable order. Every .NET developer has this mental model. It made sense to borrow it directly for LLM chains.

The interface is deliberately close to ASP.NET's RequestDelegate shape:

public delegate Task<ChainResult<TOutput>> ChainDelegate<TInput, TOutput>(
    TInput input,
    ChainContext context,
    CancellationToken cancellationToken);

public interface IChainMiddleware<TInput, TOutput>
{
    Task<ChainResult<TOutput>> InvokeAsync(
        TInput input,
        ChainContext context,
        ChainDelegate<TInput, TOutput> next,
        CancellationToken cancellationToken = default);
}
Enter fullscreen mode Exit fullscreen mode

next is everything downstream. Call it to continue. Skip it to short-circuit — a cache hit, a circuit breaker tripping. Call it and then do something with the result — tracing, cost tracking, PII scrubbing. WeaveLLM ships six middleware implementations out of the box:

services.AddWeaveLLM()
    .AddOpenAI(o => o.ApiKey = config["OpenAI:ApiKey"])
    .AddReActAgent(maxSteps: 5);

var chain = myLlmChain
    .WithMiddleware(new RetryMiddleware(maxRetries: 3, backoffSeconds: 1.5))
    .WithMiddleware(new CacheMiddleware(ttl: TimeSpan.FromMinutes(10)))
    .WithMiddleware(new RateLimitingMiddleware(requestsPerMinute: 60))
    .WithMiddleware(new TracingMiddleware(activitySource))
    .WithMiddleware(new CostMiddleware(pricing))
    .WithMiddleware(new PiiScrubbingMiddleware());
Enter fullscreen mode Exit fullscreen mode

Writing your own is implementing one method. Here's a logging middleware, complete:

public sealed class LoggingMiddleware<TInput, TOutput>
    : IChainMiddleware<TInput, TOutput>
{
    private readonly ILogger _logger;
    public LoggingMiddleware(ILogger logger) => _logger = logger;

    public async Task<ChainResult<TOutput>> InvokeAsync(
        TInput input,
        ChainContext context,
        ChainDelegate<TInput, TOutput> next,
        CancellationToken cancellationToken = default)
    {
        _logger.LogInformation("Chain {Name} starting", context.ChainName);
        var result = await next(input, context, cancellationToken);
        _logger.LogInformation("Chain {Name} finished: {Status} in {Duration}ms",
            context.ChainName,
            result.IsSuccess ? "OK" : result.Error!.Code,
            result.Duration.TotalMilliseconds);
        return result;
    }
}
Enter fullscreen mode Exit fullscreen mode

LangChain's version of this is "callbacks" — a grab-bag of optional hooks (on_llm_start, on_llm_end, on_chain_error) registered globally and fired via reflection. They can't short-circuit. They can't replace the result. They don't compose with each other in any meaningful way. It's a notification system dressed up as a pipeline.

WeaveLLM's middleware is an actual pipeline. Each component decides whether to call next, what to do with the result, and whether to replace or propagate. That's the difference between observability you can build on and hooks you can only listen to.


Design Decision 4: Generic Chains with Compile-Time Type Safety

LangChain chains are stringly typed by default. Inputs and outputs are usually Dictionary<string, string> and the framework resolves what goes where at runtime. In Python, that's fine — you've got a REPL, you find the bug fast. In C#, you find it in production when a downstream chain reaches for a key the upstream chain forgot to set.

It's an unforced error. The type system is right there.

IChain<TInput, TOutput> makes the contract explicit:

public interface IChain<TInput, TOutput>
{
    string Name { get; }
    Task<ChainResult<TOutput>> ExecuteAsync(TInput input, ChainContext context,
        CancellationToken cancellationToken = default);
    IAsyncEnumerable<TOutput> StreamAsync(TInput input, ChainContext context,
        CancellationToken cancellationToken = default);
}

// Connectable variant for fluent Pipe() composition
public interface IConnectableChain<TInput, TOutput> : IChain<TInput, TOutput>
{
    IConnectableChain<TInput, TNext> Pipe<TNext>(IChain<TOutput, TNext> next);
    IConnectableChain<TInput, TOutput> WithMiddleware(IChainMiddleware<TInput, TOutput> middleware);
}
Enter fullscreen mode Exit fullscreen mode

Pipe<TNext>() enforces that the output type of the left chain matches the input type of the right one — at compile time. Not at test time, not in staging. At dotnet build.

record UserQuery(string Text);
record SearchResults(IReadOnlyList<string> Chunks);
record FinalAnswer(string Text, decimal ConfidenceScore);

// This compiles — types line up
IConnectableChain<UserQuery, FinalAnswer> ragPipeline =
    retrievalChain         // IChain<UserQuery, SearchResults>
        .Pipe(rerankerChain)   // IChain<SearchResults, SearchResults>
        .Pipe(generatorChain); // IChain<SearchResults, FinalAnswer>

// This doesn't compile — SearchResults != UserQuery, caught immediately
// retrievalChain.Pipe(generatorChain); // CS ERROR
Enter fullscreen mode Exit fullscreen mode

ComposedChain also handles failure propagation: if the first chain returns a failure, the second never runs and the error forwards unchanged. No null checks, no manual short-circuiting.

The practical payoff shows up when you're actually using the results:

var result = await ragPipeline.ExecuteAsync(
    new UserQuery("What is the WeaveLLM license?"),
    new ChainContext { SessionId = "user-123" });

if (result.IsSuccess)
{
    Console.WriteLine($"Answer: {result.Value!.Text}");
    Console.WriteLine($"Confidence: {result.Value.ConfidenceScore:P0}");
    Console.WriteLine($"Cost: ${result.TokenUsage?.EstimatedCostUsd:F4}");
}
Enter fullscreen mode Exit fullscreen mode

Full IntelliSense, no casting, no as checks. If you rename a property on FinalAnswer, the compiler tells you everywhere it breaks.

Type safety LangChain (Python) LangChain.NET WeaveLLM
Input/output types Dynamic dict Dynamic dict Generic IChain<TIn, TOut>
Mismatched pipes caught Runtime Runtime Compile time
IDE completion on results None Partial Full IntelliSense
Refactor safety None None Compiler-enforced

What Ships in v0.1.0-alpha

WeaveLLM v0.1.0-alpha is on NuGet across five packages:

dotnet add package WeaveLLM.Core
dotnet add package WeaveLLM.Providers
dotnet add package WeaveLLM.Memory
dotnet add package WeaveLLM.Observability
dotnet add package WeaveLLM.Extensions.DependencyInjection
Enter fullscreen mode Exit fullscreen mode

Providers — four, all production-tested:

  • OpenAI (gpt-4o, embeddings, streaming)
  • Anthropic (claude-sonnet, streaming)
  • Ollama (local inference, embeddings — no API key needed)
  • HuggingFace (Inference API, embeddings)

All share IChatModel. Swap provider with one line.

Agents — three patterns:

  • ReActAgent — Thought → Action → Observation loop until Final Answer
  • PlanAndExecuteAgent — separate planning and execution phases for complex tasks
  • AgentGraph<TState> — state machine for multi-agent workflows with typed shared state and conditional branching

Memory and RAG — full pipeline:

  • IMemoryStore with in-memory, Qdrant, and Postgres (pgvector) backends
  • DefaultRagPipeline with recursive text splitting, hybrid BM25 + vector search, and Reciprocal Rank Fusion
  • Document loaders for plain text, Markdown, and directories

Middleware — six built-in:
Retry, caching, rate-limiting, OpenTelemetry tracing, per-request cost estimation, and PII scrubbing.

Observability — baked in, not bolted on:
OpenTelemetry tracing and metrics via the WeaveLLM ActivitySource and Meter. Per-request token usage and estimated USD cost tracked across all providers.


Where It's Going

v0.2.0-alpha (next 6–8 weeks):

  • Azure OpenAI provider
  • Multi-modal input (image + text messages)
  • Streaming agents — IAsyncEnumerable<AgentStep> for real-time reasoning step visibility
  • Redis memory backend
  • Structured output support

v1.0.0 — Q4 2026:

  • Frozen public API with full semver guarantee
  • Docs site with guides, API reference, and runnable samples
  • Benchmark suite against LangChain Python — qualitative claims become numbers

Honest Alpha Warning

This is a hobby project that got serious. The core abstractions are stable and I won't break them before v1.0. All four providers are tested and working. Some edge cases are still being hardened and breaking changes are possible in non-core areas before v1.0.

I built this because I was frustrated, not because I had a product roadmap. If you've hit the same frustrations with .NET LLM tooling, that's the target audience — and the best time to shape what v1.0 looks like is right now.

If you want to contribute, good-first-issue labels are a good starting point: adding provider adapters, writing integration tests against real APIs, extending the middleware library. Adding a new provider is just implementing IChatModel — the middleware, streaming, and type-safety machinery comes for free.


Licence

MIT. Use it, fork it, ship it in your own projects. No credit needed.

Copyright (c) 2026 WeaveLLM
Enter fullscreen mode Exit fullscreen mode

The railway result, the async stream, the ASP.NET-style middleware, the generic chain — none of these are Python idioms with a C# skin on top. They're what this kind of library looks like when it starts from C# instead of ending there.


Source code, samples, and NuGet links: github.com/harshil-inspire2/WeaveLLM

Keywords: .NET AI, LangChain alternative, AI orchestration, C# LLM, dotnet AI framework, IAsyncEnumerable streaming, railway-oriented programming, ASP.NET middleware pattern

Top comments (0)