rinat kozin

Posted on Jun 10

Enterprise-grade AI integration: embedding LLMs into the business processes of large companies — redb.Route.Llm 3.1.1

#dotnet #csharp #llm #opensource

Series: redb ecosystem (LLM, part 2)

A few days ago I announced redb.Route.Llm — the 24th transport in redb.Route, where calling an LLM is .To("llm://claude") and an agent tool is just a route with .AsLlmTool("shell") on it. The whole pitch was: stop bolting an "AI framework" onto an integration framework that already has retry, throttle, breaker, audit, observability. Plug the LLM in as one more endpoint, get those primitives for free.

I closed that announcement with a deliberately uncomfortable section called the "honest skip-list" — features I hadn't shipped yet: streaming end-to-end, tool cache, RAG knowledge store, async batch + callback consumer, eval-run store, sliding-window memory, sandboxed tool execution. I'd rather under-promise in writing than over-promise in screenshots.

Most of that list is shipped. But that's not what this post is about. This post is about the thing I didn't quite see at announcement time: shipping the skip-list turned the whole thing into something different from a chat library. It turns out that once you put an LLM into an ESB and start adding the boring enterprise plumbing, the result isn't "a chat framework with extras". It's a runway from a 6-line demo to a multi-tenant, audited, budgeted, human-in-the-loop agent platform — with no rewrite in the middle.

That's the story I want to tell. Three angles: deep dive on the technically-tricky bits (streaming, persistence, RAG), eight enterprise patterns that collapse into one DSL line each, and a project-to-platform timeline I've watched live more than once.

Coming in cold? Earlier in this series:

redb.Route 3.1.0 — LLM as just another connector (announcement)

redb.Route — Apache Camel for .NET

Enterprise Integration Patterns in .NET, part 1

A quick word on EIP. Throughout this post I lean on Enterprise Integration Patterns — Multicast, Aggregator, Scatter-Gather, Wire-Tap, Choice, Aggregate-by-window — without unpacking each of them in detail. That's deliberate: the "EIP in .NET via redb.Route" series with per-pattern deep-dives (diagrams, code, side-by-sides) hasn't shipped yet, I'm writing those in parallel. So I'm getting a little ahead of myself here.

If you've worked with Apache Camel or WSO2 Micro Integrator, you'll spot the patterns on the fly — the names and semantics are identical. I've personally shipped and maintained integration projects on WSO2 MI for years (and on its ESB ancestor), but architecturally redb.Route sits closer to Camel — a compiled DSL over a typed Exchange, not an XML config. Side-by-side comparisons "Camel ↔ redb.Route" and "WSO2 MI ↔ redb.Route" are their own posts, also in flight. If anything EIP-flavoured in the code below trips you up, drop a question in the comments — I'll answer on the spot, and the answer goes straight into those upcoming pieces.

Why this isn't "yet another agent framework"

The agent-framework shelf is crowded. LangChain, LangChain4j, Semantic Kernel, AutoGen, LlamaIndex — each one solves the same single problem: how does my model call functions and remember a conversation. Each one leaves you to solve the other eighteen problems yourself: retry, idempotency, audit, multi-tenant, budget, observability, governance, approval, batch, scheduling, replay, dead-letter, throttling, circuit-breaking, correlation, timeouts, cost attribution, metric exporters.

redb.Route.Llm's entire reason for existing is to make those eighteen problems be already solved at the framework layer. Not "we have an integration library, plus an LLM library, glue them yourself" — same DSL, same runtime, same governance hooks. If you already have a redb.Route policy interceptor that writes every message to Kafka, it automatically writes every tool call from your agent to Kafka too — because they're the same Exchange.

This isn't theoretical elegance. It cuts through the single biggest production headache with agents: what happens the day your model decides to drop a production table? In the LangChain-plus-hooks shape, the answer is "wire up our approval callbacks and don't forget". In the LLM-as-transport shape, the answer is "intercept the Exchange before .To("exec://...") with the same .Process(...) you use for any other potentially-destructive step in any other route". Not a new pattern for AI. The same pattern as for FTP, Kafka, JMS.

Same people, same tools, same review gates. That's the actual win.

What got shipped from the skip-list

I said I would, and I'm saying I did.

Skip-list item	Status in 3.1.1
Streaming end-to-end (HTTP SSE + WS per-frame)	Shipped. `IAsyncEnumerable<string>` lives in `Out.Body`, the HTTP consumer flips into SSE, the WS consumer dispatches one frame per message.
ToolCacheStore on REDB	Shipped. `ToolCacheProps`, opt-in via `ToolCachingPolicy.Memoize`.
KnowledgeStore — RAG chunks in REDB	Shipped (partial). `KnowledgeChunkProps` with metadata + tenant + ACL; embeddings land in a follow-up release.
BatchStore + LlmCallbackProcessor	Shipped. Anthropic Message Batches and OpenAI Batch, async webhook consumer, idempotent dispatch, retry-aware.
EvalRunStore	Shipped. Eval runs are first-class objects with trace ids and prompt-version pinning.
PromptTemplateStore	Shipped. Versioned prompt registry, referenced from DSL with `#name`.
Sliding-window memory	Not shipped. Different bet placed instead: tree-branching conversations via REDB-tree (more on this below).
Sandboxed tools (per-call container)	Not shipped in container form. Shipped `redb.Route.Exec` with allowlist + working-dir + timeout + byte-cap — the practical enterprise minimum.

Bonus, not in the skip-list:

Per-exchange ?redb=<name> hint — pick a named REDB instance per Exchange, so one route serves N tenants without N route registrations or N IServiceScopes.
DSL/tool package split — redb.Route.Llm.Abstractions (contracts only) and redb.Route.Llm.Tools (six utility tools), so 22 connectors can register .AsLlmTool() without bumping a minor on every change.
Six utility tools out of the box: HttpFetch, JsonPath, XPath, RegexExtract, MathEval, TavilyWebSearch — each as both a DSL extension and a standalone tool route.
Bug fixes that only show up in real traffic: orphan tool_use (model asks for a tool, provider 5xxs mid-turn, leaves the conversation in an invalid state) and OEM codepage in Process.StandardOutput (Windows cmd /c returns cp866 bytes, breaks the UTF-8 contract every tool implicitly assumes).

Eleven REDB schemas now sit behind all of this: ConversationProps, MessageProps, ApprovalProps, CostBudgetProps, ToolCacheProps, ToolAuditProps, KnowledgeChunkProps, PromptTemplateProps, EvalRunProps, LlmBatchProps, ToolIdempotencyProps. That's not a database for your chat history. That's the operational layer of an agent platform, opt-in via one line: AddRedbLlmStorage().

Ten enterprise patterns — each one a single DSL line

Here's where it gets useful. None of these are pseudocode. They all run today.

1. Hard budget per conversation as a circuit breaker

In a side project, a "token budget" is a log line. In production, it's a safety primitive — your model must not be allowed to burn $10k because of a bad prompt. CostBudgetProps isn't an observability artifact; it's a preventive rule. Every provider call, the agent engine sums spend by conversation id and adds the worst-case for the pending request (via max_tokens). If the total would breach the cap, LlmBudgetExceededException fires before the network call, not after.

From("kafka://support-tickets")
    .To(Llm.Factory("claude")
        .Conversation(e => "ticket-" + e.In.Headers["TicketId"])
        .CostBudget(usd: 0.50)            // hard ceiling per conversation id
        .CostBudgetExceeded(BudgetPolicy.FailFast)
        .AsUri())
    .To("kafka://support-replies");

The same primitive scales to "budget per tenant", "budget per prompt template", "budget per model" — they're all just different keys in CostBudgetProps. Same shape, different GROUP BY.

2. Approval gates with a human in the loop

The most under-appreciated production pattern in agent-land. If your model can refund payments or delete records, a human stands between the tool call and the actual side effect. Not a webhook duct-taped on later — a native runtime concept.

From("direct:tool-payments-refund")
    .AsLlmTool("issue_refund")
        .Description("Refund a payment by id and reason.")
        .Input(/* JSON Schema */)
        .SideEffect(ToolSideEffect.Mutating)        // governance hook reads this
        .Cost(ToolCostClass.Expensive)
    .Then()
    .Process<ApprovalGate>()                        // suspend exchange,
                                                     // write ApprovalProps,
                                                     // notify Slack,
                                                     // wait for HTTP callback
    .To("kafka://payments.refund.commands");

ApprovalProps records the exchange id, conversation id, input arguments, the human approver, the wait deadline, and the outcome. The Slack bot, the email handler, the web form — all of those plug in as ordinary HTTP routes in the same redb.Route. When the human clicks Approve, the webhook wakes the suspended Exchange and feeds the result back to the agent engine. On timeout, the agent receives a tool_result with status:"timeout" and decides what to do next.

The point: this is not AI logic. It's the EIP Aggregator + Correlation Identifier + Wire-Tap + Reply-To pattern, which already applies to a refund engine without an LLM. They're both Exchanges in the same runtime, so the pattern is reused.

3. Idempotent tool retries via `ToolIdempotencyProps`

Webhook consumers retry. Network timeouts retry. Anthropic's batch sometimes delivers a payload twice. If your tool call "issue refund $50" runs twice, that's a bad day at standup.

ToolIdempotencyProps keeps idempotency-key → tool-result with a TTL. In the DSL:

From("direct:tool-issue-invoice")
    .AsLlmTool("issue_invoice")
        .Caching(ToolCachingPolicy.Idempotent)
    .Then()
    .Process(BuildIdempotencyKey)                    // sha256(args + customer-id + day)
    .To("...");

When the agent engine sees Caching = Idempotent, it consults ToolIdempotencyProps before the route runs. Hit → returns the saved tool_result, route never executes. Miss → route runs, result is recorded. Framework-level, not "remember to wrap your handler".

4. Multi-tenant via `?redb=<name>`

One worker, fifty customers. Each customer is their own REDB instance — their conversations, their cost budgets, their approvals, their knowledge base. In 3.1.0 this would have meant fifty IServiceScopes or fifty route registrations. In 3.1.1 it's a per-exchange hint:

From("http:0.0.0.0:5088/api/llm/ask?inOut=true")
    .Process(e =>
        e.In.Headers[LlmHeaders.Redb] = e.In.Headers["X-Tenant"]?.ToString())
    .To("llm://claude?conversationFromHeader=true");

?redb=acme or a header — the engine pulls the named REDB instance from the registry, on the current Exchange, no factory, no route swap, no restart. Conversations, audit, approvals all land in the right tenant. Each tenant gets its own billing surface and governance lane, with no awareness of the others.

This was the single feature most worth the late nights. It's the difference between "we have a multi-tenant agent platform" and "we have a multi-tenant deployment story, here's a wiki page".

5. Audit trail without a separate integration

ToolAuditProps is a REDB object: tenant id, conversation id, exchange id, tool name, input args (or hash, if PII), output (or hash), duration, status. Every tool invocation is recorded automatically — because tools are routes, and redb.Route already has post-processors.

The query "show me every Claude tool call in tenant acme over the last 7 days, side-effect=mutating, cost=expensive, ordered by time" isn't a separate analytics pipeline. It's a value_string index plus one SQL — the same indexed-business-id pattern the rest of the REDB ecosystem uses.

6. Async batch + callback consumer

Anthropic Message Batches and OpenAI Batch are up to 50% cheaper, with a price tag of up to 24 hours of latency. For offline workloads — classifying a million tickets, extracting fields from a million PDFs — that's the right knob to turn.

LlmBatchProps stores the batch id, statuses, and the link back to the originating Exchange collection. LlmCallbackProcessor is just an HTTP route that the provider calls when the batch finishes. The route reads the batch id, fetches the results, dispatches each one back to its originating Exchange via correlation id — and those Exchanges resume their journey as if the synchronous call had just finished, twenty-four hours later.

On top of this: idempotency (same ToolIdempotencyProps), retry (the standard redb.Route circuit breaker), backpressure (Kafka or queue downstream). No batch dispatcher to write. No webhook handler to write. No "what if the callback arrives twice" to test separately.

7. Versioned prompt registry: `#`-refs

The single most depressing prod bug pattern: "the model is answering differently". You go check git blame, somebody touched the system prompt three weeks ago, the tests passed because the eval set is small. The prompt is code, and it deserves a registry with versions.

// where prompts get registered:
promptRegistry.Register("triage-system", version: "v3", body: """
    You are a support triage agent. Classify into [billing, tech, sales, abuse]...
    """);

// in the route:
.To(Llm.Factory("claude").SystemPromptRef("#triage-system@v3").AsUri())

PromptTemplateProps is a REDB object: name, version, body, metadata (author, date, experiment id). When the engine resolves #name, it pins exactly that version into MessageProps for the call. Six months later you can say with certainty "this conversation ran on triage-system v3", not "probably v3, that's what we were doing then".

The juicy part: EvalRunProps records eval runs bound to a prompt version. "v4 gives +12% accuracy on the golden set" stops being a spreadsheet and becomes a query.

8. Tree-branching conversations for A/B and counterfactuals

ConversationProps is stored as a REDB tree via the native parent_id. That means a conversation isn't a flat list of messages — it's a tree. So you can:

branch from any message and run an alternative continuation with a different model or temperature;
keep a user branch and an experiment branch in parallel;
compute metrics across pairs of branches ("with-tools vs without-tools on the exact same context").

Sliding-window memory truncates the past. Tree-branching memory writes alternative pasts and compares them. For production agents, the second one is an order of magnitude more useful, because improving prompts in production is exactly that loop: take a real conversation, replay with a new prompt, score it, ship the winner.

And nothing here was bolted on for LLMs. Tree via parent_id has been a first-class REDB feature from day one — for product hierarchies, org charts, file systems. The LLM stack just wrote its own entity on the same primitive.

9. A jury of cheap models with a senior model as arbiter — Scatter-Gather + Aggregator

One of the most underrated production tricks in agent-land: don't trust a single model. Send the same task to several cheap models in parallel (Haiku, GPT-4o-mini, Mistral-Small, Gemini-Flash, Llama-3.1-70b on Groq), gather their answers, and hand the original prompt plus all five candidates to a senior model (Sonnet, Opus, GPT-4o) acting as arbiter — pick the best, synthesise a new one, or say "nobody nailed it, ask the human". The literature calls this mixture-of-agents or ensemble voting, and on hard tasks it routinely buys you +10 to +20% accuracy at a lower bill than running everything through Opus.

In a vanilla LangChain shape, this becomes a hundred-line orchestrator with try/catch, per-model timeouts, retry knobs, and a hand-rolled aggregator. In an ESB shape, it's the bog-standard EIP Scatter-Gather + Aggregator, with twenty-five years of hardening in integration buses. The LLM is just one more endpoint type the Scatter-Gather fans out to:

From("kafka://contract-clauses-to-classify")
    .RouteId("contract-jury")
    .Multicast()                                          // Scatter
        .Parallel()
        .StopOnException(false)                           // one provider down? carry on
        .Timeout(TimeSpan.FromSeconds(30))                // per branch
        .To(Llm.Factory("haiku")
            .SystemPromptRef("#contract-classify@v3")
            .Temperature(0.0).MaxTokens(200).AsUri())
        .To(Llm.Factory("gpt-4o-mini")
            .SystemPromptRef("#contract-classify@v3")
            .Temperature(0.0).MaxTokens(200).AsUri())
        .To(Llm.Factory("groq-llama-70b")
            .SystemPromptRef("#contract-classify@v3")
            .Temperature(0.0).MaxTokens(200).AsUri())
        .To(Llm.Factory("mistral-small")
            .SystemPromptRef("#contract-classify@v3")
            .Temperature(0.0).MaxTokens(200).AsUri())
    .End()                                                // Gather: results land in Exchange.Properties["multicast.results"]
    .Process<JuryAggregator>()                            // splice candidates into one arbiter prompt
    .To(Llm.Factory("sonnet")                             // Arbiter
        .SystemPromptRef("#jury-arbiter@v2")
        .Temperature(0.1).MaxTokens(500)
        .CostBudget(usd: 0.05)
        .AsUri())
    .To("kafka://contract-clauses-classified");

JuryAggregator is a tiny processor: takes the four answers and the original prompt, builds one arbiter message of the shape "here is the task; here are candidates A/B/C/D; return the final classification or say unclear". The arbiter answers with structured JSON. Headers llm.tokens.in/out are recorded per branch plus once for the arbiter — cost is transparent at the route level.

Why ESB-shape makes this so cheap to express:

Parallelism for free. Multicast EIP already knows how to fan out across N branches, await all-or-N-of-M, handle timeouts and partial failures. No Task.WhenAll with hand-written failure handling.
Idempotency for free. A retry on a failed branch hits the same ToolIdempotencyProps (or provider-side prompt-hash cache) and dedups. No double billing.
Budget for free. .CostBudget(usd: 0.05) on the arbiter is its own circuit-breaker; per-branch .CostBudget(usd: 0.01) caps the cheap models. A combined ceiling is a different CostBudgetProps key. Breach it and the whole jury fails fast — failover to a rule-based classifier (which sits as another branch on the route).
Audit for free. Each of the five LLM calls is captured in ToolAuditProps, all stitched to one exchange id. A month later you can answer "model X agrees with the arbiter 73% of the time — let's drop it and save".
Eval for free. EvalRunProps records the run with five branches plus the arbiter — replay against a golden set to find the optimal jury composition (swap GPT-4o-mini for Gemini-Flash, see if accuracy holds).

In production this pattern wins on two axes. Accuracy: on tasks like "is this contract clause a risk", where a single model wobbles, four cheap models plus a Sonnet arbiter routinely outperforms a lone Opus, at 2.5× lower cost. Robustness: when Anthropic is having a bad afternoon, the Anthropic branch fails, the other four return, the arbiter receives four candidates instead of five and ships a verdict. Graceful degradation, not a hard outage.

One subtle production gotcha: never tell the arbiter who wrote which answer. If the arbiter's prompt says "candidate from Claude Haiku, candidate from GPT-4o-mini", the arbiter develops favourites (so do humans). So JuryAggregator anonymises the candidates as A/B/C/D, shuffles their order (Latin-squared on exchange-id for reproducibility), and only after the arbiter responds do we map back to "A was the Haiku answer". Clean signal for the post-hoc analysis "which model agrees with the arbiter most often".

10. Sub-agents — an agent as another agent's tool

A direct consequence of the "tool = route" architecture: if a route can contain .To("llm://..."), then a tool can itself be an agent. The parent agent doesn't "know" there's another LLM hiding behind that tool. To it, research_topic is just another tool, like web_search or math_eval. Inside the tool, though, lives a fully-fledged second-tier agent with its own model, prompt, toolset, budget, iteration cap, retry policies, RAG sources.

In code:

// Sub-agent: a research specialist with its own toolset
From("direct:research-subagent")
    .AsLlmTool("research_topic")
        .Description("Deep research on a topic. Takes {topic, depth}. " +
                     "Returns a structured summary with sources.")
        .Input("""{"type":"object","properties":{
                    "topic":{"type":"string"},
                    "depth":{"type":"string","enum":["short","deep"]}},
                  "required":["topic"]}""")
        .SideEffect(ToolSideEffect.ReadOnly)
        .Cost(ToolCostClass.Expensive)              // parent sees the call is pricey
    .Then()
    .Knowledge("research-corpus", k: 12)            // sub-agent has its own RAG corpus
    .To(Llm.Factory("sonnet")                       // mid-tier model — research specialist
        .SystemPromptRef("#research-specialist@v3")
        .Tools("tavily_web_search", "http_fetch", "regex_extract")
        .MaxIterations(8)
        .CostBudget(usd: 0.30)                      // sub-agent has its own budget
        .Temperature(0.1).AsUri())
    .Process<ExtractResearchSummary>();

// Parent agent uses the sub-agent as just another tool
From("kafka://complex-business-questions")
    .To(Llm.Factory("opus")                         // planner — top-tier model
        .SystemPromptRef("#senior-analyst@v1")
        .Tools("research_topic",                    // ← our sub-agent
               "sql_query",                         // ← plain data tool
               "math_eval",                         // ← computation
               "draft_report")                      // ← another sub-agent (report drafter)
        .MaxIterations(15)
        .CostBudget(usd: 2.00)                      // top-level budget
        .AsUri())
    .To("kafka://business-answers");

What actually happens under the hood: when Opus decides to call research_topic, the agent engine builds the JSON input and routes it through RouteToolBridge to direct:research-subagent. That route runs as a fresh Exchange, inheriting transaction scope, principal, headers, and DI scope from the parent. Inside it, Sonnet runs its own tool-use loop (web search → fetch → extract), potentially with eight iterations and three tools. It returns a structured summary in Out.Body, which the engine repackages as a tool_result and hands back to Opus. Opus sees the result, keeps planning, calls more tools as needed.

In other words: the architecture is recursive without a single line of dedicated code. A sub-agent inside a sub-agent works the same way. At depth three, audit, budget, idempotency, prompt versioning, RAG, governance — all of it still works, because none of those are tied to "agent level"; they're tied to the Exchange, and the Exchange is the same primitive at any depth.

This unlocks three derivative patterns that are awkward to assemble in the flat "one agent, many tools" shape:

(a) Hierarchical agents — planner and workers. A top-tier model (Opus, GPT-4o) plays planner: decomposes the task and dispatches chunks to specialists. Each specialist is a sub-agent with a narrow prompt and a narrow toolset. The planner may have no direct data access at all — only through sub-agents. That gives you hard separation of authority: the planner can't accidentally call delete_records because that tool isn't in its set; only data-cleanup-subagent has it, and the planner has to hand off explicitly.

(b) Specialist sub-agents with their own memory. A sub-agent can have its own conversationId (a different branch of the REDB tree), its own Knowledge(...) sources, its own prompt registry. A legal-review-subagent lives inside the corporate legal corpus, sees only that, answers strictly in the legal register. The senior agent never gets direct access to those documents. ACL is enforced at the route level, not at the "we hope the model doesn't quote it" level.

(c) Cost-shaped escalation. A cheap agent (Haiku) takes the first swing. If it returns unclear or confidence is below threshold, it calls escalate_to_senior itself as a tool — and behind that tool sits a route to Opus with full context. Most queries land on Haiku for pennies; only the hard ones reach Opus. On high-volume workloads the economics change by an order of magnitude.

How this relates to the static jury in pattern #9: the jury is a statically-wired sub-agent pattern. The route knows up front there are N candidates and one arbiter; the DAG is hard-coded in the DSL. Sub-agents-as-tools is the dynamic version: the parent agent decides for itself whom to call and how often. Jury wins on predictable classification/forecasting pipelines. Sub-agents win on open-ended research where the number of steps and the toolset can't be known in advance.

The obvious failure mode is cycles. If sub-agent A can call B and B can call A, you've got an unbounded recursion in theory. In practice three guards keep it bounded: MaxIterations at every level (no sub-agent loops forever), CostBudgetProps (each nested call burns parent budget), and an optional depth limit in headers (LlmHeaders.SubAgentDepth increments per level; the route rejects calls past a configured limit). Real tasks rarely benefit beyond depth = 3-4.

And one more piece of architectural elegance: a sub-agent is just a route, which means it has a stock URI. Different parent agents can share one research-subagent, which has its own ?redb=acme for tenant isolation, its own rate limiter (?throttle=5/sec), its own circuit breaker. The sub-agent behaves like an internal LLM-tier microservice that you reuse across routes without duplicating prompts or proliferating clients.

Streaming: what actually changes when a token leaves the provider

In 3.1.0, providers streamed but the client never saw it — we accumulated tokens into a string and returned the whole thing. In 3.1.1, the chain closes:

the provider yields IAsyncEnumerable<string> — frames as they arrive;
the agent engine sticks that into Out.Body as IAsyncEnumerable<string>, without materialising;
the HTTP consumer sees IAsyncEnumerable<string> and switches to SSE — one data: ...\n\n per frame;
the WS consumer sends one WebSocket message per frame;
non-streaming consumers (Kafka, RabbitMQ, ActiveMQ) materialise the way they used to.

The architectural payoff: streaming stops being a separate mode and becomes a payload type. Same Exchange can fan out to SSE and Kafka simultaneously (multicast EIP) — Kafka waits for materialisation, SSE sees frames live. No "streaming endpoint" vs "regular endpoint" duplication.

Underneath, this is the well-known pipe pattern of "iterator instead of collection". The only LLM-specific bit is making sure async iteration runs inside the end-to-end Exchange tracking so traces and metrics see the whole journey, not "received an Exchange and lost the rest".

RAG: chunks as first-class REDB objects

In 3.1.1, KnowledgeChunkProps is a REDB object with:

the text;
the source (source-uri, tenant-id, doc-id, chunk-index);
metadata (language, date, tags, ACL — who's allowed to see it);
a placeholder for embeddings (vector store ships in a follow-up; the MVP is keyword search via value_string indexes plus FTS).

What this means: a RAG source is a route, not a separate vector service. From("file://docs?include=*.md").To("knowledge://acme") indexes documents. From("kafka://support-tickets").Knowledge("acme", k: 5) injects the top-5 chunks into the system prompt before .To("llm://claude"). ACL and tenant filtering happen as value_*-indexed SQL before chunks reach the prompt.

When the vector store lands, it sits next to keyword search behind the same IKnowledgeStore interface, no route changes. That's the architectural goal: tomorrow's features don't break today's routes.

Three real-world enterprise scenarios — reports, forecasts, alerts

The patterns above are atomic bricks. Now three end-to-end scenarios where those bricks combine into routes you'd actually run on Tuesday morning. No AI hype, no "cognitive automation", no pretending to "transform the industry". Just the dull stuff a human currently does by hand every day, taken over by an agent that lives in the same bus as the data.

Case 1 — A daily financial snapshot for the CFO, in their inbox by 7:00 AM

Every business morning a finance analyst pulls together yesterday's revenue, expenses by category, plan-vs-actual variances, top-5 largest transactions, account balances, FX rates. Then they write a paragraph or two — "revenue +3.2% to plan, expenses +1.7%, the variance is X". An hour and a half, gone. Worth automating? Obvious yes. Worth standing up a separate AI service for it? Hard no.

The whole route:

// Every weekday at 07:00 local
From("cron://daily-cfo-report?schedule=0 0 7 ? * MON-FRI")
    .RouteId("daily-cfo-report")
    .Process<LoadYesterdayMetrics>()                       // pulls from ERP/bank API/data warehouse
                                                            // → e.In.Body = {revenue, expenses, accounts, fx, top5}
    .Process<LoadQuarterContext>()                         // same ETL: plan, prior quarter, MTD/QTD
    .ConvertBody<FinancialDailySnapshot>()                 // strongly-typed payload
    .Multicast().Parallel()
        .To(Llm.Factory("haiku")                           // branch A: short-form for the CFO
            .SystemPromptRef("#cfo-daily-summary@v7")
            .Temperature(0.1).MaxTokens(800)
            .CostBudget(usd: 0.02).AsUri())
        .To(Llm.Factory("haiku")                           // branch B: bulletised for the board chat
            .SystemPromptRef("#cfo-daily-bullets@v7")
            .Temperature(0.1).MaxTokens(400)
            .CostBudget(usd: 0.02).AsUri())
    .End()
    .Process<RenderHtmlReport>()                           // mustache template: numbers as a table,
                                                            // model summaries as the lede
    .To("smtp://mail.acme.com?to=cfo@acme.com,board@acme.com" +
        "&subject=Daily%20FY%20snapshot%20${date:yyyy-MM-dd}")
    .To("teams://board-channel?card=adaptive")             // same HTML body → adaptive card in Teams
    .Wiretap("kafka://reports.cfo-daily.archive");         // copy to archive for audit and training

Seven steps, and those seven steps cover the whole job:

Schedule — cron lives in the URI; no separate scheduler service. redb.Route already does this.
Data pickup — LoadYesterdayMetrics is just a processor that hits your existing integrations through the same redb.Route (Kafka, REST, JDBC). Whatever retries and circuit breakers you already wired around those integrations apply.
Two-tone output — multicast to two branches of cheap Haiku with different prompt templates. Pennies, hard-budgeted.
Versioned prompt — #cfo-daily-summary@v7. Six months from now the CFO complains "the report has gotten worse" — open EvalRunProps, see v7 regressed against the golden set, roll back to v6 with no redeploy.
Delivery — two transports (SMTP + Teams), both stock redb.Route connectors. Tomorrow they want it in Slack too: .To("slack://..."), one line.
Audit — Wiretap copies the whole exchange to a Kafka archive. Twelve months later a regulator asks "what did we send on 2026-04-15?" — pulled from the archive with metadata that says "model X, prompt version Y, source data Z" (because MessageProps records both input and output).
Budget — .CostBudget(usd: 0.02) per branch. 250 trading days × 2 branches × $0.02 ≈ $10/year on token spend. The analyst's hour a day at fully-loaded cost is $50. Pays itself back in week one of month one.

What pushes this from "cute demo" to "actual enterprise asset": when the CFO asks about "that strange number in last Tuesday's report", the auditor pulls one trace id and in 30 seconds has: the snapshot at input, the prompt version that ran, the exact output the model produced, the file that hit SMTP, who opened it in Teams. Not "let me get back to you tomorrow". Right now — because it's all REDB objects pinned to one exchange id.

Case 2 — A weekly cash-flow forecast with a jury and an Opus arbiter

Cash-flow forecasting is the textbook case where one model is worse than zero models: a confident-sounding model error leads to a decision that costs more than the analyst's salary for the month. Pattern #9 (jury + arbiter) earns its keep here.

From("cron://weekly-cashflow-forecast?schedule=0 0 9 ? * MON")
    .RouteId("weekly-cashflow-forecast")
    .Process<BuildCashflowFeatures>()                      // bank balances, AR/AP, scheduled payments,
                                                            // seasonality, FX exposure
    .Knowledge("acme-finance", k: 8)                       // RAG: forecasting playbook, prior reports,
                                                            // methodology, known seasonality notes
    .Multicast().Parallel().Timeout(TimeSpan.FromMinutes(2))
        .To(Llm.Factory("sonnet")                          // four different models — cheap insurance
            .SystemPromptRef("#cashflow-forecast@v4")      // against correlated errors
            .Temperature(0.2).MaxTokens(2000).AsUri())
        .To(Llm.Factory("gpt-4o")
            .SystemPromptRef("#cashflow-forecast@v4")
            .Temperature(0.2).MaxTokens(2000).AsUri())
        .To(Llm.Factory("gemini-pro")
            .SystemPromptRef("#cashflow-forecast@v4")
            .Temperature(0.2).MaxTokens(2000).AsUri())
        .To(Llm.Factory("mistral-large")
            .SystemPromptRef("#cashflow-forecast@v4")
            .Temperature(0.2).MaxTokens(2000).AsUri())
    .End()
    .Process<JuryAggregator>()                             // anonymise A/B/C/D + shuffle
    .To(Llm.Factory("opus")                                // arbiter — strongest model in the rotation
        .SystemPromptRef("#cashflow-arbiter@v2")
        .Temperature(0.1).MaxTokens(3000)
        .CostBudget(usd: 0.50).AsUri())
    .Process<ExtractStructuredForecast>()                  // parse JSON: 30 days × {low, mid, high}
    .Choice()
        .When(e => e.In.Body<Forecast>().ConfidenceLow < 0.6)
            .Process<ApprovalGate>()                       // low confidence → wait for CFO sign-off
            .To("smtp://...?to=cfo@acme.com&priority=high")
        .Otherwise()
            .Process<RenderForecastReport>()
            .To("smtp://...?to=treasury@acme.com")
    .End()
    .Wiretap("kafka://reports.cashflow.archive");

This route runs about $2-3 per execution, once a week — call it $150/year. The analyst would spend a day producing the same artefact. The savings aren't really the point. The point is robustness: four independent forecasts from four different vendors (Anthropic, OpenAI, Google, Mistral) give you consensus as a confidence signal. Four models agree → the arbiter just codifies. Four models diverge → the arbiter writes "model A sees risk X, the rest don't, recommend human review", and the Choice branch routes the report to an approval gate without anyone wiring anything custom.

A year in, the treasury team queries EvalRunProps: "Gemini-Pro disagrees with consensus 18% of the time, and in 70% of those cases Gemini was right". Excellent — it gets upweighted in the jury, not dropped. That's not a guess; it's a SQL query.

Case 3 — A predictive plant alert: XGBoost does the math, the LLM does the sentence

In manufacturing — the real kind, with conveyor belts and SCADA tags, not the metaphorical kind — the bread-and-butter task is "from the metrics, predict that this line will throw a fault in 4-6 hours, and ping the on-shift engineer before it does". The classical play here is a feature store and a gradient-boosted model. That play is still correct, and the LLM is not there to replace it. What the LLM does well is the last-mile translation from "anomaly score + top features + line context" into a sentence a human reads at 3 AM and understands.

// Tap the SCADA bus (MQTT/Kafka), 5-minute aggregation window
From("kafka://scada.metrics?groupId=plant-anomaly-watcher")
    .RouteId("plant-anomaly-watcher")
    .Aggregate(by: e => e.In.Headers["LineId"],
               window: TimeSpan.FromMinutes(5))            // window per production line
    .Process<RunAnomalyModel>()                            // classical ML — XGBoost, nothing fancy;
                                                            // emits {score, top-features, line-context}
    .Choice()
        .When(e => e.In.Body<AnomalyReport>().Score > 0.8)
            .Knowledge("plant-runbooks", k: 3)             // RAG: this line's runbooks,
                                                            // history of similar incidents
            .To(Llm.Factory("haiku")
                .SystemPromptRef("#plant-incident-explainer@v9")
                .Temperature(0.0).MaxTokens(600)
                .Tools("metrics-history", "fetch-shift-log")  // agent fetches more if it needs to
                .MaxIterations(4)
                .CostBudget(usd: 0.10).AsUri())
            .Process<EnrichWithOnDutyEngineer>()           // who's on shift on this line, right now
            .Multicast()
                .To("teams://engineering-shift-{LineId}?card=adaptive")
                .To("sms://twilio?to={engineer.phone}")
                .To("kafka://incidents.predicted.archive")
            .End()
        .Otherwise()
            .To("kafka://scada.metrics.normal")            // archive the boring days too
    .End();

What's actually happening, and why this works:

The ML model isn't getting kicked out. Anomaly scoring is still XGBoost, which has been good at exactly this for a decade. The LLM stands after it.
The LLM explains, not predicts. The system prompt is "anomaly on Line 7, top-3 features look like X, runbook says Y, prior incidents Z. In one page: what likely happened, what the engineer should do in the next hour, and the three parameters to check first". This is synthesis — the place LLMs are objectively strong.
The agent's tools are the plant's data. metrics-history is a direct:metrics-history route that queries Influx/Prometheus. fetch-shift-log reads the shift log from REDB. The agent decides whether to invoke them, how many times. MaxIterations(4) caps it from looping.
Multi-channel delivery. Teams adaptive card with an "I've got it" button (which, by the way, fires an ApprovalGate-style webhook), SMS to the on-shift number via Twilio (already a redb.Route transport), copy to a Kafka archive.
Aggregate by LineId, window 5min is the EIP Aggregator — bucket metrics by group. Nothing redb.Route.Llm-specific; this is plain integration code that's been working in your bus for years.

The business outcome: time from "anomaly appears" to "the engineer on the line knows what to look for" drops from ~40 minutes (operator notices, calls, describes, engineer arrives, engineer starts hunting) to ~4-5 minutes. On expensive lines, an hour of unplanned downtime is tens of thousands of dollars; this route pays for itself the first time a near-miss is caught.

Notice the LLM's role is narrow in all three cases. Not "AI runs the factory". Not "AI runs the books". The LLM does one specific job in each route — summarise, synthesise, pick from candidates, explain. Around it sits all the boring enterprise plumbing: schedules, ETL, RAG, multi-channel delivery, audit, governance, budgets. That plumbing is redb.Route. The LLM is the last brick that previously demanded its own service.

Storytelling: how a chat demo becomes a platform

Here's the timeline I've now watched several times in different teams.

Day 1. Somebody shows the team that Claude can answer a support ticket. LlmHttpRoutes.cs — six lines, From("http://...").To("llm://haiku"), demo works. This is a project, not a platform.

From("http:0.0.0.0:5088/api/llm/ask?inOut=true")
    .ConvertBody<string>()
    .To(Llm.Factory("haiku").MaxIterations(1).AsUri());

Day 7. Coworkers say "it doesn't remember what I just told it". A Process(...) step adds an X-Chat-Id header, ConversationFromHeader() is flipped on. That's it — AddRedbLlmStorage() is already in Program.cs, MessageProps are already being written.

.Process(e => e.In.Headers[LlmHeaders.ConversationId] =
    e.In.Headers["X-Chat-Id"]?.ToString() ?? "default")

Day 14. "Can it run a command on the server?" A tool-shell route appears — a separate route with .AsLlmTool("shell"), backed by redb.Route.Exec, allowlist [cmd, pwsh], working dir in temp, timeout 5 sec, 8 KB stdout cap. Safety lives in the DSL, not in advice in a system prompt.

From("direct:tool-shell")
    .AsLlmTool("shell").SideEffect(ToolSideEffect.ReadOnly).Cost(ToolCostClass.Cheap)
    .Then()
    .To(ExecDsl.Run().AllowedCommands("pwsh", "cmd").TimeoutMs(5000).MaxStdoutBytes(8192));

Day 21. "Finance says the token bill is 4x plan." Add .CostBudget(usd: 0.50) per conversation. No new services, no new dashboards — CostBudgetProps is an existing REDB object, the tsak.web dashboard already knows how to render it.

Day 30. "We need every agent action to be auditable." Already done: ToolAuditProps started writing the day tools became routes. The auditor opens a SQL query — every call this quarter, filtered by tenant and side effect. No "let's set up an integration".

Day 45. "Legal wants human-in-loop on the refund tool." Add .Process<ApprovalGate>() before exec. ApprovalProps writes, the Slack bot is just another HTTP route. Done.

Day 60. "We're onboarding a second customer — separate database, separate billing." Add Process(e => e.In.Headers[LlmHeaders.Redb] = e.In.Headers["X-Tenant"]). One route, two tenants, zero changes to logic.

Day 90. "Run a million tickets through the classifier overnight." The route is already there. Change one knob: .Mode(LlmMode.Batch) — a synchronous call becomes an Anthropic Batch, LlmBatchProps records the id, LlmCallbackProcessor waits for the webhook, results land in the same Kafka the live route uses during the day. -50% on the bill.

Day 120. "Which prompt versions regressed?" EvalRunProps already stores runs. SQL across PromptTemplateRef, accuracy diff, v4 is bad, roll back to v3. No redeploy.

There's no "and then we rewrote it on a Kubernetes operator" beat in this story. There's no "and then we added an AI platform" beat. The six lines that worked as a chat demo on Day 1 are literally the same file that on Day 120 is running a multi-tenant, jury-arbitrated, batch-classifying, audit-grade platform. No migration, no rewrite, no architecture v2 — just lines accreting into the route as the requirements showed up. The chat demo doesn't grow into a platform — it was standing on a platform from day one, just opting into more of its features over time.

This is the pitch I couldn't quite articulate at announcement time. Now I can. The deliverable here isn't an agent framework. It's a runway. You walk along the runway and you're already platform-shaped.

Demo routes — see it live

The patterns above aren't concept art. The repo redbase-app/redb-route ships two demo files:

LlmDemoRoutes.cs — three shapes of the LLM call: inline step (.Llm("demo-stub")), endpoint (.To("llm://demo-stub")), tool (.AsLlmTool("echo_tool")). All on the stub provider, no API keys.
LlmHttpRoutes.cs — two HTTP endpoints (/api/llm/ask no tools, /api/llm/shell with the shell tool through redb.Route.Exec), both passing X-Chat-Id for conversation memory, both running real Claude Haiku.

dotnet run from redb.Route.Demo, port 5088 opens, and curl works:

curl -d "what time is it on this host?" -H "X-Chat-Id: test1" \
     http://localhost:5088/api/llm/shell

Repo: github.com/redbase-app/redb-route (Apache 2.0).

What's not done, and why I keep saying so

Sliding-window memory isn't done. A real vector store isn't done. There's no AI-graph editor in tsak.web — runtime conversation inspection shows up as plain REDB objects in the existing UI, no special pane. There's no built-in eval-service integration — EvalRunStore records the runs, but "click compare against prod" is manual.

That's fine. The skip-list is a technique, not an apology. An open-source project that lies in its README about scope is one you don't come back to. I'd rather say "not yet" and ship it next minor than say "shipped" and field issues for two months.

Zooming out: why the ESB shape matters

The phrase "AI-native architecture" is fashionable. What it usually means is "we built everything around the LLM" — and what usually backs that up is a new dev stack running parallel to the old one. That's not an architecture decision. That's a policy duplication problem.

If you have retry in two stacks, audit in two stacks, governance in two stacks, tenant isolation in two stacks — those two implementations drift. A year later, the AI stack adds bounded-context auditing, the integration stack doesn't. Legal asks for one report — nobody can produce it. The duplication wasn't visible on day one; it's load-bearing on day three hundred.

The ESB shape is a deliberate choice of one control point for I/O of all kinds. The LLM is a kind of I/O — asynchronous, with tools, with context, but I/O. Putting it inside the ESB isn't a philosophical pose; it's engineering economy: one governance policy covers everything.

This is not new. Garland and Ripley wrote it up in the SOA literature of the 2000s. What's new is that in 2026 the argument finally applies to LLMs, because LLMs grew up enough to expose standard interfaces: tool use, streaming, batch APIs, embeddings. Before that, "LLM in the ESB" meant "wrap a REST call and pray". Now it means "use existing EIP patterns with minor adaptations".

That economy is the entire pitch of redb.Route.Llm. Not "we have the best agent engine." Microsoft and LangChain have better agent engines. We have a perfectly average agent engine in the right place in the architecture. I'll take that trade.

Roadmap

3.1.2:

sliding-window memory as a built-in policy;
a vector-store interface behind IKnowledgeStore;
pgvector and Qdrant as the first back-ends;
an EvalCompare DSL for side-by-side prompt-version runs.

3.2:

a ConversationProps tree UI in tsak.web;
streaming-aware aggregator EIP (buffer partial frames to semantic boundaries);
distributed batch — multiple workers behind one LlmCallbackProcessor.

Later:

multi-modal (image input/output as a payload type);
voice agents as another transport (voice://...);
routing by cost / latency / accuracy SLA per message.

Links

Part 1 (3.1.0 announcement): redb.Route 3.1.0 — LLM as just another connector
redb.Route — Apache Camel for .NET
EIP series part 1 — channels and Exchange
redb.Route 3.0.1 patch notes
GitHub: github.com/redbase-app/redb-route (Apache 2.0)
Demo routes: redb.Route/demos/redb.Route.Demo/Routes/LlmDemoRoutes.cs, LlmHttpRoutes.cs
Habr companion (Russian, not a translation): link added after publication

TL;DR. Three weeks ago I shipped 3.1.0 with an honest skip-list. Three weeks later, most of it is shipped: streaming end-to-end, tool cache, RAG knowledge store, async batch + callback consumer, eval-run store, versioned prompt registry, multi-tenant ?redb=<name>, idempotent tool retries, human-in-loop approval gates, full audit. On top, two patterns that ship the architecture's real punchline: a jury of cheap models with a senior model as arbiter, dropping straight onto stock Scatter-Gather + Aggregator EIP — no custom orchestrator; and sub-agents as tools — recursive agent composition where one agent's tool is itself an agent with its own model, prompt, budget, RAG corpus, all working out of the box because RouteToolBridge → direct: → llm:// is a fixed point. Plus three honest enterprise scenarios with code: a daily CFO snapshot to inbox, a weekly cash-flow forecast with jury arbitration, and a predictive plant alert that hands XGBoost the prediction and the LLM the sentence. The point isn't the feature checklist. The point is that enterprise-grade properties land on the day you actually need them, with no rewrite, because the LLM lives inside the ESB along with every other I/O. Those six lines you typed for the Day-1 chat demo are literally the same file that's running the Day-90 multi-tenant audit-grade platform. There's no "now we rewrite" moment in this story.

DEV Community

Enterprise-grade AI integration: embedding LLMs into the business processes of large companies — redb.Route.Llm 3.1.1

Why this isn't "yet another agent framework"

What got shipped from the skip-list

Ten enterprise patterns — each one a single DSL line

1. Hard budget per conversation as a circuit breaker

2. Approval gates with a human in the loop

3. Idempotent tool retries via `ToolIdempotencyProps`

4. Multi-tenant via `?redb=<name>`

5. Audit trail without a separate integration

6. Async batch + callback consumer

7. Versioned prompt registry: `#`-refs

8. Tree-branching conversations for A/B and counterfactuals

9. A jury of cheap models with a senior model as arbiter — Scatter-Gather + Aggregator

10. Sub-agents — an agent as another agent's tool

Streaming: what actually changes when a token leaves the provider

RAG: chunks as first-class REDB objects

Three real-world enterprise scenarios — reports, forecasts, alerts

Case 1 — A daily financial snapshot for the CFO, in their inbox by 7:00 AM

Case 2 — A weekly cash-flow forecast with a jury and an Opus arbiter

Case 3 — A predictive plant alert: XGBoost does the math, the LLM does the sentence

Storytelling: how a chat demo becomes a platform

Demo routes — see it live

What's not done, and why I keep saying so

Zooming out: why the ESB shape matters

Roadmap

Links

Top comments (0)

Why this isn't "yet another agent framework"

What got shipped from the skip-list

Ten enterprise patterns — each one a single DSL line

1. Hard budget per conversation as a circuit breaker

2. Approval gates with a human in the loop

3. Idempotent tool retries via ToolIdempotencyProps

4. Multi-tenant via ?redb=<name>

5. Audit trail without a separate integration

6. Async batch + callback consumer

7. Versioned prompt registry: #-refs

8. Tree-branching conversations for A/B and counterfactuals

9. A jury of cheap models with a senior model as arbiter — Scatter-Gather + Aggregator

10. Sub-agents — an agent as another agent's tool

Streaming: what actually changes when a token leaves the provider

RAG: chunks as first-class REDB objects

Three real-world enterprise scenarios — reports, forecasts, alerts

Case 1 — A daily financial snapshot for the CFO, in their inbox by 7:00 AM

Case 2 — A weekly cash-flow forecast with a jury and an Opus arbiter

Case 3 — A predictive plant alert: XGBoost does the math, the LLM does the sentence

Storytelling: how a chat demo becomes a platform

Demo routes — see it live

What's not done, and why I keep saying so

Zooming out: why the ESB shape matters

Roadmap

Links

3. Idempotent tool retries via `ToolIdempotencyProps`

4. Multi-tenant via `?redb=<name>`

7. Versioned prompt registry: `#`-refs