<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kirandeepjassal-crypto</title>
    <description>The latest articles on DEV Community by kirandeepjassal-crypto (@kirandeepjassalcrypto).</description>
    <link>https://dev.to/kirandeepjassalcrypto</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948965%2F92a8ebec-5c78-46dc-b19b-babd45c794b0.png</url>
      <title>DEV Community: kirandeepjassal-crypto</title>
      <link>https://dev.to/kirandeepjassalcrypto</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kirandeepjassalcrypto"/>
    <language>en</language>
    <item>
      <title>MCP Deep Dive, Part 1: Why Model Context Protocol Kills Integration Glue Code for Good</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Sat, 04 Jul 2026 11:18:13 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/mcp-deep-dive-part-1-why-model-context-protocol-kills-integration-glue-code-for-good-3jcp</link>
      <guid>https://dev.to/kirandeepjassalcrypto/mcp-deep-dive-part-1-why-model-context-protocol-kills-integration-glue-code-for-good-3jcp</guid>
      <description>&lt;p&gt;Your AI roadmap does not die from a bad model. It dies from integration glue code — the hand-written adapter that wires agent number four to backend number nine, times every agent and every backend you will ever build. &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is the thing that stops that multiplication.&lt;/p&gt;

&lt;p&gt;This is Part 1 of a 15-part deep dive. Every part uses the same running example: &lt;strong&gt;Mattrx&lt;/strong&gt;, our multi-tenant marketing-analytics SaaS (.NET 9 / Azure), and every metric here is from that real system.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Before (bespoke glue)&lt;/th&gt;
&lt;th&gt;After (MCP)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Integration model&lt;/td&gt;
&lt;td&gt;N agents × M backends&lt;/td&gt;
&lt;td&gt;N agents + M servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mattrx integrations&lt;/td&gt;
&lt;td&gt;14 point-to-point clients&lt;/td&gt;
&lt;td&gt;3 MCP servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding a capability&lt;/td&gt;
&lt;td&gt;New adapter on both sides&lt;/td&gt;
&lt;td&gt;Declare one MCP tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool discovery&lt;/td&gt;
&lt;td&gt;Hardcoded per agent&lt;/td&gt;
&lt;td&gt;Discovered at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth &amp;amp; audit&lt;/td&gt;
&lt;td&gt;Reinvented per integration&lt;/td&gt;
&lt;td&gt;One OAuth/Entra boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External AI access&lt;/td&gt;
&lt;td&gt;Unsafe / not possible&lt;/td&gt;
&lt;td&gt;Scoped, governed, audited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;14 bespoke integrations collapsed to 3 MCP servers.&lt;/li&gt;
&lt;li&gt;~9,000 lines of glue code deleted — roughly a 40% cut.&lt;/li&gt;
&lt;li&gt;New-capability onboarding dropped from ~3 days to ~2 hours.&lt;/li&gt;
&lt;li&gt;Agent tool-call error rate fell from 6% to 0.8%.&lt;/li&gt;
&lt;li&gt;~85,000 MCP tool calls/day, all governed by the same boundary.&lt;/li&gt;
&lt;li&gt;~40 tool-abuse / injection attempts per week blocked at the MCP boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The one mental shift:&lt;/strong&gt; stop building integrations and start publishing capabilities. An integration teaches one agent how to call one backend. A capability is a tool &lt;em&gt;any&lt;/em&gt; agent can discover and call from its schema alone. MCP makes capabilities additive instead of multiplicative.&lt;/p&gt;

&lt;h2&gt;
  
  
  The N×M problem
&lt;/h2&gt;

&lt;p&gt;With N agents and M backends, you write up to N×M integrations, and each one re-implements auth, retries, error mapping, and logging in its own slightly-wrong way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE — N agents x M backends = up to N*M bespoke integrations

Insights ---+--&amp;gt; Campaigns API (custom client)
            +--&amp;gt; Events API   (custom client)
            +--&amp;gt; KPI API      (custom client)
            +--&amp;gt; Reporting API (custom client)

Help -------+--&amp;gt; Campaigns API (a DIFFERENT custom client)
            +--&amp;gt; KPI API       (a DIFFERENT custom client)

External AI ....&amp;gt; (no safe path at all)


AFTER — N agents + M servers = N+M, one protocol

Insights ---+
Help -------+           +--&amp;gt; mattrx-analytics (campaigns, events, kpis)
External AI +--- MCP ---+--&amp;gt; mattrx-reports  (create_report, status)
(approved) -+           +--&amp;gt; mattrx-admin    (flags, exports; locked)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1. The integration explosion
&lt;/h2&gt;

&lt;p&gt;Before, every agent embedded a bespoke client for every backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BEFORE: the agent is welded to four hand-written clients.&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;sealed&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InsightsAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CampaignsApiClient&lt;/span&gt; &lt;span class="n"&gt;campaigns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// bespoke HTTP client #1&lt;/span&gt;
    &lt;span class="n"&gt;EventsApiClient&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;// bespoke HTTP client #2&lt;/span&gt;
    &lt;span class="n"&gt;KpiApiClient&lt;/span&gt; &lt;span class="n"&gt;kpis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;// bespoke HTTP client #3&lt;/span&gt;
    &lt;span class="n"&gt;ReportingApiClient&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// bespoke HTTP client #4&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Each client has its own auth, retry policy, and error model.&lt;/span&gt;
    &lt;span class="c1"&gt;// The next agent we build re-implements a slice of all four.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After, each capability is declared once as an MCP tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AFTER: a capability declared once on the mattrx-analytics MCP server.&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;McpServerToolType&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;sealed&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AnalyticsTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ICampaignQueries&lt;/span&gt; &lt;span class="n"&gt;campaigns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AiPrincipal&lt;/span&gt; &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;McpServerTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"get_campaign_kpis"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Return the KPI time-series for a campaign in the caller's tenant."&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;CampaignKpis&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetCampaignKpis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Campaign id within the caller's tenant"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;campaignId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ISO-8601 range, e.g. 2026-06-01/2026-06-28"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Tenant comes from the authenticated principal — never from the arguments.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;campaigns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetKpisAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;campaignId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent side collapses to one client that speaks MCP to every server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CallToolAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"get_campaign_kpis"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;campaignId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"4821"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2026-06-01/2026-06-28"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 14 integrations → 3 servers, ~9,000 lines of glue deleted, almost all deletions.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Capability discovery
&lt;/h2&gt;

&lt;p&gt;Before, the toolset was a constant the agent was compiled with — the list and reality drift. After, the server advertises its tools and the client discovers them at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AFTER: the agent asks the server what it can do, every session.&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ListToolsAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// each tool: name, description, JSON Schema for args — enough for an LLM to&lt;/span&gt;
&lt;span class="c1"&gt;// decide when and how to call it, with zero hardcoding.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Discovery is the quiet superpower: ship a new tool on the server, and every agent can use it next session. &lt;strong&gt;Onboarding a capability went from ~3 days to ~2 hours.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. One auth and audit boundary
&lt;/h2&gt;

&lt;p&gt;Before, every bespoke client reinvented auth (one static key, one OAuth scope, one that trusted a tenant id passed as an argument — the bug we shipped). After, every tool call enters through one MCP boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AFTER: one boundary enforces auth, scope, tenant binding, and audit for ALL tools.&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;sealed&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GovernedToolFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AiPrincipal&lt;/span&gt; &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IAuthorizationService&lt;/span&gt; &lt;span class="n"&gt;authz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IAiAuditLog&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;InvokeAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;McpToolCall&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Func&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;authz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AuthorizeAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequiredScope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ToolResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Denied&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequiredScope&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                 &lt;span class="c1"&gt;// tenant already bound from the token&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RecordAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One OAuth 2.1 / Entra ID boundary replaced N bespoke auth flows. &lt;strong&gt;Tool-call error rate fell from 6% to 0.8%&lt;/strong&gt; — most of those errors were auth and contract mismatches that simply stopped existing.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. A safe door for external AI
&lt;/h2&gt;

&lt;p&gt;Before, a partner wanting their AI assistant to pull your KPIs meant "build them yet another client" — so the answer was "no." After, an approved external assistant authenticates via Entra ID, gets a token scoped to its tenant and to &lt;code&gt;campaigns:read&lt;/code&gt;, and calls the exact same tools our internal agents do — discovered, scoped, and audited identically. &lt;strong&gt;A capability that simply did not exist under bespoke integration.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What an MCP call actually looks like
&lt;/h2&gt;

&lt;p&gt;The protocol is small — three message types do almost all the work: &lt;code&gt;initialize&lt;/code&gt;, &lt;code&gt;tools/list&lt;/code&gt;, &lt;code&gt;tools/call&lt;/code&gt;, all JSON-RPC over the transport (Streamable HTTP + SSE in production, stdio in local dev). That small surface is the point: it's small enough that any client and any server can implement it, which is exactly what makes capabilities additive.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to adopt MCP
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One agent, one backend (1×1).&lt;/strong&gt; A direct method call is simpler and faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A stable internal toolset with no external consumers.&lt;/strong&gt; The additive win is theoretical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ultra-low-latency hot paths.&lt;/strong&gt; MCP adds a hop and JSON-RPC framing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth is still a mess.&lt;/strong&gt; MCP's value compounds with one identity provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You haven't shipped a v1.&lt;/strong&gt; Build the naive integration first; adopt MCP when N or M actually grows. We did it at integration fourteen, not two.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The model to carry forward
&lt;/h2&gt;

&lt;p&gt;Integrations scale as N×M. Protocols scale as N+M. Every bespoke client you write is a multiplication you'll pay for again with the next agent. Every capability you publish as an MCP tool is an addition every future agent gets for free.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Publish capabilities, not endpoints.&lt;/strong&gt; Design each tool as a contract an unfamiliar agent can call from its schema alone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put one identity boundary in front of every tool.&lt;/strong&gt; One OAuth/Entra door, scoped per tool, tenant bound in code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat tool schemas as your public API.&lt;/strong&gt; Version them, document them, break them carefully.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/mcp-deep-dive-part-1-why-mcp-matters" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. Adopting MCP and want a second pair of eyes on where to draw your server boundaries? Reach me at randhir.jassal[at]gmail.com.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>dotnet</category>
      <category>architecture</category>
    </item>
    <item>
      <title>AI Code Review That Engineers Actually Trust: The Pipeline We Run on Every Pull Request</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Fri, 03 Jul 2026 18:37:46 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/ai-code-review-that-engineers-actually-trust-the-pipeline-we-run-on-every-pull-request-21fp</link>
      <guid>https://dev.to/kirandeepjassalcrypto/ai-code-review-that-engineers-actually-trust-the-pipeline-we-run-on-every-pull-request-21fp</guid>
      <description>&lt;p&gt;Bolting an LLM onto your pull requests is a weekend project. Building AI code review that your engineers don't disable within two weeks is the actual problem. The failure mode isn't missing bugs — it's crying wolf. Post twenty nitpicks and three hallucinations on someone's PR and they'll mute the bot forever. This is the pipeline we built on &lt;strong&gt;Mattrx&lt;/strong&gt; to earn — and keep — that trust.&lt;/p&gt;

&lt;p&gt;Mattrx is our multi-tenant marketing-analytics SaaS: ~95k lines of C#, 11 engineers, and enough pull requests that senior-reviewer time was the bottleneck. We tried the naive thing first — pipe the changed file into a model, post the output — and watched the team stop reading it in &lt;strong&gt;nine days&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Human-only / naive AI (before)&lt;/th&gt;
&lt;th&gt;AI review pipeline (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coverage&lt;/td&gt;
&lt;td&gt;selective / whole-file dump&lt;/td&gt;
&lt;td&gt;every PR, diff-focused&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First-review latency&lt;/td&gt;
&lt;td&gt;~6 hours (wait for a human)&lt;/td&gt;
&lt;td&gt;~3 minutes (AI first pass)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;none / a naked file&lt;/td&gt;
&lt;td&gt;diff + call sites + conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reviewers&lt;/td&gt;
&lt;td&gt;one mega-prompt&lt;/td&gt;
&lt;td&gt;specialized dimensions, in parallel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;False positives&lt;/td&gt;
&lt;td&gt;~35% (so it gets ignored)&lt;/td&gt;
&lt;td&gt;~6% (adversarially verified)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merge control&lt;/td&gt;
&lt;td&gt;human, or nothing&lt;/td&gt;
&lt;td&gt;severity gate; human always decides&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;gateway: audit, cost, secret redaction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;~90 PRs/week across 11 engineers; the pipeline reviews 100%.&lt;/li&gt;
&lt;li&gt;First-pass review latency 6h → 3 min.&lt;/li&gt;
&lt;li&gt;False-positive rate ~35% → ~6% — the single number that decides whether the bot lives or dies.&lt;/li&gt;
&lt;li&gt;Escaped defects to production down ~40%; senior-reviewer time down ~30%.&lt;/li&gt;
&lt;li&gt;~$0.05 per PR (cheap model for style, frontier only for correctness).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The one mental shift:&lt;/strong&gt; AI code review is not about finding issues — models find plenty. It's about &lt;strong&gt;not crying wolf&lt;/strong&gt;. The product is trust, and trust is a false-positive-rate problem. Verify before you comment; let the AI propose and the human dispose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The naive approach — and why it collapses
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BEFORE: dump the whole changed file into one prompt, post whatever comes back.&lt;/span&gt;
&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChangedFiles&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ReadAllTextAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompleteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"Review this code and list problems:\n&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PostCommentAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// a wall of unstructured, often-wrong text&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reviews the whole file, not the change. It has no project context, so it flags your conventions as bugs. No severity — a missing null-check and a stylistic preference arrive with equal weight. And no verification, so every hallucination goes straight to the developer. The result is a ~35% false-positive rate and a team that learns, correctly, to ignore the bot.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Context assembly — review the change, not the file
&lt;/h2&gt;

&lt;p&gt;Build a review context: the diff (only what changed), the call sites of the symbols the change touches, and the project conventions for those files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ReviewContext&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;BuildAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PullRequest&lt;/span&gt; &lt;span class="n"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetDiffAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BaseSha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HeadSha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// the change, nothing else&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ReviewContext&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Diff&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChangedFiles&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddCallSites&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;symbols&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FindReferencesAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TouchedSymbols&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// bugs hide at call sites&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddConventions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conventions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ForPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;                            &lt;span class="c1"&gt;// your rules&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// diff + call sites + conventions — never a naked file&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most false positives are the model not knowing the rules of your codebase. Feed it the conventions and the call sites and it stops flagging your patterns and starts catching the bug two callers away.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Multi-dimensional reviewers, not one mega-prompt
&lt;/h2&gt;

&lt;p&gt;Specialized reviewers — correctness, security, performance, tests — each with a narrow remit, run in parallel and return &lt;strong&gt;typed, structured findings&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;sealed&lt;/span&gt; &lt;span class="k"&gt;record&lt;/span&gt; &lt;span class="nc"&gt;ReviewFinding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Dimension&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// "correctness" | "security" | "performance" | "tests"&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Severity&lt;/span&gt; &lt;span class="n"&gt;Severity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// Blocker | High | Medium | Low | Nit&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// one sentence&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Rationale&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// why it's a defect, grounded in the diff&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;SuggestedFix&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A "security reviewer" told to hunt injection and secret leakage outperforms a generalist told to "find problems," and its output is a typed record you can gate on — not a paragraph you have to parse.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Adversarial verification — the feature that earns trust
&lt;/h2&gt;

&lt;p&gt;Before any finding is posted, a separate model is prompted to &lt;strong&gt;refute&lt;/strong&gt; it. Default to "not real" when uncertain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;IsRealAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ReviewFinding&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReviewContext&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EvaluateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;EvalRequest&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Feature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"code-review-verify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Prompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
            &lt;span class="s"&gt;$"A reviewer claims: \"&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Summary&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;\". Using the diff and the call sites, decide "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
            &lt;span class="s"&gt;"whether this is a REAL defect that would bite in production. Actively try to "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
            &lt;span class="s"&gt;"refute it. If it depends on facts not present in the context, treat it as NOT real."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Context&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ForFinding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsReal&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Confidence&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;0.90&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// post only if a skeptic couldn't refute it&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This asymmetry is the whole game. &lt;strong&gt;Precision matters far more than recall&lt;/strong&gt; for an AI reviewer, because the cost of a false positive is the tool itself getting muted. A skeptical second pass is the cheapest precision you'll ever buy — it's what took us to ~6% FP and kept the bot alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Severity gating — a human on the button
&lt;/h2&gt;

&lt;p&gt;The AI proposes; the human disposes. Only blocker/high findings request changes; everything else is a non-blocking comment, and a human can always override.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;MergeAdvice&lt;/span&gt; &lt;span class="nf"&gt;Gate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IReadOnlyList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ReviewFinding&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;blocking&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Severity&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;Severity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Blocker&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="n"&gt;Severity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;High&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;blocking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;MergeAdvice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Comment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="c1"&gt;// post comments, do not block&lt;/span&gt;
        &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MergeAdvice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RequestChanges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocking&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// request changes; human may override&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AI that can unilaterally block merges will, the first time it's confidently wrong, get switched off — taking its real value with it. Advisory-by-default with human override is what makes it safe to leave on.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Governance — run it through the gateway
&lt;/h2&gt;

&lt;p&gt;Every review call goes through the same governed AI gateway: per-repo token budgets, model routing (cheap model for style, frontier for correctness), &lt;strong&gt;secret redaction&lt;/strong&gt; before code leaves the boundary, and an append-only audit. Code is one of your most sensitive assets — if your AI reviewer isn't redacting secrets, capping spend, and logging what it saw, you've traded a review bottleneck for a data-governance incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The feedback loop
&lt;/h2&gt;

&lt;p&gt;Developers thumbs-up/down every comment; dimensions with poor precision get stricter verification thresholds, and conventions that keep getting mis-flagged get added to the context. That loop is why precision stays high after launch instead of drifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest stuff: when NOT to build this
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small team / low PR volume.&lt;/strong&gt; If a human reviews everything within the hour, the overhead isn't worth it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You haven't measured false positives.&lt;/strong&gt; Ship a noisy bot and you train your team to ignore it permanently. Pilot, measure FP, roll out under ~10%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You'd let the AI block merges alone.&lt;/strong&gt; Don't. AI proposes, humans dispose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proprietary/regulated code that can't leave your boundary.&lt;/strong&gt; Self-host or redact aggressively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You think it replaces reviewers.&lt;/strong&gt; It's an assistant — architecture and design stay human.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're using it for style.&lt;/strong&gt; A linter does style deterministically, instantly, and free. Aim the AI at logic and security.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The model to carry forward
&lt;/h2&gt;

&lt;p&gt;An AI reviewer's job is to &lt;strong&gt;delete the noise so humans review what matters&lt;/strong&gt;. The models can find issues all day; the engineering is in not crying wolf. Optimize for precision over recall, verify before you comment, and keep the human on the merge button. Get the false-positive rate low enough and the tool becomes something your team relies on; get it wrong and they'll mute it in nine days — we timed it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/ai-code-review-pipeline-enterprise-teams" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. Rolling out AI code review and fighting the false-positive problem? Reach me at randhir.jassal[at]gmail.com.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>dotnet</category>
      <category>devops</category>
    </item>
    <item>
      <title>How We Deliver 15 Million Webhooks a Day Without Losing a Single Event</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Wed, 01 Jul 2026 17:52:43 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/how-we-deliver-15-million-webhooks-a-day-without-losing-a-single-event-2a9c</link>
      <guid>https://dev.to/kirandeepjassalcrypto/how-we-deliver-15-million-webhooks-a-day-without-losing-a-single-event-2a9c</guid>
      <description>&lt;p&gt;A webhook looks like the easiest feature you'll ever build: something happens, you POST it to the customer's URL. Then you ship it, and reality arrives — the endpoint is down, or slow, or returns 500, or times out, or your own process restarts mid-send. Multiply that by 15 million events a day across thousands of endpoints you don't control, and "just POST it" becomes one of the hardest reliability problems in your system.&lt;/p&gt;

&lt;p&gt;This is the design we run on &lt;strong&gt;Mattrx&lt;/strong&gt;, our multi-tenant marketing-analytics SaaS, to deliver ~15 million webhook events per day to customer-configured endpoints. The first version was a synchronous POST inside the request handler. It lost events on every deploy and turned one slow customer into an outage for everyone. This post is everything we changed, and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Naive sync POST (before)&lt;/th&gt;
&lt;th&gt;Outbox + queue + workers (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Durability&lt;/td&gt;
&lt;td&gt;events lost on crash/deploy&lt;/td&gt;
&lt;td&gt;persisted before delivery, never lost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API latency&lt;/td&gt;
&lt;td&gt;blocked on the customer's endpoint&lt;/td&gt;
&lt;td&gt;decoupled; API p95 unaffected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retries&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;exponential backoff + jitter, 8 attempts / ~24h&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Isolation&lt;/td&gt;
&lt;td&gt;one slow customer stalls everyone&lt;/td&gt;
&lt;td&gt;per-tenant partitioning + concurrency caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Giving up&lt;/td&gt;
&lt;td&gt;fails the API call&lt;/td&gt;
&lt;td&gt;dead-letter queue + circuit breaker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;ad hoc&lt;/td&gt;
&lt;td&gt;HMAC-SHA256 signed, HTTPS, timestamped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicates&lt;/td&gt;
&lt;td&gt;unhandled&lt;/td&gt;
&lt;td&gt;stable event id; customers de-dupe&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;~15M events/day ≈ 175/sec average, peaks 5–10× (~1,500–1,730/sec).&lt;/li&gt;
&lt;li&gt;Outbox pattern → zero events dropped after a committed change (we used to lose thousands per deploy).&lt;/li&gt;
&lt;li&gt;Decoupling kept API p95 at 120 ms.&lt;/li&gt;
&lt;li&gt;First-attempt delivery ~96%; ~99.98% eventual after retries.&lt;/li&gt;
&lt;li&gt;~0.02% permanently fail → dead-letter queue → per-customer status + alert.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The one mental shift:&lt;/strong&gt; you don't control the endpoints, so you cannot prevent failure — you can only make failure survivable. Persist before you deliver, retry with discipline, isolate the slow from the fast, and make giving up a first-class, observable outcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  The naive approach — and why it collapses
&lt;/h2&gt;

&lt;p&gt;The first version delivered the webhook inside the request that caused the event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BEFORE: fire the webhook synchronously, in the request path.&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;HttpPost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"campaigns/{id}/complete"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;campaigns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompleteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetEndpointAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"campaign.completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;HttpClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PostAsJsonAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"campaign.completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// blocks&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the process restarts between &lt;code&gt;CompleteAsync&lt;/code&gt; and the POST, the event is gone forever. The API thread waits on the customer's endpoint, so slow customers exhaust the thread pool. A customer's 500 becomes your 500. And a transient blip is a permanently missed event.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 1: the Outbox Pattern — persist before you deliver
&lt;/h2&gt;

&lt;p&gt;Write the event into an outbox table in the &lt;strong&gt;same transaction&lt;/strong&gt; as the domain change. If it commits, the event will be delivered — later, by a separate relay. If it rolls back, the event never existed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AFTER: the outbox row commits atomically with the state change.&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;CompleteCampaignAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;campaignId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;BeginTransactionAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;campaigns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;MarkCompletedAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;campaignId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Outbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;InsertAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;OutboxEvent&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Guid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewGuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;          &lt;span class="c1"&gt;// stable event id == idempotency key&lt;/span&gt;
        &lt;span class="n"&gt;TenantId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"campaign.completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Payload&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;campaignId&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="n"&gt;Status&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OutboxStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pending&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;CreatedAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UtcNow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CommitAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;          &lt;span class="c1"&gt;// state change + event, all or nothing&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A background relay claims rows with &lt;code&gt;FOR UPDATE SKIP LOCKED&lt;/code&gt; so multiple relay instances never grab the same row, and publishes to the queue. &lt;strong&gt;Result: events dropped per deploy went from thousands to zero.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 2: queue + dispatcher + workers — parallelism without noisy neighbours
&lt;/h2&gt;

&lt;p&gt;Publish to a queue &lt;strong&gt;partitioned by &lt;code&gt;tenantId&lt;/code&gt;&lt;/strong&gt;, and run a pool of competing-consumer workers with &lt;strong&gt;per-tenant concurrency caps&lt;/strong&gt;. Partitioning gives per-tenant ordering; the caps give isolation — a broken tenant can waste at most its own slots.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AcquireAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxPerTenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// noisy-neighbour guard&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;DeliverAndRelease&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                                      &lt;span class="c1"&gt;// don't block the receive loop&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Peak concurrency sits around ~1,400 in-flight deliveries — exactly what Little's law predicts (1,730/s × ~0.8s).&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 3: retries, exponential backoff + jitter, dead-letter queue
&lt;/h2&gt;

&lt;p&gt;On a retryable failure, re-queue with a delay that grows exponentially and is &lt;strong&gt;jittered&lt;/strong&gt; to avoid synchronized retry storms. After a fixed number of attempts, dead-letter it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Exponential backoff with FULL jitter, capped. 8 attempts span ~24h.&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt; &lt;span class="nf"&gt;NextDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;baseSeconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseDelaySeconds&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;MaxDelaySeconds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// cap at 6h&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;jittered&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;baseSeconds&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.5&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Shared&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NextDouble&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="m"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                &lt;span class="c1"&gt;// full jitter&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jittered&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not every failure is retryable: a &lt;code&gt;410 Gone&lt;/code&gt; or &lt;code&gt;400 Bad Request&lt;/code&gt; means stop — dead-letter immediately. A &lt;code&gt;503&lt;/code&gt;, &lt;code&gt;429&lt;/code&gt;, timeout, or connection reset is transient — retry. &lt;strong&gt;Retries lift delivery from ~96% first-attempt to ~99.98% eventual; the remaining ~0.02% land in a visible, queryable dead-letter queue.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixes 4–7: idempotency, HMAC signing, circuit breaker, observability
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency:&lt;/strong&gt; every event carries a stable id (&lt;code&gt;X-Mattrx-Event-Id&lt;/code&gt;) that never changes across retries. You can't make delivery exactly-once, but you make &lt;em&gt;processing&lt;/em&gt; effectively-once by shipping a stable id and telling customers to key on it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; HMAC-SHA256 over &lt;code&gt;timestamp + id + body&lt;/code&gt; with a per-tenant secret, HTTPS only. Sign and send the &lt;em&gt;raw&lt;/em&gt; serialized body so a proxy reformatting JSON can't break verification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Circuit breaker:&lt;/strong&gt; track consecutive failures per endpoint; after 20, auto-disable and email the owner. This reclaims retry capacity otherwise burned forever by endpoints that will never succeed (~40/day for us).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; every attempt writes a delivery record (event id, endpoint, attempt, status, latency, outcome) powering a per-customer dashboard. At 15M/day, aggregate "99.98% delivered" hides the one tenant at 40%.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The model to carry forward
&lt;/h2&gt;

&lt;p&gt;At-least-once &lt;strong&gt;plus&lt;/strong&gt; idempotency — never exactly-once. You cannot stop the endpoints from failing, so the whole design is about surviving their failure: persist before you deliver, isolate the slow from the fast, and make giving up a loud, observable, recoverable outcome instead of a silent drop.&lt;/p&gt;

&lt;p&gt;A webhook really is just a POST. Delivering fifteen million of them a day, to endpoints you don't control, without losing one — that's a distributed system, and it deserves to be designed like one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/webhook-delivery-system-15-million-events-per-day" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. If you're designing an event/webhook delivery system and want a second pair of eyes on the failure paths, reach me at &lt;a href="mailto:randhir.jassal@gmail.com"&gt;randhir.jassal@gmail.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>dotnet</category>
      <category>webhooks</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Stop Chunking Documents: The Open Knowledge Format (OKF) for Enterprise AI</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Tue, 30 Jun 2026 18:37:28 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/stop-chunking-documents-the-open-knowledge-format-okf-for-enterprise-ai-39i4</link>
      <guid>https://dev.to/kirandeepjassalcrypto/stop-chunking-documents-the-open-knowledge-format-okf-for-enterprise-ai-39i4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/open-knowledge-format-okf-enterprise-ai" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Everyone's first RAG pipeline is the same four boxes: documents, chunk, vector DB, LLM. It demos in an afternoon and then quietly betrays you in production — stale answers, no relationships, no governance, and a model guessing from fragments. The fix is not a bigger vector index. It is to stop storing &lt;em&gt;documents&lt;/em&gt; and start storing &lt;em&gt;knowledge&lt;/em&gt;. That is Open Knowledge Format (OKF).&lt;/p&gt;

&lt;p&gt;To be clear up front, because the title is deliberately provocative: &lt;strong&gt;OKF does not kill embeddings.&lt;/strong&gt; Vectors still do the recall. What OKF kills is &lt;em&gt;blind chunking&lt;/em&gt; — slicing opaque documents into context-free fragments and hoping cosine similarity reassembles meaning. On &lt;strong&gt;Mattrx&lt;/strong&gt;, a multi-tenant marketing-analytics SaaS (.NET 9 + Azure SQL + a Python FastAPI AI service), replacing blind chunking with OKF + a Context Engine took the assistant's hallucination rate from &lt;strong&gt;18% to 3%&lt;/strong&gt; and stale-answer rate from &lt;strong&gt;11% to 1.5%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Documents → chunk → vector DB (before)&lt;/th&gt;
&lt;th&gt;OKF + Context Engine (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unit of knowledge&lt;/td&gt;
&lt;td&gt;Opaque chunk of text&lt;/td&gt;
&lt;td&gt;Typed, governed knowledge unit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;None — chunks are islands&lt;/td&gt;
&lt;td&gt;Metadata + relationships + schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshness&lt;/td&gt;
&lt;td&gt;Snapshot, rots silently&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;valid_until&lt;/code&gt; + live API refs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rules&lt;/td&gt;
&lt;td&gt;Buried in prose, ignorable&lt;/td&gt;
&lt;td&gt;First-class data the engine enforces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;Top-k cosine&lt;/td&gt;
&lt;td&gt;Hybrid + vector + graph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-hop questions&lt;/td&gt;
&lt;td&gt;Unanswerable&lt;/td&gt;
&lt;td&gt;Answered via relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Results after the rebuild:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Knowledge base restructured into &lt;strong&gt;~11,000 OKF units&lt;/strong&gt; (Markdown + metadata + relationships + APIs + schemas + business rules).&lt;/li&gt;
&lt;li&gt;Hallucination &lt;strong&gt;18% -&amp;gt; 3%&lt;/strong&gt;; faithfulness &lt;strong&gt;0.96&lt;/strong&gt;; answer-relevance &lt;strong&gt;0.91&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Context tokens/call &lt;strong&gt;14k -&amp;gt; 3.5k&lt;/strong&gt; — structure lets the engine attach the &lt;em&gt;right&lt;/em&gt; thing, not everything.&lt;/li&gt;
&lt;li&gt;Outdated-answer rate &lt;strong&gt;11% -&amp;gt; 1.5%&lt;/strong&gt; (&lt;code&gt;valid_until&lt;/code&gt; + metadata freshness).&lt;/li&gt;
&lt;li&gt;Multi-hop questions &lt;strong&gt;unanswerable -&amp;gt; answered&lt;/strong&gt; via graph retrieval.&lt;/li&gt;
&lt;li&gt;Deprecated-plan recommendations &lt;strong&gt;recurring -&amp;gt; 0&lt;/strong&gt; (business rules enforced as data).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The one mental shift:&lt;/strong&gt; a chunk is a fragment of text with no identity, no owner, and no expiry. An OKF unit is a &lt;em&gt;governed, typed, related&lt;/em&gt; piece of knowledge your context engine can reason about. Stop indexing text. Start indexing knowledge.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Part 1 — The OKF Knowledge Base
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. From documents to knowledge units.&lt;/strong&gt; A chunk has no identity — you can't say who owns it, when it expires, or what it relates to. The OKF atomic unit is a Markdown body plus structured frontmatter: &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;owner&lt;/code&gt;, &lt;code&gt;version&lt;/code&gt;, &lt;code&gt;valid_from&lt;/code&gt; / &lt;code&gt;valid_until&lt;/code&gt;, &lt;code&gt;visibility&lt;/code&gt;, plus &lt;code&gt;relationships&lt;/code&gt;, &lt;code&gt;apis&lt;/code&gt;, &lt;code&gt;schemas&lt;/code&gt;, and &lt;code&gt;business rules&lt;/code&gt;. The body is still embedded; the metadata is what the engine filters and ranks on. The moment a unit has &lt;code&gt;valid_until&lt;/code&gt;, the engine can refuse to ground an answer in expired knowledge — which dropped outdated answers &lt;strong&gt;11% -&amp;gt; 1.5%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Relationships and schemas: the knowledge graph.&lt;/strong&gt; Chunks are islands; vector similarity finds text that &lt;em&gt;sounds alike&lt;/em&gt;, not text that's &lt;em&gt;connected&lt;/em&gt;. OKF makes edges first-class (&lt;code&gt;relates_to&lt;/code&gt;, &lt;code&gt;supersedes&lt;/code&gt;, &lt;code&gt;governed_by&lt;/code&gt;, &lt;code&gt;depends_on&lt;/code&gt;) and defines schemas for structured units — together a knowledge graph over the corpus. Graph retrieval starts from semantic seeds and expands along those edges, which is how "if a Growth customer downgrades mid-cycle, how is it prorated?" (an answer spanning three documents) became answerable. &lt;strong&gt;Vectors find the door; the graph walks the building.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. APIs and business rules: live data and governance as data.&lt;/strong&gt; A snapshot like "Our plans are Starter, Growth, Scale…" is wrong the day the catalog changes, and a rule written in a paragraph is a suggestion the model can ignore. OKF units reference &lt;strong&gt;APIs&lt;/strong&gt; (resolved at query time, through a governed tool layer) so answers reflect the &lt;em&gt;current&lt;/em&gt; catalog, and link to &lt;strong&gt;business-rule units&lt;/strong&gt; (&lt;code&gt;enforcement: hard&lt;/code&gt;) that the engine injects as constraints. "Recommended a deprecated plan" went from a recurring complaint to &lt;strong&gt;0&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — The Context Engine
&lt;/h2&gt;

&lt;p&gt;OKF is how knowledge is &lt;em&gt;stored&lt;/em&gt;; the Context Engine is how it becomes a prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Retrieval: hybrid + vector + graph.&lt;/strong&gt; Top-k cosine is one signal — it misses exact-term matches (a campaign id, an error code) and related-but-not-similar units. The engine runs &lt;strong&gt;hybrid search&lt;/strong&gt; (BM25 + vector) for recall, &lt;strong&gt;vector retrieval&lt;/strong&gt; for nuance, and &lt;strong&gt;graph retrieval&lt;/strong&gt; for connected units — all tenant-scoped and filtered to currently-valid units (&lt;code&gt;onlyValid: true&lt;/code&gt;, metadata doing security and quality work for free). This combination is the core of the &lt;strong&gt;18% -&amp;gt; 3%&lt;/strong&gt; hallucination drop and &lt;strong&gt;0.96&lt;/strong&gt; faithfulness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Memory, tool calls, prompt assembly.&lt;/strong&gt; Memory supplies what's already known; tool calls fetch the live data OKF units reference; prompt assembly packs everything to a hard token budget &lt;strong&gt;in priority order — constraints first (never trimmed), then memory, then ranked knowledge, then live data&lt;/strong&gt;. Ordering is a design decision: a budget squeeze can never silently drop the rule that keeps an answer compliant. This is how context holds at &lt;strong&gt;3.5k tokens (down from 14k)&lt;/strong&gt; while accuracy &lt;em&gt;improves&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to adopt OKF
&lt;/h2&gt;

&lt;p&gt;It's an investment in structure — pay it where structure exists and matters. Skip it for a small/static corpus, genuinely unstructured knowledge, when no one can own authoring (rotten metadata is worse than none), or before you've nailed plain hybrid RAG + a cross-encoder rerank (OKF is the &lt;em&gt;next&lt;/em&gt; layer, not the first). And don't big-bang it — convert high-value, high-churn domains first (we did billing, then product, then runbooks) and leave the long tail as plain chunks.&lt;/p&gt;

&lt;p&gt;And the honest framing of the title: &lt;strong&gt;OKF does not replace embeddings.&lt;/strong&gt; Vector retrieval is still inside the Context Engine doing recall. OKF replaces &lt;em&gt;blind chunking&lt;/em&gt; and adds the structure, governance, and graph embeddings alone cannot provide. If someone sells you "vectors are dead," walk away.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model to carry forward
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Documents are what you have; knowledge is what the model needs.&lt;/strong&gt; Give every unit an identity, an owner, and an expiry; model relationships explicitly (the graph answers the multi-hop questions vectors structurally can't); encode rules as data, not prose.&lt;/p&gt;




&lt;p&gt;👉 &lt;strong&gt;The full article — the complete OKF format, the C# records and graph/retrieval/assembly code, the end-to-end query walkthrough, the migration checklist, and the full "when not to do this" — is on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://prepstack.co.in/blog/open-knowledge-format-okf-enterprise-ai" rel="noopener noreferrer"&gt;Stop Chunking Documents: The Open Knowledge Format (OKF)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>architecture</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>AI-Native Architecture: The 9-Layer Blueprint Every Enterprise Will Adopt by 2027</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Mon, 29 Jun 2026 13:58:21 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/ai-native-architecture-the-9-layer-blueprint-every-enterprise-will-adopt-by-2027-mcf</link>
      <guid>https://dev.to/kirandeepjassalcrypto/ai-native-architecture-the-9-layer-blueprint-every-enterprise-will-adopt-by-2027-mcf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/ai-native-architecture-enterprise-2027" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every enterprise has now shipped "an AI feature." Almost none have shipped an AI-native &lt;em&gt;architecture&lt;/em&gt;. The difference decides whether your AI roadmap survives 2027 — or quietly gets ripped out after the third incident.&lt;/p&gt;

&lt;p&gt;We learned this the expensive way on &lt;strong&gt;Mattrx&lt;/strong&gt;, a multi-tenant marketing-analytics SaaS (Angular 19 + .NET 9 / ASP.NET Core + Azure SQL, with a Python FastAPI AI service; 110k MAU, ~3,200 req/sec peak). The first "Mattrx Help" was a single MVC action calling a frontier model inline. It demoed beautifully. In production it leaked context across tenants, hallucinated 18% of the time, cost $0.021/query, and one tenant's runaway retry loop billed the entire fleet for an afternoon. The fix was not a better prompt — it was an architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Bolt-on AI (before)&lt;/th&gt;
&lt;th&gt;AI-native (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model access&lt;/td&gt;
&lt;td&gt;SDK called inline in controllers&lt;/td&gt;
&lt;td&gt;Single governed AI gateway&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tenant isolation&lt;/td&gt;
&lt;td&gt;"Please don't leak" in the prompt&lt;/td&gt;
&lt;td&gt;Filters pushed into the data layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context size&lt;/td&gt;
&lt;td&gt;~14,000 tokens, stuff everything&lt;/td&gt;
&lt;td&gt;~3,500 tokens, assembled + ranked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Stateless, cold every turn&lt;/td&gt;
&lt;td&gt;Short-term buffer + semantic long-term&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;Naive top-k cosine&lt;/td&gt;
&lt;td&gt;Hybrid recall + cross-encoder rerank&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;One mega-prompt&lt;/td&gt;
&lt;td&gt;Orchestrator + specialists + eval gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model choice&lt;/td&gt;
&lt;td&gt;frontier everywhere&lt;/td&gt;
&lt;td&gt;Routed by task complexity, with fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Actions&lt;/td&gt;
&lt;td&gt;Agent had raw DB access&lt;/td&gt;
&lt;td&gt;Typed, authorized tool contracts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Production results after the rebuild:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucination rate &lt;strong&gt;18% -&amp;gt; 3%&lt;/strong&gt; (hybrid retrieval + rerank).&lt;/li&gt;
&lt;li&gt;Faithfulness &lt;strong&gt;0.96&lt;/strong&gt;, answer-relevance &lt;strong&gt;0.91&lt;/strong&gt; (offline eval set).&lt;/li&gt;
&lt;li&gt;Context tokens/call &lt;strong&gt;14k -&amp;gt; 3.5k&lt;/strong&gt; — same answers, a quarter of the spend.&lt;/li&gt;
&lt;li&gt;Cost/query &lt;strong&gt;$0.021 -&amp;gt; $0.008&lt;/strong&gt; (mostly model routing).&lt;/li&gt;
&lt;li&gt;Agentic p95 latency &lt;strong&gt;4.2s -&amp;gt; 1.8s&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;~&lt;strong&gt;40&lt;/strong&gt; prompt-injection attempts blocked/week at gateway + identity.&lt;/li&gt;
&lt;li&gt;Eval gate threshold &lt;strong&gt;0.90&lt;/strong&gt; — answers below it never reach a user.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero&lt;/strong&gt; cross-tenant leaks in the six months since.&lt;/li&gt;
&lt;li&gt;Help now deflects ~&lt;strong&gt;520 support tickets/month&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The one mental shift:&lt;/strong&gt; stop treating the model as a feature you call, and start treating it as a &lt;em&gt;tier&lt;/em&gt; you operate — with its own gateway, identity, memory, and governance, exactly like your database.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The nine layers (each added after a real failure)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1 → 2. The API gateway becomes an AI gateway.&lt;/strong&gt; The first version put the model call inside a controller — no per-tenant rate limit, no token accounting, no PII redaction, no audit. Now every model call (Help, Insights, internal classification) passes through one gateway that reserves a per-tenant token budget, redacts PII before anything leaves the boundary, routes to a model, and writes an audit row. Runaway loops became a 429 for one tenant instead of a fleet-wide bill — and ~40 injection attempts/week get blocked here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Identity stops meaning "who logged in."&lt;/strong&gt; "Tell the model not to leak" is not a security control. Identity became a first-class &lt;code&gt;AiPrincipal&lt;/code&gt; (tenant, user, scopes) that propagates into retrieval and every tool call, with the tenant filter &lt;strong&gt;pushed down into the vector store&lt;/strong&gt; — data-layer enforcement, not a polite request. Zero cross-tenant leaks since.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The context layer (the one most teams skip).&lt;/strong&gt; We used to concatenate everything and hope the window was big enough (~14k tokens). But bigger context isn't better — past a point recall drops and the model misses the middle. A context assembler now treats the prompt like a budget to pack: select, rank, compress to a hard token ceiling. Result: &lt;strong&gt;14k -&amp;gt; 3.5k tokens&lt;/strong&gt;, faithfulness &lt;strong&gt;0.96&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Memory.&lt;/strong&gt; Two tiers: a short-term conversation buffer (TTL-bounded) and a long-term &lt;em&gt;semantic&lt;/em&gt; memory that persists only salient turns (decisions, preferences, corrections). It stops re-interrogating returning users — a big part of deflecting ~520 tickets/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. The knowledge base — RAG done properly.&lt;/strong&gt; Top-k cosine returns chunks that &lt;em&gt;look&lt;/em&gt; relevant and are often subtly wrong — the worst failure mode because it sounds confident. We moved retrieval to Python: hybrid recall (BM25 + vector), then a &lt;strong&gt;cross-encoder rerank&lt;/strong&gt; (the step most RAG skips), with the rule that returning &lt;em&gt;nothing&lt;/em&gt; beats returning something wrong. This single change drove hallucination &lt;strong&gt;18% -&amp;gt; 3%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Agents.&lt;/strong&gt; One mega-prompt that planned, queried, forecasted, and wrote up in a single shot drifted as a whole when any step drifted. Now an orchestrator plans, dispatches to specialist agents (sql / forecast / summarize), and &lt;strong&gt;nothing reaches the user until a 0.90 eval gate scores it&lt;/strong&gt;. Agentic p95 &lt;strong&gt;4.2s -&amp;gt; 1.8s&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. The LLM layer is a router, not a model.&lt;/strong&gt; We used the frontier model for everything — including "is this about billing or analytics?" Instrumented, &lt;strong&gt;~70% of calls were trivial&lt;/strong&gt;. A router now picks the cheapest model that can do the job, with fallback. Cost/query &lt;strong&gt;$0.021 -&amp;gt; $0.008&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Business APIs become governed tools.&lt;/strong&gt; The agent used to hold a database connection — functionally an unaudited admin user. Now every action is a typed, scope-checked tool: the model &lt;em&gt;proposes&lt;/em&gt;, the tool layer &lt;em&gt;decides&lt;/em&gt;, and the tenant is bound in code, never from model-supplied arguments. A model can hallucinate an action; it cannot hallucinate a scope check.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to build this
&lt;/h2&gt;

&lt;p&gt;It earns its complexity at scale and under real risk — and is overkill otherwise: a single AI feature with no multi-tenancy, low traffic, no regulated/cross-tenant data, no v1 shipped yet, or no labeled eval set (the gate is theater without one). Smaller teams should adopt the &lt;strong&gt;gateway and the rerank&lt;/strong&gt; first and stop there. We didn't build all nine at once either — each layer was a response to a specific failure, which is exactly how you should adopt it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model to carry forward
&lt;/h2&gt;

&lt;p&gt;Treat AI as a &lt;strong&gt;tier&lt;/strong&gt;, not a &lt;strong&gt;feature&lt;/strong&gt;. Your database has a gateway, identity, isolation, and an audit trail because direct access doesn't stay safe at scale. The model is no different. Three habits: &lt;strong&gt;measure before you add a layer&lt;/strong&gt; (no bad number, no layer); &lt;strong&gt;make governance structural, not textual&lt;/strong&gt; (if a guarantee lives in a prompt, it isn't one); &lt;strong&gt;default to the cheapest path that passes eval&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;👉 &lt;strong&gt;The full article — all nine layers with the complete C# (.NET 9) and Python code, the end-to-end request flow, the adoption checklist, and the full "when not to do this" — is on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://prepstack.co.in/blog/ai-native-architecture-enterprise-2027" rel="noopener noreferrer"&gt;AI-Native Architecture: The 9-Layer Blueprint&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>dotnet</category>
      <category>azure</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI, Part 6: AI &amp; Data Governance — The Foundation Everything Grows From</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Sat, 27 Jun 2026 08:57:14 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-6-ai-data-governance-the-foundation-everything-3gen</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-6-ai-data-governance-the-foundation-everything-3gen</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-6-ai-data-governance" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. This is **Part 6&lt;/em&gt;* (the finale) of* Context Engineering for Enterprise AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Parts 1–5 built capability for Mattrx, a multi-tenant marketing-analytics SaaS (110k MAU, ~9,000 tenants, ~3,200 req/sec peak, ASP.NET Core / .NET 9 on Azure SQL plus a Python FastAPI AI service): a budgeted context window, a memory layer, multi-agent orchestration, an enterprise design spine, and multi-tenant isolation. This part is the layer underneath all of them — the control plane that decides &lt;em&gt;what data is allowed to enter or leave the context at all&lt;/em&gt;, for &lt;em&gt;whom&lt;/em&gt;, and for &lt;em&gt;what purpose&lt;/em&gt;. Capability without governance is a breach waiting for a date.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Governance is a control plane, not a checklist. Five controls decide what may enter or leave the context, each enforced &lt;em&gt;before&lt;/em&gt; the model sees data, never after a breach:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;The question it answers&lt;/th&gt;
&lt;th&gt;Where it's enforced&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Classification&lt;/td&gt;
&lt;td&gt;&lt;em&gt;How sensitive is this data?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;A classify gate at &lt;strong&gt;ingestion&lt;/strong&gt;; class + purpose tags to the Data Catalog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entitlement-aware retrieval&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;May **this user&lt;/em&gt;* (not just this tenant) see it?*&lt;/td&gt;
&lt;td&gt;Principal ACL/group filter stacked on the Part 5 tenant filter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consent &amp;amp; purpose&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;Are we **allowed&lt;/em&gt;* to use it for AI at all?*&lt;/td&gt;
&lt;td&gt;Purpose tags + a consent registry checked at use time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lineage &amp;amp; provenance&lt;/td&gt;
&lt;td&gt;&lt;em&gt;What exactly did the model see, and why allowed?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;An append-only record per generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Policy-as-code (PDP)&lt;/td&gt;
&lt;td&gt;&lt;em&gt;Who decides, and can we prove the rule?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;One deny-by-default Policy Decision Point every path calls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Mattrx production results (3-week build):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidential/Restricted documents reaching the shared embedding store ungoverned: &lt;strong&gt;~3,100 -&amp;gt; 0&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Intra-tenant cross-principal leaks (a support agent retrieving a finance doc &lt;em&gt;inside their own tenant&lt;/em&gt;), red-team: reproducible &lt;strong&gt;-&amp;gt; 0&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;"What did the model see about subject X, and why allowed?": &lt;strong&gt;0% -&amp;gt; 100%&lt;/strong&gt; of generations; DSAR fulfillment &lt;strong&gt;~2 days -&amp;gt; under 3 minutes&lt;/strong&gt; (one query).&lt;/li&gt;
&lt;li&gt;Customer data used for eval/fine-tuning without a consenting purpose: unbounded &lt;strong&gt;-&amp;gt; 0&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Governance enforcement: &lt;strong&gt;~14 scattered sites -&amp;gt; 1 PDP + N versioned policies&lt;/strong&gt;; a policy change ships without touching service code.&lt;/li&gt;
&lt;li&gt;Governance overhead on the hot path: &lt;strong&gt;+4 ms p95&lt;/strong&gt; (PDP decision ~3 ms cached, deny-by-default). Retrieval p95 &lt;strong&gt;31 ms -&amp;gt; 35 ms&lt;/strong&gt;, recall@5 held &lt;strong&gt;0.94&lt;/strong&gt;; cost/query &lt;strong&gt;$0.008 (unchanged)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The one mental shift
&lt;/h2&gt;

&lt;p&gt;Governance is a control plane that runs &lt;em&gt;before&lt;/em&gt; retrieval, not a report you generate &lt;em&gt;after&lt;/em&gt; an incident. Make every data access a question — &lt;code&gt;can(principal, data, purpose, context)?&lt;/code&gt; — answered deny-by-default by one engine, and recorded — so the answer to "could this leak?" is a query, not a prayer.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Classification at ingestion: you cannot govern what you never labeled
&lt;/h2&gt;

&lt;p&gt;Before: everything a tenant connected was embedded the moment it arrived — no sensitivity label, no purpose. A "knowledge base" sync could vectorize a spreadsheet of customer emails and a file of API tokens, now one cosine hop from any prompt.&lt;/p&gt;

&lt;p&gt;After: a &lt;strong&gt;classify gate&lt;/strong&gt; runs at ingestion. Every asset gets a sensitivity class (Public / Internal / Confidential / Restricted) and purpose tags, is registered in a Data Catalog, and &lt;em&gt;then&lt;/em&gt; routed — Public/Internal to the pool index, Confidential to a restricted index, and &lt;strong&gt;Restricted is catalogued and quarantined, never embedded at all&lt;/strong&gt;. Cheap deterministic detectors (regex/Presidio for secrets and PII) run first; an LLM classifier handles only the ambiguous remainder. Classifying at the front door means a missing filter downstream can't expose what was never indexed. The gate moved &lt;strong&gt;~3,100&lt;/strong&gt; sensitive docs out of the shared store (0.4% over-quarantine rate, human-reviewed).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Entitlement-aware retrieval: authorize the principal, not just the tenant
&lt;/h2&gt;

&lt;p&gt;Part 5 made retrieval &lt;em&gt;tenant&lt;/em&gt;-scoped. But inside a tenant, retrieval returned anything that tenant owned, to anyone in it — a support agent could pull the revenue runbook. The fix stacks a &lt;strong&gt;principal&lt;/strong&gt; predicate (the user's groups/clearance) on top of the tenant predicate; both apply at once. Mattrx runs a &lt;strong&gt;hybrid&lt;/strong&gt;: a fast ACL filter in the index, then a live revalidation of the surviving top-k against the entitlements service, so a just-revoked permission can't leak through a stale index. Intra-tenant cross-role leaks went reproducible &lt;strong&gt;-&amp;gt; 0&lt;/strong&gt;, recall@5 held at &lt;strong&gt;0.94&lt;/strong&gt;, +4 ms p95.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Consent and purpose limitation: allowed to &lt;em&gt;have&lt;/em&gt; it isn't allowed to &lt;em&gt;use&lt;/em&gt; it
&lt;/h2&gt;

&lt;p&gt;If Mattrx stored data, every feature treated it as fair game — retrieval, agent analysis, eval, fine-tuning. But "we hold this to run the customer's campaigns" is not consent to "use it to improve our product." Every asset carries &lt;strong&gt;purpose tags&lt;/strong&gt;; a &lt;strong&gt;consent registry&lt;/strong&gt; records what each tenant agreed to; each AI use names its purpose (&lt;code&gt;serve&lt;/code&gt; / &lt;code&gt;eval&lt;/code&gt; / &lt;code&gt;train&lt;/code&gt;) and the PDP refuses data whose purposes don't include it. Opt-out tenants are &lt;em&gt;structurally&lt;/em&gt; excluded from eval/train while still fully served their own features. Customer data used without a consenting purpose went &lt;strong&gt;-&amp;gt; 0&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Lineage and provenance: prove exactly what the model saw
&lt;/h2&gt;

&lt;p&gt;A DSAR ("what does your AI know about me, and where did it come from?") used to be a multi-day dig through stateless logs. Now every generation writes one &lt;strong&gt;append-only lineage record&lt;/strong&gt;: output hash, the exact source assets (id, version, class) that entered the prompt, the principal, the declared purpose, and the PDP decisions that allowed each source. It stores &lt;strong&gt;references and versions, not raw content&lt;/strong&gt; (so the audit trail isn't a second copy of the sensitive data), the table grants no UPDATE/DELETE, and the write is off the hot path. Provenance coverage &lt;strong&gt;0% -&amp;gt; 100%&lt;/strong&gt;; DSAR &lt;strong&gt;~2 days -&amp;gt; under 3 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Policy-as-code: one decision point everything asks
&lt;/h2&gt;

&lt;p&gt;Each rule above — and in Parts 2/4/5 — lived as its own &lt;code&gt;if&lt;/code&gt; statements in its own service: ~14 enforcement sites, four dialects of "allowed," nothing to read, test, or change centrally. Drift was inevitable. The fix: a single &lt;strong&gt;Policy Decision Point&lt;/strong&gt; answering &lt;code&gt;can(principal, data, purpose, context)?&lt;/code&gt; &lt;strong&gt;deny-by-default&lt;/strong&gt;, with rules as versioned, unit-tested policy-as-code (Mattrx runs OPA/Rego as a sidecar). Ingest, retrieval, output, and agent tools all ask the &lt;em&gt;same&lt;/em&gt; PDP; every decision is cached (~3 ms) and recorded into lineage. The PDP &lt;strong&gt;fails closed&lt;/strong&gt; — an outage blocks access rather than risking an open one, the correct failure mode for governance (and a real availability coupling you design for).&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest stuff
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Governance can become theater&lt;/strong&gt; — a catalog nobody trusts and policies nobody tests manufacture false confidence. The PDP earns trust only because its policies are unit-tested in CI and its decisions are recorded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classification is probabilistic&lt;/strong&gt; — make the failure bounded and reviewable (every asset has a catalog row), and put humans on the quarantine queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The PDP is an availability coupling&lt;/strong&gt; — deny-by-default makes it a tier-1 dependency (sidecar, health checks, cached decisions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't classify what you can delete&lt;/strong&gt; — minimization beats governance. The cheapest data to govern is the data you never kept.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The closing mental model
&lt;/h2&gt;

&lt;p&gt;Govern at the front door (classify and decide before data is embedded). Ask one engine, deny by default (every path asks the same PDP; if a new path doesn't ask, it's a hole). Record so you can prove it (append-only lineage turns "trust us" into "here's the query"). If you can't reconstruct what the model saw and why it was allowed, you don't have governance — you have hope.&lt;/p&gt;




&lt;p&gt;👉 &lt;strong&gt;The full article — with all the C# (.NET 9), Python, OPA/Rego, and SQL code, the control-plane diagram, the pre-ship checklist, and the full "honest stuff" — is on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-6-ai-data-governance" rel="noopener noreferrer"&gt;Context Engineering for Enterprise AI, Part 6&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dotnet</category>
      <category>architecture</category>
      <category>security</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI, Part 5: Multi-Tenant Patterns That Don't Leak, Starve, or Overspend</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Thu, 25 Jun 2026 14:47:19 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-5-multi-tenant-patterns-that-dont-leak-starve-or-b63</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-5-multi-tenant-patterns-that-dont-leak-starve-or-b63</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-5-multi-tenant-patterns" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. This is **Part 5&lt;/em&gt;* of* Context Engineering for Enterprise AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Parts 1–4 built the context layer for Mattrx, a multi-tenant marketing-analytics SaaS (110k MAU, ~9,000 tenants, ~3,200 req/sec peak, ASP.NET Core / .NET 9 + a separate Python FastAPI AI service). Every one of those parts quietly leaned on one primitive — the tenant boundary. This part makes it the design surface instead of an afterthought.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy is not a &lt;code&gt;WHERE tenant_id = ?&lt;/code&gt; you sprinkle on queries. It is a single boundary that has to hold across every surface of the context pipeline — identity, retrieval, memory, prompt, cache, rate limit, model routing, cost, and residency — each with its own failure mode.&lt;/p&gt;

&lt;p&gt;Mattrx results after making tenant scope a first-class context primitive (2-week build):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-tenant leak incidents (docs + cache + prompt + logs, load + red-team): &lt;strong&gt;0&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Noisy-neighbor: other tenants' p95 during the whale's nightly batch: &lt;strong&gt;6.4s -&amp;gt; 1.9s&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Prompt-cache hit on the shared system preamble: &lt;strong&gt;0% -&amp;gt; 71%&lt;/strong&gt; (~50% cheaper prefill).&lt;/li&gt;
&lt;li&gt;Per-tenant cost attribution: &lt;strong&gt;0% (one blended bill) -&amp;gt; 100%&lt;/strong&gt; (per-call ledger).&lt;/li&gt;
&lt;li&gt;Runaway-tenant spend in one hour: &lt;strong&gt;~$140 -&amp;gt; $5&lt;/strong&gt; (budget cap trips; clean 429).&lt;/li&gt;
&lt;li&gt;New-tenant isolation onboarding: &lt;strong&gt;~2 days manual -&amp;gt; &amp;lt; 5 min automated&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Retrieval p95 on the pool index with a tenant filter at ~9,000 tenants: &lt;strong&gt;31 ms (held)&lt;/strong&gt;; whale on a dedicated index: &lt;strong&gt;22 ms&lt;/strong&gt;, and pool recall@5 recovered 0.88 -&amp;gt; &lt;strong&gt;0.94&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Cost per AI query: &lt;strong&gt;$0.008 (unchanged)&lt;/strong&gt; — now attributed, capped, and routed per plan.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The one mental shift
&lt;/h2&gt;

&lt;p&gt;Stop treating the tenant as a filter you remember to add. Treat tenant scope as part of the context itself — resolved once at the edge, carried as an unforgeable token through retrieval, prompt, cache, model, and ledger. If isolation depends on anyone &lt;em&gt;remembering&lt;/em&gt; to add a filter, it will leak the day someone forgets.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Tenant identity: resolve once, never trust the body
&lt;/h2&gt;

&lt;p&gt;The first multi-tenant version read &lt;code&gt;tenant_id&lt;/code&gt; from wherever was convenient — a query string, the JSON body — and passed it to the AI service. Any authenticated user could read another tenant by changing one field.&lt;/p&gt;

&lt;p&gt;The fix: resolve the tenant &lt;strong&gt;once&lt;/strong&gt;, at the edge, from the signed &lt;code&gt;workspace_id&lt;/code&gt; claim in the JWT. Everything downstream receives a &lt;code&gt;TenantScope&lt;/code&gt; it cannot forge or widen, and &lt;em&gt;every&lt;/em&gt; method signature gains a required &lt;code&gt;TenantScope&lt;/code&gt; — so the type system refuses to compile a query that forgot the tenant. Moving identity from the body to the token closed the entire "change one field, read another tenant" class: &lt;strong&gt;0&lt;/strong&gt; successful cross-tenant reads in red-team testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Knowledge isolation: pool by default, silo on a threshold
&lt;/h2&gt;

&lt;p&gt;Three models, and why the hybrid wins:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;th&gt;Isolation&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pool&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One index, every doc tagged &lt;code&gt;tenant_id&lt;/code&gt;, hard filter on read&lt;/td&gt;
&lt;td&gt;Logical (a missing filter = leak)&lt;/td&gt;
&lt;td&gt;The long tail of small tenants&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Silo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One index per tenant&lt;/td&gt;
&lt;td&gt;Physical (nothing to forget)&lt;/td&gt;
&lt;td&gt;Whales, Enterprise, EU-residency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bridge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pool by default; promote to silo on a threshold&lt;/td&gt;
&lt;td&gt;Logical for most, physical for the few&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mattrx's choice&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pure silo is impossible at 9,000 tenants (Azure AI Search caps indexes per service). Pure pool can't give residency and lets one tenant's corpus degrade everyone's recall. Bridge keeps the cheap, instant pool for the 99% and spends physical isolation only where size, SLA, or regulation forces it. Crucially, the &lt;code&gt;tenant_id&lt;/code&gt; filter is applied &lt;strong&gt;by the router, never by the call site&lt;/strong&gt; — so no caller can forget it. Promoting the whale to its own index dropped &lt;em&gt;its&lt;/em&gt; p95 to 22 ms and recovered the long tail's recall@5 from 0.88 to &lt;strong&gt;0.94&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Prompt and policy variation: behavior as data, not branches
&lt;/h2&gt;

&lt;p&gt;Per-tenant behavior used to accrete as &lt;code&gt;if (tenantId == ...)&lt;/code&gt; branches, and large tenants got bespoke prompts rebuilt every call (a &lt;code&gt;DateTime.UtcNow&lt;/code&gt; in the system block alone meant a 0% cache hit). The fix: tenant behavior is &lt;strong&gt;data&lt;/strong&gt; (a &lt;code&gt;TenantConfig&lt;/code&gt; row), and the prompt is assembled as a &lt;strong&gt;byte-stable shared preamble first&lt;/strong&gt; (identical bytes for every tenant, so prompt caching reuses it) followed by a small tenant delta. Result: system-block cache hit &lt;strong&gt;0% -&amp;gt; 71%&lt;/strong&gt;, and every tenant inherits the &lt;em&gt;same&lt;/em&gt; injection/redaction rules — no per-tenant safety drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Cache, quota, and routing: stop one tenant starving (or impersonating) another
&lt;/h2&gt;

&lt;p&gt;Two shared resources had no tenant in them. The answer cache was keyed only by the question hash — so A's cached answer was served to B. And one global rate limiter let the whale's nightly batch consume the whole budget.&lt;/p&gt;

&lt;p&gt;Fixes: the cache key carries residency + tenant (&lt;code&gt;residency:tenant:hash&lt;/code&gt;), so answers can't cross the boundary; rate limiting is &lt;strong&gt;partitioned by tenant&lt;/strong&gt; with per-plan token buckets (&lt;code&gt;QueueLimit = 0&lt;/code&gt; → a clean 429 with &lt;code&gt;Retry-After&lt;/code&gt;, not an unbounded queue); model routing respects plan &lt;strong&gt;and&lt;/strong&gt; remaining budget (over budget downgrades the model, never 500s). Partitioned fairness held small tenants' p95 at &lt;strong&gt;1.9s&lt;/strong&gt; during the whale's batch, down from &lt;strong&gt;6.4s&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cost attribution, residency, and the RLS backstop
&lt;/h2&gt;

&lt;p&gt;Every model call books cost to a per-tenant &lt;strong&gt;ledger&lt;/strong&gt;, which feeds dashboards, the budget cap from Section 4, and a usage-based billing export. Residency is enforced by region-pinned indexes &lt;strong&gt;and&lt;/strong&gt; Azure SQL &lt;strong&gt;Row-Level Security&lt;/strong&gt;, so even a buggy query that forgets the tenant filter returns &lt;em&gt;nothing&lt;/em&gt; instead of leaking. The ledger took attribution &lt;strong&gt;0% -&amp;gt; 100%&lt;/strong&gt; and capped a runaway tenant's hour at &lt;strong&gt;$5&lt;/strong&gt; instead of ~$140; RLS + region pinning kept EU data in-region with &lt;strong&gt;0&lt;/strong&gt; cross-region reads in audit.&lt;/p&gt;

&lt;p&gt;The subtle part: RLS can &lt;em&gt;mask&lt;/em&gt; a missing app filter by silently returning no rows. So Mattrx &lt;strong&gt;alerts&lt;/strong&gt; when RLS filters a row the app should have scoped — the backstop catching something is a signal, not a success.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing mental model
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy is one boundary, enforced in many places, resolved once and never re-derived. Three habits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Resolve at the edge, carry as a token.&lt;/strong&gt; Derive &lt;code&gt;TenantScope&lt;/code&gt; from auth once; make every layer take it as a required input; let the type system reject any tenant-less call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default to pool, promote on a threshold.&lt;/strong&gt; Keep the cheap shared path for the 99%; spend physical isolation only where size, SLA, or regulation demands it — and automate the promotion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforce isolation twice.&lt;/strong&gt; App filter for speed, RLS for the day the filter is missing. If a single forgotten &lt;code&gt;WHERE&lt;/code&gt; can leak, you have not isolated anything — you've documented an intention.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;👉 &lt;strong&gt;The full article — with all the C# (.NET 9) and Python code, the end-to-end boundary diagram, the pool/silo/bridge comparison, the pre-ship checklist, and the "honest stuff" caveats — is on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-5-multi-tenant-patterns" rel="noopener noreferrer"&gt;Context Engineering for Enterprise AI, Part 5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dotnet</category>
      <category>architecture</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI, Part 4: Enterprise AI Design — Governance, Cost &amp; Safety</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Tue, 23 Jun 2026 15:10:19 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-4-enterprise-ai-design-governance-cost-safety-1kkh</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-4-enterprise-ai-design-governance-cost-safety-1kkh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-4-enterprise-ai-design" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;. This is **Part 4 of 6&lt;/em&gt;* of* Context Engineering for Enterprise AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Parts 1–3 gave us a context pipeline, a memory layer, and a multi-agent architecture. All real, all measurable — and all still a demo until you wrap them in what this part covers: governance, security, evaluation, observability, cost control, and reliability. That is the enterprise design that lets you ship AI to 110k paying users without losing sleep, money, or a compliance audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;A context pipeline without governance is a liability, not a feature. The hard part of enterprise AI is not the model — it's the boundary around it.&lt;/p&gt;

&lt;p&gt;Production metrics after the full enterprise design is in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong-answer / hallucination rate: 18% (naive RAG) → &lt;strong&gt;3%&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Faithfulness (groundedness) eval score: &lt;strong&gt;0.96&lt;/strong&gt;; answer-relevance: &lt;strong&gt;0.91&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Eval gate threshold: any change dropping faithfulness below &lt;strong&gt;0.90&lt;/strong&gt; is blocked in CI.&lt;/li&gt;
&lt;li&gt;Prompt-injection attempts blocked at the boundary: ~&lt;strong&gt;40/week&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Cost per AI query: $0.021 → &lt;strong&gt;$0.008&lt;/strong&gt; (caching + model routing + context compression).&lt;/li&gt;
&lt;li&gt;Context tokens per request: ~14,000 → &lt;strong&gt;~3,500&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Agentic query p95: 4.2s → &lt;strong&gt;1.8s&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The C# app API p95 stays &lt;strong&gt;120 ms&lt;/strong&gt; — the AI work never bled into the product API.&lt;/li&gt;
&lt;li&gt;Every AI response carries a trace id + an immutable audit row (prompt hash, tokens, cost, citations).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The one mental shift
&lt;/h2&gt;

&lt;p&gt;Stop treating the model as the system. The model is one untrusted, non-deterministic dependency inside a system you do govern. Everything around it — eval gates, the security boundary, cost routing, tracing, audit — is the part you actually own, test, and are accountable for. Engineer that, and the model becomes swappable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation gates: stop shipping prompts on vibes
&lt;/h2&gt;

&lt;p&gt;A prompt change is a code change with a non-deterministic compiler. You'd never merge a refactor without tests; don't merge a system-prompt edit without an eval.&lt;/p&gt;

&lt;p&gt;A golden set of ~200 curated (question, ideal-answer, must-cite-source) tuples lives in version control. Every prompt or model change runs the offline harness in CI, scoring &lt;strong&gt;faithfulness&lt;/strong&gt; and &lt;strong&gt;answer-relevance&lt;/strong&gt;. A change that drops faithfulness below &lt;strong&gt;0.90&lt;/strong&gt; fails the build. We sit at &lt;strong&gt;0.96&lt;/strong&gt; faithfulness, &lt;strong&gt;0.91&lt;/strong&gt; relevance.&lt;/p&gt;

&lt;p&gt;Offline catches regressions; online catches drift. We sample ~2% of live traffic and run the same groundedness judge asynchronously (never on the hot path), alerting if rolling faithfulness dips.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security: the boundary that says no
&lt;/h2&gt;

&lt;p&gt;Every AI request passes through &lt;code&gt;AiGovernanceMiddleware&lt;/code&gt; before it can reach the Python service. It enforces RBAC, stamps the authenticated &lt;code&gt;tenant_id&lt;/code&gt; (never trusting a client-supplied one), redacts PII, and runs an injection classifier. Only a sanitized, scoped request crosses the HTTP boundary.&lt;/p&gt;

&lt;p&gt;The injection classifier is cheap on the Python side — a small, fast model plus a deny-pattern check, kept off the expensive model entirely. PII redaction happens in C# at both ingress (before the model sees it) and egress (before we log or store the answer).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; the boundary blocks ~40 prompt-injection attempts per week, and zero cross-tenant retrievals have occurred since &lt;code&gt;tenant_id&lt;/code&gt; enforcement moved from "in the query" to "in the token."&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost and reliability: budgets, routing, and graceful failure
&lt;/h2&gt;

&lt;p&gt;At 3,200 req/sec, a 2-cent query versus a 0.8-cent query is a $30k/month argument. And the Python service &lt;em&gt;will&lt;/em&gt; go down; the only question is whether the user sees a 500 or a graceful degrade.&lt;/p&gt;

&lt;p&gt;The C# client wraps the call in a Polly resilience pipeline (timeout + circuit breaker + fallback), and a budget gate refuses queries from a tenant that has blown its monthly AI spend. The Python service routes cheap tasks to a small model. Combined with caching and context compression, cost per query dropped from $0.021 to &lt;strong&gt;$0.008&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and audit: trace every prompt, token, and citation
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry spans flow from the C# request through the HTTP boundary into the Python service and back, carrying the same trace id. Every AI response writes an immutable audit row: prompt hash, model, token counts, cost, and the exact citations. Mean time to answer "what did the AI cite for this response" went from "we can't" to under 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing mental model
&lt;/h2&gt;

&lt;p&gt;The model is the cheapest, most replaceable part of an enterprise AI system. The eval gate, the security boundary, the cost router, and the audit trail are the product — and they're the parts you can actually be held accountable for.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;No prompt or model change merges without passing the eval gate.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The boundary is the only door.&lt;/strong&gt; Every AI request goes through governance or it doesn't go at all.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If you can't trace it and audit it, it didn't happen.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;👉 &lt;strong&gt;The full article — with all the C# (.NET 9) and Python code, the architecture diagram, the pre-ship checklist, and the "honest stuff" caveats — is on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-4-enterprise-ai-design" rel="noopener noreferrer"&gt;Context Engineering for Enterprise AI, Part 4&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dotnet</category>
      <category>architecture</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI, Part 3: Multi-Agent Architecture That Survives Production</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Sun, 21 Jun 2026 17:03:03 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-3-multi-agent-architecture-that-survives-production-2bjh</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-3-multi-agent-architecture-that-survives-production-2bjh</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-3-multi-agent-architecture" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most "AI agents" in production are one giant agent with every tool and a 10,000-token prompt. It loops, stalls, and ships confident nonsense. This is Part 3 of my Context Engineering series.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reframe
&lt;/h2&gt;

&lt;p&gt;A single agent with every tool isn't "one smart system" — it's a state machine with no states, no guards, and no exits. You don't fix that with a better prompt. You fix it with structure: &lt;strong&gt;the model decides; the graph governs.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;ASP.NET Core owns orchestration, budgets, and governance; a Python LangGraph service runs the agent graph:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Supervisor → specialized workers&lt;/strong&gt; — a cheap supervisor (gpt-4o-mini) routes to a retriever, analyst, writer, and critic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model routing&lt;/strong&gt; — only the analyst uses the expensive model; the rest run on mini&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Typed hand-offs&lt;/strong&gt; — workers pass structured data (Pydantic + C# records), not prose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded loops + a critic gate&lt;/strong&gt; — a hard step budget, and it can't finish until a critic verifies the answer is grounded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallelism with failure isolation&lt;/strong&gt; — concurrent retrieval with per-fetch timeouts; one slow tool degrades to best-effort instead of killing the run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The C# boundary&lt;/strong&gt; owns the global wall-clock budget, cost cap, and prompt-injection screen&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;4.2s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.8s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per query&lt;/td&gt;
&lt;td&gt;$0.021&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.008&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context tokens / agentic request&lt;/td&gt;
&lt;td&gt;~12,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~3,800&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runaway loops (&amp;gt;12 calls)&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Endpoint 500 rate&lt;/td&gt;
&lt;td&gt;1.4%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Three habits: no worker without a budget, no hand-off without a type, no done without a verdict.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read the full breakdown — with all the C# and Python code — on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-3-multi-agent-architecture" rel="noopener noreferrer"&gt;https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-3-multi-agent-architecture&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>dotnet</category>
      <category>python</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI, Part 2: The Memory Layer That Makes Agents Useful</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Sat, 20 Jun 2026 19:19:26 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-2-the-memory-layer-that-makes-agents-useful-1god</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-part-2-the-memory-layer-that-makes-agents-useful-1god</guid>
      <description>&lt;h2&gt;
  
  
  published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-2-memory-layer" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;.*
&lt;/h2&gt;

&lt;p&gt;Your AI agent forgets everything the moment a request ends. That's not a model limitation — it's a missing &lt;strong&gt;memory layer&lt;/strong&gt;. This is Part 2 of my Context Engineering series.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reframe
&lt;/h2&gt;

&lt;p&gt;The model is a stateless function. Memory is an &lt;strong&gt;enterprise database with embeddings bolted on&lt;/strong&gt; — owned, scoped, audited, and forgettable. Not a chat transcript you keep pasting back into the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;Built across ASP.NET Core (system-of-record + governance on Azure SQL) and a Python FastAPI service (embeddings + semantic recall on Azure AI Search):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tiered memory&lt;/strong&gt; — working → short-term (Redis + SQL) → long-term episodic + semantic (vector index)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A salience-gated write policy&lt;/strong&gt; — store what matters, not every turn (long-term writes cut ~85%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval blends similarity + recency&lt;/strong&gt;, packed into the Part 1 token budget&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant + user as hard query filters&lt;/strong&gt; — never prompt instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Right-to-be-forgotten&lt;/strong&gt; that fans out to SQL + vectors + cache in under 5 minutes (GDPR Art. 17)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context tokens at turn 30&lt;/td&gt;
&lt;td&gt;~3,500 (vs ~14,000 history-stuffing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per query&lt;/td&gt;
&lt;td&gt;$0.021 → $0.008&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-tenant leaks (red-team)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory retrieval p95&lt;/td&gt;
&lt;td&gt;74 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-session continuity&lt;/td&gt;
&lt;td&gt;resumes the prior thread, no re-explaining&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The model is stateless. Memory is infrastructure you own — scoped, audited, and forgettable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read the full breakdown — with all the C# and Python code — on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-2-memory-layer" rel="noopener noreferrer"&gt;https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-2-memory-layer&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>dotnet</category>
      <category>python</category>
    </item>
    <item>
      <title>Context Engineering for Enterprise AI: Cutting RAG Hallucination from 18% to 3% (C# + Python)</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Wed, 17 Jun 2026 19:18:42 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-cutting-rag-hallucination-from-18-to-3-c-python-2d3f</link>
      <guid>https://dev.to/kirandeepjassalcrypto/context-engineering-for-enterprise-ai-cutting-rag-hallucination-from-18-to-3-c-python-2d3f</guid>
      <description>&lt;h2&gt;
  
  
  &lt;em&gt;Originally published on &lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-1-context-management" rel="noopener noreferrer"&gt;PrepStack&lt;/a&gt;.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;We took an enterprise RAG assistant from an &lt;strong&gt;18% wrong-answer rate to 3%&lt;/strong&gt; — without changing the model. The lever wasn't the prompt. It was the &lt;strong&gt;context&lt;/strong&gt; we assembled and fed the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mental shift
&lt;/h2&gt;

&lt;p&gt;The model isn't your product; the context you assemble is. Prompt engineering tweaks the wording. Context engineering controls what data enters the window. Treat the context window like a CPU cache — a scarce, governed resource — not a junk drawer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Naive top-k RAG dumped 8 fuzzy chunks into a ~14,000-token prompt and hoped. We replaced it with a real pipeline, split across ASP.NET Core (orchestration) and a Python FastAPI service (retrieval + ranking):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rewrite&lt;/strong&gt; the vague user question into a self-contained query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval&lt;/strong&gt; — BM25 keyword + vector, not vector-only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-encoder re-rank&lt;/strong&gt; a wide candidate pool down to the best 6&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget&lt;/strong&gt; the window (~3,500 tokens, every byte allocated)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compress&lt;/strong&gt; chunks to only the sentences that matter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ground + cite&lt;/strong&gt; every claim — or &lt;strong&gt;refuse&lt;/strong&gt; and route to a human&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hallucination rate&lt;/td&gt;
&lt;td&gt;18%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context tokens/request&lt;/td&gt;
&lt;td&gt;~14,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~3,500&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per query&lt;/td&gt;
&lt;td&gt;$0.021&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.008&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval recall@5&lt;/td&gt;
&lt;td&gt;0.71&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.94&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The context window is a budget you spend on relevance, not a bucket you fill with hope.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read the full breakdown — with all the C# and Python code — on PrepStack:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-1-context-management" rel="noopener noreferrer"&gt;https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-1-context-management&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>dotnet</category>
      <category>python</category>
    </item>
    <item>
      <title>We Replaced REST with Kafka and Cut API Failures 90% — .NET 9 Event-Driven Architecture</title>
      <dc:creator>kirandeepjassal-crypto</dc:creator>
      <pubDate>Sun, 14 Jun 2026 13:36:35 +0000</pubDate>
      <link>https://dev.to/kirandeepjassalcrypto/we-replaced-rest-with-kafka-and-cut-api-failures-90-net-9-event-driven-architecture-43fo</link>
      <guid>https://dev.to/kirandeepjassalcrypto/we-replaced-rest-with-kafka-and-cut-api-failures-90-net-9-event-driven-architecture-43fo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — We swapped Mattrx's inter-service REST calls for Kafka topics (multi-tenant marketing analytics SaaS, .NET 9 / ASP.NET Core, Azure SQL, ~3,200 req/sec peak). Over 8 weeks: end-to-end ingestion failures &lt;strong&gt;1.9% → 0.18% (−90%)&lt;/strong&gt;, ingestion p95 &lt;strong&gt;180 ms → 8 ms (−96%)&lt;/strong&gt;, events lost on downstream outage &lt;strong&gt;tens of thousands → 0&lt;/strong&gt;, cascading-failure incidents &lt;strong&gt;~3/month → 0&lt;/strong&gt;, time to add a new consumer &lt;strong&gt;~1 sprint → ~1 day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;🎥 &lt;strong&gt;3-minute video walkthrough:&lt;/strong&gt; &lt;a href="https://youtu.be/O21rbuQdM1Y" rel="noopener noreferrer"&gt;https://youtu.be/O21rbuQdM1Y&lt;/a&gt;&lt;br&gt;
👉 &lt;strong&gt;Full deep-dive (architecture, code, pre-adoption checklist, when NOT to reach for Kafka):&lt;/strong&gt; &lt;a href="https://prepstack.co.in/blog/replaced-rest-with-kafka-cut-failures-90-percent" rel="noopener noreferrer"&gt;https://prepstack.co.in/blog/replaced-rest-with-kafka-cut-failures-90-percent&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The one mental shift
&lt;/h2&gt;

&lt;p&gt;The reflex in a .NET shop is: service A needs something to happen in service B, so A calls B's REST endpoint and waits. That's correct for a &lt;strong&gt;query&lt;/strong&gt; ("give me this tenant's plan") — A genuinely needs the answer. It's wrong for an &lt;strong&gt;event&lt;/strong&gt; ("this campaign event occurred") — A doesn't need anything back, yet it's now blocked on B, C, and D being healthy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Distinguish commands/queries (need an answer → REST) from events (fire-and-forget → log).&lt;/strong&gt; A synchronous call makes the caller's uptime a function of the callee's uptime. An event published to a durable log decouples them in time — the producer succeeds the instant the event is durably written, consumers process whenever they can.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you stop treating "something happened" as a function call and start treating it as a fact appended to a log, cascading failures, retry storms, and burst overload mostly disappear — because nothing downstream is on the request's critical path anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before — synchronous REST chain (the chain that breaks)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customer site ──► Collector ──► Enrichment ──► Analytics ──► Persister
                  (waits)       (waits)       (waits)      (waits)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BEFORE — the collector waits on the whole chain. Any failure loses the event.&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CampaignEvent&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;enriched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_enrichClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EnrichAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// can fail&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_analyticsClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RollupAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// can fail&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_persistClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SaveAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            &lt;span class="c1"&gt;// can fail&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                                      &lt;span class="c1"&gt;// only if ALL three succeed&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Availability multiplies. Four 99.9%s in series = ~99.6% uptime. Analytics slow at month-end → Collector times out → customer's event is gone → customer retries → MORE load on the struggling service → retry storm.&lt;/p&gt;

&lt;h2&gt;
  
  
  After — Kafka log decouples the pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customer site ──► Collector ──► [Kafka: events.raw]
                  returns 202        │
                  in ~8ms            ├──► [analytics group]   (independent)
                                     ├──► [persister group]   (independent)
                                     └──► [enrichment group]  (independent)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AFTER — produce ONE event to Kafka and return. No downstream on the critical path.&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CampaignEvent&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;]&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;                  &lt;span class="c1"&gt;// per-tenant ordering&lt;/span&gt;
        &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SerializeToUtf8Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ProduceAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"events.raw"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// durable, ~ms&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Accepted&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                            &lt;span class="c1"&gt;// 202 — collector p95: 8 ms&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The collector succeeds the instant the event is in the log. The customer NEVER waits on a consumer. Analytics down? Its consumer group lags and catches up later. Zero loss, zero customer impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  The consumer — groups, manual commit, retry, DLQ
&lt;/h2&gt;

&lt;p&gt;Three things matter: a &lt;strong&gt;consumer group&lt;/strong&gt; (so partitions parallelize), &lt;strong&gt;manual offset commit&lt;/strong&gt; (commit only after successful processing — at-least-once delivery), and a &lt;strong&gt;retry + dead-letter&lt;/strong&gt; path (so a poison message can't block the partition forever).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;ExecuteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"events.raw"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsCancellationRequested&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deserialize&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;CampaignEvent&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)!;&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rollup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ApplyAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                              &lt;span class="c1"&gt;// do the work&lt;/span&gt;
            &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                                       &lt;span class="c1"&gt;// ✅ commit AFTER success&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TransientException&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Attempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ProduceAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"events.retry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LogError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Poison → DLQ at offset {Offset}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Offset&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ProduceAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"events.dlq"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                                       &lt;span class="c1"&gt;// don't block the partition&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consumer lag is the one health metric. If lag climbs, the consumer can't keep up. Scale it. The producer is untouched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The outbox — atomic produce with a DB write
&lt;/h2&gt;

&lt;p&gt;There's one trap. If the same handler writes to Azure SQL &lt;em&gt;and&lt;/em&gt; produces to Kafka, those are two systems. A crash between them loses or duplicates the event. For anything tied to a committed DB change, use the &lt;strong&gt;outbox pattern&lt;/strong&gt;: write the event to an outbox table in the same SQL transaction as the business data; a background relay reads unsent rows and publishes them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Atomic: the campaign row AND the event commit in ONE SQL transaction.&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;BeginTransactionAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Campaigns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Outbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;OutboxMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"events.raw"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SaveChangesAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CommitAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Background relay reads unsent outbox rows, produces to Kafka, marks them sent.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dual-write inconsistencies (the "DB says published, no event fired" bug class) → 0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idempotent consumers — because delivery is at-least-once
&lt;/h2&gt;

&lt;p&gt;Kafka gives you at-least-once delivery. A consumer can crash after processing but before committing and reprocess on restart. So consumers must be &lt;strong&gt;idempotent&lt;/strong&gt;. The simplest tool is a dedup key with a unique constraint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;ApplyAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CampaignEvent&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;firstTime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ProcessedEvents&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ExecuteInsertIfAbsentAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                 &lt;span class="c1"&gt;// unique index on EventId&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;firstTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                                     &lt;span class="c1"&gt;// duplicate → no-op, safe&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Aggregates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;IncrementAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CampaignId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At-least-once delivery + idempotent processing = &lt;strong&gt;effectively-once&lt;/strong&gt;, and it's far simpler than chasing true exactly-once.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two superpowers REST never gave us
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Replay.&lt;/strong&gt; A bug corrupted yesterday's analytics rollup? Reset that one consumer group's offset and reprocess. The other groups are untouched. The raw events are still in the log — the log is the source of truth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kafka-consumer-groups &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BROKERS&lt;/span&gt; &lt;span class="nt"&gt;--group&lt;/span&gt; analytics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reset-offsets&lt;/span&gt; &lt;span class="nt"&gt;--to-datetime&lt;/span&gt; 2026-06-10T00:00:00.000 &lt;span class="nt"&gt;--topic&lt;/span&gt; events.raw &lt;span class="nt"&gt;--execute&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Adding a consumer for free.&lt;/strong&gt; When we shipped fraud scoring, we created a new consumer group on the same &lt;code&gt;events.raw&lt;/code&gt; topic. The producer didn't change. &lt;strong&gt;~1 day instead of ~1 sprint.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Aggregate metrics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before (REST)&lt;/th&gt;
&lt;th&gt;After (Kafka)&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ingestion failures (incident windows)&lt;/td&gt;
&lt;td&gt;1.9%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.18%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;−90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Events lost on downstream outage&lt;/td&gt;
&lt;td&gt;tens of thousands&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;eliminated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ingestion p95&lt;/td&gt;
&lt;td&gt;180 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;−96%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cascading-failure incidents / mo&lt;/td&gt;
&lt;td&gt;~3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;eliminated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry-storm load during incidents&lt;/td&gt;
&lt;td&gt;severe&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;none&lt;/strong&gt; (202 + walk away)&lt;/td&gt;
&lt;td&gt;eliminated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to add a new consumer&lt;/td&gt;
&lt;td&gt;~1 sprint&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~1 day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;−80%+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replay a bad day of processing&lt;/td&gt;
&lt;td&gt;impossible&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;one command&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dual-write inconsistencies (outbox)&lt;/td&gt;
&lt;td&gt;recurring&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;eliminated&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most of the win isn't throughput — it's &lt;strong&gt;failure isolation&lt;/strong&gt;. The producer succeeds against a log that's almost always up, and every downstream problem became "a consumer is lagging" instead of "the pipeline is down and we're losing data."&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to reach for Kafka (honest section)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't replace request/response with Kafka.&lt;/strong&gt; If the caller needs the answer, that's a query — REST/gRPC is correct.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kafka is operational weight.&lt;/strong&gt; Brokers, partitions, schema registry, lag monitoring. For modest volume, Azure Service Bus or a DB-backed queue gives most of the decoupling with far less to operate. We used &lt;strong&gt;managed Kafka&lt;/strong&gt; (Confluent Cloud) precisely because a 5-person team doesn't run brokers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Exactly-once" is a trap.&lt;/strong&gt; Plan for at-least-once + idempotent consumers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ordering is per-partition, not global.&lt;/strong&gt; Design your key around the ordering you actually need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You traded synchronous errors for asynchronous lag.&lt;/strong&gt; Watch lag, or you've hidden the problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On Azure, weigh Service Bus / Event Hubs first.&lt;/strong&gt; Kafka was a deliberate choice (replay + ecosystem), not a default.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The mental model in one line
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A synchronous call couples you to the callee being up; an event in a log couples you to the log being up.&lt;/strong&gt; For anything that's "this happened" rather than "tell me this," publishing to a durable log decouples producers from consumers in time — and most of resilience, burst absorption, and extensibility falls out of that one change.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;3-minute video walkthrough on YouTube:&lt;/strong&gt; &lt;a href="https://youtu.be/O21rbuQdM1Y" rel="noopener noreferrer"&gt;https://youtu.be/O21rbuQdM1Y&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full deep-dive with architecture diagrams, the complete outbox teardown, idempotency patterns, replay commands, and the honest list of when NOT to reach for Kafka:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://prepstack.co.in/blog/replaced-rest-with-kafka-cut-failures-90-percent" rel="noopener noreferrer"&gt;https://prepstack.co.in/blog/replaced-rest-with-kafka-cut-failures-90-percent&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If this saved you from a "the pipeline went down at 3 AM" page, a ❤️ or 🦄 helps it reach more backend engineers.&lt;/p&gt;

&lt;p&gt;What's the worst cascading-failure incident you've inherited because someone Kafka-ed a query — or REST-ed an event?&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>kafka</category>
      <category>microservices</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
