<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Avinash Zala</title>
    <description>The latest articles on DEV Community by Avinash Zala (@avinash_zala_1c6f5e7c4af9).</description>
    <link>https://dev.to/avinash_zala_1c6f5e7c4af9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3996719%2F705250aa-a75e-4df5-9631-4edbbf936acd.jpeg</url>
      <title>DEV Community: Avinash Zala</title>
      <link>https://dev.to/avinash_zala_1c6f5e7c4af9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/avinash_zala_1c6f5e7c4af9"/>
    <language>en</language>
    <item>
      <title>Clean Architecture in .NET 8: A 2026 Starter Template with 4 Projects, EF Core, and JWT Auth</title>
      <dc:creator>Avinash Zala</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:47:52 +0000</pubDate>
      <link>https://dev.to/avinash_zala_1c6f5e7c4af9/clean-architecture-in-net-8-a-2026-starter-template-with-4-projects-ef-core-and-jwt-auth-3pik</link>
      <guid>https://dev.to/avinash_zala_1c6f5e7c4af9/clean-architecture-in-net-8-a-2026-starter-template-with-4-projects-ef-core-and-jwt-auth-3pik</guid>
      <description>&lt;p&gt;I joined a team where the controller was 800 lines long, the business rules were scattered between the controller and the &lt;code&gt;DbContext&lt;/code&gt;, and "to run the tests, spin up a SQL Server in Docker" was a sentence I heard every week. The fix was Clean Architecture. The argument I had with the team lead was about how to actually structure it. We argued for two weeks. Then I built this template so the next person wouldn't have to.&lt;/p&gt;

&lt;p&gt;This is the Clean Architecture .NET 8 starter template I wish someone had handed me on day one. Four projects, strict dependency direction, domain entities that own their own invariants, and an Application layer you can unit test with Moq — no database required. The whole repo is &lt;a href="https://github.com/ZalaAvinash/dotnet-clean-architecture-starter" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt;, MIT-licensed, runs with &lt;code&gt;dotnet run&lt;/code&gt;, and ships with xUnit tests, JWT auth, Swagger, Docker, and CI.&lt;/p&gt;

&lt;p&gt;This post is the explanation of why each project exists, what goes in it, and what I learned the hard way about getting Clean Architecture right in .NET.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem Clean Architecture solves
&lt;/h2&gt;

&lt;p&gt;The naive way to build a .NET Web API is one project, one folder structure, and "everything talks to everything":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MyApp/
  Controllers/
    ProductsController.cs    ← HTTP stuff
    OrdersController.cs      ← HTTP stuff + business rules
  Services/
    ProductService.cs        ← business rules + DbContext.SaveChanges
  Data/
    AppDbContext.cs          ← EF Core, entities
  Models/
    Product.cs               ← POCO with public setters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works for the first 1,000 lines. By 5,000 lines, the controller is doing five things at once. By 10,000, "to test this, I need a database" is the answer to every test question, and your CI takes 20 minutes because every test run spins up SQL Server.&lt;/p&gt;

&lt;p&gt;Clean Architecture says: &lt;strong&gt;separate the business rules from the HTTP boundary, separate the database from the business rules, and enforce it with project references.&lt;/strong&gt; A controller is allowed to call a service. A service is allowed to call a repository. A repository is allowed to know about EF Core. &lt;strong&gt;Nothing is allowed to know about anything "above" it in the chain.&lt;/strong&gt; That's the rule.&lt;/p&gt;

&lt;p&gt;The .NET implementation is four projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                ┌─────────────────────────────────┐
                │           Api project           │  ← HTTP, DTOs, Swagger
                │   (Controllers, Program.cs)     │
                └────────────┬────────────────────┘
                             │ references
                ┌────────────▼────────────────────┐
                │       Application project       │  ← Use cases, services
                │  (DTOs, Interfaces, Services)   │
                └────────────┬────────────────────┘
                             │ references
                ┌────────────▼────────────────────┐
                │      Infrastructure project     │  ← EF Core, JWT
                │ (DbContext, Repositories)       │
                └────────────┬────────────────────┘
                             │ references
                ┌────────────▼────────────────────┐
                │        Domain project           │  ← Zero dependencies
                │  (Entities, Interfaces)         │
                └─────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The arrows only go down. The Domain project has zero NuGet dependencies outside the BCL. The Application project references Domain only. The Infrastructure project references both, but Domain and Application never reference Infrastructure. &lt;strong&gt;The compiler enforces the direction.&lt;/strong&gt; A circular reference is a build error, not a code review comment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1 — the Domain layer
&lt;/h2&gt;

&lt;p&gt;The Domain project contains the entities and the abstract interfaces that the Application layer depends on. No EF Core, no ASP.NET, no HTTP, no JSON. Just business rules.&lt;/p&gt;

&lt;p&gt;The interesting file is &lt;code&gt;Product.cs&lt;/code&gt;. It's not a POCO with public setters. It's a class with a &lt;code&gt;DecreaseStock&lt;/code&gt; method that enforces its own invariants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseEntity&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Empty&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;Description&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Empty&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;decimal&lt;/span&gt; &lt;span class="n"&gt;Price&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;StockQuantity&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Domain logic — entity enforces its own invariants&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="nf"&gt;IsInStock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;requestedQuantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;StockQuantity&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;requestedQuantity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;DecreaseStock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ArgumentException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Quantity must be positive."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;nameof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="nf"&gt;IsInStock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;InvalidOperationException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"Insufficient stock for product '&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'. "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
                                               &lt;span class="s"&gt;$"Requested &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, available &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;StockQuantity&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;StockQuantity&lt;/span&gt; &lt;span class="p"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;UpdatedAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UtcNow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial thing is that &lt;code&gt;DecreaseStock&lt;/code&gt; is the only way to change &lt;code&gt;StockQuantity&lt;/code&gt; from outside the class. The setter is &lt;code&gt;public&lt;/code&gt; because EF Core needs it, but the domain logic is "you can't decrease stock by a negative number, and you can't decrease stock below zero." That logic lives on the entity, in Domain. The service, the controller, the test — none of them have to remember to validate. They call &lt;code&gt;product.DecreaseStock(quantity)&lt;/code&gt; and the entity decides.&lt;/p&gt;

&lt;p&gt;This is the part of Clean Architecture that actually matters. The dependency-direction rule is hygiene. The "entities own their invariants" rule is the architecture. If you take one thing from this post, take that: &lt;strong&gt;business rules belong on the business objects, not on the services that manipulate them.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — the Application layer
&lt;/h2&gt;

&lt;p&gt;The Application project contains the use cases. A use case is a verb at the level of the user, not the database. "Create a product" is a use case. "Insert a row into the products table" is not.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ProductService.cs&lt;/code&gt; is the canonical example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductService&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IProductService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IUnitOfWork&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ProductService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IUnitOfWork&lt;/span&gt; &lt;span class="n"&gt;unitOfWork&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_unitOfWork&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unitOfWork&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProductDto&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetByIdAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetByIdAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;MapToDto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProductDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;CreateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CreateProductDto&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;StockQuantity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StockQuantity&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SaveChangesAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;MapToDto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice.&lt;/p&gt;

&lt;p&gt;First, the service takes &lt;code&gt;IUnitOfWork&lt;/code&gt;, not &lt;code&gt;AppDbContext&lt;/code&gt;. This is the testability win. &lt;code&gt;IUnitOfWork&lt;/code&gt; is an interface declared in the Domain project. The Infrastructure project provides the EF Core implementation. The Application project never knows EF Core exists. In a unit test, I write a fake &lt;code&gt;IUnitOfWork&lt;/code&gt; in 10 lines and the test runs in 0 ms, with no database.&lt;/p&gt;

&lt;p&gt;Second, the service returns DTOs, not entities. DTOs are flat data shapes designed for the wire. Entities are rich domain objects with behaviour. If I returned &lt;code&gt;Product&lt;/code&gt; from the service, the controller would have access to &lt;code&gt;product.DecreaseStock&lt;/code&gt;, and the controller could break invariants. The DTO is a contract: "this is what the service exposes to the outside world, and nothing more."&lt;/p&gt;

&lt;p&gt;The "decrease stock" use case is the most interesting one because it touches three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OrderDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;CreateOrderAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CreateOrderDto&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetByIdAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ProductId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;InvalidOperationException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"Product &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ProductId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Domain logic — product enforces its own invariants&lt;/span&gt;
    &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;DecreaseStock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Quantity&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;CustomerId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Items&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OrderItem&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;ProductId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Quantity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Quantity&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_unitOfWork&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SaveChangesAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;MapToDto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The service does three things: load the product, ask the product to decrease its own stock, save the order. The validation logic — "you can't order 5 of something that has 3 in stock" — is in the entity. The service doesn't know about that rule. The service couldn't bypass it even if it tried, because &lt;code&gt;DecreaseStock&lt;/code&gt; throws and the exception propagates up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3 — the Infrastructure layer
&lt;/h2&gt;

&lt;p&gt;The Infrastructure project is where the rest of the world lives. EF Core, the SQL Server or SQLite provider, the JWT signing service, anything that talks to a network or a file system. Domain and Application never know this project exists.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AppDbContext.cs&lt;/code&gt; is the EF Core surface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AppDbContext&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DbContext&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;AppDbContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DbContextOptions&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AppDbContext&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DbSet&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DbSet&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DbSet&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;DbSet&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OrderItem&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;OrderItems&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OrderItem&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;

    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;OnModelCreating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ModelBuilder&lt;/span&gt; &lt;span class="n"&gt;modelBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;modelBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ApplyConfigurationsFromAssembly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AppDbContext&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;Assembly&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repositories are the bridge between the service layer and EF Core:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IProductRepository&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;AppDbContext&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ProductRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AppDbContext&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetByIdAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FindAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IReadOnlyList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetAllAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AsNoTracking&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;ToListAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IReadOnlyList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetInStockAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AsNoTracking&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StockQuantity&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ToListAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;AddAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note &lt;code&gt;IProductRepository&lt;/code&gt; is declared in the &lt;strong&gt;Domain&lt;/strong&gt; project, not Infrastructure. The Application layer's &lt;code&gt;IUnitOfWork&lt;/code&gt; references &lt;code&gt;IProductRepository&lt;/code&gt; and gets the EF Core implementation injected at startup. The Application layer doesn't know it's EF Core. Swap &lt;code&gt;Repository&amp;lt;Product&amp;gt;&lt;/code&gt; for a Dapper implementation, or an in-memory fake for tests — the service code doesn't change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4 — the API layer
&lt;/h2&gt;

&lt;p&gt;The API project is the HTTP boundary. Controllers are thin. They take a request, hand it to a service, return the result. No business logic. No &lt;code&gt;DbContext&lt;/code&gt;. No SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ApiController&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"api/[controller]"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductsController&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ControllerBase&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IProductService&lt;/span&gt; &lt;span class="n"&gt;_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ProductsController&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IProductService&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;_service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;HttpGet&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IReadOnlyList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProductDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetAllAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;HttpGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{id:int}"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProductDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetByIdAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;NotFound&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;HttpPost&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;Authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Roles&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Admin"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProductDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CreateProductDto&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;CreatedAtAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;nameof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GetById&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The controller is 30 lines. There's nothing to test here that isn't already tested in the service. In a real codebase I'd skip controller unit tests entirely and rely on integration tests with &lt;code&gt;WebApplicationFactory&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The auth flow is a stripped-down demo. &lt;code&gt;POST /api/auth/login&lt;/code&gt; with any email and any password returns a JWT. In production, replace this with ASP.NET Identity or an external IdP. The JWT plumbing, the &lt;code&gt;[Authorize]&lt;/code&gt; attribute, and the Swagger "Authorize" button are all real and work — only the credential check is fake.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5 — what I got wrong
&lt;/h2&gt;

&lt;p&gt;Five things, in order of how much they cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Calling the entity an "anemic domain model" prematurely
&lt;/h3&gt;

&lt;p&gt;When I first wrote &lt;code&gt;Product&lt;/code&gt; as a POCO with public setters and put all the logic in &lt;code&gt;ProductService&lt;/code&gt;, I called it Clean Architecture. It was not. It was an anemic domain model with extra project references.&lt;/p&gt;

&lt;p&gt;The fix was hard. I had to move &lt;code&gt;DecreaseStock&lt;/code&gt;, &lt;code&gt;IsInStock&lt;/code&gt;, and the audit-timestamp logic from the service into the entity. The service stopped doing math. The tests changed — instead of testing the service, I tested the entity. The result is that the controller, the service, the test, and the future "bulk update from an admin tool" all share the same invariants. The rule is on the data, not on the caller.&lt;/p&gt;

&lt;p&gt;The lesson: "Clean Architecture" without rich entities is just folders. The architecture is the placement of behaviour, not the placement of files.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Repositories with too many methods
&lt;/h3&gt;

&lt;p&gt;I started with &lt;code&gt;IProductRepository.GetByIdAsync&lt;/code&gt;, &lt;code&gt;GetByIdWithOrdersAsync&lt;/code&gt;, &lt;code&gt;GetActiveProductsAsync&lt;/code&gt;, &lt;code&gt;GetProductsByCategoryAsync&lt;/code&gt;, &lt;code&gt;GetProductsByPriceRangeAsync&lt;/code&gt;, etc. Each method was a one-off query. The repository became a dumping ground for every query in the system.&lt;/p&gt;

&lt;p&gt;The fix is the &lt;strong&gt;generic &lt;code&gt;IRepository&amp;lt;T&amp;gt;&lt;/code&gt; plus the &lt;code&gt;IUnitOfWork&lt;/code&gt; aggregate root pattern&lt;/strong&gt;. The generic repo handles &lt;code&gt;GetById&lt;/code&gt;, &lt;code&gt;Add&lt;/code&gt;, &lt;code&gt;Update&lt;/code&gt;, &lt;code&gt;Remove&lt;/code&gt;. Domain-specific queries (&lt;code&gt;GetInStockAsync&lt;/code&gt;, &lt;code&gt;GetProductsByCategoryAsync&lt;/code&gt;) live on aggregate-specific repos that extend the generic interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;IProductRepository&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IRepository&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IReadOnlyList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetInStockAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a query is one-off and used in only one place, it can be a LINQ chain in the service. It doesn't need a repository method. Repositories are for queries that are reused, that have business meaning, or that the test needs to mock. Everything else is service code.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Returning entities from the service layer
&lt;/h3&gt;

&lt;p&gt;First version of &lt;code&gt;ProductService.GetByIdAsync&lt;/code&gt; returned &lt;code&gt;Product&lt;/code&gt;. The controller passed it directly to &lt;code&gt;Ok(product)&lt;/code&gt;. ASP.NET serialized all the entity fields, including navigation properties. A &lt;code&gt;Product&lt;/code&gt; with an &lt;code&gt;Orders&lt;/code&gt; collection returned a JSON blob with 50 nested orders and 200 line items. The response was 2 MB for a single product.&lt;/p&gt;

&lt;p&gt;The fix is DTOs. Always DTOs. The service is the boundary between "rich domain object" and "flat data shape for the wire." The service's job is to translate between them. &lt;code&gt;MapToDto&lt;/code&gt; is the explicit acknowledgement that the wire format is a contract, not an accident.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Putting validation in the service
&lt;/h3&gt;

&lt;p&gt;I started with &lt;code&gt;if (dto.Price &amp;lt; 0) throw new ValidationException(...)&lt;/code&gt; at the top of every service method. It was repetitive, easy to forget, and mixed validation with business logic.&lt;/p&gt;

&lt;p&gt;The fix is FluentValidation. Each DTO has a &lt;code&gt;Validator&amp;lt;T&amp;gt;&lt;/code&gt; that lives in the Application project. The validators are auto-registered via &lt;code&gt;DependencyInjection.cs&lt;/code&gt;. The service trusts that if it received a DTO, the DTO is valid. The controller, or an action filter, runs the validators before the service is called.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CreateProductDtoValidator&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AbstractValidator&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;CreateProductDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;CreateProductDtoValidator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;RuleFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;NotEmpty&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;MaximumLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;RuleFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;GreaterThan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;RuleFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StockQuantity&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;GreaterThanOrEqualTo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the "parse, don't validate" principle. The DTO arrives pre-validated. The service code reads as if the data is good. The failure modes are at the boundary, not scattered through the use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 The "test pyramid in reverse"
&lt;/h3&gt;

&lt;p&gt;I wrote integration tests first, because "they test the real thing." Each test spun up a &lt;code&gt;WebApplicationFactory&lt;/code&gt;, hit the API with &lt;code&gt;HttpClient&lt;/code&gt;, and asserted on the response. The first 5 tests took 8 seconds. The full suite took 90 seconds. CI took 4 minutes. Adding a test meant a 90-second wait.&lt;/p&gt;

&lt;p&gt;The fix is the test pyramid: lots of fast unit tests at the bottom, a few integration tests in the middle, and a handful of end-to-end tests at the top. Domain tests are pure functions, no I/O — 6 tests in 50 ms. Service tests use Moq for &lt;code&gt;IUnitOfWork&lt;/code&gt; — 5 tests in 200 ms. Controller tests use &lt;code&gt;WebApplicationFactory&lt;/code&gt; — 0 tests in my repo, because the controllers are 30-line pass-throughs that don't need their own tests.&lt;/p&gt;

&lt;p&gt;The current test count is 11. The current test time is 1.2 seconds. That's the right ratio.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo and how to run it
&lt;/h2&gt;

&lt;p&gt;The full source is at &lt;a href="https://github.com/ZalaAvinash/dotnet-clean-architecture-starter" rel="noopener noreferrer"&gt;github.com/ZalaAvinash/dotnet-clean-architecture-starter&lt;/a&gt;. MIT license. 11 tests passing on every push.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ZalaAvinash/dotnet-clean-architecture-starter.git
&lt;span class="nb"&gt;cd &lt;/span&gt;dotnet-clean-architecture-starter
dotnet run &lt;span class="nt"&gt;--project&lt;/span&gt; src/CleanArchitecture.Api
&lt;span class="c"&gt;# Swagger UI: http://localhost:5xxx/swagger&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;span class="c"&gt;# API at http://localhost:8080/swagger&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;span class="c"&gt;# 11 passed, 0 failed, ~1.2s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Clean Architecture in .NET is not a folder structure. It is a constraint on where code is allowed to live, enforced by the project system. A controller cannot call &lt;code&gt;DbContext&lt;/code&gt; because the project reference doesn't allow it. A service cannot return an entity to the controller because the convention is "services return DTOs." A test can run in 5 ms because the unit test never touches the database, because the project structure makes that the natural way to write the test.&lt;/p&gt;

&lt;p&gt;If your team is having the "where does the business logic go" argument, this template is the answer. Fork it, rename &lt;code&gt;Product&lt;/code&gt; to &lt;code&gt;Invoice&lt;/code&gt; or &lt;code&gt;Patient&lt;/code&gt; or &lt;code&gt;Trade&lt;/code&gt;, and ship the thing. The architecture is the same. The folder names are the same. The patterns are the same. The only thing that changes is the business.&lt;/p&gt;

&lt;p&gt;If you fork it and add a feature — CQRS, MediatR, FluentValidation on the entity, multi-tenancy — send me a PR. I'd especially like to see a real auth flow with ASP.NET Identity or Auth0. The demo &lt;code&gt;AuthController&lt;/code&gt; works for the template, but it shouldn't ship to production as-is.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Build with:&lt;/strong&gt; .NET 8 · ASP.NET Core · EF Core 8 · SQLite / SQL Server · xUnit · FluentAssertions · Moq · FluentValidation · Serilog · Swagger · JWT · Docker&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ZalaAvinash/dotnet-clean-architecture-starter" rel="noopener noreferrer"&gt;ZalaAvinash/dotnet-clean-architecture-starter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Avinash Zala is a senior .NET engineer in Surat, India, with 7+ years building enterprise web apps, APIs, and ERP systems. He writes about what he learns on the job. &lt;a href="https://github.com/ZalaAvinash" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/avinash-zala/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>csharp</category>
      <category>softwareengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built RAG From Scratch in Python to Understand It. Here's What I Learned.</title>
      <dc:creator>Avinash Zala</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:45:42 +0000</pubDate>
      <link>https://dev.to/avinash_zala_1c6f5e7c4af9/i-built-rag-from-scratch-in-python-to-understand-it-heres-what-i-learned-33kf</link>
      <guid>https://dev.to/avinash_zala_1c6f5e7c4af9/i-built-rag-from-scratch-in-python-to-understand-it-heres-what-i-learned-33kf</guid>
      <description>&lt;p&gt;I had used LangChain's RAG chain in production for six months. I could not have told you, off the top of my head, what &lt;code&gt;chunk_overlap&lt;/code&gt; did, or why cosine similarity is the right distance metric, or how &lt;code&gt;nomic-embed-text&lt;/code&gt; actually turns a sentence into a vector. The high-level library abstracted all of it away.&lt;/p&gt;

&lt;p&gt;So one weekend I deleted the LangChain dependency and wrote a RAG pipeline from scratch in ~500 lines of plain Python. No framework, no magic. &lt;code&gt;pypdf&lt;/code&gt; for text extraction. A 60-line chunker. ChromaDB for the vector store. Ollama for embeddings and the LLM. The whole thing is &lt;a href="https://github.com/ZalaAvinash/rag-from-scratch-python" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt; — every module is under 200 lines, every test is deterministic, and you can read the whole thing in one sitting.&lt;/p&gt;

&lt;p&gt;This is the build log. Not a tutorial — the build log, with the parts that surprised me and the parts I got wrong the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother
&lt;/h2&gt;

&lt;p&gt;The honest reason: I was using LangChain's &lt;code&gt;RetrievalQA&lt;/code&gt; chain and getting answers I didn't trust. Sometimes the model would say "according to the document" when the document didn't say that. Sometimes the citations were wrong. I had no way to know if the chunker was dropping important context, or if the cosine similarity was picking the wrong neighbors, or if the prompt was actually constraining the model. The library was a black box.&lt;/p&gt;

&lt;p&gt;When you build it yourself, every layer is inspectable. When the answer is wrong, you can add a print statement in &lt;code&gt;pipeline.py&lt;/code&gt; line 102 and see exactly which chunks were sent to the LLM. When the chunker cuts a sentence in half, you see it in the test fixtures. When the embedding model gives garbage for some inputs, you can swap in a different model with one constructor parameter. None of that is possible when the whole thing is &lt;code&gt;RetrievalQA.from_chain_type(llm=..., retriever=...)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The other reason: the code I wrote is 500 lines, and it covers the same ground as a 50-line LangChain script. The extra 450 lines are comments, type hints, tests, and explicit error handling. That's the actual complexity. LangChain hides it; building it yourself makes you confront it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;The whole pipeline is six modules, each doing one thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ PDF file ]
      |
      v
+-----------+        text         +--------------+
| loaders.py| -------------------&amp;gt;|  chunker.py  |
| (pypdf)   |                      | (sliding     |
+-----------+                      |  window)     |
                                   +------+-------+
                                          |
                                     embeddings
                                          |
                                          v
                                   +--------------+        question
                                   |  store.py    | &amp;lt;------ (also embedded)
                                   | (ChromaDB)   |
                                   +------+-------+
                                          |
                                  top_k similar chunks
                                          |
                                          v
                                   +--------------+        +-----------+
                                   |  pipeline.py | -----&amp;gt; |  llm.py   |
                                   | (orchestr.)  |        | (Ollama)  |
                                   +--------------+        +-----------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each module has a single responsibility. Each is testable in isolation. Each can be swapped without touching the others. That's the design constraint that kept the code small — and the thing that made the difference between "toy" and "thing I trust in production."&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1 — the chunker
&lt;/h2&gt;

&lt;p&gt;The chunker is the part most tutorials skip. They say "split the text into chunks" and move on. But chunking is where you decide what the model can and cannot find later. A 5,000-character chunk with no overlap is going to miss the answer to a question that lives at the boundary between two chunks. A 200-character chunk with no semantic awareness is going to split sentences and lose context.&lt;/p&gt;

&lt;p&gt;I went with a sliding-window chunker with character-level overlap, normalized whitespace, and original-offset tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Split text into overlapping windows of approximately `chunk_size` characters.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_size must be &amp;gt; 0, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_overlap must be &amp;gt;= 0, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_overlap (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) must be &amp;lt; chunk_size (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;piece&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Find the original-text char range for this normalized slice
&lt;/span&gt;        &lt;span class="n"&gt;char_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_normalized_to_original_offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;char_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_normalized_to_original_offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;piece&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;char_start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;char_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;char_end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;char_end&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to notice.&lt;/p&gt;

&lt;p&gt;First, the &lt;strong&gt;whitespace normalization&lt;/strong&gt; is a small thing that makes a big difference. PDF text comes out with weird whitespace — newlines mid-sentence, tabs from table cells, double spaces after periods. If you chunk on the raw text, your "500-character" chunks have wildly different token counts. Normalizing first means &lt;code&gt;chunk_size=800&lt;/code&gt; actually means "about 800 useful characters."&lt;/p&gt;

&lt;p&gt;Second, the &lt;strong&gt;100-character overlap&lt;/strong&gt; is the difference between "I found this" and "I missed the answer because it spans a chunk boundary." When a sentence lives across two chunks, the overlap means both chunks contain the bridge words, so the cosine similarity can match either side.&lt;/p&gt;

&lt;p&gt;Third, the &lt;strong&gt;original-offset tracking&lt;/strong&gt; (&lt;code&gt;char_start&lt;/code&gt;, &lt;code&gt;char_end&lt;/code&gt; in the &lt;code&gt;Chunk&lt;/code&gt; dataclass) is the feature I didn't know I needed until I built the source highlighter in the UI. With it, when the model says "see passage 4," I can show the user exactly which characters in the original PDF that came from. Without it, I'd have to store the whole document in memory and do a fuzzy text match. The cost is 16 bytes per chunk. The payoff is "this citation is real, not a hallucination."&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — the embedding swap
&lt;/h2&gt;

&lt;p&gt;The single best refactor I did in this project was making &lt;code&gt;Embedder&lt;/code&gt; a &lt;code&gt;Protocol&lt;/code&gt;. Two lines of typing, infinite flexibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Embedder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I can write a &lt;code&gt;FakeEmbedder&lt;/code&gt; for tests that returns deterministic vectors, and &lt;code&gt;OllamaEmbedder&lt;/code&gt; for production that hits the local Ollama API. The pipeline doesn't know or care which one it's talking to. This is what dependency injection looks like when you do it by hand instead of letting a framework do it for you.&lt;/p&gt;

&lt;p&gt;The actual &lt;code&gt;OllamaEmbedder&lt;/code&gt; is 20 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OllamaEmbedder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Embedding via local Ollama HTTP API. Free, no API key.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="c1"&gt;# One HTTP call per batch is dramatically faster than per-text
&lt;/span&gt;        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/api/embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The per-batch call is the only performance optimization. The naive version sends one HTTP request per chunk, which is 800 requests for an 800-chunk document. At 50ms per request, that's 40 seconds. Batched is the same wall-clock time, but the model can pipeline them on the Ollama side, cutting the actual generation time in half.&lt;/p&gt;

&lt;p&gt;The reason the per-batch loop is sequential and not &lt;code&gt;concurrent.futures.ThreadPoolExecutor&lt;/code&gt;: when I tried threading, Ollama's HTTP server dropped connections under load. The sequential version is slower in wall-clock terms but reliable. Trade-offs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3 — the vector store
&lt;/h2&gt;

&lt;p&gt;I used ChromaDB. Not because it's the best, but because it's the easiest to set up correctly. &lt;code&gt;pip install chromadb&lt;/code&gt;, three lines of code, and you have a persistent, queryable, cosine-similarity-vector-store on disk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Thin wrapper around a ChromaDB collection.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PersistentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anonymized_telemetry&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allow_reset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# cosine space — works regardless of embedding norm and is standard for semantic search
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hnsw:space&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;hnsw:space: cosine&lt;/code&gt; metadata is the one line that matters. ChromaDB's default is L2 (Euclidean) distance, which is fine for normalized embeddings but the wrong intuition. Cosine distance is "angle between vectors, ignoring length," which is what you want for semantic search. Two sentences that mean the same thing should have vectors pointing in the same direction, regardless of how long those vectors are.&lt;/p&gt;

&lt;p&gt;The search method does one non-obvious conversion: ChromaDB returns distances in &lt;code&gt;[0, 2]&lt;/code&gt;, and I convert to similarity in &lt;code&gt;[-1, 1]&lt;/code&gt; (clamped to &lt;code&gt;[0, 1]&lt;/code&gt; for display). The line &lt;code&gt;similarity = max(0.0, 1.0 - float(dist))&lt;/code&gt; is the only math in the file. Everything else is glue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;SearchHit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chunk_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why clamp to 0? Because cosine distance can theoretically be greater than 1 (vectors pointing in opposite directions), which would give a "negative similarity." For UI display, you don't want to show "this chunk is -12% similar to your question." Clamping to 0 says "irrelevant" and is honest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4 — the prompt is the whole product
&lt;/h2&gt;

&lt;p&gt;The most important 20 lines in the project are in &lt;code&gt;pipeline.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a careful assistant that answers questions based ONLY on the
provided document context. Follow these rules strictly:

1. Use ONLY the information in the context below. Do not use outside knowledge.
2. If the context does not contain the answer, say: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I cannot find this in the
   provided document.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; Do NOT guess.
3. Quote or paraphrase the relevant passages. Keep answers concise.
4. When you use information from a passage, mention which passage number it came from.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I rewrote this prompt six times. The first version said "answer based on the context" and the model happily invented facts 40% of the time. The current version, with the explicit numbered rules and the refusal template, has the model invent facts in maybe 5% of cases. The difference is 8x fewer hallucinations, with no other change to the pipeline.&lt;/p&gt;

&lt;p&gt;The single most important sentence is #2: "If the context does not contain the answer, say: 'I cannot find this in the provided document.'" Without that exact refusal template, the model would rather guess than admit ignorance. With it, the model has a safe, grammatically correct way to say "I don't know," and it takes that exit ramp instead of fabricating.&lt;/p&gt;

&lt;p&gt;The second most important sentence is #4: "mention which passage number it came from." This forces the model to engage with the structure of what I sent it. The model can't paraphrase passage 3 and pretend it came from passage 1 if I told it the answer must reference a passage number. The citations are now verifiable.&lt;/p&gt;

&lt;p&gt;The third most important sentence is "Use ONLY the information in the context below." That single word — ONLY — does most of the work. Without it, the model treats the context as a suggestion and falls back on its training data. With it, the model treats the context as a constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5 — what I got wrong
&lt;/h2&gt;

&lt;p&gt;Five things, in order of how much they cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Embedding the whole PDF
&lt;/h3&gt;

&lt;p&gt;First version: I embedded the entire 40-page PDF as one document and asked questions against the single vector. The result was uniformly bad — every question returned the same vaguely-related passage, regardless of what was actually being asked.&lt;/p&gt;

&lt;p&gt;I had to read three papers and one textbook chapter to figure out why. Embedding a 50,000-character document and embedding a 200-character chunk don't produce vectors with the same semantics. The whole-document vector is an average, and averages are useless for finding specific answers. Chunking is not an optimization. Chunking is the algorithm.&lt;/p&gt;

&lt;p&gt;Fix: chunk first, embed chunks. Obvious in hindsight. Took me an embarrassing amount of time to figure out the first time.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Using the L2 distance by default
&lt;/h3&gt;

&lt;p&gt;ChromaDB's default distance metric is L2 (Euclidean). I shipped the first version with the default and the search results were "kind of relevant but not really." I spent two hours tweaking the chunker and the embedder before I realized the distance metric was the problem.&lt;/p&gt;

&lt;p&gt;The fix is one line: &lt;code&gt;metadata={"hnsw:space": "cosine"}&lt;/code&gt; when creating the collection. But the symptom is the same as "the chunker is wrong" or "the embedder is wrong." Without a strong intuition for what each component does, you can chase the wrong layer for hours.&lt;/p&gt;

&lt;p&gt;The lesson: when the search results are bad, check the distance metric before you check anything else. The cost of an L2-vs-cosine mix-up is invisible until you know to look for it.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 The "always answer" reflex
&lt;/h3&gt;

&lt;p&gt;The first version of the system prompt said "answer the question based on the context." The model would answer every question, including ones the document didn't cover. "What year was the company founded?" on a 2024 product spec returned "2020" because the model had been trained on 2020 and ignored the fact that 2020 wasn't in the spec.&lt;/p&gt;

&lt;p&gt;The fix is the refusal template, as discussed in Part 4. The hard part was not writing the prompt — it was accepting that the model is fundamentally a completer, not an oracle. A completer with a good prompt is a useful tool. A completer with a vague prompt is a hallucination engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 No idempotency on re-ingest
&lt;/h3&gt;

&lt;p&gt;I re-ran the ingest command on the same PDF three times while debugging. Each run added 800 new chunks. After three runs, the same query returned three identical passages, ranked by score. The answer was fine (the top chunk was the right one), but the UI was showing duplicates.&lt;/p&gt;

&lt;p&gt;The fix: derive &lt;code&gt;document_id&lt;/code&gt; from a hash of the file path, and use that as the prefix for chunk IDs in ChromaDB. Re-ingesting the same file generates the same IDs, and ChromaDB's &lt;code&gt;.add()&lt;/code&gt; is idempotent on ID. This is 5 lines of code. I should have written it on day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 Not testing the chunker first
&lt;/h3&gt;

&lt;p&gt;I wrote the pipeline top-down: PDF → embed → store → query → answer. Tests came later, when the answer was wrong and I didn't know which layer was the problem. I ended up writing the chunker tests last, which was backwards.&lt;/p&gt;

&lt;p&gt;The right order: chunker tests first (pure functions, no I/O, no network, fast), then embedder (with a fake), then store (with an in-memory ChromaDB or a mock), then pipeline (integration test with fakes for everything). When you do tests last, you write tests for the code as it is, not the code as you intended. The chunks were off-by-one on the overlap calculation for two weeks because no test caught it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The code and how to run it
&lt;/h2&gt;

&lt;p&gt;The full source is at &lt;a href="https://github.com/ZalaAvinash/rag-from-scratch-python" rel="noopener noreferrer"&gt;github.com/ZalaAvinash/rag-from-scratch-python&lt;/a&gt;. 14 tests pass. CI runs on Python 3.11, 3.12, 3.13. MIT license.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ZalaAvinash/rag-from-scratch-python.git
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-from-scratch-python
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# One-time: pull the models&lt;/span&gt;
ollama pull nomic-embed-text
ollama pull llama3.2

&lt;span class="c"&gt;# Ingest&lt;/span&gt;
&lt;span class="nv"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;src python &lt;span class="nt"&gt;-m&lt;/span&gt; rag.cli ingest path/to/document.pdf

&lt;span class="c"&gt;# Ask&lt;/span&gt;
&lt;span class="nv"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;src python &lt;span class="nt"&gt;-m&lt;/span&gt; rag.cli ask &lt;span class="s2"&gt;"What is the main conclusion?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use it as a library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rag&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RAGPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OllamaEmbedder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OllamaLLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VectorStore&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RAGPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OllamaEmbedder&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OllamaLLM&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/document.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the key points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] score=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;If you have used LangChain or LlamaIndex for RAG and you have a nagging feeling that you don't actually understand what's happening, build it yourself. The exercise takes a weekend. The 500 lines of code are not the point — the 500 lines of &lt;em&gt;thinking&lt;/em&gt; about chunk sizes, distance metrics, prompt design, and idempotency are the point. You will never use LangChain the same way again.&lt;/p&gt;

&lt;p&gt;The most valuable thing I learned is that RAG is not "an algorithm." It's five different algorithms stacked on top of each other (chunking, embedding, retrieval, prompt construction, generation), and each one has its own failure modes. The high-level libraries hide the stack. The stack is the product.&lt;/p&gt;

&lt;p&gt;If you build something similar, send me a PR. The repo is open. I've got an open issue for persistent in-process ChromaDB that nobody has claimed yet, and the test suite is the kind of thing that grows by accretion over years.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Build with:&lt;/strong&gt; Python 3.11+ · pypdf · ChromaDB · Ollama · nomic-embed-text · llama3.2 · click · pytest&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ZalaAvinash/rag-from-scratch-python" rel="noopener noreferrer"&gt;ZalaAvinash/rag-from-scratch-python&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Avinash Zala is a senior .NET engineer in Surat, India, with 7+ years building enterprise web apps, APIs, and ERP systems. He is currently adding AI/LLM capabilities to his stack and writing about what he learns. &lt;a href="https://github.com/ZalaAvinash" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/avinash-zala/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Build a Local RAG Chatbot in 30 Minutes with .NET 8, Ollama, and React</title>
      <dc:creator>Avinash Zala</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:43:15 +0000</pubDate>
      <link>https://dev.to/avinash_zala_1c6f5e7c4af9/build-a-local-rag-chatbot-in-30-minutes-with-net-8-ollama-and-react-55go</link>
      <guid>https://dev.to/avinash_zala_1c6f5e7c4af9/build-a-local-rag-chatbot-in-30-minutes-with-net-8-ollama-and-react-55go</guid>
      <description>&lt;p&gt;I uploaded a 40-page PDF of an internal API spec, asked "what's the rate limit for the search endpoint?", and got back: "100 requests per minute per API key, with bursts up to 200. See section 4.2 of the document." With citations. In about three seconds. The whole stack runs on my laptop. It cost me $0 in LLM credits during development because &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; is free and local, and the embedder I used is also free and local. The repo is &lt;a href="https://github.com/ZalaAvinash/AI-Document-Chatbot-RAG-" rel="noopener noreferrer"&gt;here&lt;/a&gt; — issues and PRs welcome.&lt;/p&gt;

&lt;p&gt;This is the build log. Not a tutorial where every step works the first time — a build log where I tell you which decisions held up and which ones I redid.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem most "chat with your PDF" demos have
&lt;/h2&gt;

&lt;p&gt;Every "chat with your PDF" tutorial I read in early 2025 had the same shape: open OpenAI, paste your API key, call &lt;code&gt;gpt-4&lt;/code&gt; with a 50-page PDF stuffed into the context window, get an answer, pay $0.03 per question, repeat. That works for a demo. It does not work for a tool you'd actually use at work, because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The PDF might contain customer data, internal pricing, or unreleased features. You do not want that going to OpenAI's training pipeline or anyone's logs.&lt;/li&gt;
&lt;li&gt;The cost adds up. If your team uses it 50 times a day, that's $45/month per seat.&lt;/li&gt;
&lt;li&gt;The model hallucinates on long PDFs anyway. Stuff 100 pages into a 128k context window and the model starts forgetting the middle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is &lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; — don't send the whole PDF, send only the 3-5 chunks that are actually relevant to the question. The rest of the work is the same: embed the chunks, embed the question, find the closest matches, send those to the LLM with the question. But the cost and the privacy story both improve by 100x.&lt;/p&gt;

&lt;p&gt;The actual ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Upload a PDF. Ask questions. Get answers from the document with citations, in under 5 seconds, with no data leaving my laptop and no monthly bill.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;One .NET 8 solution, one React app, one Ollama process, zero cloud dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ PDF Upload ]
      |
      v
+-------------------+        chunks         +---------------------+
|  PdfService       | ---------------------&amp;gt; |  VectorStore        |
|  (PdfPig)         |                        |  (in-memory)        |
+--------+----------+                        +----------+----------+
         |                                           |
    embeddings (nomic-embed-text)                    | search by cosine similarity
         |                                           |
         v                                           v
+-------------------+                        +---------------------+
|  EmbeddingService | &amp;lt;--------------------- |  ChatService        |
|  (Ollama /embed)  |                        |  (RAG pipeline)     |
+-------------------+                        +----------+----------+
                                                       |
                                              answer (llama3.2)
                                                       |
                                                       v
                                              +------------------+
                                              |  React frontend  |
                                              |  (ChatInterface) |
                                              +------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial detail is that &lt;strong&gt;everything runs on localhost&lt;/strong&gt;. Ollama listens on &lt;code&gt;http://localhost:11434&lt;/code&gt;. The .NET API listens on &lt;code&gt;http://localhost:5000&lt;/code&gt;. The React dev server listens on &lt;code&gt;http://localhost:5173&lt;/code&gt;. No data leaves the machine. The only outbound network call is to &lt;code&gt;npm&lt;/code&gt; to install React dependencies, and even that you can do offline if you cache them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1 — the PDF ingestion
&lt;/h2&gt;

&lt;p&gt;The whole ingestion pipeline is two services: &lt;code&gt;PdfService&lt;/code&gt; for text extraction + chunking, and &lt;code&gt;EmbeddingService&lt;/code&gt; to vectorize each chunk. Then the chunks go into &lt;code&gt;VectorStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PdfService&lt;/code&gt; uses &lt;a href="https://github.com/UglyToad/PdfPig" rel="noopener noreferrer"&gt;PdfPig&lt;/a&gt; — a pure C# PDF library, no native dependencies. The text extraction is the easy part. The interesting part is the chunking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ExtractAndChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;documentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Stream&lt;/span&gt; &lt;span class="n"&gt;pdfStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ExtractTextFromPdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdfStream&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ChunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;ChunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;documentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;chunkSize&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StringSplitOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RemoveEmptyEntries&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chunkWords&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Skip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ToArray&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunkWords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;DocumentChunk&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;DocumentId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;DocumentName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunkWords&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;ChunkIndex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunkSize&lt;/span&gt; &lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice.&lt;/p&gt;

&lt;p&gt;First, I chunk by &lt;strong&gt;words&lt;/strong&gt;, not characters or tokens. Word-based chunking is dumb-simple and the size is predictable: 500 words ≈ 650 tokens, well within the embedder's input limit. Token-aware chunking is "more correct" but requires a tokenizer dependency, and for &lt;code&gt;nomic-embed-text&lt;/code&gt; with its 8k context, word-based works fine.&lt;/p&gt;

&lt;p&gt;Second, the &lt;strong&gt;50-word overlap&lt;/strong&gt; is not decoration. It's the difference between "I found this" and "I missed the answer because it spans a chunk boundary." When a key sentence lives across two chunks, the overlap means both chunks contain the bridge words, so the cosine similarity can match either side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — the embeddings
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;EmbeddingService&lt;/code&gt; is a thin wrapper around Ollama's &lt;code&gt;/api/embeddings&lt;/code&gt; endpoint. Three lines of real code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;]&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GenerateEmbeddingAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;EmbeddingRequest&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"nomic-embed-text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Free, fast embedding model&lt;/span&gt;
        &lt;span class="n"&gt;Prompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_httpClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PostAsJsonAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/embeddings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EnsureSuccessStatusCode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFromJsonAsync&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;EmbeddingResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt; &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to generate embedding"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;nomic-embed-text&lt;/code&gt; is a 137M-parameter embedding model. It runs on CPU, takes ~50ms per chunk on my M1, and produces 768-dimensional vectors. The dimension doesn't matter to my code — &lt;code&gt;VectorStore&lt;/code&gt; treats it as &lt;code&gt;float[]&lt;/code&gt;. When I want to swap to a different embedder later, I change one model name string and the rest works.&lt;/p&gt;

&lt;p&gt;The important wiring is in &lt;code&gt;Program.cs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddHttpClient&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OllamaService&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BaseAddress&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollamaBaseUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// LLM generation can be slow on first run&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That 5-minute timeout is not paranoia. The first time you ask Ollama a question, the model has to load from disk into memory. On a cold start with &lt;code&gt;llama3.2&lt;/code&gt;, that takes 8-15 seconds. On a CPU-only machine, the actual generation can take 30-60 seconds for a long answer. The default HttpClient timeout is 100 seconds. That will bite you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3 — the vector store
&lt;/h2&gt;

&lt;p&gt;I almost reached for a real vector database here. ChromaDB, Qdrant, pgvector — all good options. I shipped an in-memory list with a lock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_chunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;object&lt;/span&gt; &lt;span class="n"&gt;_lock&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;AddChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IEnumerable&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;lock&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;(&lt;/span&gt;&lt;span class="n"&gt;DocumentChunk&lt;/span&gt; &lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;)&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;topK&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;lock&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;scored&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_chunks&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;CosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Embedding&lt;/span&gt;&lt;span class="p"&gt;!)))&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderByDescending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;scored&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;CosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;dotProduct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;magnitudeA&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;magnitudeB&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;++)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dotProduct&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;magnitudeA&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;magnitudeB&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;magA&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;magnitudeA&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;magB&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;magnitudeB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;magA&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="n"&gt;magB&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dotProduct&lt;/span&gt; &lt;span class="p"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;magA&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;magB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cosine similarity is the standard textbook formula. No tricks. The brute-force scan is &lt;code&gt;O(n * d)&lt;/code&gt; where &lt;code&gt;n&lt;/code&gt; is the number of chunks and &lt;code&gt;d&lt;/code&gt; is the embedding dimension. For n=1000 chunks and d=768, that's 768k multiplications per query. On a modern CPU, that runs in about 5ms. For a personal-use chatbot with a few PDFs uploaded, brute force is the right answer.&lt;/p&gt;

&lt;p&gt;When would I switch to a real vector database? When &lt;code&gt;n&lt;/code&gt; exceeds ~50,000 chunks (which is roughly 200 large PDFs), or when the search latency budget drops below 20ms. Neither of those is the case for this app.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lock&lt;/code&gt; is there because the React frontend can hit &lt;code&gt;/api/chat&lt;/code&gt; from multiple browser tabs simultaneously, and &lt;code&gt;AddChunks&lt;/code&gt; runs on the upload endpoint. Concurrent reads and writes on a &lt;code&gt;List&amp;lt;T&amp;gt;&lt;/code&gt; will throw. A 5-line &lt;code&gt;lock&lt;/code&gt; is cheaper than a real database for this scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4 — the RAG pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;ChatService.AnswerQuestionAsync&lt;/code&gt; is the whole RAG pipeline. Five steps, all in one method, all readable in 30 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;AnswerQuestionAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1. Embed the user's question using free local model&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;questionEmbedding&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_embeddingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateEmbeddingAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Question&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Find top 3-5 most similar chunks via cosine similarity&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;relevantChunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_vectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questionEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevantChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ChatResponse&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Answer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"No relevant context found in the uploaded documents. Please upload a PDF first."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Sources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;SourceReference&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. Build the prompt with context&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n\n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevantChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"You are a helpful assistant that answers questions based on the provided document context. Answer using ONLY the context provided. If the context doesn't contain enough information, say so."&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;$@"Context from uploaded documents:&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Question&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="s"&gt;Answer the question using ONLY the context above. Include relevant citations from the context where possible."&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// 4. Call free local LLM via Ollama&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateChatAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 5. Return answer with source references&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ChatResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Answer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Sources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relevantChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SourceReference&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;DocumentName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DocumentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;[..&lt;/span&gt;&lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"..."&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Score&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;ChunkIndex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChunkIndex&lt;/span&gt;
        &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt is the most important line in the whole file:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Answer using ONLY the context provided. If the context doesn't contain enough information, say so."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single sentence cuts hallucination by 80%. Without it, &lt;code&gt;llama3.2&lt;/code&gt; happily answers "the rate limit is 100/min" even when the PDF says something else — because 100/min is the generic answer it learned from training. With it, the model either finds the answer in the chunks I sent or admits it can't find the answer.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;topK: 5&lt;/code&gt; is a magic number I should defend. Five chunks × 500 words = 2,500 words of context. That's a comfortable prompt size for &lt;code&gt;llama3.2&lt;/code&gt; (8k context) and gives the model enough rope to actually answer compound questions like "compare the rate limits for the search and upload endpoints." Three was too few. Ten started to introduce noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5 — what I got wrong
&lt;/h2&gt;

&lt;p&gt;This is the part you came for. Five things that bit me, in order of how much they cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 The "in-memory vector store" trade-off
&lt;/h3&gt;

&lt;p&gt;I shipped an in-memory &lt;code&gt;List&amp;lt;DocumentChunk&amp;gt;&lt;/code&gt; because it was fast to write. The cost: when you restart the .NET API, all uploaded documents are gone. The user has to re-upload.&lt;/p&gt;

&lt;p&gt;That is fine for a demo. It is not fine for a real tool. The fix is to persist embeddings to SQLite on &lt;code&gt;AddChunks&lt;/code&gt; and load on startup. About 30 lines of code. I haven't done it yet because I keep telling myself "next weekend" and then I don't. If you fork this and add it, send me a PR.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 The PDF text extraction order
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;PdfPig&lt;/code&gt; extracts text in the order it appears in the PDF's content stream. For most PDFs that's the order you'd read it. For some PDFs (academic papers, multi-column layouts, scanned-and-OCR'd docs), the order is completely wrong. A page might come back as "Conclusion Section 1 Introduction ... Discussion" with no paragraph breaks.&lt;/p&gt;

&lt;p&gt;The fix is to use &lt;code&gt;page.Text&lt;/code&gt; but with the &lt;code&gt;ReadingOrderDetector&lt;/code&gt; from PdfPig, or to fall back to OCR (Tesseract via Tesseract NuGet wrapper) for the broken cases. For my actual use case (internal API docs, well-formatted PDFs), the default works. For scanned PDFs, it does not. I document this limitation in the README and I am honest with users when their PDF doesn't work.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 The 5-minute HTTP timeout almost ate my first real session
&lt;/h3&gt;

&lt;p&gt;I mentioned this earlier. The default &lt;code&gt;HttpClient&lt;/code&gt; timeout is 100 seconds. On my machine, a &lt;code&gt;llama3.2&lt;/code&gt; response to a 4-paragraph RAG context takes 35-50 seconds. On a slower CPU, it can take 90 seconds. The first three end-to-end tests I ran timed out at 100 seconds and I thought my RAG pipeline was broken. It wasn't. The model was just slow.&lt;/p&gt;

&lt;p&gt;I now set &lt;code&gt;client.Timeout = TimeSpan.FromMinutes(5)&lt;/code&gt; for the Ollama client. That gives a 3x safety margin over the worst case I've seen. The 5-minute timeout is also helpful because when Ollama is downloading a model for the first time (the &lt;code&gt;pull&lt;/code&gt; step happens lazily on first request), the model load can take 2-3 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 No correlation between a chat answer and the document chunk
&lt;/h3&gt;

&lt;p&gt;When the model says "see section 4.2," the user wants to know &lt;em&gt;which document chunk&lt;/em&gt; in their PDF section 4.2 corresponds to. I do return &lt;code&gt;Sources&lt;/code&gt; with &lt;code&gt;chunkIndex&lt;/code&gt;, &lt;code&gt;score&lt;/code&gt;, and a 200-character text excerpt. But the React frontend just shows the answer — it doesn't render the sources inline.&lt;/p&gt;

&lt;p&gt;That's a UI bug, not a backend bug. The data is there. I just haven't built the source-citation UI yet. When I do, the assistant message will look like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The rate limit for the search endpoint is 100 requests per minute per API key. [Source: api-spec.pdf, chunk 23, score 0.89]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the kind of detail that separates "demo" from "tool I trust." It's on my list.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 The "free" in "free local LLM" has a hidden cost
&lt;/h3&gt;

&lt;p&gt;Ollama is free. The models are free. Running them on your laptop is free. What's not free is your time the first time you set it up.&lt;/p&gt;

&lt;p&gt;On Windows, Ollama installs as a system service. The first &lt;code&gt;ollama pull nomic-embed-text&lt;/code&gt; downloads 274MB. The first &lt;code&gt;ollama pull llama3.2&lt;/code&gt; downloads 2.0GB. On a 10Mbps connection that's 30 minutes. On a metered connection (hotel WiFi, mobile hotspot), it's an hour. On a corporate laptop behind a strict firewall, it might not work at all because Ollama uses HTTPS but the model blobs are fetched from a CDN that some corporate proxies block.&lt;/p&gt;

&lt;p&gt;The honest marketing line is: "free at runtime, 2GB download and 30 minutes of setup the first time." I'm fine with that trade. But I learned not to demo this tool to a non-technical stakeholder without first running &lt;code&gt;ollama pull&lt;/code&gt; on their machine and waiting for the model to load. Cold-start time on a 5-year-old laptop can be 20+ seconds for the first question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo and how to run it
&lt;/h2&gt;

&lt;p&gt;The full source is at &lt;a href="https://github.com/ZalaAvinash/AI-Document-Chatbot-RAG-" rel="noopener noreferrer"&gt;github.com/ZalaAvinash/AI-Document-Chatbot-RAG-&lt;/a&gt;. To run it locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install Ollama and pull the two models (~2.3 GB total)&lt;/span&gt;
ollama pull nomic-embed-text
ollama pull llama3.2

&lt;span class="c"&gt;# 2. Backend (.NET 8)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;backend
dotnet run          &lt;span class="c"&gt;# http://localhost:5000 (Swagger at /swagger)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Frontend (in a new terminal)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev          &lt;span class="c"&gt;# http://localhost:5173&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with Docker (which handles Ollama for you, including the first-time model download):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;span class="c"&gt;# Wait ~5 minutes the first time for the model download&lt;/span&gt;
&lt;span class="c"&gt;# Open http://localhost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Docker route is what I recommend for non-.NET teammates. The native route is what I use day-to-day because it's faster on subsequent runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;A local RAG chatbot is one of the few AI features that is &lt;em&gt;actually&lt;/em&gt; ready for production use today, in 2026, on a $0 budget. The pieces are all there: a free local LLM runner (Ollama), a free local embedder (nomic-embed-text), a textbook RAG pipeline in 30 lines of C#, and a React frontend that anyone who has used ChatGPT already knows how to operate.&lt;/p&gt;

&lt;p&gt;The thing that surprised me most is how often "the right answer is in the PDF, the user just couldn't find it" is a real problem worth solving. I've used this on four different real documents in the last two weeks: an API spec, a vendor contract, a 200-page compliance document, and a research paper. In every case the chatbot gave me the answer in under 5 seconds, with a citation I could verify by clicking through to the source chunk. The hallucinations are rare and easy to spot because the model is forced to cite.&lt;/p&gt;

&lt;p&gt;If you build something similar and run into the same five problems, I'd love to hear about it. The repo is open for issues, PRs, and stories about your 30-minute Ollama download. We've all been there.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Build with:&lt;/strong&gt; .NET 8 · ASP.NET Core · React (Vite) · PdfPig · Ollama · nomic-embed-text · llama3.2&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ZalaAvinash/AI-Document-Chatbot-RAG-" rel="noopener noreferrer"&gt;ZalaAvinash/AI-Document-Chatbot-RAG-&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Avinash Zala is a senior .NET engineer in Surat, India, with 7+ years building enterprise web apps, APIs, and ERP systems. He is currently adding AI/LLM capabilities to his stack and writing about what he learns. &lt;a href="https://github.com/ZalaAvinash" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/avinash-zala/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>csharp</category>
      <category>ai</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>From Stack Trace to Suggested Fix in 4 Seconds: Building a Self-Healing .NET API Gateway.</title>
      <dc:creator>Avinash Zala</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:36:18 +0000</pubDate>
      <link>https://dev.to/avinash_zala_1c6f5e7c4af9/from-stack-trace-to-suggested-fix-in-4-seconds-building-a-self-healing-net-api-gateway-v1-3k5d</link>
      <guid>https://dev.to/avinash_zala_1c6f5e7c4af9/from-stack-trace-to-suggested-fix-in-4-seconds-building-a-self-healing-net-api-gateway-v1-3k5d</guid>
      <description>&lt;p&gt;Last Tuesday my API gateway caught a &lt;code&gt;NullReferenceException&lt;/code&gt;, streamed it to a dashboard in real-time, and pushed a draft code fix to the browser tab of the on-call engineer — before I finished reading the error myself. That sentence used to be vendor marketing. Now it's just my &lt;code&gt;Program.cs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the architecture post-mortem. I built it on weekends. It runs in Docker. It cost me exactly $0 in LLM credits during development because &lt;a href="https://groq.com" rel="noopener noreferrer"&gt;Groq's free tier&lt;/a&gt; is generous and &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; works as a swap-in. The repo is &lt;a href="https://github.com/ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway" rel="noopener noreferrer"&gt;here&lt;/a&gt; — issues and PRs welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem most .NET teams have
&lt;/h2&gt;

&lt;p&gt;Production errors are caught, logged to a file, and forgotten. Engineers find out from a Slack ping twenty minutes later, if at all. By the time someone looks, the original request context is gone, the user's session has expired, and the stack trace is buried four layers deep in &lt;code&gt;System.*&lt;/code&gt; calls.&lt;/p&gt;

&lt;p&gt;"Self-healing" is a word vendors use to mean "auto-restart the pod." I wanted something better. The actual ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When an exception is thrown in service A, give the engineer (a) a clear root cause, (b) a suggested fix, and (c) a draft code patch — in under 30 seconds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not a magic black box. Not an auto-applied patch. Just: &lt;strong&gt;catch the error, give the model the right context, push the analysis to a human in real-time, and let the human close the loop.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;One .NET solution, four projects, four NuGet packages, no new infrastructure beyond what you probably already have.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ HTTP request ]
       |
       v
+-------------------+        enqueue         +---------------------+
| SmartLogAnalyzer. | ---------------------&amp;gt; |  Hangfire (Redis)   |
|      Api          |                        +----------+----------+
|  (ErrorHandling   |                                   |
|   Middleware)     |                                   v
+-------------------+                        +---------------------+
                                                | SmartLogAnalyzer.   |
                                                |     Worker          |
                                                | (ErrorProcessingWorker)
                                                +-----+-------+-------+
                                                      |       |
                                          AI call     |       |  persist
                                                      v       v
                                            +-----------+   +-----------+
                                            | Semantic  |   | MSSQL     |
                                            | Kernel +  |   | (ErrorLog |
                                            | Groq LLM  |   |  table)   |
                                            +-----+-----+   +-----------+
                                                  |
                                                  v
                                        +---------------------+
                                        |  SignalR Hub        |
                                        |  (ErrorHub)         |
                                        +----------+----------+
                                                   |
                                              broadcast
                                                   v
                                        +---------------------+
                                        |  React Dashboard    |
                                        |  (live update)      |
                                        +---------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial detail is &lt;em&gt;where&lt;/em&gt; the AI call happens. It does &lt;strong&gt;not&lt;/strong&gt; happen in the request thread. The middleware returns the 500 in milliseconds; the AI work happens inside a Hangfire background job, in a different process, possibly on a different machine. Two different response times, one user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1 — the capture
&lt;/h2&gt;

&lt;p&gt;The middleware is fifty lines including the &lt;code&gt;using&lt;/code&gt; statements. Here is the whole thing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Hangfire&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;SmartLogAnalyzer.Core.Models&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;SmartLogAnalyzer.Core.Workers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;System.Text.Json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;SmartLogAnalyzer.Api.Middleware&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ErrorHandlingMiddleware&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;RequestDelegate&lt;/span&gt; &lt;span class="n"&gt;_next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IBackgroundJobClient&lt;/span&gt; &lt;span class="n"&gt;_backgroundJobClient&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ErrorHandlingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;RequestDelegate&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;IBackgroundJobClient&lt;/span&gt; &lt;span class="n"&gt;backgroundJobClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;_next&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;_backgroundJobClient&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;backgroundJobClient&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;InvokeAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;HandleExceptionAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;HandleExceptionAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ContentType&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;errorLog&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;ErrorLog&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;ErrorMessage&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;StackTrace&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StackTrace&lt;/span&gt; &lt;span class="p"&gt;??&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Empty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;RoutePath&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;
            &lt;span class="p"&gt;};&lt;/span&gt;

            &lt;span class="c1"&gt;// The line that does the work. Enqueue is non-blocking;&lt;/span&gt;
            &lt;span class="c1"&gt;// the response is sent before the AI is ever called.&lt;/span&gt;
            &lt;span class="n"&gt;_backgroundJobClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Enqueue&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ErrorProcessingWorker&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;
                &lt;span class="n"&gt;worker&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ProcessErrorAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

            &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"An internal error has been logged and is being analyzed."&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice.&lt;/p&gt;

&lt;p&gt;First, the &lt;code&gt;Enqueue&lt;/code&gt; call returns immediately. Hangfire's &lt;code&gt;IBackgroundJobClient&lt;/code&gt; is a thin proxy over the Hangfire storage (Redis in this case) and a worker pickup. We don't &lt;code&gt;await&lt;/code&gt; an AI call here. The user gets their 500 in single-digit milliseconds.&lt;/p&gt;

&lt;p&gt;Second, the response body — &lt;code&gt;"An internal error has been logged and is being analyzed."&lt;/code&gt; — is itself a feature. The user (or the calling frontend) now knows the error is being handled. It is not a lie, it is a contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — the worker
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;ErrorProcessingWorker&lt;/code&gt; is a plain C# class. Hangfire instantiates it from the DI container, calls &lt;code&gt;ProcessErrorAsync&lt;/code&gt;, and (if it throws) retries up to three times with exponential backoff.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;AutomaticRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Attempts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;ProcessErrorAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ErrorLog&lt;/span&gt; &lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1. Hash the stack trace to dedupe identical errors&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;stackTraceHash&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ComputeHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StackTrace&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. If we've seen this stack trace in the last 24h, just bump the count&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_redisCacheService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;KeyExistsAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stackTraceHash&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;existingLog&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_errorLogRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddOrUpdateErrorLogAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_hubContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;All&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SendAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"ReceiveErrorUpdate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;existingLog&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. New error — claim the hash so duplicates skip the AI call&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_redisCacheService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SetKeyAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stackTraceHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromHours&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;24&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="c1"&gt;// 4. Ask the LLM&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;analyzedLog&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_aiAnalysisService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AnalyzeErrorAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 5. Persist + push to the dashboard&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;savedLog&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_errorLogRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddOrUpdateErrorLogAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analyzedLog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_hubContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;All&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SendAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"ReceiveErrorUpdate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;JsonSerializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;savedLog&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nf"&gt;ComputeHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;md5&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MD5&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;hashBytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ComputeHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Encoding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UTF8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;BitConverter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashBytes&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"-"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ToLowerInvariant&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Redis dedupe step is the difference between a $0 demo and a $200 Groq bill. The first occurrence of a &lt;code&gt;NullReferenceException&lt;/code&gt; at &lt;code&gt;/api/users/{id}&lt;/code&gt; costs one LLM call. The next 10,000 occurrences cost nothing. The 24-hour TTL is a knob you will want to tune.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3 — the AI call
&lt;/h2&gt;

&lt;p&gt;I started this project with a fancy design: a Semantic Kernel &lt;code&gt;KernelPlugin&lt;/code&gt; that would have the AI fetch the offending source file from GitHub, look at the test that covers it, and then propose a diff grounded in real code. It was clever. It was also over-engineered for v1.&lt;/p&gt;

&lt;p&gt;The version that shipped is fifteen lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;$@"&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s"&gt;You are a Senior .NET Engineer. Analyze the following error&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s"&gt;and provide a JSON response with exactly three keys:&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s"&gt;RootCause, FixSuggestion, and CodePatch.&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="s"&gt;Error Message: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ErrorMessage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s"&gt;Stack Trace: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;errorLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StackTrace&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="s"&gt;JSON Response:&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_kernel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;InvokePromptAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;responseText&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I parse the response into the three fields with a &lt;code&gt;JsonDocument&lt;/code&gt;, and on failure, fall back to a hand-rolled string parser. We will get back to that parser in the next section — it is the most important code in the whole project and also the part I am least proud of.&lt;/p&gt;

&lt;p&gt;Why Semantic Kernel if the call is this simple? Two reasons.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider swap.&lt;/strong&gt; The Groq wire-up is one line: &lt;code&gt;AddOpenAIChatCompletion(modelId: "llama-3.3-70b-versatile", apiKey: [your-groq-key], endpoint: new Uri("https://api.groq.com/openai/v1"))&lt;/code&gt; — where the key is loaded from &lt;code&gt;.env&lt;/code&gt; at startup. Swapping to OpenAI, Azure OpenAI, or local Ollama is one constructor call. If I had called Groq directly via &lt;code&gt;HttpClient&lt;/code&gt;, I would be rewriting the call site for every provider I tried.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in retries and timeouts.&lt;/strong&gt; &lt;code&gt;Kernel.InvokePromptAsync&lt;/code&gt; handles 429s and 5xxs with a default policy. That is one less thing to get wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can absolutely build this with raw &lt;code&gt;HttpClient&lt;/code&gt; and &lt;code&gt;chat.completions.create()&lt;/code&gt;. You will write the retry logic yourself. I have done that. I do not recommend it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4 — what I got wrong
&lt;/h2&gt;

&lt;p&gt;This is the part you came for. Five things that bit me, in order of how much they cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 The JSON parser that almost shipped
&lt;/h3&gt;

&lt;p&gt;First version of the response parser used &lt;code&gt;JsonDocument.Parse&lt;/code&gt; and threw on any malformed output. About 15% of Groq responses came back wrapped in&lt;br&gt;
&lt;br&gt;
 &lt;code&gt;json ...&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
 markdown fences, despite the prompt saying "JSON Response:" right at the end. I added a stripper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"```
&lt;/span&gt;&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;
&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="s"&gt;")) cleaned = cleaned.Substring(7);
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"
&lt;/span&gt;&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;
&lt;span class="err"&gt;```&lt;/span&gt;&lt;span class="s"&gt;"))    cleaned = cleaned.Substring(3);
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EndsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"```
&lt;/span&gt;&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;
&lt;span class="s"&gt;"))      cleaned = cleaned.Substring(0, cleaned.Length - 3);
&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That fixed 90% of it. The other 10% needed a hand-rolled regex parser that walks the string looking for &lt;code&gt;"RootCause": "..."&lt;/code&gt; and respects backslash escapes. Do not be too proud to write a regex parser. When the upstream is an LLM and the contract is "please return JSON," the LLM is sometimes wrong and you need a fallback.&lt;/p&gt;

&lt;p&gt;The pattern in &lt;code&gt;AiAnalysisService.cs&lt;/code&gt; is the right one: try the strict parser, catch the exception, try the lenient one, and only then give up and store the raw text with a "Failed to parse" flag. The dashboard renders the raw text anyway, so the engineer still gets value.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Sensitive data leakage
&lt;/h3&gt;

&lt;p&gt;A stack trace can contain connection strings, JWTs, or PII. The first version sent the raw exception text to Groq. After a code review from a friend who is more paranoid than I am, I added a redaction step before the AI call.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
csharp
private static readonly Regex BearerToken  = new(@"Bearer\s+[A-Za-z0-9._\-]+", RegexOptions.Compiled);
private static readonly Regex PasswordKV   = new(@"(password|pwd|secret)\s*=\s*\S+",  RegexOptions.Compiled | RegexOptions.IgnoreCase);
private static readonly Regex CreditCard   = new(@"\b\d{16}\b",                       RegexOptions.Compiled);
private static readonly Regex EmailAddr    = new(@"\b[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}\b", RegexOptions.Compiled);

public static string Redact(string input)
{
    input = BearerToken.Replace(input, "Bearer [REDACTED]");
    input = PasswordKV .Replace(input, "$1=[REDACTED]");
    input = CreditCard .Replace(input, "[REDACTED-CC]");
    input = EmailAddr  .Replace(input, "[REDACTED-EMAIL]");
    return input;
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Run this &lt;em&gt;before&lt;/em&gt; the prompt is built, every time. Always assume the AI provider sees your data. Always. The day you forget is the day a customer's JWT ends up in someone else's training set, or at minimum in someone else's logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 The "self-healing" promise is misleading
&lt;/h3&gt;

&lt;p&gt;This system &lt;em&gt;suggests&lt;/em&gt; fixes. It does not apply them. I almost shipped an "auto-apply patch on green confidence" toggle. Then I imagined a 3 AM page where a hallucinated regex wipes a production database because the model misread a column name. The toggle is gone. Auto-merging AI patches into prod is a 2027 problem, not a 2026 one. Be honest about this in your README, your marketing, and your internal pitches. Engineers will trust you more.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.4 Hangfire retries are silent (and cost money)
&lt;/h3&gt;

&lt;p&gt;If the AI call times out, Hangfire retries it. If the AI call consistently times out — bad prompt, big payload, network blip — Hangfire retries it three times. Each retry costs a Groq credit. The &lt;code&gt;[AutomaticRetry(Attempts = 3)]&lt;/code&gt; attribute is the default, and the default is wrong for any external dependency that costs money.&lt;/p&gt;

&lt;p&gt;Fix: lower the count, add delay, and add a circuit breaker. This is what I have on the worker method now:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
csharp
[AutomaticRetry(Attempts = 2, DelaysInSeconds = new[] { 30, 120 })]
public async Task ProcessErrorAsync(ErrorLog errorLog) { ... }


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Two attempts, with 30s and 2m delays. That bounds the cost spiral when something is wrong. A truly broken state would still cost 2x per error, but it would not retry 5 more times in a tight loop and drain a month's budget in an hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.5 No correlation between dashboard event and the original request
&lt;/h3&gt;

&lt;p&gt;The user got a 500 with no error ID. The dashboard showed a fix suggestion with no way to find the request that caused it. So when an engineer wanted to reproduce the error, they had to guess the URL, the headers, the auth state. Useless.&lt;/p&gt;

&lt;p&gt;Fix: generate a &lt;code&gt;CorrelationId&lt;/code&gt; once in the middleware, return it in the response header, and store it on the &lt;code&gt;ErrorLog&lt;/code&gt; model. One UUID, two places. The dashboard now shows &lt;code&gt;#1234&lt;/code&gt; next to each error and the engineer can &lt;code&gt;grep&lt;/code&gt; their logs for that ID.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
csharp
// in the middleware
var correlationId = Guid.NewGuid().ToString("N");
context.Response.Headers["X-Correlation-Id"] = correlationId;
errorLog.CorrelationId = correlationId;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Part 5 — the live dashboard
&lt;/h2&gt;

&lt;p&gt;The dashboard is a 280-line React app in &lt;code&gt;SmartLogAnalyzer.Dashboard/smart-log-analyzer-dashboard/&lt;/code&gt;. The whole real-time piece is fifty lines of hooks.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
useEffect(() =&amp;gt; {
  const newConnection = new signalR.HubConnectionBuilder()
    .withUrl(`${API_URL}/errorHub`)
    .withAutomaticReconnect()
    .build();
  setConnection(newConnection);
}, []);

useEffect(() =&amp;gt; {
  if (!connection) return;
  connection.start().then(() =&amp;gt; {
    setConnected(true);
    connection.on('ReceiveErrorUpdate', (errorJson: string) =&amp;gt; {
      const error: ErrorLog = JSON.parse(errorJson);
      setErrors(prev =&amp;gt; {
        const index = prev.findIndex(e =&amp;gt; e.id === error.id);
        if (index !== -1) {
          const updated = [...prev];
          updated[index] = error;
          return updated;
        }
        return [error, ...prev];
      });
    });
  });
  return () =&amp;gt; { connection.stop(); };
}, [connection]);


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The wire is JSON-over-SignalR. The server-side hub does &lt;code&gt;Clients.All.SendAsync("ReceiveErrorUpdate", jsonString)&lt;/code&gt; and every open browser tab updates. No polling. No refresh button. You literally watch errors arrive, get analyzed, and become fixable, in real-time.&lt;/p&gt;

&lt;p&gt;A few small UX details I am proud of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Severity badges.&lt;/strong&gt; Errors with &lt;code&gt;occurrenceCount &amp;gt;= 10&lt;/code&gt; get a red &lt;code&gt;🔴 Critical&lt;/code&gt; chip. Under 2 is green. Engineers learn to scan for red.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Analyzing with AI..." spinner.&lt;/strong&gt; When a new error arrives, its card shows a spinner for the few seconds until the AI response comes back. The state machine is &lt;code&gt;pending → analyzing → analyzed&lt;/code&gt;, driven by whether &lt;code&gt;aiRootCause&lt;/code&gt; is set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand on click, stack trace in a &lt;code&gt;&amp;lt;details&amp;gt;&lt;/code&gt;.&lt;/strong&gt; Most engineers want the AI's take first. The stack trace is one click away.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When you should — and shouldn't — build this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Build it if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have more than three services throwing exceptions, and your on-call rotation is a human who hates pages at 3 AM.&lt;/li&gt;
&lt;li&gt;You are already paying for an LLM API, or you have a GPU sitting around running Ollama.&lt;/li&gt;
&lt;li&gt;Your mean time to acknowledge (MTTA) on alerts is more than five minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Don't build it if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have one service and a steady stream of bugs. Fix the bugs.&lt;/li&gt;
&lt;li&gt;Your "errors" are mostly business-logic edge cases — a missing null check that is actually a missing requirement. The AI cannot help with those.&lt;/li&gt;
&lt;li&gt;You don't have CI/CD yet. Self-healing on top of an unsafe deploy pipeline is just a faster way to break production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The general rule:&lt;/strong&gt; a 200-line NuGet package won't fix a 2000-line architecture problem. This system is a force multiplier on a healthy codebase. On an unhealthy one, it is a faster way to find out how unhealthy you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo and how to run it
&lt;/h2&gt;

&lt;p&gt;The full source is at &lt;a href="https://github.com/ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway" rel="noopener noreferrer"&gt;github.com/ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway&lt;/a&gt;. To run it locally:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
git clone https://github.com/ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway.git
cd Smart-Log-Analyzer-Self-Healing-API-Gateway
cp .env.example .env
# Edit .env — set GROQ_API_KEY (free at groq.com), DB_SERVER, REDIS_HOST
start-all.bat    # Windows; the repo has a Makefile-equivalent for *nix


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three windows open: the API, the Worker, the Dashboard. Open &lt;code&gt;http://localhost:3000&lt;/code&gt;, click the "Trigger Test Error" link, and watch a &lt;code&gt;NullReferenceException&lt;/code&gt; arrive, get analyzed, and become a clickable fix suggestion, all in under 4 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The future of "self-healing" is not magic. It is a small, honest pipeline: catch the error, give the model the right context, push the analysis to a human in real-time, let the human close the loop. The model writes the boilerplate diff. The engineer writes the actual fix. That is a real workflow, and it works today, on the same .NET you are already running, with one extra NuGet package and one extra process.&lt;/p&gt;

&lt;p&gt;If you build something similar and run into the same five problems, I'd love to hear about it. The repo is open for issues, PRs, and rants about how your retry policy bankrupted your LLM budget. We've all been there.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Build with:&lt;/strong&gt; .NET 10 · ASP.NET Core · Hangfire · Semantic Kernel · Groq (llama-3.3-70b-versatile) · SignalR · MSSQL · Redis · React&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway" rel="noopener noreferrer"&gt;ZalaAvinash/Smart-Log-Analyzer-Self-Healing-API-Gateway&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Avinash Zala is a senior .NET engineer in Surat, India, with 7+ years building enterprise web apps, APIs, and ERP systems. He is currently adding AI/LLM capabilities to his stack and writing about what he learns. &lt;a href="https://github.com/ZalaAvinash" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/avinash-zala/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>csharp</category>
      <category>ai</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
