<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sharad Kumar</title>
    <description>The latest articles on DEV Community by Sharad Kumar (@sharad_kumar_45b990921489).</description>
    <link>https://dev.to/sharad_kumar_45b990921489</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3916780%2Fac06fee2-be79-4289-8730-b12b23a3416a.png</url>
      <title>DEV Community: Sharad Kumar</title>
      <link>https://dev.to/sharad_kumar_45b990921489</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sharad_kumar_45b990921489"/>
    <language>en</language>
    <item>
      <title>I Built a Production AI Layer Inside a Legacy ASP.NET Core App — and It Broke in Ways Tutorials Never Mention</title>
      <dc:creator>Sharad Kumar</dc:creator>
      <pubDate>Tue, 12 May 2026 00:34:14 +0000</pubDate>
      <link>https://dev.to/sharad_kumar_45b990921489/i-built-a-production-ai-layer-inside-a-legacy-aspnet-core-app-and-it-broke-in-ways-tutorials-4edg</link>
      <guid>https://dev.to/sharad_kumar_45b990921489/i-built-a-production-ai-layer-inside-a-legacy-aspnet-core-app-and-it-broke-in-ways-tutorials-4edg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most LLM tutorials assume you start from nothing. A blank project. A clean architecture. No constraints. No legacy code. No deployment history. No production traffic.&lt;/p&gt;

&lt;p&gt;That is not how real systems work.&lt;/p&gt;

&lt;p&gt;I spent one week integrating a production-grade AI service layer into an existing ASP.NET Core MVC e-commerce system that was already live, already structured, and already dependent on architectural decisions I couldn't change. The challenge wasn't calling an LLM API. It was designing an AI layer that could survive inside a real backend system without breaking testability, without leaking cost, without becoming tightly coupled to the domain, and without collapsing the moment the model or provider behaved unexpectedly.&lt;/p&gt;

&lt;p&gt;The feature itself, a tone-aware product description generator, is simple. The system design behind it is not. This article is about the architectural decisions, the production failure modes, and the assumptions that broke the moment an LLM entered a real backend system.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI System Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Provider/Domain Seam: The Most Important Structural Decision
&lt;/h3&gt;

&lt;p&gt;The first instinct when adding AI to any backend is to create one service class that does everything. One class, one interface, done.&lt;/p&gt;

&lt;p&gt;That instinct produces a system you cannot test, cannot swap, and cannot extend without touching everything.&lt;/p&gt;

&lt;p&gt;The correct design draws a hard seam between two concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provider layer&lt;/strong&gt; (&lt;code&gt;IAIService&lt;/code&gt; / &lt;code&gt;AzureOpenAIService&lt;/code&gt;): knows how to talk to an LLM. Accepts two strings (system prompt, user prompt), returns a string. Knows nothing about what a book or product is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain layer&lt;/strong&gt; (&lt;code&gt;IProductAIService&lt;/code&gt; / &lt;code&gt;BookAIService&lt;/code&gt;): knows what a description request looks like, what tone means, and how prompts should be constructed. Knows nothing about Azure, HTTP, or Semantic Kernel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Above that seam: domain logic, testable with zero real network calls. Below it: provider mechanics, swappable without touching anything above. Swap Azure OpenAI for Ollama, write one new class, and change one registration. The domain layer doesn't notice.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Naming is a design smell detector. My original class was called &lt;code&gt;OpenAIService&lt;/code&gt; and it implemented both interfaces. When I tried to give it an honest name, I couldn't because it was doing two things."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Generic Wrapper That Eliminates Try/Catch Everywhere
&lt;/h3&gt;

&lt;p&gt;With the seam in place, the next question was: what does every AI call return? My first draft returned plain strings or domain objects directly. That looked fine until I had to handle failures, and I started writing the same &lt;code&gt;try/catch&lt;/code&gt; block in three different places.&lt;/p&gt;

&lt;p&gt;Every AI call in the system returns &lt;code&gt;AIResponse&amp;lt;T&amp;gt;&lt;/code&gt;, not a raw result type. The wrapper carries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;    &lt;span class="n"&gt;Success&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;init&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;      &lt;span class="n"&gt;Data&lt;/span&gt;         &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;init&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;ErrorMessage&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;init&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;    &lt;span class="n"&gt;FromCache&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;init&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="n"&gt;TokensUsed&lt;/span&gt;   &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;init&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;fromCache&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The payoff is architectural: &lt;strong&gt;no caller above the provider layer ever writes a try/catch.&lt;/strong&gt; Every feature just checks &lt;code&gt;result.Success&lt;/code&gt;. Error handling is decided once, at the boundary where it's caught, and that discipline holds automatically across every AI feature you add later.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AIResponse&amp;lt;string&amp;gt;&lt;/code&gt; at the provider layer. &lt;code&gt;AIResponse&amp;lt;ProductDescriptionResult&amp;gt;&lt;/code&gt; at the domain layer. &lt;code&gt;AIResponse&amp;lt;ChatResult&amp;gt;&lt;/code&gt; when the chatbot arrives in Week 3. Same envelope, different payloads.&lt;/p&gt;

&lt;p&gt;The static factory methods aren't just convenience they make invalid states unrepresentable. You cannot call &lt;code&gt;Ok()&lt;/code&gt; and get &lt;code&gt;Success = false&lt;/code&gt;. You cannot call &lt;code&gt;Fail()&lt;/code&gt; and accidentally leave &lt;code&gt;ErrorMessage&lt;/code&gt; null.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Inheritance Trap I Almost Fell Into
&lt;/h3&gt;

&lt;p&gt;My first instinct was to write &lt;code&gt;ProductDescriptionResult : AIResponse&lt;/code&gt;. It compiles. It works. But the moment you try to store a &lt;code&gt;ProductDescriptionResult&lt;/code&gt; in a database or pass it across a service boundary, it drags &lt;code&gt;Success&lt;/code&gt;, &lt;code&gt;ErrorMessage&lt;/code&gt;, and &lt;code&gt;TokensUsed&lt;/code&gt; with it, infrastructure concerns that mean nothing in the domain layer.&lt;/p&gt;

&lt;p&gt;Composition is correct here. The result class is a pure data payload. The wrapper is the envelope. Neither inherits from the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Prompts as Architectural Contracts
&lt;/h3&gt;

&lt;p&gt;This is where prompt engineering actually lives in the codebase, not scattered across controllers, not inline in HTTP calls, but centralised in a single switch expression:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nf"&gt;BuildSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DescriptionTone&lt;/span&gt; &lt;span class="n"&gt;tone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tone&lt;/span&gt; &lt;span class="k"&gt;switch&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;DescriptionTone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Professional&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="s"&gt;"You are a professional copywriter for a premium book retailer. "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
        &lt;span class="s"&gt;"Write concise, authoritative product descriptions. "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
        &lt;span class="s"&gt;"Never fabricate awards, authors, or facts not provided."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DescriptionTone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Casual&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="s"&gt;"You are a friendly book recommender. Keep it warm and enthusiastic."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things worth calling out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;BuildSystemPrompt()&lt;/code&gt; and &lt;code&gt;BuildUserPrompt()&lt;/code&gt; are &lt;em&gt;separate private methods by design&lt;/em&gt;. Developer rules and user input must never accidentally merge. That separation is what makes the service layer both testable and secure.&lt;/li&gt;
&lt;li&gt;Hardcoded system prompts are a deliberate decision, not laziness. They represent fixed feature contracts that don't change at runtime, they don't bleed across features, and they're easy to audit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The temperature misconception is worth correcting explicitly: &lt;strong&gt;temperature is not a safety control.&lt;/strong&gt; It's a creativity dial. The system prompt is where your behavioural constraints live. Three controls, three separate jobs: system prompt = instructions, temperature = style, &lt;code&gt;max_tokens&lt;/code&gt; = budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatHistory: Single-Turn Today, Multi-Turn Ready Tomorrow
&lt;/h3&gt;

&lt;p&gt;Most beginners send one big string to an LLM API. Using Semantic Kernel's &lt;code&gt;ChatHistory&lt;/code&gt; correctly is a signal that you understand how chat models actually work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;chatHistory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ChatHistory&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;chatHistory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddSystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;chatHistory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddUserMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The description generator creates a new &lt;code&gt;ChatHistory&lt;/code&gt; per call by design. When the chatbot arrives in Week 3, the same pattern extends with no architecture change, just persistence added.&lt;/p&gt;

&lt;h3&gt;
  
  
  CancellationToken: The Hidden Cost Leak
&lt;/h3&gt;

&lt;p&gt;Every async AI method in the chain accepts and propagates a &lt;code&gt;CancellationToken&lt;/code&gt;. This isn't just politeness; it's cost control. If a user closes their browser tab mid-request and the token isn't propagated all the way to the Azure SDK call, your app completes the API call and gets billed for a response nobody receives.&lt;/p&gt;

&lt;p&gt;A token accepted but not passed to the next &lt;code&gt;await&lt;/code&gt; is worse than not accepting it at all; it gives false confidence that cancellation is handled. The chain must be unbroken: Controller → Service → Provider → Azure SDK call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Response Caching: Proving It Works in the UI
&lt;/h3&gt;

&lt;p&gt;Caching is wired at the provider layer with a hash-based key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;cacheKey&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;$"ai:text:&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetHashCode&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetHashCode&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;AIResponse&amp;lt;T&amp;gt;&lt;/code&gt; wrapper carries a &lt;code&gt;FromCache&lt;/code&gt; boolean all the way to the admin UI, where it renders as &lt;code&gt;⚡ Cached&lt;/code&gt; vs &lt;code&gt;✨ Generated&lt;/code&gt;. That's not decoration, it's verification. You can prove your caching is working in a live demo without opening Application Insights.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;EnableCaching&lt;/code&gt; is a feature flag in &lt;code&gt;AISettings&lt;/code&gt;, not a hardcoded &lt;code&gt;true&lt;/code&gt;. You can flip it in the Azure App Service configuration without redeploying. That's a production pattern, not a tutorial habit.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Cost Dashboard: Why Almost Nobody Builds This
&lt;/h3&gt;

&lt;p&gt;Most AI portfolio projects show features. Mine also shows cost. The admin dashboard tracks tokens per feature per day, cost per request, and cache hit rate. This makes the economics of AI visible and is the difference between someone who &lt;em&gt;built&lt;/em&gt; a feature and someone who &lt;em&gt;shipped&lt;/em&gt; one.&lt;/p&gt;

&lt;p&gt;Concrete number: approximately $15 total API spend over 8 weeks of development, with &lt;code&gt;gpt-4o-mini&lt;/code&gt; during development (roughly 15x cheaper than &lt;code&gt;gpt-4o&lt;/code&gt;) and caching in place.&lt;/p&gt;




&lt;h2&gt;
  
  
  Supporting Architecture Decisions
&lt;/h2&gt;

&lt;p&gt;Three decisions that shaped how the AI layer was placed and wired were included because they caused real problems, not because they're interesting trivia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI infrastructure belongs at the application root, not inside feature folders.&lt;/strong&gt; The app used area-based routing. The instinct was to put &lt;code&gt;Services/AI/&lt;/code&gt; inside an Area. Wrong call, UI grouping doesn't determine where cross-cutting infrastructure lives. AI features span the whole application. Scoping them to an Area implies boundaries that don't exist and creates coupling that becomes painful when the chatbot and search features arrive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All AI wiring lives in one extension method.&lt;/strong&gt; One line in &lt;code&gt;Program.cs&lt;/code&gt;: &lt;code&gt;builder.Services.AddAIServices(builder.Configuration)&lt;/code&gt;. Everything else is encapsulated. The subtle thing to know: if you register the same concrete class twice under different interfaces without careful use of &lt;code&gt;GetRequiredService&lt;/code&gt;, you can end up with two separate instances per request instead of one. Most tutorials never flag this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets never touch the config file.&lt;/strong&gt; &lt;code&gt;ApiKey&lt;/code&gt; exists in the settings class but is absent from &lt;code&gt;appsettings.json&lt;/code&gt;. User Secrets fill it locally, App Service Application Settings fill it in production. The application code is identical in both environments everything is injected through the same configuration pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges &amp;amp; Learnings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Debugging Arc: 404 → 400 → 200
&lt;/h3&gt;

&lt;p&gt;Getting the AI endpoint working produced three sequential errors, each from a different layer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;404&lt;/strong&gt; : routing was treating &lt;code&gt;AI&lt;/code&gt; as an area name. Fix: explicit &lt;code&gt;[Route("AI")]&lt;/code&gt; on the controller.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;400&lt;/strong&gt; : the JavaScript payload sends &lt;code&gt;"Professional"&lt;/code&gt; as a string; the C# model declares a &lt;code&gt;DescriptionTone&lt;/code&gt; enum. ASP.NET Core's default deserializer returns a silent 400 without bridging them. Fix: add &lt;code&gt;JsonStringEnumConverter&lt;/code&gt;. This is an AI-specific integration point — the LLM-facing tone selector has to round-trip correctly through the API layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;200&lt;/strong&gt; : success.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each error was a different subsystem. Real integrations rarely have a single root cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cascade Failure at Deployment
&lt;/h3&gt;

&lt;p&gt;At this point, the feature was working locally. Deployment looked straightforward.&lt;/p&gt;

&lt;p&gt;It wasn't.&lt;/p&gt;

&lt;p&gt;After deploying, &lt;code&gt;MaxTokens&lt;/code&gt; was reading as &lt;code&gt;0&lt;/code&gt;. Completely unrelated to the actual cause: a missing Azure App Settings entry was causing &lt;code&gt;AIServiceExtensions.cs&lt;/code&gt; to crash at startup, leaving the settings object in a zeroed-out state. A &lt;code&gt;NullReferenceException&lt;/code&gt; deep in Semantic Kernel setup was the symptom. A missing config key was the cause.&lt;/p&gt;

&lt;p&gt;The lesson applies specifically to AI service registration: &lt;strong&gt;validate that your AI configuration is present and well-formed at startup, loudly, before any service gets built.&lt;/strong&gt; A &lt;code&gt;?? throw&lt;/code&gt; on the config read gives you a readable failure message at the right location, not a cryptic error three layers downstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two Hours I Lost to a URL
&lt;/h3&gt;

&lt;p&gt;This one stings to write. I spent the better part of an afternoon convinced my Semantic Kernel wiring was broken. Checked the DI registration twice. Re-read the SK docs. Added logging everywhere. Silent failure, no exception, no response.&lt;/p&gt;

&lt;p&gt;The actual problem: I had copied the full REST endpoint URL from the Azure portal, something like &lt;code&gt;https://your-resource.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-01&lt;/code&gt; directly into my config. Semantic Kernel only wants the base domain. It constructs the rest itself. One wrong URL format, zero helpful error messages, two hours gone.&lt;/p&gt;

&lt;p&gt;I'm including this because the Azure portal genuinely shows you the full path and it looks correct. It isn't.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;https://your-resource.openai.azure.com/&lt;/code&gt;, the base domain only. Semantic Kernel builds the path from there based on your deployment name and API version.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Error Messages Are a Security Leak
&lt;/h3&gt;

&lt;p&gt;A raw Azure OpenAI &lt;code&gt;401&lt;/code&gt; or timeout exception carries endpoint URLs, subscription hints, and SDK internals in &lt;code&gt;ex.Message&lt;/code&gt;. None of that should reach a client. The AI service boundary catches the exception, logs it internally with full context, and returns a sanitised message to the caller. One line of difference, significant surface reduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TinyMCE Silent Failure
&lt;/h3&gt;

&lt;p&gt;The AI card writes to the product description field via &lt;code&gt;textarea.value = text&lt;/code&gt;. This silently does nothing when TinyMCE is active. TinyMCE replaces the DOM element entirely. No error, no feedback. The button appears to work, but nothing changes. Lesson: The AI integration point in the UI needs to know what the UI is actually made of.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Draw the seam before you write any code.&lt;/strong&gt; The provider/domain split is the foundational decision. Everything else, testability, swappability, and cost control, follows from it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make failure a return value, not a control flow mechanism.&lt;/strong&gt; An AI system has expected failure modes: timeouts, rate limits, and degraded responses. &lt;code&gt;AIResponse&amp;lt;T&amp;gt;&lt;/code&gt; handles them at the boundary. No caller above it needs a try/catch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System prompts are architectural contracts, not strings.&lt;/strong&gt; They live in one place, they don't change at runtime, and they're the only place your business rules for AI behaviour exist. Treat them with the same discipline as interface contracts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temperature is a creativity dial, not a safety control.&lt;/strong&gt; The system prompt is where constraints live. Knowing the difference affects every feature you build on top of the same provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Propagate &lt;code&gt;CancellationToken&lt;/code&gt; all the way to the LLM call.&lt;/strong&gt; A broken chain gives false confidence and silently leaks API cost when users abandon requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make cache hits observable.&lt;/strong&gt; The &lt;code&gt;FromCache&lt;/code&gt; signal in the response wrapper isn't decoration; it's the only way to verify your caching is working without opening a monitoring dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost is a first-class concern, not an afterthought.&lt;/strong&gt; Token tracking per feature per day is what separates a portfolio project from a production system. Almost no one builds this. Build it anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI startup failures should be loud and specific.&lt;/strong&gt; A missing config key that crashes deep inside Semantic Kernel setup is far worse than a clean, early "AI configuration is missing" exception at the service boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion &amp;amp; What's Next
&lt;/h2&gt;

&lt;p&gt;Week 1 is one AI feature with a production-grade system behind it: a layered architecture with a hard provider/domain seam, a generic response envelope, observable caching, structured logging, graceful degradation, and cost tracking. The feature is modest. The system thinking is not.&lt;/p&gt;

&lt;p&gt;Week 2 is RAG semantic search using &lt;code&gt;text-embedding-3-small&lt;/code&gt; and Azure SQL Vector, with hybrid search (vector + keyword in parallel) to handle exact matches that pure vector retrieval handles poorly. The agentic chatbot in Weeks 3 - 4 depends on RAG to ground its responses, which is why RAG comes first. Dependency-first sequencing isn't just planning hygiene; it prevents retrofitting at the seam where two features meet.&lt;/p&gt;

&lt;p&gt;The design decisions in Week 1 were made with Week 4 in mind. That's what production AI system design actually is.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built on: ASP.NET Core MVC · Azure OpenAI · Semantic Kernel · Microsoft.Extensions.AI · gpt-4.1-mini · IMemoryCache&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🤝 Connect with Me
&lt;/h2&gt;

&lt;p&gt;If you're building AI into .NET or just following along, let's connect.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💼 &lt;a href="https://www.linkedin.com/in/sharad9kumar/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🐙 &lt;a href="https://github.com/sharad99kr" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>dotnet</category>
      <category>azure</category>
      <category>csharp</category>
    </item>
  </channel>
</rss>
