<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: TheProdSDE</title>
    <description>The latest articles on DEV Community by TheProdSDE (@theprodsde).</description>
    <link>https://dev.to/theprodsde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png</url>
      <title>DEV Community: TheProdSDE</title>
      <link>https://dev.to/theprodsde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/theprodsde"/>
    <language>en</language>
    <item>
      <title>Azure Key Vault vs AWS Secrets Manager: Cutting Through the Cloud Marketing</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Tue, 16 Jun 2026 10:56:10 +0000</pubDate>
      <link>https://dev.to/theprodsde/azure-key-vault-vs-aws-secrets-manager-cutting-through-the-cloud-marketing-5dmh</link>
      <guid>https://dev.to/theprodsde/azure-key-vault-vs-aws-secrets-manager-cutting-through-the-cloud-marketing-5dmh</guid>
      <description>&lt;h2&gt;
  
  
  "A hands-on breakdown of Azure Key Vault, AWS Secrets Manager, IRSA, Azure Managed Identity, and when HashiCorp Vault actually matters."
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;This is Part 3 of our 4-part series on secret management for cloud-native teams. &lt;a href="https://dev.tolink-to-part-1"&gt;Start with Part 1 if you haven't&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most cloud teams face this decision: should we use Azure Key Vault, AWS Secrets Manager, Google Secret Manager, or HashiCorp Vault?&lt;/p&gt;

&lt;p&gt;The marketing departments of Azure and AWS have strong opinions. The internet has conflicting opinions.&lt;/p&gt;

&lt;p&gt;We're going to give you the &lt;em&gt;technical&lt;/em&gt; answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scoreboard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Azure Key Vault&lt;/th&gt;
&lt;th&gt;AWS Secrets Manager&lt;/th&gt;
&lt;th&gt;Google SM&lt;/th&gt;
&lt;th&gt;Vault&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Managed Identity Integration&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐ (IRSA)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto DB Credential Rotation&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic Credentials&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PKI / TLS Management&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;⭐&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Cloud Support&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;N/A (AWS-only)&lt;/td&gt;
&lt;td&gt;N/A (GCP-only)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price per Secret&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Self-hosted / Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to Use Each
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Azure Key Vault
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use this if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;90% of your workloads live in Azure (App Service, Functions, AKS)&lt;/li&gt;
&lt;li&gt;Your team is Azure-first and has Entra ID mastery&lt;/li&gt;
&lt;li&gt;You need managed identity integration out-of-the-box&lt;/li&gt;
&lt;li&gt;Compliance requirement: "secrets must be in the same cloud"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;YAML Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Kubernetes Pod with Azure Workload Identity&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp:v1&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_VAULT_URL&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://myvault.vault.azure.net/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use this if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;90% of your workloads live in AWS (ECS, Lambda, EKS)&lt;/li&gt;
&lt;li&gt;You need automatic RDS/Redshift credential rotation (strongest feature)&lt;/li&gt;
&lt;li&gt;Your database already has IAM auth enabled&lt;/li&gt;
&lt;li&gt;You want per-secret CloudTrail audit events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;YAML Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Kubernetes with IRSA (IAM Roles for Service Accounts)&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::ACCOUNT:role/app-role&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp:v1&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_ROLE_ARN&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::ACCOUNT:role/app-role&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/eks.amazonaws.com/serviceaccount/token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Google Secret Manager
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use this if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're GCP-native (Cloud Run, GKE, Cloud Functions)&lt;/li&gt;
&lt;li&gt;You need deep integration with GCP Workload Identity&lt;/li&gt;
&lt;li&gt;You want automatic secret versioning and access logs&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  HashiCorp Vault
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use this if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your infrastructure is multi-cloud (AWS + Azure + On-Prem)&lt;/li&gt;
&lt;li&gt;You need dynamic credentials (short-lived, auto-generated)&lt;/li&gt;
&lt;li&gt;You need PKI/TLS certificate management&lt;/li&gt;
&lt;li&gt;You manage databases, SSH keys, or APIs that require rotation&lt;/li&gt;
&lt;li&gt;Compliance requirement: centralized secret audit trail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Production Pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ExternalSecret syncs from Vault to Kubernetes&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;refreshInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1h&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault-backend&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secret&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db_password&lt;/span&gt;
      &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;secret/data/prod/app/db&lt;/span&gt;
        &lt;span class="na"&gt;property&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vault rotates the secret in the backend → ESO picks it up on the next refresh → App reads the updated file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Truth
&lt;/h2&gt;

&lt;p&gt;Your choice isn't "Key Vault vs Secrets Manager."&lt;/p&gt;

&lt;p&gt;Your choice is: &lt;strong&gt;Do I need a secrets storage solution, or a secrets management solution?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; = Key Vault, Secrets Manager (static secrets only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management&lt;/strong&gt; = Vault (rotation, dynamic creds, PKI, audit)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams end up with &lt;strong&gt;both&lt;/strong&gt;: Cloud platform for workload identity credentials (zero storage, pure auth), and Vault for everything else.&lt;/p&gt;




&lt;h2&gt;
  
  
  Read the Full Technical Deep Dive
&lt;/h2&gt;

&lt;p&gt;We broke down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Managed Identity internals (how IRSA actually works)&lt;/li&gt;
&lt;li&gt;✅ Database credential rotation patterns&lt;/li&gt;
&lt;li&gt;✅ Cost analysis (which is cheaper at scale?)&lt;/li&gt;
&lt;li&gt;✅ Security architecture (etcd encryption, audit trails)&lt;/li&gt;
&lt;li&gt;✅ When to choose which tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://medium.com/towards-artificial-intelligence/azure-key-vault-vs-aws-secrets-manager-managed-identity-irsa-and-cloud-native-secret-management-725b4762a1a4" rel="noopener noreferrer"&gt;Read Part 3: Azure Key Vault vs AWS Secrets Manager (Full Technical Breakdown)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Series
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.tolink-to-part-1"&gt;Part 1: Your Secrets Are Probably Leaking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.tolink-to-part-2"&gt;Part 2: HashiCorp Vault Deep Dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 3: Azure Key Vault vs AWS Secrets Manager (you are here)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Part 4: Kubernetes Secrets &amp;amp; Production Patterns (coming soon)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>security</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>I Built a Finance App at a Hackathon, Abandoned It for Months, Then GitHub Copilot Helped Me Finally Ship It</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Sat, 06 Jun 2026 19:01:35 +0000</pubDate>
      <link>https://dev.to/theprodsde/financialedapp-analyse-your-expenses-55fc</link>
      <guid>https://dev.to/theprodsde/financialedapp-analyse-your-expenses-55fc</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Where did my money go?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every expense tracker ever built answers that question. I wanted to build something that answered a harder one:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Why did my money go there — and what should I do about it?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/karangehlod/FinancialEdApp" rel="noopener noreferrer"&gt;FinancialEdApp&lt;/a&gt;&lt;/strong&gt; is an AI-powered personal finance platform that combines expense tracking, budget planning, loan/EMI analysis, and a conversational AI assistant — all working together on your &lt;em&gt;actual&lt;/em&gt; financial data.&lt;/p&gt;

&lt;p&gt;It didn't start that way though. It started as a half-working prototype I built under pressure, shipped zero of, and left to collect dust in a private folder  for months. This challenge gave me the push to finally finish it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Comeback Story
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where It Was: An Honest Before
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep51he3cxgbtrudx0tsz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep51he3cxgbtrudx0tsz.png" alt="Dashboard which was not working " width="799" height="413"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7u7h7l215ldy4ps5ge7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7u7h7l215ldy4ps5ge7.png" alt="Chat UI too bad" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project started at Economic hackathon i saw somewhere. I had a vision for a &lt;em&gt;smart&lt;/em&gt; finance app — not just a tracker, but something that would actually &lt;em&gt;explain&lt;/em&gt; your financial behavior.&lt;/p&gt;

&lt;p&gt;What I shipped at the deadline was considerably humbler:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;State at Abandonment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Expense tracking UI&lt;/td&gt;
&lt;td&gt;✅ Worked, barely styled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget planning&lt;/td&gt;
&lt;td&gt;⚠️ UI existed, no logic behind it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Financial Assistant&lt;/td&gt;
&lt;td&gt;❌ Completely mocked — returned hardcoded strings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loan / EMI Analysis&lt;/td&gt;
&lt;td&gt;❌ Static inputs, zero calculation logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data persistence&lt;/td&gt;
&lt;td&gt;❌ Everything reset on refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;❌ None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD&lt;/td&gt;
&lt;td&gt;❌ "Deploy" meant running &lt;code&gt;npm run build&lt;/code&gt; on my laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud infrastructure&lt;/td&gt;
&lt;td&gt;❌ Never left localhost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI assistant — the &lt;em&gt;entire point of the app&lt;/em&gt; — was a facade. When a user asked "Why did my savings drop?", it returned the same canned response regardless of their data. I knew it. I just ran out of time.&lt;/p&gt;

&lt;p&gt;Then life happened. The code sat untouched for 7months.&lt;/p&gt;


&lt;h3&gt;
  
  
  The Turning Point
&lt;/h3&gt;

&lt;p&gt;When I saw this challenge, I opened the repo expecting to feel overwhelmed. That was hard to find the repo and code in machine sitting in some mycode folder.&lt;/p&gt;

&lt;p&gt;I made a list of what "actually finished" would mean:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Real AI assistant connected to the user's own financial data&lt;/li&gt;
&lt;li&gt;Working EMI calculator with full amortization logic&lt;/li&gt;
&lt;li&gt;Budget tracking that actually saves and persists state&lt;/li&gt;
&lt;li&gt;A CI/CD pipeline so deployments aren't a prayer&lt;/li&gt;
&lt;li&gt;Production deployment on Azure Kubernetes Service (AKS)
That's what I set out to finish. Here's how it went.&lt;/li&gt;
&lt;/ol&gt;


&lt;h3&gt;
  
  
  What Changed: The Real After
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The AI Assistant — from fake to functional&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest gap. The original "AI" was &lt;code&gt;if (query.includes("savings")) return hardcodedString&lt;/code&gt;. Embarrassing in retrospect.&lt;/p&gt;

&lt;p&gt;The rewrite connected the assistant to the user's actual expense history, budget targets, and loan data. Now when you ask &lt;em&gt;"Why did my savings decrease last month?"&lt;/em&gt; — the model gets your real data as context and responds with specific observations: &lt;em&gt;"Your dining expenses increased 34% in March and exceeded your ₹4,000 budget by ₹1,200."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loan &amp;amp; EMI Calculator — from stub to substance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The EMI form previously accepted inputs and did absolutely nothing with them. I built out the full amortization engine: monthly breakdowns, total interest calculations, and an early repayment simulator so users can see the actual rupee impact of prepaying a loan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget Tracking — actually persistent now&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Budgets used to vanish on refresh. The state management overhaul means budgets, categories, and goals persist properly. The budget vs. actual comparison view now shows live overrun indicators instead of static placeholder data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure — from "it works on my machine" to AKS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Containerized with Docker, automated with GitHub Actions, deployed to Azure Kubernetes Service with Azure Container Apps handling scaling. The CI/CD pipeline runs on every push to &lt;code&gt;main&lt;/code&gt; — lint, build, test, deploy. No more laptop deploys.&lt;/p&gt;


&lt;h3&gt;
  
  
  Screenshots
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch7b5xs104t1lim7j6zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch7b5xs104t1lim7j6zf.png" alt="Dashboard part 1" width="799" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tvugchq6yqete5q2imb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tvugchq6yqete5q2imb.png" alt="Dashboard part 2" width="800" height="412"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxk2epk2p4o7zyiejwdq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxk2epk2p4o7zyiejwdq1.png" alt="Expense" width="799" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje974qyxxcimcm0isdlx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje974qyxxcimcm0isdlx.png" alt="Budget" width="800" height="411"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzvnw86qvqniqrkfdg8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzvnw86qvqniqrkfdg8s.png" alt="Loans" width="800" height="410"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgei56jjlovo9kaqc9zpg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgei56jjlovo9kaqc9zpg.png" alt="EMI calculations" width="799" height="406"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmv74w5mccxnem15l29c1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmv74w5mccxnem15l29c1.png" alt="AI chat on Data" width="799" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtoh0vfz7q8y4xf88nf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtoh0vfz7q8y4xf88nf9.png" alt="User Profile" width="799" height="409"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  My Experience with GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;Copilot wasn't just a code autocomplete tool on this project. It was the reason I didn't abandon the repo &lt;em&gt;again&lt;/em&gt; the moment I reopened it.&lt;/p&gt;
&lt;h3&gt;
  
  
  Breaking Through the Paralysis
&lt;/h3&gt;

&lt;p&gt;The hardest part of returning to an old project is re-orientation. What does this file do? Why is this state structured this way? I found myself using Copilot Chat as a &lt;strong&gt;code archaeologist&lt;/strong&gt; — pasting in old functions and asking &lt;em&gt;"What was this trying to do?"&lt;/em&gt; before I could safely extend it.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Specific Moments That Mattered
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. The EMI Amortization Engine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wrote one comment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Calculate monthly amortization schedule for a loan&lt;/span&gt;
&lt;span class="c1"&gt;// given principal, annual interest rate, and tenure in months&lt;/span&gt;
&lt;span class="c1"&gt;// Return array of { month, principal, interest, balance }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copilot generated a complete, correct implementation. I verified the math against known loan calculators, found one edge case in the final-month rounding, asked Copilot to fix it — and it did. What would have taken me 90 minutes of formula-checking took 15.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. CI/CD Pipeline from Scratch&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I had never set up an AKS deployment pipeline before. I described what I wanted in a comment block in the YAML file and Copilot drafted the full GitHub Actions workflow: Docker build → push to Azure Container Registry → rolling deploy to AKS. It wasn't perfect on the first pass — the registry auth step needed fixing — but having a near-complete scaffold to edit was dramatically faster than starting from a blank file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. AI Context Injection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The trickiest engineering problem was safely injecting financial context into the AI prompts without exceeding token limits or leaking sensitive data. Copilot suggested a summarization pattern I hadn't considered — pre-processing transactions into statistical summaries (category totals, percentage changes, budget deviation) rather than passing raw transaction arrays. This made the prompts leaner and the AI responses noticeably more useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The Bug I'd Been Staring At&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There was a budget comparison calculation that returned correct totals for some months and wrong ones for others. I'd been looking at it for an hour. I opened Copilot Chat, described the symptom, and it spotted the issue in under a minute: a timezone offset was causing transactions near midnight to be bucketed into the wrong month. I would not have found that quickly on my own.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Broader Lesson
&lt;/h3&gt;

&lt;p&gt;I used AI tools across the entire development lifecycle on this project — not just for writing code. Planning, architecture review, debugging, documentation, infrastructure, and CI/CD all benefited.&lt;/p&gt;

&lt;p&gt;The most valuable insight: &lt;strong&gt;AI is most useful when you're specific.&lt;/strong&gt; Vague prompts get vague code. When I described the exact shape of data I had, the exact output I needed, and the exact constraint I was working within — Copilot delivered something I could actually use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React, Next.js, TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;OpenAI API, GitHub Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Docker, Azure Kubernetes Service, Azure Container Apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevOps&lt;/td&gt;
&lt;td&gt;GitHub Actions, CI/CD pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This project is now a real foundation, not a prototype. Planned next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bank account integrations via open banking APIs&lt;/li&gt;
&lt;li&gt;Automatic transaction import and categorization&lt;/li&gt;
&lt;li&gt;Financial forecasting with ML-based trend detection&lt;/li&gt;
&lt;li&gt;Family / shared budgeting mode&lt;/li&gt;
&lt;li&gt;Personalized financial literacy modules based on spending patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The app I demo today is not the app that existed when I reopened this repo. The core idea was always solid — the execution just needed time, focus, and honestly, a development partner that never got tired of my questions at 1am.&lt;/p&gt;

&lt;p&gt;That's what GitHub Copilot was on this project. Not magic. Just a remarkably patient collaborator.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/karangehlod/FinancialEdApp" rel="noopener noreferrer"&gt;https://github.com/karangehlod/FinancialEdApp&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your JWT Is Lying to You — The Authorization Problem Nobody Solves Correctly</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Tue, 02 Jun 2026 16:53:16 +0000</pubDate>
      <link>https://dev.to/theprodsde/your-jwt-is-lying-to-you-the-authorization-problem-nobody-solves-correctly-10a9</link>
      <guid>https://dev.to/theprodsde/your-jwt-is-lying-to-you-the-authorization-problem-nobody-solves-correctly-10a9</guid>
      <description>&lt;p&gt;A valid token proves who you are. It says almost nothing about what you're actually allowed to do.&lt;br&gt;
That gap is where most authorization architectures silently collapse — and where botnets quietly walk in.&lt;br&gt;
I published a deep-dive on Medium covering the full picture of what happens after token validation — the layer most applications get dangerously wrong.&lt;br&gt;
What's covered:&lt;br&gt;
Why JWTs aren't enough&lt;br&gt;
JWTs encode static claims at issuance time. Authorization decisions are almost never static. Resource state, approval workflows, time-based rules, and revocation events all live outside the token.&lt;br&gt;
The four authorization models&lt;br&gt;
RBAC, ABAC, ReBAC, and Policy-as-Code — when each applies, and when you need to combine them.&lt;br&gt;
Open Policy Agent (OPA) — in depth&lt;br&gt;
Full Rego policy walkthrough, real HTTP request/response, unit test examples, and Kubernetes admission control via Gatekeeper.&lt;br&gt;
The policy engine landscape&lt;br&gt;
Honest tradeoffs between OPA, Cedar (AWS), Cerbos, Casbin, and SpiceDB — including who each one is actually best for.&lt;br&gt;
Threat-aware authorization&lt;br&gt;
How botnets pass authentication and exploit weak authorization, how to wire IP reputation, velocity, and device signals into your OPA policy, and why BOLA has been OWASP's #1 API security risk since 2019.&lt;/p&gt;

&lt;p&gt;👉 Read the full article on Medium: &lt;a href="https://medium.com/towards-artificial-intelligence/every-developer-gets-auth-wrong-until-they-understand-this-0e34b66ed0d3" rel="noopener noreferrer"&gt;AuthZ&lt;/a&gt;&lt;br&gt;
(Part 1 — covering OAuth 2.0, OIDC, SAML, JWT internals, Okta vs Keycloak — is linked at the top of the article.)&lt;/p&gt;

</description>
      <category>security</category>
      <category>webdev</category>
      <category>architecture</category>
      <category>devops</category>
    </item>
    <item>
      <title>Vibe-Coding Works. That's Exactly Why It Will Destroy Your Codebase at Scale.</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Mon, 11 May 2026 07:43:47 +0000</pubDate>
      <link>https://dev.to/theprodsde/vibe-coding-works-thats-exactly-why-it-will-destroy-your-codebase-at-scale-1e56</link>
      <guid>https://dev.to/theprodsde/vibe-coding-works-thats-exactly-why-it-will-destroy-your-codebase-at-scale-1e56</guid>
      <description>&lt;p&gt;&lt;strong&gt;The productivity gains are real. The compound debt is realer. And almost no team is measuring the right thing.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Let me say the quiet part out loud: &lt;strong&gt;vibe-coding is not the villain in this story.&lt;/strong&gt; The critics who call it "reckless" are wrong. The evangelists who call it "the future of engineering" are also wrong. And if your team is using it at scale without a framework, you are quietly building a time bomb inside your own codebase.&lt;/p&gt;

&lt;p&gt;I've watched this pattern play out across multiple production systems. The first 90 days are miraculous. Features ship in hours. Backlogs shrink. Stakeholders are ecstatic. Then, somewhere around the 4-month mark, something subtle changes. PRs get harder to review. Bugs surface in places nobody thought to check. Onboarding new engineers takes longer. And the team can't quite explain why.&lt;/p&gt;

&lt;p&gt;Nobody tracks the inflection point. That's the whole problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The AI didn't make your codebase worse. It made your bad habits move at the speed of light."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The consensus is measuring the wrong metric
&lt;/h2&gt;

&lt;p&gt;When teams evaluate vibe-coding at scale, they measure &lt;strong&gt;output velocity&lt;/strong&gt; — lines shipped, tickets closed, deployment frequency. These numbers go up. Sometimes dramatically. So the conclusion seems obvious: AI-assisted coding works.&lt;/p&gt;

&lt;p&gt;But output velocity is a leading indicator. The lagging indicators — the ones that actually matter for a codebase that needs to survive past your next sprint cycle — are mostly invisible until they aren't. Coupling density. Contextual coherence across modules. The ratio of code that can be safely changed by an engineer who didn't write it. These don't show up in your Jira dashboard.&lt;/p&gt;

&lt;p&gt;Vibe-coding is extraordinarily good at generating &lt;strong&gt;locally correct&lt;/strong&gt; code. A function that does what its prompt described. A class that passes its tests. An endpoint that returns the right shape. What it's structurally blind to — without deliberate scaffolding — is &lt;strong&gt;global coherence&lt;/strong&gt;. And global coherence is exactly what decides whether your codebase scales or suffocates.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 4 hidden costs nobody's budgeting for
&lt;/h2&gt;

&lt;p&gt;These aren't theoretical. They're what I see in every team that's been vibe-coding at scale for more than a quarter without a deliberate system around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 01 — The coherence gap
&lt;/h3&gt;

&lt;p&gt;AI generates code that fits the prompt, not the system. Over hundreds of PRs, modules develop subtly incompatible assumptions. Nobody catches it because each PR looks fine in isolation. The architecture degrades not through big, obvious mistakes but through the slow accumulation of locally correct, globally incompatible decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 02 — The ownership void
&lt;/h3&gt;

&lt;p&gt;Engineers stop building mental models of their code. When a bug surfaces in AI-written logic, the team debugs blind. Nobody truly "owns" what they didn't think through. This isn't a character flaw — it's a structural consequence of a workflow that optimizes generation over comprehension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 03 — The test confidence trap
&lt;/h3&gt;

&lt;p&gt;AI writes tests optimistically — testing what the code &lt;em&gt;does&lt;/em&gt;, not what it &lt;em&gt;should&lt;/em&gt; do. Green CI becomes a false floor. Production failures start happening in the gap between "tests pass" and "system works." The more AI-generated your test suite, the more confidently wrong your confidence intervals are.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 04 — The context window debt
&lt;/h3&gt;

&lt;p&gt;Every future AI-assisted change now requires feeding the model more context to compensate for the incoherence introduced by previous AI-assisted changes. Debt compounds automatically. The very tool you're using to pay down technical debt is silently issuing new debt instruments faster than you can retire the old ones — unless you architect for it deliberately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Here's what the critics get wrong
&lt;/h2&gt;

&lt;p&gt;The anti-vibe-coding camp makes a seductive argument: the AI doesn't understand your system, therefore it cannot write code that belongs in your system. This sounds rigorous. It is also completely beside the point.&lt;/p&gt;

&lt;p&gt;No tool understands your system. Junior engineers don't understand your system on day one. Contractors don't understand your system. Copy-pasted Stack Overflow answers definitely don't understand your system. We have always used code-generation mechanisms that produce locally correct, globally risky output — and we've always managed that risk through process: code review, architecture docs, team conventions, onboarding.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The failure mode isn't that AI writes bad code. The failure mode is that teams stop doing the process work that made previous bad code safe to ship."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When vibe-coding goes wrong at scale, the AI isn't the reason. The reason is that the team scaled their output without scaling their quality infrastructure. They optimized the generation step without investing in the integration step. And because the AI made generation so effortless, the integration step started to feel like bureaucratic overhead — until it wasn't optional anymore.&lt;/p&gt;




&lt;h2&gt;
  
  
  The inflection point nobody is tracking
&lt;/h2&gt;

&lt;p&gt;In my experience, there's a reliable pattern: teams hit an invisible inflection point around the 90-day mark. Before it, vibe-coding feels like a superpower. After it, the team is spending an increasing proportion of its time managing the artifacts of its own productivity.&lt;/p&gt;

&lt;p&gt;The marker isn't a metric. It's a feeling. Engineers start saying things like &lt;em&gt;"I'm not sure how this connects to the rest of the system"&lt;/em&gt; in PR reviews. Architectural discussions get longer and more circular. Postmortems trace bugs back not to a bad line of code but to a wrong assumption that was encoded in 30 files simultaneously.&lt;/p&gt;

&lt;p&gt;At that point, you haven't hit a vibe-coding problem. You've hit a systems thinking deficit — and the AI accelerated your arrival at a destination you were already heading toward.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually works at scale (and why it's not what you think)
&lt;/h2&gt;

&lt;p&gt;The answer is not to vibe-code less. The answer is to build the infrastructure that makes vibe-coding safe to do at velocity. Here is the framework I've seen work in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Architect before you prompt.&lt;/strong&gt;&lt;br&gt;
Every AI-generated module should begin with a human-written architectural decision record — not a full spec, just a paragraph: what this does, what it doesn't do, and what global assumptions it's allowed to make. This becomes the ground truth the AI generates against. Without it, the AI invents assumptions and you pay for them later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Review for coherence, not correctness.&lt;/strong&gt;&lt;br&gt;
Most teams review AI-generated PRs the same way they review human-written PRs: does this code do what it claims? The more important question at scale is: does this code &lt;em&gt;belong&lt;/em&gt; in this system the way we want this system to evolve? That's a different review entirely, and almost no team has built the habit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Rotate ownership explicitly.&lt;/strong&gt;&lt;br&gt;
If an engineer didn't write a module by hand, assign them a scheduled "understanding session" — read it, trace the dependency graph, add comments that explain the why. This is not about distrust. It's about preventing the ownership void that turns a future bug into a week-long archaeology expedition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Test the boundary, not the happy path.&lt;/strong&gt;&lt;br&gt;
Mandate that every AI-generated test suite gets reviewed for what it &lt;em&gt;doesn't&lt;/em&gt; test. AI writes optimistically. Your job is to be a pessimist: what happens when the upstream service returns a 200 with an empty body? What happens at the 2GB payload? AI will not volunteer these scenarios. Someone has to own the adversarial imagination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Track the lagging indicators.&lt;/strong&gt;&lt;br&gt;
Instrument what matters: time-to-understand for new engineers on existing modules, mean time to debug production incidents that touch AI-generated code, coupling metrics across AI-generated vs. human-written boundaries. You cannot manage what you don't measure — and right now, most teams are only measuring velocity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real contrarian take
&lt;/h2&gt;

&lt;p&gt;Here it is: the engineers who will thrive in the next five years are not the ones who use AI the most aggressively. They are the ones who understand &lt;strong&gt;when AI-generated code becomes a liability and how to defer that moment as long as possible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Senior engineers have always done this. We've always taken code that worked locally and asked: does this belong here? Does this fit our operational model? Can someone wake up at 3am and understand this? Vibe-coding doesn't change those questions. It makes them more urgent and more frequent.&lt;/p&gt;

&lt;p&gt;The teams that treat vibe-coding as a pure velocity tool will discover its costs on a 6-month delay with compounded interest. The teams that treat it as a powerful new generation mechanism that requires a proportionally upgraded integration discipline will build the fastest, most coherent codebases the industry has ever seen.&lt;/p&gt;




&lt;p&gt;The productivity gains are real. They are also a trap — if you only measure the gains and not the price. Vibe-coding works. That's exactly why it deserves to be taken seriously enough to build a real system around it.&lt;/p&gt;

&lt;p&gt;The engineers who will get this right aren't the loudest advocates or the loudest skeptics. They're the ones quietly asking: &lt;em&gt;what does this code cost us in six months?&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this changed how you think about AI at scale — share it with your engineering team. The conversation your team hasn't had yet is the one this article is trying to start.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; AI Engineering, Software Architecture, Vibe Coding, Technical Debt, Engineering Leadership, LLM Tools, Production Systems, Code Quality, Senior Engineers, Developer Productivity&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>softwaredevelopment</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>MCP for Backend Engineers: When to Use It (and When to Skip It)</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:12:51 +0000</pubDate>
      <link>https://dev.to/theprodsde/mcp-for-backend-engineers-when-to-use-it-and-when-to-skip-it-l3g</link>
      <guid>https://dev.to/theprodsde/mcp-for-backend-engineers-when-to-use-it-and-when-to-skip-it-l3g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"If you only have one consumer, MCP is a liability."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Subtitle:&lt;/strong&gt; Real code, real cost, and an honest answer to the question every AI team gets wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1 app · 1 team · 1 toolset → use plain tool calling&lt;/li&gt;
&lt;li&gt;Multiple teams or AI surfaces → use MCP&lt;/li&gt;
&lt;li&gt;Existing APIs → wrap them in MCP&lt;/li&gt;
&lt;li&gt;Early-stage / MVP → skip MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MCP is a scaling decision, not a starting point.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Exists
&lt;/h2&gt;

&lt;p&gt;A client had a simple ask: &lt;em&gt;"Let our AI assistant use our REST API."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three teams built three integrations. All three were wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Model called /getCustomer when it needed /getOrders
- Auth header dropped silently on retry
- Three teams hardcoded three different endpoint assumptions

Result: 3 integrations. 3 slightly different bugs. 1 very awkward client call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's when MCP became the answer. One server wrapping the API. Any AI host connects to it. The client got exactly what they asked for — and we got a pattern reused three times since.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MCP is not about smarter models — it's about cleaner boundaries.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  MCP in One Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP = API Gateway + Plugin System + Contract Layer for LLMs&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without MCP&lt;/th&gt;
&lt;th&gt;With MCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate integrations&lt;/td&gt;
&lt;td&gt;Single contract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth bugs everywhere&lt;/td&gt;
&lt;td&gt;Centralized auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardcoded tools&lt;/td&gt;
&lt;td&gt;Dynamic discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fragile wrappers&lt;/td&gt;
&lt;td&gt;Reusable servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rebuild per host&lt;/td&gt;
&lt;td&gt;Plug into existing system&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Hero Diagram: MCP vs Tool Calling
&lt;/h2&gt;

&lt;p&gt;This is the architecture difference that actually matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    subgraph TC["❌ Tool Calling — Tightly Coupled"]
        direction TB
        TC_H["AI Host"]
        TC_T1["Tool: get_customer()"]
        TC_T2["Tool: list_orders()"]
        TC_T3["Tool: check_inventory()"]
        TC_H --&amp;gt; TC_T1
        TC_H --&amp;gt; TC_T2
        TC_H --&amp;gt; TC_T3
    end

    subgraph MCP["✅ MCP — Loosely Coupled via Contract"]
        direction TB
        MH1["Host: Web Chat"]
        MH2["Host: Slack Bot"]
        MH3["Host: VS Code"]

        MS1["customer-mcp-server\nOwner: CRM Team"]
        MS2["payments-mcp-server\nOwner: Payments Team"]
        MS3["infra-mcp-server\nOwner: Platform Team"]

        MH1 --&amp;gt; MS1
        MH1 --&amp;gt; MS2
        MH2 --&amp;gt; MS1
        MH2 --&amp;gt; MS3
        MH3 --&amp;gt; MS2
        MH3 --&amp;gt; MS3
    end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The shift:&lt;/strong&gt; Tool calling is a local decision. MCP is an organizational contract.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What MCP Actually Is
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is an open protocol that standardises how AI applications connect to external tools and data sources. Think of it as USB-C for AI tools — one connector, many servers, any host.&lt;/p&gt;

&lt;p&gt;Instead of every IDE, chat UI, or agent framework inventing its own plugin format, MCP defines a common contract using &lt;strong&gt;JSON-RPC 2.0&lt;/strong&gt; over Streamable HTTP or stdio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three roles in the spec:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host&lt;/strong&gt; — the application the user sees (IDE, CLI, chat UI, agent framework)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client&lt;/strong&gt; — a connector inside the host that speaks the MCP protocol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server&lt;/strong&gt; — a service that exposes tools, resources, and prompts to the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The client sits between host and server. It connects to one server at a time and speaks the protocol; the host can orchestrate multiple clients simultaneously.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    subgraph HOST["AI Host (Orchestrates Everything)"]
        direction TB
        H["Orchestrator Logic + LLM Calls"]
        C1["MCP Client 1"]
        C2["MCP Client 2"]
        H --&amp;gt; C1
        H --&amp;gt; C2
    end

    subgraph SERVERS["MCP Servers (Owned per Team)"]
        S1["customer-mcp-server"]
        S2["infra-mcp-server"]
    end

    C1 --&amp;gt; S1
    C2 --&amp;gt; S2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt; The model never talks to your backend directly. It only talks to tools exposed by MCP servers. That's the boundary.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Case Study: Wrapping a Client REST API
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Three separate UIs (web, Slack bot, IDE) called the REST API directly. Each integrator made slightly different assumptions about endpoints and auth — producing three brittle integrations and repeated bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; One MCP server wrapping the existing API. Same tool list and prompts exposed to every host.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results in production:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration time reduced by ~60% for new hosts&lt;/li&gt;
&lt;li&gt;Tool-call auth failures dropped from multiple incidents to zero in the first month (root cause: centralized token handling)&lt;/li&gt;
&lt;li&gt;Two new hosts onboarded with zero backend changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the practical payoff: a small upfront cost for long-term reduction in duplicated integration work and clearer ownership.&lt;/p&gt;




&lt;h2&gt;
  
  
  Should You Use MCP? — Decide Before Reading Further
&lt;/h2&gt;

&lt;p&gt;Most tutorials make you read 80% before answering this. Here it is upfront.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you pick MCP too early, you're trading ~1–2 extra days of setup for zero immediate benefit. Start with the simplest thing. Reach for MCP when the "three wrappers" problem is already real.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Three questions that determine everything:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How many consumers will call these tools?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One app, one team → plain tool calling, done&lt;/li&gt;
&lt;li&gt;Multiple apps or teams → MCP starts making sense&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Who owns the tools vs who owns the AI host?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same team owns both → plain tool calling, tight coupling is fine&lt;/li&gt;
&lt;li&gt;Different teams → MCP gives you the clean boundary&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Do these tools already exist as APIs?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → wrap them as an MCP server, zero changes to existing backend&lt;/li&gt;
&lt;li&gt;No → build them either way, MCP adds structure from day one&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Five Real Scenarios
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1 — Internal chatbot for one team&lt;/strong&gt;&lt;br&gt;
A team wants an AI that queries their own database and sends Slack alerts. One app, one team, they control everything.&lt;br&gt;
→ &lt;strong&gt;Plain tool calling.&lt;/strong&gt; MCP adds overhead with zero benefit here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2 — Your client's REST API (the exact story above)&lt;/strong&gt;&lt;br&gt;
Client has an existing API. Wants multiple AI surfaces to consume it — web app today, Slack bot next month, VS Code extension later.&lt;br&gt;
→ &lt;strong&gt;MCP.&lt;/strong&gt; Wrap the API once, every host connects via the same protocol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3 — Enterprise platform, multiple teams&lt;/strong&gt;&lt;br&gt;
Payments team, infra team, CRM team — each owns their domain. One central AI console orchestrates across all three.&lt;br&gt;
→ &lt;strong&gt;MCP per domain.&lt;/strong&gt; Each team ships and secures their own server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 4 — Quick prototype / MVP&lt;/strong&gt;&lt;br&gt;
You're validating an idea over a weekend. Speed matters more than architecture.&lt;br&gt;
→ &lt;strong&gt;Plain tool calling always.&lt;/strong&gt; Get the idea working first. MCP is a refactor you do when it proves valuable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 5 — Agentic workflow with long-running tasks&lt;/strong&gt;&lt;br&gt;
Agent coordinates across multiple systems, tasks need to be traceable and retryable across sessions.&lt;br&gt;
→ &lt;strong&gt;MCP.&lt;/strong&gt; The clean server/client boundary makes observability and retry logic significantly easier to implement.&lt;/p&gt;
&lt;h3&gt;
  
  
  Decision Flow
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    Start([Start])
    Start --&amp;gt; Prototype{Prototype / MVP?}
    Prototype -- Yes --&amp;gt; Plain[Plain tool calling]
    Prototype -- No --&amp;gt; Multi{Multiple hosts / teams?}
    Multi -- No --&amp;gt; Plain
    Multi -- Yes --&amp;gt; Ownership{Same team owns tools &amp;amp; host?}
    Ownership -- Yes --&amp;gt; Plain
    Ownership -- No --&amp;gt; APICheck{APIs already exist?}
    APICheck -- Yes --&amp;gt; MCP[Use MCP]
    APICheck -- No --&amp;gt; Consider[Build either way — MCP adds structure]
    classDef recommend fill:#f9f,stroke:#333,stroke-width:1px;
    class MCP recommend;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt; MCP is triggered by scale of ownership and consumption — not complexity.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Where MCP Is a Bad Idea
&lt;/h2&gt;

&lt;p&gt;MCP is the wrong choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're at the early-stage or MVP phase&lt;/li&gt;
&lt;li&gt;Only one host consumes the tools&lt;/li&gt;
&lt;li&gt;One team owns the full stack&lt;/li&gt;
&lt;li&gt;Latency is business-critical&lt;/li&gt;
&lt;li&gt;Tool shapes are still evolving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only have one consumer, MCP is architecture cosplay.&lt;/p&gt;


&lt;h2&gt;
  
  
  Core MCP Concepts
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Server-side primitives
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Tools&lt;/strong&gt;&lt;br&gt;
Executable actions — often with side effects. The LLM decides when to call them.&lt;br&gt;
Examples: &lt;code&gt;create_invoice&lt;/code&gt;, &lt;code&gt;deploy_service&lt;/code&gt;, &lt;code&gt;run_sql_query&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Resources&lt;/strong&gt;&lt;br&gt;
Read-only data and context the model is allowed to see — not actions.&lt;br&gt;
Examples: docs, config files, logs, user profile JSON, query results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Prompts&lt;/strong&gt;&lt;br&gt;
Reusable workflow templates — Postman collections for LLM behaviour. Server-defined recipes the host can invoke.&lt;br&gt;
Examples: &lt;code&gt;bug_report_prompt&lt;/code&gt;, &lt;code&gt;deployment_checklist_prompt&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Client-side capabilities
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;4. Sampling&lt;/strong&gt;&lt;br&gt;
The server asks the client: "Run a model completion for me using your model access." Useful when servers don't hold model keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Roots&lt;/strong&gt;&lt;br&gt;
The server asks: "What file paths or URIs can I operate on?" Organises where tools can act.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Elicitation&lt;/strong&gt;&lt;br&gt;
The server asks the client to collect more information from the user.&lt;br&gt;
Example: "Ask the user which environment to deploy to: dev/staging/prod."&lt;/p&gt;


&lt;h2&gt;
  
  
  Request / Response Lifecycle
&lt;/h2&gt;

&lt;p&gt;The most important thing to understand: &lt;strong&gt;the Host orchestrates LLM calls — not the MCP Client&lt;/strong&gt;. The client only handles server communication. This distinction is what most sequence diagrams get wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant U as User
    participant H as Host / Orchestrator
    participant L as LLM
    participant C as MCP Client
    participant S as MCP Server

    U -&amp;gt;&amp;gt; H : Query
    H -&amp;gt;&amp;gt; C : discover tools
    C -&amp;gt;&amp;gt; S : GET /mcp/tools
    S --&amp;gt;&amp;gt; C : tool schemas
    C --&amp;gt;&amp;gt; H : return schemas
    H -&amp;gt;&amp;gt; L : prompt + tool schemas
    L --&amp;gt;&amp;gt; H : tool call decision
    H -&amp;gt;&amp;gt; C : execute tool call
    C -&amp;gt;&amp;gt; S : call_tool(name, args)
    S --&amp;gt;&amp;gt; C : tool result
    C --&amp;gt;&amp;gt; H : return result
    H -&amp;gt;&amp;gt; L : result + continue prompt
    L --&amp;gt;&amp;gt; H : final answer
    H --&amp;gt;&amp;gt; U : output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt; The model decides &lt;em&gt;what&lt;/em&gt; to call, but the server controls &lt;em&gt;what exists&lt;/em&gt;. That's the contract.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Minimal MCP Server (Python)
&lt;/h2&gt;

&lt;p&gt;Three things to watch in this block:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@mcp.tool()&lt;/code&gt; is an executable action with side effects&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@mcp.resource()&lt;/code&gt; is read-only context — the model sees it, doesn't call it&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;/health&lt;/code&gt; endpoint is not optional — Container Apps and AKS need it for readiness probes. Skip it and your pods restart on a loop.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# server.py
# requires: fastmcp&amp;gt;=2.0.0, python-dotenv
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer service MCP server.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ── Tool: get_customer ────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_customer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch customer profile by ID.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace with your real DB call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Priya Sharma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priya@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_order_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-03-15&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ── Tool: get_orders ─────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch recent orders for a customer.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delivered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ── Tool: create_support_ticket ───────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_support_ticket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a support ticket for a customer issue.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TKT-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ── Resource: customer playbook ───────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-playbook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;customer_playbook&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Internal guidance for handling customer incidents.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
# Customer Handling Playbook

- Always greet the customer by name.
- Premium customers: consider goodwill credit for severe issues.
- Escalate to L3 if incident impacts &amp;gt; 10 customers.
- SLA: 4 hours for premium, 24 hours for standard.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_playbook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Internal guidance for handling customer incidents.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;mimeType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ── Prompt: email draft ───────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_draft_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;email_draft_prompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Reusable workflow template for customer apology emails.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_draft_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Template for customer apology emails.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior customer success manager.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short apology email to {{customer_name}} about: {{issue}}.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tone: professional, empathetic. Under 100 words.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ── Health check — required for ACA and AKS readiness probes ─────────────────
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.custom_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;streamable-http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"fastmcp&amp;gt;=2.0.0"&lt;/span&gt; python-dotenv
python server.py
&lt;span class="c"&gt;# Live at http://localhost:8000/mcp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inspect without writing a single line of client code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @modelcontextprotocol/inspector http://localhost:8000/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  MCP Client + Agent Loop
&lt;/h2&gt;

&lt;p&gt;The key line is &lt;code&gt;client.list_tools()&lt;/code&gt; — tools are discovered &lt;strong&gt;dynamically&lt;/strong&gt; from the server, not hardcoded in the client. Every host that connects to this server gets the same tool list automatically, even as you add new tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# client.py
# requires: fastmcp&amp;gt;=2.0.0, openai&amp;gt;=1.0.0, python-dotenv
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncOpenAI&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;MCP_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP_SERVER_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MCP_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="c1"&gt;# Discover tools from the server — not hardcoded here
&lt;/span&gt;        &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;openai_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[MCP] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tools discovered: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openai_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# No tool calls = final answer
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

            &lt;span class="c1"&gt;# Execute each tool call via the MCP server
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Tool] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;tool_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up customer C-123 and tell me their tier.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the last 3 orders for customer C-123.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;C-123 had a late delivery. Open a ticket and draft an apology email.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Query] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;─&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it (server must be up):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python client.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Auth — The Part Most Tutorials Skip
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt;&lt;br&gt;
Internal / pod-to-pod → skip auth entirely, use network policy.&lt;br&gt;
Public-facing → OAuth 2.1 + PKCE, no exceptions. The spec mandates it.&lt;br&gt;
Everything else is a practical middle ground.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For remote MCP servers (Streamable HTTP), the spec prescribes &lt;strong&gt;OAuth 2.1 Authorization Code + PKCE&lt;/strong&gt;. The MCP client acts as the OAuth client; your server acts as the resource server; an Authorization Server (Keycloak, Auth0, Azure AD) issues the tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Client hits server without a token → &lt;code&gt;401&lt;/code&gt; with &lt;code&gt;WWW-Authenticate&lt;/code&gt; pointing to &lt;code&gt;/.well-known/oauth-protected-resource&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Client discovers the auth server from that metadata&lt;/li&gt;
&lt;li&gt;Client runs OAuth 2.1 Authorization Code + PKCE&lt;/li&gt;
&lt;li&gt;Client retries MCP requests with &lt;code&gt;Authorization: Bearer &amp;lt;token&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Server validates the token and serves tools/resources&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the token validation middleware — the piece that took longest to get right because the &lt;strong&gt;&lt;code&gt;aud&lt;/code&gt; (audience) claim check&lt;/strong&gt; is easy to miss. Without it, a token issued for a different service will pass validation silently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# auth_middleware.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;authlib.jose&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JsonWebKey&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;

&lt;span class="n"&gt;MCP_RESOURCE_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP_RESOURCE_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://your-mcp-server/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;AUTH_SERVER_URL&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUTH_SERVER_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_jwks_cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;SKIP_PATHS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/.well-known/oauth-protected-resource&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_jwks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_jwks_cache&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_jwks_cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_jwks_cache&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;AUTH_SERVER_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/protocol/openid-connect/certs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;_jwks_cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_jwks_cache&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;oauth_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;SKIP_PATHS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WWW-Authenticate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bearer realm=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MCP_RESOURCE_ID&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeprefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;jwks&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_jwks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JsonWebKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;import_key_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jwks&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# ── The check most tutorials miss ───────────────────────────
&lt;/span&gt;        &lt;span class="n"&gt;aud&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aud&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;aud&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;aud&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;MCP_RESOURCE_ID&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;aud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wrong_audience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wire it into your server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add to server.py after: mcp = FastMCP(...)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;auth_middleware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;oauth_middleware&lt;/span&gt;
&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;oauth_middleware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Real-World Cost (Numbers, Not Theory)
&lt;/h2&gt;

&lt;p&gt;Expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;+3–100ms&lt;/strong&gt; MCP protocol overhead per call (varies by gateway)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;+280–950ms&lt;/strong&gt; total round-trip latency including tool execution (p50–p95)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;+1.8–4.2 seconds&lt;/strong&gt; for multi-step agentic flows (3-agent chains, p50–p95)&lt;/li&gt;
&lt;li&gt;Extra infra: containers, auth service, monitoring&lt;/li&gt;
&lt;li&gt;Non-trivial engineering overhead during setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your system is latency-sensitive or single-tool focused, MCP will feel slower — because it is.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You're buying &lt;strong&gt;reuse and consistency&lt;/strong&gt;, not speed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Observability &amp;amp; Debugging
&lt;/h2&gt;

&lt;p&gt;Make MCP servers and client-host interactions observable from day one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trace every request&lt;/strong&gt; — propagate a correlation ID across host → client → server → backend and include it in logs and responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics to track&lt;/strong&gt; — &lt;code&gt;mcp_tool_call_latency&lt;/code&gt;, &lt;code&gt;mcp_tool_call_errors&lt;/code&gt;, &lt;code&gt;mcp_discovery_time&lt;/code&gt;, &lt;code&gt;mcp_auth_failures&lt;/code&gt;, per-tool success rates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured JSON logs&lt;/strong&gt; — &lt;code&gt;(timestamp, level, correlation_id, tool, args, duration, error)&lt;/code&gt; to make postmortems fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed tracing&lt;/strong&gt; — use OpenTelemetry to connect host traces to server traces for full request/response visibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Troubleshooting checklist&lt;/strong&gt; — confirm &lt;code&gt;/.well-known/oauth-protected-resource&lt;/code&gt;, check token &lt;code&gt;aud&lt;/code&gt; claim, verify &lt;code&gt;/health&lt;/code&gt; readiness, replay failing tool calls against a local mock server&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Testing &amp;amp; QA
&lt;/h2&gt;

&lt;p&gt;Treat the MCP server like a public contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Contract tests&lt;/strong&gt; — verify the tool list and input/output schemas remain compatible across releases. Fail CI on breaking schema changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit tests&lt;/strong&gt; — each &lt;code&gt;@mcp.tool()&lt;/code&gt; should have unit tests for edge cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration tests&lt;/strong&gt; — run the server against a mocked backend and a simulated client that performs discovery + tool calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smoke tests&lt;/strong&gt; — lightweight end-to-end checks that run on deploy (e.g., a small request to &lt;code&gt;get_customer&lt;/code&gt; and &lt;code&gt;get_orders&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Production Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Health/readiness: expose &lt;code&gt;/health&lt;/code&gt; and use readiness probes (ACA/AKS)&lt;/li&gt;
&lt;li&gt;[ ] Replicas: &lt;code&gt;min-replicas &amp;gt;= 1&lt;/code&gt; for interactive workloads to avoid cold starts&lt;/li&gt;
&lt;li&gt;[ ] Auth: validate &lt;code&gt;aud&lt;/code&gt; claim and enforce least-privilege scopes&lt;/li&gt;
&lt;li&gt;[ ] Rate limiting &amp;amp; quotas: protect downstream systems from runaway agents&lt;/li&gt;
&lt;li&gt;[ ] Secrets: store credentials in Key Vault / secret manager — avoid plain env vars for DB credentials&lt;/li&gt;
&lt;li&gt;[ ] Observability: metrics, logs, and traces wired to your platform&lt;/li&gt;
&lt;li&gt;[ ] Cost &amp;amp; latency: budget for an extra 280–950ms network hop per tool call&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Deploying on Azure
&lt;/h2&gt;

&lt;p&gt;MCP is not Azure-specific — it runs on any platform that supports HTTP. But if you're already in the Azure ecosystem, here's the honest breakdown.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A["Infra context?"]
    A --&amp;gt;|Greenfield| B["Azure Container Apps\n✅ Default choice"]
    A --&amp;gt;|Existing cluster| C["AKS + Workload Identity"]
    A --&amp;gt;|Low usage / event-driven| D["Azure Functions"]
    A --&amp;gt;|Existing App Service| E["App Service /mcp"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Deployment Decision Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Best fit&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Greenfield, no existing infra&lt;/td&gt;
&lt;td&gt;Azure Container Apps&lt;/td&gt;
&lt;td&gt;Least ops overhead, managed ingress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Already running AKS&lt;/td&gt;
&lt;td&gt;AKS + Ingress&lt;/td&gt;
&lt;td&gt;Reuse cluster auth, colocate workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple servers, same team&lt;/td&gt;
&lt;td&gt;ACA shared environment&lt;/td&gt;
&lt;td&gt;Internal discovery, Dapr sidecars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrequent tools / cost-sensitive&lt;/td&gt;
&lt;td&gt;Azure Functions&lt;/td&gt;
&lt;td&gt;Per-invocation billing, zero idle cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Existing App Service web app&lt;/td&gt;
&lt;td&gt;App Service /mcp&lt;/td&gt;
&lt;td&gt;Zero infrastructure change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise, Istio service mesh&lt;/td&gt;
&lt;td&gt;AKS + mTLS&lt;/td&gt;
&lt;td&gt;Zero trust alongside OAuth 2.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Option 1 — Azure Container Apps (start here)
&lt;/h3&gt;

&lt;p&gt;Recommended for most teams starting fresh. Handles autoscaling, ingress, Entra ID integration, and scale-to-zero with no MCP-specific platform knowledge needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;RG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mcp-rg"&lt;/span&gt;
&lt;span class="nv"&gt;ACR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;your-acr-name&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Build and push&lt;/span&gt;
az acr build &lt;span class="nt"&gt;--registry&lt;/span&gt; &lt;span class="nv"&gt;$ACR&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt; customer-mcp-server:latest &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Create environment&lt;/span&gt;
az containerapp &lt;span class="nb"&gt;env &lt;/span&gt;create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; mcp-env &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RG&lt;/span&gt; &lt;span class="nt"&gt;--location&lt;/span&gt; eastus

&lt;span class="c"&gt;# Store credentials in Key Vault — never directly in env vars&lt;/span&gt;
az keyvault secret &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vault-name&lt;/span&gt; mcp-keyvault &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; database-url &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s2"&gt;"postgresql://user:pass@host:5432/db"&lt;/span&gt;

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
az containerapp create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; customer-mcp-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--environment&lt;/span&gt; mcp-env &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; &lt;span class="nv"&gt;$ACR&lt;/span&gt;.azurecr.io/customer-mcp-server:latest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target-port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ingress&lt;/span&gt; external &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-replicas&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-replicas&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secrets&lt;/span&gt; db-url&lt;span class="o"&gt;=&lt;/span&gt;keyvaultref:&amp;lt;key-vault-secret-uri&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--env-vars&lt;/span&gt; &lt;span class="nv"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;secretref:db-url
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;min-replicas 1&lt;/code&gt; is critical for interactive use — scale-to-zero causes cold start that breaks mid-conversation flow. Expect ~1–3s latency even on warm replicas; design your agent loop timeouts accordingly (30s is a safe outer bound).&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2 — AKS (if you already run a cluster)
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;Workload Identity&lt;/strong&gt; over storing credentials in env vars. Reuse cluster auth and network policies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mcp-deployment.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-server&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;your-acr&amp;gt;.azurecr.io/customer-mcp-server:latest&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
          &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;
              &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;secretKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-secrets&lt;/span&gt;
                  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database-url&lt;/span&gt;
          &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
            &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
            &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
            &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;
            &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-ingress&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp.yourdomain.internal&lt;/span&gt;
      &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/mcp&lt;/span&gt;
            &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
            &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-mcp-server&lt;/span&gt;
                &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace mcp-system
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; mcp-deployment.yaml
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; mcp-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 3 — Azure Functions (infrequent, event-driven tools)
&lt;/h3&gt;

&lt;p&gt;Best for tools called rarely — scheduled jobs, webhook-triggered actions, batch processors. Per-invocation billing means zero idle cost. &lt;strong&gt;Not suitable for interactive agent loops&lt;/strong&gt; — cold start breaks conversational flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 4 — App Service (add &lt;code&gt;/mcp&lt;/code&gt; to an existing service)
&lt;/h3&gt;

&lt;p&gt;Mount the MCP server on &lt;code&gt;/mcp&lt;/code&gt; alongside existing routes. Zero infrastructure change. Lowest friction path for adding MCP to a service that already exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Failure Modes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token misconfiguration&lt;/strong&gt; — auth breaks silently on retry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool explosion&lt;/strong&gt; — too many vaguely named tools confuse the model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency stacking&lt;/strong&gt; — chained MCP calls compound response time (budget 280–950ms per hop)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business logic duplication&lt;/strong&gt; — logic living in both the server and the calling agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak observability&lt;/strong&gt; — no tracing across the client–server boundary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No ownership boundaries&lt;/strong&gt; — multiple teams modifying the same server&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Critical rule:&lt;/strong&gt; One MCP server = one owning team. Violating this recreates the exact problem MCP was meant to solve.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What MCP Does NOT Solve
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Bad tool design&lt;/li&gt;
&lt;li&gt;Weak or vague prompts&lt;/li&gt;
&lt;li&gt;Latency at the model level&lt;/li&gt;
&lt;li&gt;Capability gaps in the underlying LLM&lt;/li&gt;
&lt;li&gt;Business logic duplication inside your tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP standardises &lt;em&gt;access&lt;/em&gt;. It does not improve &lt;em&gt;what you expose&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migration Path
&lt;/h2&gt;

&lt;p&gt;Don't start with MCP. Migrate to it when the pain appears:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start&lt;/strong&gt; with plain tool calling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identify&lt;/strong&gt; tools being duplicated across teams or surfaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrap&lt;/strong&gt; those shared tools in an MCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add&lt;/strong&gt; auth, deployment, and observability incrementally&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Avoid premature standardisation. It's just premature optimisation with a fancier name.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Further Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP spec:&lt;/strong&gt; &lt;a href="https://modelcontextprotocol.io/specification" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inspector:&lt;/strong&gt; &lt;code&gt;npx @modelcontextprotocol/inspector&lt;/code&gt; — discover and test any MCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastMCP:&lt;/strong&gt; &lt;a href="https://github.com/jlowin/fastmcp" rel="noopener noreferrer"&gt;https://github.com/jlowin/fastmcp&lt;/a&gt; — the Python server/client used in this guide&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If this saved you from building an MCP server you didn't need — hit clap. It takes one second and tells the algorithm this is worth distributing. If you've already wired MCP into production, drop your setup in the comments — specifically curious what the auth layer looks like on your end.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Verdict
&lt;/h2&gt;

&lt;p&gt;MCP won't make your architecture simpler. It will make it &lt;em&gt;honest&lt;/em&gt; — every tool, resource, and auth scope exactly where it belongs, exposed through a contract any host can consume.&lt;/p&gt;

&lt;p&gt;If you're building for one app, one team: skip it. Plain tool calling ships faster and is entirely adequate.&lt;/p&gt;

&lt;p&gt;But if you're building something multiple systems or teams will consume — especially if those tools already exist as APIs — MCP ends the "three copies of the same integration, each slightly wrong" problem permanently.&lt;/p&gt;

&lt;p&gt;That's the only problem it solves. It solves it very well.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/towards-data-engineering/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-dcca8cd49a41" rel="noopener noreferrer"&gt;Why Most RAG Systems Fail in Production&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/towards-artificial-intelligence/langgraph-vs-semantic-kernel-the-one-decision-that-will-shape-your-ai-agent-architecture-636f5e32eef2" rel="noopener noreferrer"&gt;LangGraph vs Semantic Kernel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/towards-artificial-intelligence/most-ai-agent-frameworks-are-overkill-heres-how-to-choose-the-right-one-in-30-seconds-b0a77c894fb7" rel="noopener noreferrer"&gt;Most AI Agent Frameworks Are Overkill&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://medium.com/@theprodsde" rel="noopener noreferrer"&gt;TheProdSDE&lt;/a&gt; · Follow for production-grade AI systems, backend architecture, and honest engineering takes.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>backend</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Worth Each minute read if you work on RAG</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Mon, 06 Apr 2026 09:59:14 +0000</pubDate>
      <link>https://dev.to/theprodsde/worth-each-minute-read-if-you-work-on-rag-g1d</link>
      <guid>https://dev.to/theprodsde/worth-each-minute-read-if-you-work-on-rag-g1d</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55" class="crayons-story__hidden-navigation-link"&gt;Why Most RAG Systems Fail in Production (And How to Design One That Actually Works)&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/theprodsde" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" alt="theprodsde profile" class="crayons-avatar__image" width="500" height="500"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/theprodsde" class="crayons-story__secondary fw-medium m:hidden"&gt;
              TheProdSDE
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                TheProdSDE
                
              
              &lt;div id="story-author-preview-content-3367068" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/theprodsde" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" class="crayons-avatar__image" alt="" width="500" height="500"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;TheProdSDE&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 24&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55" id="article-link-3367068"&gt;
          Why Most RAG Systems Fail in Production (And How to Design One That Actually Works)
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/rag"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;rag&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/software"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;software&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              1&lt;span class="hidden s:inline"&gt;&amp;nbsp;comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Agentic AI Fails in Production for Simple Reasons — What MLDS 2026 Taught Me</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:56:35 +0000</pubDate>
      <link>https://dev.to/theprodsde/agentic-ai-fails-in-production-for-simple-reasons-what-mlds-2026-taught-me-2ec</link>
      <guid>https://dev.to/theprodsde/agentic-ai-fails-in-production-for-simple-reasons-what-mlds-2026-taught-me-2ec</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt;&lt;br&gt;
Most agentic AI failures in production are not caused by weak models, but by &lt;strong&gt;stale data, poor validation, lost context, and lack of governance&lt;/strong&gt;. MLDS 2026 reinforced that enterprise‑grade agentic AI is a &lt;strong&gt;system design problem&lt;/strong&gt;, requiring validation‑first agents, structural intelligence, strong observability, memory discipline, and cost‑aware orchestration—not just bigger LLMs.&lt;/p&gt;

&lt;p&gt;I recently attended &lt;strong&gt;MLDS 2026 (Machine Learning Developer Summit)&lt;/strong&gt; by Analytics India Magazine (AIM) in Bangalore. While many sessions featured advanced models and agentic frameworks, the most valuable insight was unexpected:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most AI systems don’t fail in production because of bad models — they fail because of bad systems.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Across the summit, speakers repeatedly showed that issues like stale data, missing validation, poor observability, and uncontrolled execution are what derail agentic AI at scale—not lack of intelligence.&lt;/p&gt;

&lt;p&gt;A recurring theme across sessions was clear: &lt;strong&gt;the hardest problem in AI today is no longer building impressive demos, but running AI systems reliably at enterprise scale&lt;/strong&gt;. Many real-world failures stem from system design gaps rather than model limitations.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Key Shift: From Models to Systems
&lt;/h2&gt;

&lt;p&gt;One of the most important takeaways from the summit was that &lt;strong&gt;enterprise AI is fundamentally a system design problem&lt;/strong&gt;, not a model selection problem.&lt;/p&gt;

&lt;p&gt;Multiple speakers highlighted common failure modes seen in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stale or outdated data&lt;/li&gt;
&lt;li&gt;Poor data granularity&lt;/li&gt;
&lt;li&gt;Context loss across multi-step workflows&lt;/li&gt;
&lt;li&gt;False confidence and lack of validation&lt;/li&gt;
&lt;li&gt;Black-box decisions with no observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This explains why many AI solutions look powerful in prototypes but break down in real operational environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Policy Learning vs. Structural Intelligence
&lt;/h2&gt;

&lt;p&gt;A particularly insightful discussion contrasted two approaches:&lt;/p&gt;

&lt;h3&gt;
  
  
  Runtime Policy Learning
&lt;/h3&gt;

&lt;p&gt;Examples include &lt;strong&gt;Reinforcement Learning (RL)&lt;/strong&gt;, &lt;strong&gt;MADDPG&lt;/strong&gt;, and &lt;strong&gt;Graph Neural Networks (GNNs)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic decision-making&lt;/li&gt;
&lt;li&gt;GPU-intensive&lt;/li&gt;
&lt;li&gt;Higher cost and latency&lt;/li&gt;
&lt;li&gt;Harder to govern and observe&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structural Intelligence at Design Time
&lt;/h3&gt;

&lt;p&gt;In this approach, intelligence is &lt;strong&gt;encoded into the system structure itself&lt;/strong&gt;, often using graph-based designs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Relationships are resolved at construction time&lt;/li&gt;
&lt;li&gt;Minimal runtime inference&lt;/li&gt;
&lt;li&gt;Deterministic behavior&lt;/li&gt;
&lt;li&gt;Lower cost and faster response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Not every intelligent system needs continuous runtime learning. When relationships are stable, embedding intelligence structurally can be more efficient and reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Validation-First Agent Design
&lt;/h2&gt;

&lt;p&gt;Another strong theme was the shift toward &lt;strong&gt;validation-first agents&lt;/strong&gt;, not answer-first agents.&lt;/p&gt;

&lt;p&gt;Successful agentic systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ground every important output to source data&lt;/li&gt;
&lt;li&gt;Track freshness and provenance&lt;/li&gt;
&lt;li&gt;Validate semantics before taking actions&lt;/li&gt;
&lt;li&gt;Plan explicitly before executing&lt;/li&gt;
&lt;li&gt;Expose confidence where appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Several talks emphasized that observability should evolve from &lt;em&gt;“what happened?”&lt;/em&gt; to &lt;em&gt;“was the result actually correct?”&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agentic Memory: Accuracy, Cost, and Trust
&lt;/h2&gt;

&lt;p&gt;Sessions on agentic memory highlighted how &lt;strong&gt;short-term memory, long-term memory, and pruning strategies&lt;/strong&gt; directly influence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;User trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key takeaway was that memory should be treated as a &lt;strong&gt;first-class architectural concern&lt;/strong&gt;, with explicit design choices and benchmarks—rather than an ad-hoc cache bolted on later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Platforms and Practical Architecture Choices
&lt;/h2&gt;

&lt;p&gt;The summit also covered modern data platforms that unify &lt;strong&gt;OLTP and OLAP workloads&lt;/strong&gt;, with strong support for &lt;strong&gt;time-series data&lt;/strong&gt;. These architectures reduce complexity and make near–real-time analytics more accessible.&lt;/p&gt;

&lt;p&gt;A broader lesson emerged: &lt;strong&gt;cost, latency, reliability, and accuracy must be designed together&lt;/strong&gt;. Choosing larger models without optimizing workflows, routing, and memory leads to unnecessary compute cost and slower systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting Agents into Production: Real-World Risks
&lt;/h2&gt;

&lt;p&gt;One session focused entirely on lessons learned from deploying agents in production. Four recurring risks were highlighted:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Silent failures&lt;/strong&gt; – systems appear healthy but produce wrong outputs
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Black-box decisions&lt;/strong&gt; – lack of explainability and traceability
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permission explosion&lt;/strong&gt; – agents accumulating excessive access
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runaway execution&lt;/strong&gt; – uncontrolled tool calls and rising costs
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These issues reinforce the importance of governance, guardrails, observability, and scoped execution from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI-Assisted Development Needs Guardrails
&lt;/h2&gt;

&lt;p&gt;Another notable takeaway was the need to pair &lt;strong&gt;AI-assisted code generation&lt;/strong&gt; with strong &lt;strong&gt;static analysis and security validation&lt;/strong&gt;. Integrations with tools like SonarQube demonstrate how AI-written and human-written code can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validated automatically&lt;/li&gt;
&lt;li&gt;Secured against vulnerabilities&lt;/li&gt;
&lt;li&gt;Fixed via generated pull requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This closes the gap between productivity gains and production reliability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Reflections
&lt;/h2&gt;

&lt;p&gt;MLDS 2026 reinforced a critical idea:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The future of AI in enterprises depends more on architecture, validation, and governance than on model strength alone.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Agentic AI succeeds when it is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grounded in reliable data&lt;/li&gt;
&lt;li&gt;Observable and debuggable&lt;/li&gt;
&lt;li&gt;Cost-aware and execution-bounded&lt;/li&gt;
&lt;li&gt;Designed around real workflows&lt;/li&gt;
&lt;li&gt;Rolled out with clear trust and adoption strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest mindset shift is moving from &lt;em&gt;“How powerful is the model?”&lt;/em&gt; to &lt;em&gt;“How reliable and efficient is the end-to-end intelligent workflow?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That, more than anything, was the most valuable learning from the summit.&lt;/p&gt;




&lt;p&gt;If you’re working on agentic AI in production, I’d love to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where have agents broken down for you?&lt;/li&gt;
&lt;li&gt;What controls or guardrails helped the most?&lt;/li&gt;
&lt;li&gt;Are you handling validation and memory explicitly—or implicitly?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s compare notes.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>devops</category>
      <category>llm</category>
    </item>
    <item>
      <title>Why Most RAG Systems Fail in Production (And How to Design One That Actually Works)</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Tue, 24 Mar 2026 12:23:04 +0000</pubDate>
      <link>https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55</link>
      <guid>https://dev.to/theprodsde/why-most-rag-systems-fail-in-production-and-how-to-design-one-that-actually-works-j55</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical, system design–focused breakdown of why RAG systems degrade after launch—and what actually works in production.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Everyone builds a RAG system.&lt;/p&gt;

&lt;p&gt;And almost all of them work — in demos.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean query&lt;/li&gt;
&lt;li&gt;Relevant chunks&lt;/li&gt;
&lt;li&gt;Decent answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ship it.&lt;/p&gt;

&lt;p&gt;Then production happens.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users ask vague follow-ups&lt;/li&gt;
&lt;li&gt;Retrieval returns partial context&lt;/li&gt;
&lt;li&gt;The model answers confidently… and incorrectly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Your “working” RAG system becomes unreliable.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Reality: RAG Fails Quietly
&lt;/h2&gt;

&lt;p&gt;RAG doesn’t crash. It degrades.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slightly wrong answers&lt;/li&gt;
&lt;li&gt;Missing context&lt;/li&gt;
&lt;li&gt;Hallucinated explanations with citations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which is worse than a system that fails loudly.&lt;/p&gt;

&lt;p&gt;Most teams blame:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embeddings&lt;/li&gt;
&lt;li&gt;vector database&lt;/li&gt;
&lt;li&gt;chunk size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in real systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;RAG failures are usually system design failures—not retrieval failures.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What a Production RAG System Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;Not this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Query → Vector DB → LLM&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2up9k15qpcbmdnpz2kto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2up9k15qpcbmdnpz2kto.png" alt="Prod Arch" width="775" height="1804"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Parsing Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;Most pipelines start like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where things already break.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;PDFs lose structure&lt;/li&gt;
&lt;li&gt;Tables turn into noise&lt;/li&gt;
&lt;li&gt;Headers/footers pollute chunks&lt;/li&gt;
&lt;li&gt;Sections lose meaning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Production Approach
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document → Layout-aware parsing → Structured sections → Clean chunks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preserve headings and hierarchy&lt;/li&gt;
&lt;li&gt;remove boilerplate&lt;/li&gt;
&lt;li&gt;chunk by meaning, not length&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If parsing is wrong, retrieval will always be wrong.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 2: Dense vs Sparse Retrieval (You Need Both)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dense Retrieval (Embeddings)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;semantic similarity&lt;/li&gt;
&lt;li&gt;handles vague queries&lt;/li&gt;
&lt;li&gt;fails on exact matches&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Sparse Retrieval (BM25 / Keyword)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;exact term matching&lt;/li&gt;
&lt;li&gt;works for IDs, clauses&lt;/li&gt;
&lt;li&gt;ignores meaning&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Production Pattern: Hybrid Retrieval
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p3dos13k150u211qog5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p3dos13k150u211qog5.png" alt="Hybrid Retrieval" width="799" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic understanding&lt;/li&gt;
&lt;li&gt;exact precision&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Using only vector search is a common production mistake.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 3: Reranking (The Accuracy Multiplier)
&lt;/h2&gt;

&lt;p&gt;Top-K retrieval is noisy.&lt;/p&gt;

&lt;p&gt;Add a &lt;strong&gt;reranker&lt;/strong&gt; (cross-encoder):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;evaluates (query, chunk) pairs&lt;/li&gt;
&lt;li&gt;reorders by true relevance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This significantly improves answer quality without changing your database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Context Building (Where Systems Win or Lose)
&lt;/h2&gt;

&lt;p&gt;Even with good retrieval, most failures happen here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Mistakes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;stuffing too many chunks&lt;/li&gt;
&lt;li&gt;mixing unrelated documents&lt;/li&gt;
&lt;li&gt;ignoring token limits&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Production Approach
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;select top-ranked chunks only&lt;/li&gt;
&lt;li&gt;preserve document structure&lt;/li&gt;
&lt;li&gt;enforce token budget&lt;/li&gt;
&lt;li&gt;maintain ordering&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Better context &amp;gt; more context&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Vector DB vs Graph DB — When to Use What
&lt;/h2&gt;




&lt;h3&gt;
  
  
  Use Vector Database When
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;unstructured data&lt;/li&gt;
&lt;li&gt;semantic search&lt;/li&gt;
&lt;li&gt;document retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctfdm6ce0okep4c7z159.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctfdm6ce0okep4c7z159.png" alt="Vector DB usecase" width="794" height="764"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Use Graph Database When
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;relationships matter&lt;/li&gt;
&lt;li&gt;multi-hop reasoning&lt;/li&gt;
&lt;li&gt;structured entities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu2uqier2j76b1rqzyyq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu2uqier2j76b1rqzyyq.png" alt="Graph DB usecase" width="722" height="972"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Hybrid (Real Systems)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulorxshvfzyn486hmf8u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulorxshvfzyn486hmf8u.png" alt="HYbrid " width="794" height="972"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use graph when relationships matter.&lt;br&gt;
Use vector when meaning matters.&lt;br&gt;
Use both when systems get complex.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  RAG Is Not Single-Turn — Managing Context Over Time
&lt;/h2&gt;

&lt;p&gt;Most systems fail here.&lt;/p&gt;

&lt;p&gt;RAG is not just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;retrieve → answer&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;retrieve → answer → follow-up → correction → refinement&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem: Context Drift
&lt;/h2&gt;

&lt;p&gt;If you blindly append chat history:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token usage explodes&lt;/li&gt;
&lt;li&gt;wrong answers get reinforced&lt;/li&gt;
&lt;li&gt;relevance drops&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Production Strategy: Context Is a Filter
&lt;/h2&gt;

&lt;p&gt;Not a dump.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2zmb0bj9k0do74h101no.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2zmb0bj9k0do74h101no.png" alt=" " width="514" height="1180"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Context Layers
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Store full history&lt;/li&gt;
&lt;li&gt;Select only relevant turns&lt;/li&gt;
&lt;li&gt;Exclude invalid or corrected responses&lt;/li&gt;
&lt;li&gt;Combine with retrieved context&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  When to Summarize vs Include Raw History
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Include Raw
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;short conversations&lt;/li&gt;
&lt;li&gt;active refinement&lt;/li&gt;
&lt;li&gt;recent corrections&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Summarize
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;long conversations (&amp;gt;5–7 turns)&lt;/li&gt;
&lt;li&gt;approaching token limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm6w9gprolpifc5ct0t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm6w9gprolpifc5ct0t8.png" alt=" " width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Critical Rule
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summarize facts—not hallucinations.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If a previous answer was wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exclude it&lt;/li&gt;
&lt;li&gt;prioritize user correction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Handling User Corrections (Critical for Trust)
&lt;/h2&gt;

&lt;p&gt;Users will fix your system.&lt;/p&gt;

&lt;p&gt;If you ignore that, the system feels broken.&lt;/p&gt;




&lt;h3&gt;
  
  
  Strategy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;mark incorrect responses&lt;/li&gt;
&lt;li&gt;exclude them from future context&lt;/li&gt;
&lt;li&gt;boost corrected information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"turn_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"corrected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agentic RAG (When Retrieval Needs Reasoning)
&lt;/h2&gt;

&lt;p&gt;Basic RAG is static.&lt;/p&gt;

&lt;p&gt;Agentic RAG adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;iteration&lt;/li&gt;
&lt;li&gt;tool usage&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcerbgga5knsatpplsf9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcerbgga5knsatpplsf9z.png" alt=" " width="649" height="1099"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Use It When
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;multi-step queries&lt;/li&gt;
&lt;li&gt;missing context&lt;/li&gt;
&lt;li&gt;dynamic retrieval&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Avoid It When
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;simple Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;strict latency requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Otherwise you're adding complexity without ROI.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Confidence Scores and Citations (Trust Layer)
&lt;/h2&gt;

&lt;p&gt;Without trust signals, users don’t trust answers.&lt;/p&gt;




&lt;h3&gt;
  
  
  Citations
&lt;/h3&gt;

&lt;p&gt;Always return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source document&lt;/li&gt;
&lt;li&gt;section or chunk reference&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Confidence Score (Simple Heuristic)
&lt;/h3&gt;

&lt;p&gt;Combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval score&lt;/li&gt;
&lt;li&gt;reranker score&lt;/li&gt;
&lt;li&gt;validation signal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;confidence =
  0.4 * retrieval +
  0.4 * reranker +
  0.2 * validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Optional Validation Step
&lt;/h3&gt;

&lt;p&gt;Ask the model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is this answer fully supported by the context?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lower confidence if not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Guardrail: Don’t Trust the Model Alone
&lt;/h2&gt;

&lt;p&gt;Even with RAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hallucinations still happen&lt;/li&gt;
&lt;li&gt;citations can be fabricated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enforce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;answers must reference retrieved chunks&lt;/li&gt;
&lt;li&gt;no context → no answer&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Architecture (Multi-Turn RAG System)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft7ekmt83a7u0btw68yq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft7ekmt83a7u0btw68yq.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Checklist
&lt;/h2&gt;

&lt;p&gt;If your system doesn’t have these, it will fail:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;structured parsing&lt;/li&gt;
&lt;li&gt;hybrid retrieval&lt;/li&gt;
&lt;li&gt;reranking&lt;/li&gt;
&lt;li&gt;controlled context building&lt;/li&gt;
&lt;li&gt;memory filtering&lt;/li&gt;
&lt;li&gt;correction handling&lt;/li&gt;
&lt;li&gt;confidence + citations&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Real Rule
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;RAG is not a retrieval problem. It’s a system design problem.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;The best RAG systems are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simple&lt;/li&gt;
&lt;li&gt;structured&lt;/li&gt;
&lt;li&gt;observable&lt;/li&gt;
&lt;li&gt;measurable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not over-engineered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;If your system only works when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the query is perfect&lt;/li&gt;
&lt;li&gt;the data is clean&lt;/li&gt;
&lt;li&gt;the demo is controlled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it doesn’t work.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;Once RAG works, the next bottleneck is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why LLM systems become expensive in production—and how to control it without killing performance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>agents</category>
      <category>software</category>
    </item>
    <item>
      <title>Most AI Agent Frameworks Are Overkill — Here's How to Choose the Right One in 30 Seconds</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Wed, 18 Mar 2026 12:58:03 +0000</pubDate>
      <link>https://dev.to/theprodsde/most-ai-agent-frameworks-are-overkill-heres-how-to-choose-the-right-one-in-30-seconds-37k3</link>
      <guid>https://dev.to/theprodsde/most-ai-agent-frameworks-are-overkill-heres-how-to-choose-the-right-one-in-30-seconds-37k3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A senior engineer's field-tested breakdown of LangGraph, AutoGen, CrewAI, Microsoft Agent Framework, and Haystack — from reviewing real production systems across teams.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Everyone is building AI agents right now.&lt;/p&gt;

&lt;p&gt;LangGraph. AutoGen. CrewAI. Semantic Kernel. Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;But most production AI systems don't actually need an agent framework.&lt;/p&gt;

&lt;p&gt;Across multiple teams and production codebases I've reviewed, the same two failure modes appear constantly — over-engineering and under-engineering. In one case, replacing a complex agent framework with ~200 lines of plain tool-calling code made the system 3× faster, easier to debug, and easier to maintain. In another, the absence of a framework caused a codebase to collapse under its own complexity.&lt;/p&gt;

&lt;p&gt;Both failures had the same root cause:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The problem isn't choosing the wrong framework. It's choosing a framework before understanding the workflow.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This guide is the decision framework I now use before touching any agent tooling.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR — Pick Your Approach in 30 Seconds
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow Shape&lt;/th&gt;
&lt;th&gt;Recommended Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single request → tools → response&lt;/td&gt;
&lt;td&gt;Plain tool calling (no framework)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-correcting loops, retries, re-evaluation&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel specialist agents&lt;/td&gt;
&lt;td&gt;AutoGen 0.7.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise persistent agents (Azure)&lt;/td&gt;
&lt;td&gt;Microsoft Agent Framework RC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential role-based task delegation&lt;/td&gt;
&lt;td&gt;CrewAI 1.10.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document analysis and extraction pipelines&lt;/td&gt;
&lt;td&gt;Haystack 2.x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Workflow Shapes at a Glance
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ss9dwvby161agwccf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ss9dwvby161agwccf7.png" alt="How to decide framework?" width="800" height="823"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Real Failure Modes I've Seen Across Teams
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The over-engineering case.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During a cross-team architecture review, I found a financial data analysis pipeline — pull market metrics, cross-reference filings, produce a risk score — built with three microservices, a LangGraph orchestrator with twelve nodes, Redis for inter-agent memory, and a separate evaluation loop. Six weeks of engineering. It worked.&lt;/p&gt;

&lt;p&gt;What it actually needed: 180 lines of FastAPI with direct OpenAI tool calling. Same output. 3× faster inference. Any engineer on the team could debug it in minutes.&lt;/p&gt;

&lt;p&gt;After the review, the team simplified it together. The framework had been adopted before anyone mapped the workflow shape — a straight line — and LangGraph's graph model added cost with no return.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The under-engineering case.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a separate review, I found the opposite problem. An infrastructure incident response system — Azure alerts, metric retrieval, remediation decisions, rollback logic, retry conditions, escalation thresholds — built with hand-rolled state machines, custom retry logic, and a bespoke tool orchestration layer. Three weeks in, the codebase was a maze. Every new remediation path required rewriting core routing logic.&lt;/p&gt;

&lt;p&gt;A proper agent framework would have provided state management, conditional branching, retry handling, and checkpointing for free. Instead the team was reinventing those primitives by hand — the most expensive kind of technical debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern is the same in both directions: the architecture was chosen before the workflow was understood.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Question: Does Your Workflow Resist Simplicity?
&lt;/h2&gt;

&lt;p&gt;Before touching any framework, draw the workflow on paper. Then answer these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does step N's output determine whether to &lt;strong&gt;redo&lt;/strong&gt; step N-1? → &lt;em&gt;You have a loop&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Do multiple specialized agents need to run &lt;strong&gt;simultaneously&lt;/strong&gt;? → &lt;em&gt;You have parallelism&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Does the workflow run for &lt;strong&gt;minutes or hours&lt;/strong&gt;, surviving restarts? → &lt;em&gt;You need persistent state&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Does the agent need to &lt;strong&gt;decide its next action&lt;/strong&gt; from intermediate results? → &lt;em&gt;Dynamic planning&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Do independent agents need to &lt;strong&gt;hand off context&lt;/strong&gt; to each other? → &lt;em&gt;Multi-agent delegation&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;None of these? Use plain tool calling. Here's what that looks like at production quality.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Baseline: Plain Tool Calling (No Framework Needed)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Real-time ESG (Environmental, Social, Governance) risk scoring. Given a ticker, pull sustainability metrics, cross-reference regulatory filings, produce a risk-adjusted score, and persist an audit trail — one clean, observable service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzm121pgrz784peoebgn5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzm121pgrz784peoebgn5.png" alt="Real-time ESG (Environmental, Social, Governance) risk scoring" width="800" height="388"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# requirements: fastapi, openai, psycopg2-binary, httpx, pydantic
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ESGRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;portfolio_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_esg_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.sustainalytics.com/v1/esg/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ESG_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_regulatory_flags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                SELECT flag_type, severity, filing_date, description
                FROM regulatory_flags WHERE ticker = %s
                AND filing_date &amp;gt; NOW() - INTERVAL &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2 years&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
                ORDER BY severity DESC LIMIT 10
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
            &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_esg_metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetch ESG sustainability scores for a stock ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_regulatory_flags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieve regulatory violations and compliance flags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;TOOL_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_esg_metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_esg_metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_regulatory_flags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_regulatory_flags&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/assess-esg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assess_esg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ESGRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run a full ESG risk assessment for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in portfolio &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;portfolio_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Produce a risk-adjusted score with recommendation: HOLD, REDUCE, or DIVEST.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)}]&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assessment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_MAP&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multi-step tool calling, audit trail, fully debuggable by any engineer. If this solves the problem — stop here. No framework needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. LangGraph — Stateful Cyclic Workflows
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Version:&lt;/strong&gt; 1.1.2 | &lt;code&gt;pip install langgraph langgraph-checkpoint-redis&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt; Your workflow has genuine loops — the result of one step conditionally re-runs a previous step. Redis checkpointing lets state survive service restarts mid-workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid when:&lt;/strong&gt; The workflow is strictly sequential with no conditional branching. The graph model adds measurable overhead for zero architectural return.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Automated cloud cost optimization — scan Azure VMs for underutilization, simulate right-sizing savings, apply low-risk changes, re-scan. The loop continues until no optimization exceeds the savings threshold.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc30200qg4ej5ma44nsk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc30200qg4ej5ma44nsk.png" alt="Automated cloud cost optimization" width="800" height="1411"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# requirements: langgraph, langgraph-checkpoint-redis, openai
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.redis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RedisSaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;oai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;subscription_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;applied&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;total_savings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scan_resources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;azure_monitor_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_resources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subscription_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;underutilized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_cpu_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_memory_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;underutilized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_savings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            Analyze these underutilized Azure resources and recommend right-sizing.
            For each: resource_id, current_sku, recommended_sku, monthly_savings_usd, risk_level (LOW/MEDIUM/HIGH).
            Only include LOW or MEDIUM risk recommendations.
            Resources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
            Respond: {{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [...]}}
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_savings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monthly_savings_usd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_changes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;applied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;azure_compute&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resize_vm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resource_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommended_sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;applied&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;applied&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_threshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_savings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CostState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scan_resources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_savings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;apply_changes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check_threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RedisSaver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_conn_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subscription_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AZURE_SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_savings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this justifies LangGraph:&lt;/strong&gt; The self-correcting re-scan is genuinely hard to express cleanly in plain tool calling without writing a custom state machine — which is exactly the under-engineering trap described earlier.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. AutoGen 0.7.5 — Parallel Multi-Agent Collaboration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Version:&lt;/strong&gt; 0.7.5 | &lt;code&gt;pip install "autogen-agentchat&amp;gt;=0.7.5" "autogen-ext[openai,redis]"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Key additions in 0.7.5: linear memory via &lt;code&gt;RedisMemory&lt;/code&gt;, fixed &lt;code&gt;GraphFlow&lt;/code&gt; cycle detection, Anthropic thinking mode support, &lt;code&gt;reasoning_effort&lt;/code&gt; parameter for GPT-5 models, improved Azure AI client streaming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt; Multiple specialized agents run independent analysis simultaneously. Async, event-driven — agents can be deployed on separate containers with zero blocking between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid when:&lt;/strong&gt; The task is sequential and single-agent. Multi-agent coordination overhead only pays off when genuine parallelism exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Automated M&amp;amp;A due diligence — Legal, Financial, and Tech Audit agents work in parallel, then a Synthesis agent consolidates findings into an investment decision. Sequential execution would add hours of unnecessary latency to a time-sensitive deal process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps8lxqs6tzetlgw95uif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps8lxqs6tzetlgw95uif.png" alt="Automated M&amp;amp;A due diligence" width="800" height="404"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# requirements: autogen-agentchat&amp;gt;=0.7.5, autogen-ext[openai,redis]
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_agentchat.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AssistantAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_agentchat.teams&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SelectorGroupChat&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_agentchat.conditions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextMentionTermination&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_ext.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIChatCompletionClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_ext.memory.redis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RedisMemory&lt;/span&gt;  

&lt;span class="n"&gt;model_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIChatCompletionClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;legal_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AssistantAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LegalAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;RedisMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ma-legal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;M&amp;amp;A legal counsel. Identify: IP gaps, change-of-control clauses,
    litigation exposure, employment liabilities.
    Output JSON: {risk_items, severity: BLOCKER|HIGH|MEDIUM|LOW, deal_impact}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fetch_contract_repository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_litigation_database&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;financial_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AssistantAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FinancialAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;RedisMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ma-financial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;M&amp;amp;A financial analyst. Identify: revenue quality, off-balance-sheet
    liabilities, working capital needs post-acquisition, EBITDA normalization.
    Output JSON: {financial_flags, normalized_ebitda, recommended_valuation_range}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fetch_financial_statements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compute_dcf_model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tech_audit_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AssistantAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TechAuditAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Technical due diligence expert. Assess: tech debt score (1–10),
    security vulnerabilities, scalability ceiling, bus-factor risk.
    Output JSON: {tech_risks, estimated_remediation_cost_usd, integration_complexity}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;clone_and_analyze_repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_dependency_scan&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;synthesis_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AssistantAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SynthesisAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;M&amp;amp;A deal lead. Wait for all domain agents to complete.
    Synthesize: PROCEED / PROCEED_WITH_CONDITIONS / ABORT.
    Include: top 5 risks, price adjustment recommendation, 90-day priorities.
    End your message with: ANALYSIS_COMPLETE&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;generate_pdf_memo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;notify_deal_team_slack&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_ma_due_diligence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SelectorGroupChat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;participants&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;legal_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financial_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tech_audit_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesis_agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;termination_condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TextMentionTermination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ANALYSIS_COMPLETE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;selector_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run LegalAgent, FinancialAgent, TechAuditAgent first (order flexible,
        can run in parallel). Only select SynthesisAgent after all three have reported.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Full M&amp;amp;A due diligence for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_ma_due_diligence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TargetCorp Inc.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Microsoft Agent Framework RC — Enterprise Production Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Version:&lt;/strong&gt; Release Candidate, Feb 19, 2026 | &lt;code&gt;pip install agent-framework --pre&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The most significant shift in the Microsoft AI ecosystem right now. This framework &lt;strong&gt;unifies Semantic Kernel and AutoGen into a single SDK&lt;/strong&gt; — both are entering maintenance mode as of this writing, and all new Microsoft investment flows into Agent Framework first.&lt;/p&gt;

&lt;p&gt;RC signals a frozen, stable API surface with GA targeting Q1 2026. Core capabilities: persistent threads (Cosmos DB), Service Bus integration, MCP + A2A protocol support, multi-agent orchestration with handoff and group chat patterns, streaming checkpointing for long-running agents, full .NET and Python support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt; Azure-first enterprise teams, production 24/7 background agents, regulated systems requiring full audit trails, or any team currently building on Semantic Kernel or AutoGen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid when:&lt;/strong&gt; Greenfield Python-only stacks with no Azure dependency, or where GA stability is a hard requirement before adoption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Autonomous DevOps deployment agent — monitors CI/CD completion events, validates health via Application Insights, promotes through dev → staging → prod automatically, pages on-call only when a gate fails. Runs continuously as a persistent, resumable agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu6hu6bcqcyl1ww6pgal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu6hu6bcqcyl1ww6pgal.png" alt="Autonomous DevOps deployment agent" width="800" height="845"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# requirements: agent-framework --pre, azure-identity, azure-monitor-query, kubernetes
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_framework&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.identity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DefaultAzureCredential&lt;/span&gt;

&lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DefaultAzureCredential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;agent_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AZURE_AI_FOUNDRY_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credential&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_app_insights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_minutes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.monitor.query&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MetricsQueryClient&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;
    &lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MetricsQueryClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;resource_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requests/failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requests/duration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;timespan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;window_minutes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;environment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_rate_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_calculate_error_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p99_latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_calculate_p99&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_calculate_error_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;_calculate_p99&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;promote_deployment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kubernetes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;k8s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_incluster_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;k8s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AppsV1Api&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;patch_namespaced_deployment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;template&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;containers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ACR_REGISTRY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_tag&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]}}}}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;promoted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;environment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Persistent thread — Cosmos DB preserves state across restarts
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DeploymentOrchestrator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Autonomous DevOps deployment agent.
    1. Validate health: error rate &amp;lt; 1%, p99 &amp;lt; 500ms, zero pod restarts
    2. Run smoke tests on the deployed environment
    3. All gates pass → promote to next environment
    4. Any gate fails → halt, capture full diagnostics, page on-call
    5. After prod deploy: monitor 15 minutes, auto-rollback if error rate &amp;gt; 2%
    Log every action: timestamp, metric values, decision, outcome.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_app_insights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;promote_deployment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_smoke_tests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;rollback_deployment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_oncall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_deployment_event&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_pipeline_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Azure Event Grid webhook — fires on every pipeline completion&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;agent_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deployment complete — Service: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;service_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tag: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;image_tag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Environment: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;environment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Begin validation and promotion workflow.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;agent_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_and_process_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. CrewAI 1.10.1 — Role-Based Sequential Pipelines
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Version:&lt;/strong&gt; 1.10.1 (March 3, 2026) | &lt;code&gt;pip install crewai==1.10.1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⚠️ Note:&lt;/strong&gt; v1.10.0 was yanked from PyPI due to an AMP runtime issue. Always pin to &lt;code&gt;1.10.1&lt;/code&gt; directly. This release lazy loading in Memory module and resolves a concurrent multi-process &lt;code&gt;LockException&lt;/code&gt; in production flows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt; Clearly defined specialist roles, sequential task handoff, and fastest path from idea to working prototype. CrewAI is the most opinionated framework and the fastest to build with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid when:&lt;/strong&gt; Fine-grained conditional edge control is needed, or throughput-sensitive production paths where abstraction overhead is measurable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Automated system architecture pipeline — an Architect designs from requirements, a Reviewer stress-tests for production failures, a Documentation Lead produces the Architecture Decision Record for the engineering wiki.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq63pavxrx7inawd3vbfv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq63pavxrx7inawd3vbfv.png" alt="Automated system architecture pipeline" width="799" height="255"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# requirements: crewai==1.10.1
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;

&lt;span class="n"&gt;architect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Principal Solutions Architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Design a scalable, cloud-native system architecture from requirements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10+ years on Azure and AWS. Expert in event-driven architecture, CQRS, and zero-trust.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staff Engineer — Technical Reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Identify scalability bottlenecks, security vulnerabilities, and operational risks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Former SRE. Has seen every production failure mode. Only approves designs that hold at 10x load.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;doc_lead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Technical Documentation Lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Produce a complete Architecture Decision Record capturing every design trade-off&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Undocumented architecture is a liability. Writes for the engineer joining 2 years later.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;design_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Design a complete architecture for: {requirements}
    Output JSON: components (name + responsibility), data_flows (source → destination),
    tech_choices (with justification), scaling_strategy per component, security_controls.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSON architecture specification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;architect&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;review_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Stress-test the architecture:
    1. 10x traffic spike — what breaks first?
    2. Single region failure — what is the blast radius?
    3. Compromised service account — what can an attacker reach?
    Return: {approved: YES/NO, blocking_issues: [], recommended_changes: []}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review JSON with approval and findings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;design_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;adr_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a complete ADR: Context, Decision, Consequences, Alternatives Considered, Risk Register. Format: Markdown.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Complete ADR in Markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc_lead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;design_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc_lead&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;design_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adr_task&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requirements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Real-time fraud detection — 50K TPS, sub-100ms decisions, 99.99% uptime, multi-region active-active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Haystack 2.x — Document Intelligence Pipelines
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Version:&lt;/strong&gt; 2.x | &lt;code&gt;pip install haystack-ai&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt; Document processing, extraction, or compliance analysis at scale is the core product — not general agentic behavior. Purpose-built for this and measurably outperforms general frameworks on document-centric workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid when:&lt;/strong&gt; The problem is general agentic orchestration, multi-agent coordination, or any real-time interactive system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Automated SOC 2 evidence collection — ingests all internal policy documents, maps clauses against Trust Service Criteria, produces a compliance gap report showing which controls are missing or non-conformant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65yg8it7bjtpknlvy5uo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65yg8it7bjtpknlvy5uo.png" alt="Automated SOC 2 evidence collection" width="800" height="1952"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pipeline maps extracted content against all six SOC 2 Trust Service Criteria (CC6, CC7, CC8, CC9, A1, C1), flags NOT_COVERED gaps with descriptions, and returns a structured compliance report with overall coverage percentage — no custom parsing layer required.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Decision Rule
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If your AI system needs:

  Retries or self-correction    → LangGraph
  Multiple parallel specialists → AutoGen
  Enterprise Azure persistence  → Microsoft Agent Framework
  Sequential role-based tasks   → CrewAI
  Document extraction at scale  → Haystack
  None of the above             → No framework. Plain tool calling.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Framework Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Primary Strength&lt;/th&gt;
&lt;th&gt;Use When&lt;/th&gt;
&lt;th&gt;Avoid When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Plain tool calling&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Speed, simplicity, debuggability&lt;/td&gt;
&lt;td&gt;Straight-line workflow&lt;/td&gt;
&lt;td&gt;Never — if no loops exist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;1.1.2&lt;/td&gt;
&lt;td&gt;Cyclic graphs, checkpointing&lt;/td&gt;
&lt;td&gt;Self-correcting loops, retries&lt;/td&gt;
&lt;td&gt;No conditional branching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.7.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Parallel async agents, RedisMemory&lt;/td&gt;
&lt;td&gt;Multiple specialists in parallel&lt;/td&gt;
&lt;td&gt;Sequential single-agent tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft Agent Framework&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RC Feb 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise persistence, SK + AutoGen unified&lt;/td&gt;
&lt;td&gt;Azure production 24/7 agents&lt;/td&gt;
&lt;td&gt;Python-only, no Azure stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.10.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Role-based prototyping, fast iteration&lt;/td&gt;
&lt;td&gt;Sequential delegation, fast prototyping&lt;/td&gt;
&lt;td&gt;Fine-grained production control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haystack&lt;/td&gt;
&lt;td&gt;2.x&lt;/td&gt;
&lt;td&gt;Document extraction pipelines&lt;/td&gt;
&lt;td&gt;Document processing as core product&lt;/td&gt;
&lt;td&gt;General agentic tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Production Lessons From Real Systems
&lt;/h2&gt;

&lt;p&gt;After reviewing AI systems across multiple teams, a few patterns appear consistently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Most workflows are simpler than they look.&lt;/strong&gt;&lt;br&gt;
The instinct when building with LLMs is to reach for orchestration layers. That instinct is usually wrong. Start with the simplest thing that could work, measure it, then add complexity only where the problem resists simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Agent frameworks pay off only when the workflow has the right shape.&lt;/strong&gt;&lt;br&gt;
Loops, parallel specialists, or long-running persistent state. If none of these exist, the framework is cost with no architectural return.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Debuggability matters more than clever architecture.&lt;/strong&gt;&lt;br&gt;
A system the team can debug at 2AM is worth more than an elegant multi-agent pipeline nobody fully understands. Production incidents don't wait for framework comprehension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Hand-rolling framework primitives is the most expensive mistake.&lt;/strong&gt;&lt;br&gt;
Custom state machines, retry logic, and checkpointing written to avoid a framework dependency consistently cost more engineering time than learning the framework properly. Both failure modes described earlier confirm this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Decide the architecture before writing the first line.&lt;/strong&gt;&lt;br&gt;
Draw the workflow. Identify the shape. Pick the tool. That decision should take 30 minutes, not six weeks of refactoring.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Rule
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If you can draw your workflow as a straight line — plain tool calling is your production architecture.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;If that line needs to loop, branch, or coordinate parallel specialists — match the tool to the shape.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The best production AI systems are architecturally boring. One clean service, typed tool schemas, structured output, observable logs. No framework overhead unless the problem demands it.&lt;/p&gt;

&lt;p&gt;Add complexity only when the problem resists simplicity. That's the whole framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next in This Series
&lt;/h2&gt;

&lt;p&gt;This post focused on orchestration — how to structure and run AI workflows in production.&lt;/p&gt;

&lt;p&gt;But once orchestration is solved, most teams run into a different problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Their RAG system doesn’t actually work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not in demos — those look fine.&lt;br&gt;
In production — it breaks.&lt;/p&gt;

&lt;p&gt;Wrong answers. Missing context. Hallucinations with high confidence.&lt;/p&gt;

&lt;p&gt;And in most cases, the root cause is &lt;em&gt;not&lt;/em&gt; the vector database or the embedding model.&lt;/p&gt;

&lt;p&gt;It’s the architecture around it.&lt;/p&gt;

&lt;p&gt;The next article breaks down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why most RAG systems fail after launch&lt;/li&gt;
&lt;li&gt;The common design mistakes teams repeat&lt;/li&gt;
&lt;li&gt;And the production architecture that actually fixes it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re building anything on top of retrieval, this is where things either scale — or quietly fail.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Are you running an agent framework in production — or did you strip one out and go back to basics?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What made that decision clear? Drop your stack and the reason in the comments. The real production stories are always more useful than the official docs.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#llm&lt;/code&gt; &lt;code&gt;#machinelearning&lt;/code&gt; &lt;code&gt;#softwareengineering&lt;/code&gt; &lt;code&gt;#devops&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Version data sourced from official release channels: &lt;a href="//github.com/microsoft/autogen/releases"&gt;AutoGen 0.7.5&lt;/a&gt; , &lt;a href="//learn.microsoft.com/agent-framework"&gt;Microsoft Agent Framework RC&lt;/a&gt;, &lt;a href="//docs.crewai.com/changelog,%20pypi.org/project/crewai"&gt;CrewAI 1.10.1&lt;/a&gt;, &lt;a href="//langchain-ai.github.io/langgraph"&gt;LangGraph&lt;/a&gt;, &lt;a href="//docs.haystack.deepset.ai"&gt;Haystack&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>software</category>
      <category>agents</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Wed, 18 Mar 2026 09:37:06 +0000</pubDate>
      <link>https://dev.to/theprodsde/-1582</link>
      <guid>https://dev.to/theprodsde/-1582</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m" class="crayons-story__hidden-navigation-link"&gt;Stop Guessing Your LLM Replacement&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/theprodsde" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" alt="theprodsde profile" class="crayons-avatar__image" width="500" height="500"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/theprodsde" class="crayons-story__secondary fw-medium m:hidden"&gt;
              TheProdSDE
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                TheProdSDE
                
              
              &lt;div id="story-author-preview-content-3348480" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/theprodsde" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" class="crayons-avatar__image" alt="" width="500" height="500"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;TheProdSDE&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 14&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m" id="article-link-3348480"&gt;
          Stop Guessing Your LLM Replacement
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/cloud"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;cloud&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/systemdesign"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;systemdesign&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>cloud</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Stop Guessing Your LLM Replacement</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Sat, 14 Mar 2026 11:32:29 +0000</pubDate>
      <link>https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m</link>
      <guid>https://dev.to/theprodsde/stop-guessing-your-llm-replacement-5f7m</guid>
      <description>&lt;h2&gt;
  
  
  A Practical Guide to Migrating GPT Apps Across Azure, AWS, and GCP
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Most LLM migrations are not caused by model performance. They happen because of &lt;strong&gt;data residency laws&lt;/strong&gt;, &lt;strong&gt;enterprise deployment requirements&lt;/strong&gt;, or &lt;strong&gt;cloud standardisation decisions&lt;/strong&gt;. This guide helps you narrow the search space to the right replacement candidates — not replace real testing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Companies Actually Migrate LLMs
&lt;/h2&gt;

&lt;p&gt;Engineers rarely wake up and decide to migrate their AI stack. Most migrations are triggered by &lt;strong&gt;business constraints&lt;/strong&gt;, not technical ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feemc41l3n4xa67lsj8mv.png" alt="Why Companies Actually Migrate LLMs" width="799" height="237"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Common Migration Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1️⃣ Expanding into Regions With Data Residency Laws
&lt;/h3&gt;

&lt;p&gt;Your product currently runs on Azure OpenAI (&lt;code&gt;gpt-4o-mini&lt;/code&gt;), but a new region requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EU sovereign cloud&lt;/li&gt;
&lt;li&gt;Local data processing&lt;/li&gt;
&lt;li&gt;Provider-specific compliance certifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You may need to move to &lt;strong&gt;AWS Bedrock&lt;/strong&gt;, &lt;strong&gt;Google Vertex AI&lt;/strong&gt;, or &lt;strong&gt;Azure AI Foundry with open models&lt;/strong&gt; — even though your application logic stays identical.&lt;/p&gt;




&lt;h3&gt;
  
  
  2️⃣ Enterprise Customers Want AI Inside Their Environment
&lt;/h3&gt;

&lt;p&gt;This is extremely common in &lt;strong&gt;B2B SaaS&lt;/strong&gt;. Enterprise customers often require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Private cloud deployment&lt;/li&gt;
&lt;li&gt;VPC-only access&lt;/li&gt;
&lt;li&gt;On-prem inference&lt;/li&gt;
&lt;li&gt;Sovereign cloud environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your API-based model suddenly needs to become &lt;strong&gt;Llama&lt;/strong&gt;, &lt;strong&gt;Qwen&lt;/strong&gt;, &lt;strong&gt;DeepSeek&lt;/strong&gt;, or &lt;strong&gt;Mistral&lt;/strong&gt; — running inside &lt;em&gt;their&lt;/em&gt; infrastructure.&lt;/p&gt;




&lt;h3&gt;
  
  
  3️⃣ Corporate Cloud Standardisation
&lt;/h3&gt;

&lt;p&gt;A classic scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI team → Azure&lt;/li&gt;
&lt;li&gt;Platform team → AWS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Leadership decides: &lt;em&gt;"All workloads must run on AWS."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now your team must translate &lt;code&gt;gpt-4o-mini&lt;/code&gt; into an AWS Bedrock equivalent — and the model catalog doesn't make that obvious.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Model Names Don't Translate
&lt;/h2&gt;

&lt;p&gt;Each vendor uses completely different naming schemes. There is no official cross-provider model map.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;th&gt;Entry Model&lt;/th&gt;
&lt;th&gt;Mid Model&lt;/th&gt;
&lt;th&gt;Reasoning Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gpt-4o-mini&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gpt-4o&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;o-series&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flash&lt;/td&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;Pro reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Meta&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scout&lt;/td&gt;
&lt;td&gt;Maverick&lt;/td&gt;
&lt;td&gt;Large variants&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Small (7B–14B)&lt;/td&gt;
&lt;td&gt;72B–110B&lt;/td&gt;
&lt;td&gt;235B Thinking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;V3 (non-thinking)&lt;/td&gt;
&lt;td&gt;V3 standard&lt;/td&gt;
&lt;td&gt;R1 (reasoning)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Common mistakes teams make
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;❌ Picking the &lt;strong&gt;cheapest&lt;/strong&gt; model in the new catalog&lt;/li&gt;
&lt;li&gt;❌ Picking the &lt;strong&gt;newest&lt;/strong&gt; model by release date&lt;/li&gt;
&lt;li&gt;❌ Picking the &lt;strong&gt;highest benchmark&lt;/strong&gt; model regardless of tier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three approaches can silently break production behaviour.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tier Model That Actually Works
&lt;/h2&gt;

&lt;p&gt;Instead of comparing names, compare &lt;strong&gt;capability tiers&lt;/strong&gt;. Every major provider follows the same five-tier structure.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Typical Use Cases&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Relative Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mini / Flash / Small&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chatbots, RAG, classification&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;td&gt;Lowest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Standard / Mid&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Assistants, summarisation, coding&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reasoning / Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Agents, planning, complex Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;Slower&lt;/td&gt;
&lt;td&gt;Higher&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontier / Flagship&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Research workloads, safety-critical&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once you know your current model's tier, finding candidates becomes systematic — not guesswork.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tier 1 — Replacing &lt;code&gt;gpt-4o-mini&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Typical workloads:&lt;/strong&gt; chat assistants, RAG pipelines, tool calling, lightweight coding&lt;/p&gt;

&lt;h3&gt;
  
  
  Candidates by cloud
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud&lt;/th&gt;
&lt;th&gt;Replacement Candidates&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure AI Foundry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 4 Scout, Qwen3-8B/14B, DeepSeek V3, Claude Haiku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Bedrock&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Haiku, Mistral Small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP Vertex AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini Flash-Lite, Gemini Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Behaviour differences at this tier
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;th&gt;Watch out for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Haiku&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reliable, low hallucination rate&lt;/td&gt;
&lt;td&gt;~7× more expensive than &lt;code&gt;gpt-4o-mini&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extremely fast, 1M token context&lt;/td&gt;
&lt;td&gt;GCP-only; not available on Azure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama 4 Scout&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open-weight, 10M token context, Azure-hosted&lt;/td&gt;
&lt;td&gt;Not a pure reasoning-tuned model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek V3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unusually strong reasoning (MMLU-Pro ~75.9, GPQA ~59.1) for this price tier&lt;/td&gt;
&lt;td&gt;Direct API or Azure Foundry; no native AWS/GCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-8B/14B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong multilingual + math, Apache 2.0&lt;/td&gt;
&lt;td&gt;Smaller context than Gemini/Llama&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Tier 2 — Replacing &lt;code&gt;gpt-4o&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Typical workloads:&lt;/strong&gt; document summarisation, coding assistance, enterprise chat assistants&lt;/p&gt;

&lt;h3&gt;
  
  
  Candidates by cloud
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud&lt;/th&gt;
&lt;th&gt;Replacement Candidates&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure AI Foundry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Sonnet, Llama 4 Maverick, DeepSeek V3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Bedrock&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Sonnet, Mistral Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP Vertex AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini Flash, Gemini 2.5 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Benchmark reference (reasoning quality)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MMLU&lt;/th&gt;
&lt;th&gt;MMLU-Pro&lt;/th&gt;
&lt;th&gt;GPQA-Diamond&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~88+&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Competitive&lt;/td&gt;
&lt;td&gt;Best SWE-bench coding score at this tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama 4 Maverick&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~85+&lt;/td&gt;
&lt;td&gt;~80.5&lt;/td&gt;
&lt;td&gt;~69.8&lt;/td&gt;
&lt;td&gt;Beats GPT-4o on Meta's coding benchmarks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek V3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;88.5&lt;/td&gt;
&lt;td&gt;75.9–81.2&lt;/td&gt;
&lt;td&gt;59.1–68.4&lt;/td&gt;
&lt;td&gt;Frontier-class at mid-tier pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Flash (GCP)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Competitive&lt;/td&gt;
&lt;td&gt;~78% SWE-bench&lt;/td&gt;
&lt;td&gt;GCP-only; fastest in this tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;DeepSeek V3 on Azure often outperforms &lt;code&gt;gpt-4o&lt;/code&gt; on raw reasoning benchmarks at significantly lower cost. Treat it as a tier upgrade, not just a replacement.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Tier 3 — Replacing Reasoning Models (&lt;code&gt;gpt-4.1&lt;/code&gt; / &lt;code&gt;gpt-5&lt;/code&gt; / o-series)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Typical workloads:&lt;/strong&gt; agent systems, research workflows, complex multi-step reasoning&lt;/p&gt;

&lt;h3&gt;
  
  
  Candidates by cloud
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud&lt;/th&gt;
&lt;th&gt;Replacement Candidates&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure AI Foundry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Opus, DeepSeek R1, Qwen3-235B Thinking (via Foundry)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Bedrock&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Opus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP Vertex AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Pro, Gemini 3.1 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Reasoning benchmark reference (HLE + advanced)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;HLE Score&lt;/th&gt;
&lt;th&gt;MMLU-Pro&lt;/th&gt;
&lt;th&gt;GPQA-Diamond&lt;/th&gt;
&lt;th&gt;AIME-25&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top-tier (Anthropic reports #1 on HLE leaderboard)&lt;/td&gt;
&lt;td&gt;~90+&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-235B-A22B Thinking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~18%&lt;/strong&gt; (one of few published open-weight HLE scores)&lt;/td&gt;
&lt;td&gt;~84.4%&lt;/td&gt;
&lt;td&gt;~81%&lt;/td&gt;
&lt;td&gt;~92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek R1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not widely published&lt;/td&gt;
&lt;td&gt;~81.2&lt;/td&gt;
&lt;td&gt;~68.4&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini 2.5 / 3.1 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Competitive&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Qwen3-235B-A22B Thinking&lt;/strong&gt; is currently one of the few open-weight models with a &lt;strong&gt;published Humanity's Last Exam score (~18%)&lt;/strong&gt; — putting it in the same conversation as frontier closed models for reasoning-heavy tasks.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Architecture Pattern: Make LLMs Replaceable
&lt;/h2&gt;

&lt;p&gt;The biggest mistake teams make is &lt;strong&gt;hard-coding a model into their architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ The fragile pattern
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application → GPT-4o-mini (hardcoded)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any migration requires touching application logic, service config, and prompt templates.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ The replaceable pattern
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu04ixmxhxb99gc7dfe5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu04ixmxhxb99gc7dfe5s.png" alt="The replaceable pattern" width="800" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vendor independence — swap providers via config&lt;/li&gt;
&lt;li&gt;Easier model upgrades without app rewrites&lt;/li&gt;
&lt;li&gt;Enables cost-optimised routing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Cost Optimisation: Intelligent Request Routing
&lt;/h2&gt;

&lt;p&gt;Many production AI systems at scale route requests by &lt;strong&gt;task complexity&lt;/strong&gt; rather than using one model for everything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qr9b496z534g9hna69x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qr9b496z534g9hna69x.png" alt="Cost Optimisation: Intelligent Request Routing" width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This pattern can reduce LLM costs by &lt;strong&gt;60–90%&lt;/strong&gt; in workloads with a mix of simple and complex requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt Regression Testing — Non-Negotiable
&lt;/h2&gt;

&lt;p&gt;Before committing to any model swap, run &lt;strong&gt;prompt regression tests&lt;/strong&gt; on your real production prompts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simple regression harness
&lt;/span&gt;&lt;span class="n"&gt;test_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_production_samples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;output_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;old_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score_a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;# correctness, format, hallucination
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score_b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;regression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score_b&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;score_a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;regressions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;regression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Regression rate: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regressions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Correctness&lt;/strong&gt; — does the answer change?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format compliance&lt;/strong&gt; — does it still follow your output structure?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination rate&lt;/strong&gt; — does it fabricate facts?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt; — does it still meet your SLA?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚠️ Important Disclaimer
&lt;/h2&gt;

&lt;p&gt;The mappings in this guide are a &lt;strong&gt;starting point&lt;/strong&gt;, not guaranteed drop-in replacements.&lt;/p&gt;

&lt;p&gt;Models in the same tier can have meaningfully different behaviour on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your specific domain vocabulary&lt;/li&gt;
&lt;li&gt;Your prompt style&lt;/li&gt;
&lt;li&gt;Edge cases in your data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every migration must include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt regression testing on real data&lt;/li&gt;
&lt;li&gt;Human evaluation of sampled outputs&lt;/li&gt;
&lt;li&gt;Shadow traffic validation (run both models in parallel, compare outputs)&lt;/li&gt;
&lt;li&gt;Gradual rollout (5% → 25% → 100%)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Recommended Migration Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjhlhmxlmltol3rm2f5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjhlhmxlmltol3rm2f5d.png" alt="Recommended Migration Workflow" width="336" height="1080"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Do not skip shadow traffic. It catches subtle regressions that prompt tests miss.&lt;/p&gt;




&lt;h2&gt;
  
  
  📋 LLM Migration Cheat Sheet
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mini Tier (&lt;code&gt;gpt-4o-mini&lt;/code&gt; equivalent)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt; → Llama 4 Scout, Qwen3-8B/14B, DeepSeek V3, Claude Haiku&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt; → Claude Haiku, Mistral Small&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP&lt;/strong&gt; → Gemini Flash-Lite, Gemini Flash&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Standard Tier (&lt;code&gt;gpt-4o&lt;/code&gt; equivalent)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt; → Claude Sonnet, Llama 4 Maverick, DeepSeek V3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt; → Claude Sonnet, Mistral Medium&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP&lt;/strong&gt; → Gemini Flash, Gemini 2.5 Pro&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reasoning Tier (o-series / &lt;code&gt;gpt-5&lt;/code&gt; equivalent)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt; → Claude Opus, DeepSeek R1, Qwen3-235B Thinking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt; → Claude Opus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP&lt;/strong&gt; → Gemini 2.5 Pro, Gemini 3.1 Pro&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;The LLM ecosystem is evolving too fast to depend on a single provider.&lt;/p&gt;

&lt;p&gt;Design your systems so that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPT → Claude → Gemini → DeepSeek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…is a &lt;strong&gt;configuration change&lt;/strong&gt;, not a &lt;strong&gt;system rewrite&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When that happens, migrations become boring infrastructure work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And boring infrastructure is exactly what you want in production.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Follow &lt;a href="https://dev.to/theprodsde"&gt;TheProdSDE&lt;/a&gt; for more practical engineering guides on AI systems, cloud architecture, and developer tooling.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#llm&lt;/code&gt; &lt;code&gt;#cloud&lt;/code&gt; &lt;code&gt;#azure&lt;/code&gt; &lt;code&gt;#systemdesign&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>cloud</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Worth your 10min of time if deciding which framework suits your use case better</title>
      <dc:creator>TheProdSDE</dc:creator>
      <pubDate>Fri, 13 Mar 2026 14:48:04 +0000</pubDate>
      <link>https://dev.to/theprodsde/worth-your-10min-of-time-if-deciding-which-framework-suits-your-use-case-better-3gjp</link>
      <guid>https://dev.to/theprodsde/worth-your-10min-of-time-if-deciding-which-framework-suits-your-use-case-better-3gjp</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/theprodsde/langgraph-vs-semantic-kernel-python-ai-agents-in-2026-1p4g" class="crayons-story__hidden-navigation-link"&gt;LangGraph vs Semantic Kernel: Python AI Agents in 2026&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/theprodsde" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" alt="theprodsde profile" class="crayons-avatar__image" width="500" height="500"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/theprodsde" class="crayons-story__secondary fw-medium m:hidden"&gt;
              TheProdSDE
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                TheProdSDE
                
              
              &lt;div id="story-author-preview-content-3339533" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/theprodsde" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752909%2F36ae54b2-61fd-4d29-b097-ed6fbe2cd7ce.png" class="crayons-avatar__image" alt="" width="500" height="500"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;TheProdSDE&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/theprodsde/langgraph-vs-semantic-kernel-python-ai-agents-in-2026-1p4g" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 11&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/theprodsde/langgraph-vs-semantic-kernel-python-ai-agents-in-2026-1p4g" id="article-link-3339533"&gt;
          LangGraph vs Semantic Kernel: Python AI Agents in 2026
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/langchain"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;langchain&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/theprodsde/langgraph-vs-semantic-kernel-python-ai-agents-in-2026-1p4g" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/theprodsde/langgraph-vs-semantic-kernel-python-ai-agents-in-2026-1p4g#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              2&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            8 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
