<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Cursuri AI</title>
    <description>The latest articles on DEV Community by Cursuri AI (cursuri-ai).</description>
    <link>https://dev.to/cursuri-ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12719%2Ffd877e1e-b068-40d1-90c2-438ed313f3e4.png</url>
      <title>DEV Community: Cursuri AI</title>
      <link>https://dev.to/cursuri-ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cursuri-ai"/>
    <language>en</language>
    <item>
      <title>Cursor vs GitHub Copilot vs Claude Code: Which AI Coding Tool in 2026?</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 29 Jun 2026 14:59:29 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/cursor-vs-github-copilot-vs-claude-code-which-ai-coding-tool-in-2026-6c8</link>
      <guid>https://dev.to/cursuri-ai/cursor-vs-github-copilot-vs-claude-code-which-ai-coding-tool-in-2026-6c8</guid>
      <description>&lt;p&gt;If you write code for a living in 2026, you're not asking &lt;em&gt;whether&lt;/em&gt; to use an AI coding tool — you're asking &lt;em&gt;which one&lt;/em&gt;. And the three names that dominate every team's Slack debate are &lt;strong&gt;Cursor&lt;/strong&gt;, &lt;strong&gt;GitHub Copilot&lt;/strong&gt;, and &lt;strong&gt;Claude Code&lt;/strong&gt;. They look similar from a distance (type intent, get code) but they're built on three genuinely different bets about how software gets written.&lt;/p&gt;

&lt;p&gt;I've spent serious time in all three on real, multi-file, multi-repo work — not toy demos — and this is the comparison I wish someone had handed me before I burned a month figuring it out. I write and teach about agentic engineering at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, Eastern Europe's AI education platform, so I'll keep this grounded in how these tools actually behave in production, not in launch-day marketing.&lt;/p&gt;

&lt;p&gt;A note before we start: pricing and features in this category change almost monthly. Everything below is a mid-2026 snapshot — verify the current numbers on each tool's official page before you budget for a team.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR — three different philosophies
&lt;/h2&gt;

&lt;p&gt;Here's the one-sentence version of each, before we go deep:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; is an &lt;strong&gt;AI-native editor&lt;/strong&gt; — it rebuilt the IDE around the agent. Best for developers who want fast, fluid, in-the-flow generation with deep editor integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Copilot&lt;/strong&gt; is the &lt;strong&gt;ecosystem play&lt;/strong&gt; — it lives where your code, issues, and PRs already are. Best for teams standardized on GitHub who want AI woven through the whole SDLC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; is the &lt;strong&gt;terminal-first agent&lt;/strong&gt; — it treats the command line as the primary surface and excels at autonomous, multi-step, multi-file work. Best for engineers comfortable orchestrating agents rather than babysitting autocomplete.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of them is "the best." They optimize for different moments, and the real skill is knowing which to reach for. Let's break down why.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Cursor?
&lt;/h2&gt;

&lt;p&gt;Cursor is an AI-native IDE built as a fork of VS Code, so the editor feels instantly familiar — your extensions, keybindings, and themes mostly carry over. What's different is that the AI isn't bolted on as a plugin; the whole editing experience is designed around it.&lt;/p&gt;

&lt;p&gt;Its signature features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tab completion&lt;/strong&gt; — a multi-line, context-aware autocomplete that predicts your &lt;em&gt;next edit&lt;/em&gt;, not just the next token. It's the feature people miss most when they switch away.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composer&lt;/strong&gt; — Cursor's agentic, multi-file editing mode. You describe a change in natural language and it edits across files, runs commands, and iterates. Cursor now ships &lt;strong&gt;Composer 2.5&lt;/strong&gt;, its own model trained specifically for agentic coding, alongside routing to frontier models from Anthropic, OpenAI, and Google.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Agents&lt;/strong&gt; — introduced in the Cursor 3.5 release (May 20, 2026), these run in isolated cloud VMs with terminal and browser access, can work across multiple repos in parallel, and report results back to your IDE asynchronously. It's Cursor's answer to "I want the agent working while I do something else."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cursor's center of gravity is &lt;strong&gt;in-the-flow coding&lt;/strong&gt;: you stay in the editor, you see every diff, and the AI keeps pace with your thinking. It rewards developers who want speed without giving up granular control over the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is GitHub Copilot?
&lt;/h2&gt;

&lt;p&gt;Copilot is the most widely deployed of the three, and its biggest advantage is gravitational: it lives inside the tools and platform most teams already use. It runs in VS Code, JetBrains IDEs, Visual Studio, and on GitHub itself.&lt;/p&gt;

&lt;p&gt;By 2026 Copilot has grown well past autocomplete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent mode&lt;/strong&gt; became generally available across both VS Code and JetBrains in March 2026 (previously VS Code only) — a multi-step agent that plans, edits across files, and runs commands inside your editor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The autonomous coding agent&lt;/strong&gt; is the standout. You assign a GitHub issue to Copilot, and it works asynchronously in the background — analyzing the repo, making changes, and opening a ready-to-review pull request. Assign, walk away, come back to a PR. It's the closest any mainstream tool comes to "fire-and-forget" feature work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic code review&lt;/strong&gt; gathers full project context before suggesting changes and can hand fixes straight to the coding agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Spark&lt;/strong&gt; lets you describe an app in plain English and get generated code with a live preview.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strategic point: Copilot's value isn't any single feature — it's that AI is now threaded through the entire GitHub-centric SDLC, from issue to PR to review. If your team lives on GitHub, that integration is hard to beat.&lt;/p&gt;

&lt;p&gt;One billing change worth flagging: as of June 1, 2026, GitHub moved to &lt;strong&gt;GitHub AI Credits&lt;/strong&gt; (token-based billing) in place of the older Premium Request Units. You're now billed by tokens processed at published model rates, which makes heavy agent usage more transparent — and easier to accidentally overspend if you're not watching.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Claude Code?
&lt;/h2&gt;

&lt;p&gt;Claude Code, from Anthropic, takes the opposite stance from Cursor: instead of building an editor, it makes the &lt;strong&gt;terminal&lt;/strong&gt; the primary surface (with IDE extensions available on top). That sounds minimalist until you see what it does with full shell access.&lt;/p&gt;

&lt;p&gt;Its defining strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic, multi-file, repo-aware work&lt;/strong&gt; from the command line — it reads your codebase, makes coordinated changes across many files, runs your tests, and handles git operations and CI-aware workflows natively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents&lt;/strong&gt; — reusable agent configurations with their own custom prompts and tool access, so you can define a "reviewer," a "test-writer," or a "migration" agent and invoke it on demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent teams and multi-agent orchestration&lt;/strong&gt; — coordinate multiple agent sessions working in parallel, with an agent view dashboard to manage them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code runs on Anthropic's models — currently Claude Opus 4.8 as the default, with the newer Claude Fable 5 as the most capable tier — and it's deliberately model-opinionated rather than a router. The tradeoff is real: it's the most powerful for autonomous, complex tasks, and the least hand-holdy. It assumes you're comfortable thinking like an &lt;em&gt;orchestrator of agents&lt;/em&gt; rather than a writer of lines.&lt;/p&gt;

&lt;p&gt;A word of caution that applies to every agent platform but bites hardest here: &lt;strong&gt;parallel agents multiply your token spend.&lt;/strong&gt; Running ten agents at once consumes your quota roughly ten times faster. The autonomy is exhilarating; the bill is real. Set limits before you scale up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-head: the dimensions that actually matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The editing model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; wins on &lt;em&gt;in-editor flow&lt;/em&gt;. Tab completion and inline diffs keep you in control of every change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; wins on &lt;em&gt;breadth of surface&lt;/em&gt; — it's good everywhere your code already is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; wins on &lt;em&gt;autonomous depth&lt;/em&gt; — it goes furthest without supervision, but you give up the inline, line-by-line feel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agents and autonomy
&lt;/h3&gt;

&lt;p&gt;All three now have agents, but the philosophy differs. Cursor's Cloud Agents and Copilot's coding agent are both "assign work, get a result later." Claude Code goes further with explicit multi-agent orchestration and reusable subagents. If your work is increasingly &lt;em&gt;delegating&lt;/em&gt; rather than &lt;em&gt;typing&lt;/em&gt;, this is the dimension to weigh most — and it's exactly the shift that makes understanding &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI agent architecture and automation&lt;/a&gt; a genuine career edge rather than a nice-to-have.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem and integration
&lt;/h3&gt;

&lt;p&gt;This is Copilot's home turf. The issue-to-PR loop, native code review, and presence across every major IDE make it the path of least resistance for GitHub-standardized teams. Cursor integrates deeply but inside &lt;em&gt;its&lt;/em&gt; editor; Claude Code integrates deeply with your &lt;em&gt;shell and git&lt;/em&gt;, which is either liberating or intimidating depending on your comfort with the command line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models
&lt;/h3&gt;

&lt;p&gt;Cursor routes across many frontier models and adds its own Composer model. Copilot offers a model picker. Claude Code is Anthropic-only by design. If model choice matters to you (and for some workloads it genuinely does), Cursor and Copilot give you more knobs; Claude Code bets that a tightly-integrated, top-tier model beats a buffet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing, side by side (mid-2026 snapshot)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Entry&lt;/th&gt;
&lt;th&gt;Mid tier&lt;/th&gt;
&lt;th&gt;Power / team&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cursor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hobby (free)&lt;/td&gt;
&lt;td&gt;Pro — $20/user/mo&lt;/td&gt;
&lt;td&gt;Teams — $40/user/mo (Standard), $120/user/mo (Premium)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Pro — $10/mo · Pro+ — $39/mo&lt;/td&gt;
&lt;td&gt;Max — $100/mo · Business / Enterprise seats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pro — $20/mo&lt;/td&gt;
&lt;td&gt;Max 5× — $100/mo&lt;/td&gt;
&lt;td&gt;Max 20× — $200/mo · API pay-per-token&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few honest caveats on cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; has the cheapest entry paid tier ($10), but token-based AI Credits mean heavy agent use can climb fast beyond the included allotment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor's&lt;/strong&gt; $20 Pro includes a fixed amount of frontier-model usage; power users hit the ceiling and either upgrade or switch to its cheaper Auto/Composer routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code's&lt;/strong&gt; Max tiers are priced for sustained, agent-heavy sessions — and again, parallel agents are a multiplier, not an add.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prices and tiers shift constantly in this category. Treat the table as a snapshot, not a quote, and confirm before committing a team budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which one should you choose?
&lt;/h2&gt;

&lt;p&gt;Here's the honest, persona-based answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor if&lt;/strong&gt; you want the best in-editor experience, you value fast inline generation and tight control over every diff, and you're happy living inside a (very good) VS Code fork. It's the most natural upgrade for a developer who loves their editor and wants AI to keep pace with their flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose GitHub Copilot if&lt;/strong&gt; your team is standardized on GitHub and you want AI woven through the entire lifecycle — issues, PRs, reviews — across whatever IDEs your team already uses. The issue-to-PR autonomous agent alone can change how a team ships. It's the safest institutional bet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Claude Code if&lt;/strong&gt; you're comfortable in the terminal, your work skews toward complex multi-file refactors and autonomous tasks, and you want to orchestrate agents rather than supervise autocomplete. It has the highest ceiling for autonomy — and asks the most of you in return.&lt;/p&gt;

&lt;p&gt;And the answer most senior engineers actually land on? &lt;strong&gt;More than one.&lt;/strong&gt; Plenty of us keep Cursor open for flow-state editing, lean on Copilot inside the GitHub workflow, and fire up Claude Code for the gnarly autonomous jobs. The tools overlap, but they're not redundant — they're a toolkit. The real meta-skill isn't loyalty to one editor; it's &lt;strong&gt;fluency across the category&lt;/strong&gt; so you instinctively reach for the right one per task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill underneath the tools
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth that the demos hide: these tools amplify the engineer you already are. Point a powerful agent at a vague intent and you get a fast, confident wall of code you didn't design and can't fully maintain. The developers getting outsized leverage from Cursor, Copilot, and Claude Code aren't the ones who learned the keyboard shortcuts — they're the ones who understand agent architecture, context engineering, and how to specify intent precisely enough that autonomy becomes an asset instead of a liability.&lt;/p&gt;

&lt;p&gt;That foundation is exactly what we build at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI education platform&lt;/a&gt; for Eastern Europe — practical, project-based courses taught around real repositories with an interactive AI instructor, not slide decks. If you want to go from "I use these tools" to "I get serious leverage from them," we maintain dedicated, hands-on tracks for &lt;a href="https://cursuri-ai.ro/courses/cursor-pro" rel="noopener noreferrer"&gt;using Cursor as a pro&lt;/a&gt; and for &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;agentic coding with Claude Code&lt;/a&gt; — both built around real multi-file, real-repo workflows rather than toy examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In 2026, "AI coding tool" isn't one product category — it's three philosophies wearing similar clothes. Cursor bet on the editor, Copilot bet on the ecosystem, and Claude Code bet on the terminal-native agent. Each is genuinely excellent at the thing it optimized for, and genuinely compromised at the things it didn't.&lt;/p&gt;

&lt;p&gt;So don't ask "which is best." Ask "best at what, for whom, doing which task" — and then build the judgment to switch fluently between them. That judgment, not the tool, is what compounds over a career. Try each one on a real feature, not a demo, and you'll feel the differences fast.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by the team at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — practical, hands-on AI engineering courses for developers and professionals across Eastern Europe, from agentic coding and AI agents to context engineering and the modern AI-native IDE workflow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://cursor.com/docs/models-and-pricing" rel="noopener noreferrer"&gt;Cursor Models &amp;amp; Pricing&lt;/a&gt; · &lt;a href="https://github.com/features/copilot/plans" rel="noopener noreferrer"&gt;GitHub Copilot Plans &amp;amp; Pricing&lt;/a&gt; · &lt;a href="https://docs.github.com/en/copilot/get-started/plans" rel="noopener noreferrer"&gt;GitHub Copilot Plans (Docs)&lt;/a&gt; · &lt;a href="https://claude.com/pricing" rel="noopener noreferrer"&gt;Claude Pricing&lt;/a&gt; · &lt;a href="https://platform.claude.com/docs/en/about-claude/pricing" rel="noopener noreferrer"&gt;Claude Platform Docs — Pricing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Stop Vibe-Checking Your LLM: A Developer's Guide to Evals</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 22 Jun 2026 08:24:22 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/stop-vibe-checking-your-llm-a-developers-guide-to-evals-3oed</link>
      <guid>https://dev.to/cursuri-ai/stop-vibe-checking-your-llm-a-developers-guide-to-evals-3oed</guid>
      <description>&lt;p&gt;You tweaked the system prompt, ran the same two test questions you always run, the answers looked good, and you shipped. A week later support is forwarding you screenshots of the model confidently doing the exact thing your prompt was supposed to stop. You never saw it, because "did it get better?" was answered by vibes.&lt;/p&gt;

&lt;p&gt;This is the single most common failure mode in shipping LLM features, and it has nothing to do with which model you picked. &lt;strong&gt;If your only quality gate is reading a handful of outputs and nodding, every change you make is a coin flip.&lt;/strong&gt; You can't tell whether a prompt edit helped, hurt, or just moved the failures somewhere you didn't look. Evals are how you replace the nod with a number.&lt;/p&gt;

&lt;p&gt;This is a practical guide to building that number — from a 30-row eval set you can write this afternoon, through code-based checks and LLM-as-judge scoring, to wiring the whole thing into CI so regressions get blocked instead of discovered by users. No new framework to adopt; just the discipline that separates a demo from a system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you can't just &lt;code&gt;assert output == expected&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Traditional tests work because the output space is small and exact. &lt;code&gt;add(2, 2)&lt;/code&gt; is &lt;code&gt;4&lt;/code&gt; or it's a bug. LLM output breaks all three assumptions that make &lt;code&gt;assertEqual&lt;/code&gt; work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's non-deterministic.&lt;/strong&gt; The same prompt can produce different text on two calls. Even at temperature 0 you are not guaranteed byte-identical output across runs or model versions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's open-ended.&lt;/strong&gt; "Summarize this ticket" has thousands of correct answers. None of them are string-equal to your reference, and that's fine — a good summary isn't &lt;em&gt;the&lt;/em&gt; summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It fails softly.&lt;/strong&gt; A wrong answer isn't a stack trace. It's a fluent, plausible, well-formatted paragraph that happens to be incorrect. Nothing crashes. Nothing logs an error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the goal of an eval isn't "is the output identical to the expected string." It's "does the output satisfy the &lt;em&gt;properties&lt;/em&gt; I care about" — is it grounded in the provided context, does it stay on policy, does it actually answer the question, is it valid JSON. You're testing behavior against criteria, not bytes against bytes. Once that clicks, the rest is mechanics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the eval set, not the metric
&lt;/h2&gt;

&lt;p&gt;The instinct is to reach for a fancy metric first. Wrong order. The asset that makes everything else work is a small, representative &lt;strong&gt;eval set&lt;/strong&gt;: a fixed collection of inputs paired with what a good output looks like (or the criteria a good output must meet). This is your golden dataset, your regression suite, your source of truth.&lt;/p&gt;

&lt;p&gt;You do not need thousands of examples to start. &lt;strong&gt;Thirty to fifty well-chosen pairs&lt;/strong&gt; turn LLM tuning from vibes into engineering, because now every change is measured against the same fixed bar. Build the set like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mine real failures.&lt;/strong&gt; Every time the system gets something wrong in dev or prod, that exact input goes into the eval set with a note on what the right behavior is. Your bug reports &lt;em&gt;are&lt;/em&gt; your test cases. This is the highest-signal source you have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cover the categories, not just the happy path.&lt;/strong&gt; Easy questions, ambiguous ones, adversarial ones, out-of-scope ones ("I don't know" is the correct answer and you should test that it says so), and the edge cases specific to your domain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freeze it and version it.&lt;/strong&gt; The eval set lives in your repo next to the code. When you add a case, that's a commit. A moving target can't measure progress.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a holdout.&lt;/strong&gt; If you start tuning prompts &lt;em&gt;against&lt;/em&gt; the eval set, you'll overfit to it. Keep a slice you don't look at until you think you're done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal eval set is just data — JSON, a CSV, a Python list. Here's the shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/dataset.py
&lt;/span&gt;&lt;span class="n"&gt;EVAL_SET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund-window-basic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is our refund window?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds are accepted within 14 days of purchase.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_not_say&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no refunds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;out-of-scope&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Cluj tomorrow?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds are accepted within 14 days of purchase.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# correct behavior: decline, don't invent
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 30-50 of these, grown from real failures
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the foundation. Everything below scores outputs against this set.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two halves of every LLM eval
&lt;/h2&gt;

&lt;p&gt;Separate two questions that get mushed together when you eval by eyeball, because they have different fixes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Did the system retrieve / set up the right context?&lt;/strong&gt; (a retrieval or pipeline question)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Given that context, did the model produce a good answer?&lt;/strong&gt; (a generation question)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building RAG, the first half is its own discipline — measuring recall@k and precision@k on questions with known relevant documents tells you whether the right chunk even reached the prompt. That's a deep enough topic that it deserves its own treatment; a dedicated &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;course on RAG and retrieval-augmented generation&lt;/a&gt; spends real time there, and the failure modes are different from the ones below. This guide focuses on the second half: scoring the generated answer. The techniques split into two families — &lt;strong&gt;code-based checks&lt;/strong&gt; and &lt;strong&gt;model-based judges&lt;/strong&gt; — and you want both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code-based checks: cheaper and more reliable than you think
&lt;/h2&gt;

&lt;p&gt;Before you reach for an LLM to grade an LLM, a surprising amount of quality is checkable with plain code. These checks are deterministic, free, instant, and never hallucinate. Use them for everything they can cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural validity.&lt;/strong&gt; If the output should be JSON matching a schema, validate it. A response that doesn't parse is a hard failure, no judgment call needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Must-contain / must-not-contain.&lt;/strong&gt; The answer about a 14-day refund window must contain "14" and must not contain "30." Keyword and regex assertions catch a whole class of factual regressions for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format and bounds.&lt;/strong&gt; Length limits, required citations present, no leaked system-prompt text, no forbidden phrases (the "as an AI language model" tax), valid enum values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic similarity.&lt;/strong&gt; For open-ended answers, embed the output and your reference answer and check cosine similarity passes a threshold. It's fuzzy, but it catches "the answer wandered off topic" without needing a judge model.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/checks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_structural&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;schema_keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_must_not_say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;banned&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;banned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule of thumb: &lt;strong&gt;anything a regex or a schema can catch, don't pay a model to catch.&lt;/strong&gt; Reserve the expensive, fuzzy judge for the genuinely subjective stuff.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM-as-judge: powerful, biased, and fixable
&lt;/h2&gt;

&lt;p&gt;For the subjective half — "is this answer faithful to the source?", "is this helpful?", "is the tone right?" — you use a strong model to grade outputs. This is &lt;strong&gt;LLM-as-judge&lt;/strong&gt;, and it scales human-quality judgment to thousands of examples for the price of an API call. Two metrics carry most of the weight for RAG-style apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness / groundedness&lt;/strong&gt; — does every claim in the answer trace back to the provided context, or did the model invent things? This is your hallucination detector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer relevance&lt;/strong&gt; — does the response actually address the question that was asked, or is it a fluent dodge?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The catch: &lt;strong&gt;LLM judges have well-documented biases&lt;/strong&gt;, and if you ignore them your eval numbers are noise dressed up as signal. The big ones, all reported in the research on using models as evaluators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Position bias&lt;/strong&gt; — when comparing two answers, judges favor the one shown first (or in a fixed slot) regardless of quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbosity bias&lt;/strong&gt; — judges tend to rate longer, more elaborate answers higher even when a short answer is more correct.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-preference&lt;/strong&gt; — a judge model can favor text written in its own style or by its own family.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't abandon the technique; you engineer around the bias:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Score against a rubric, not a vibe.&lt;/strong&gt; Ask for a 1–5 score with explicit criteria for each level, and require the judge to output its reasoning &lt;em&gt;before&lt;/em&gt; the score. A judge forced to justify itself is more consistent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For pairwise comparisons, randomize and swap.&lt;/strong&gt; Run each comparison twice with the order flipped; only count it as a win if the judge picks the same answer both times. This cancels position bias directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calibrate against humans.&lt;/strong&gt; Hand-label 20–30 examples yourself, run the judge on them, and check it agrees with you. If it doesn't, fix the rubric before trusting it on 2,000. An uncalibrated judge is a random number generator with good grammar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a strong model as the judge.&lt;/strong&gt; Grading is harder than answering. Use a current frontier model for the judge even if your app runs on a smaller, cheaper one.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/judge.py — sketch of a rubric-based faithfulness judge
&lt;/span&gt;&lt;span class="n"&gt;JUDGE_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are grading whether an ANSWER is fully supported by the CONTEXT.

CONTEXT:
{context}

ANSWER:
{answer}

Rules:
- A claim is &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supported&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; only if the CONTEXT states or directly implies it.
- Outside knowledge does NOT count as support.

First write one sentence of reasoning. Then output a JSON object:
{{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: true|false}}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;judge_faithfulness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;JUDGE_PROMPT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Designing judges that hold up — picking the rubric, calibrating, knowing when a model is the wrong tool for the grade — is exactly the muscle a &lt;a href="https://cursuri-ai.ro/courses/ai-evals-llm-productie" rel="noopener noreferrer"&gt;course on AI evals in production&lt;/a&gt; builds, because it's the difference between "the new prompt feels better" and "faithfulness went from 0.78 to 0.91 on the holdout."&lt;/p&gt;

&lt;h2&gt;
  
  
  Wire it into CI, or it won't survive contact with deadlines
&lt;/h2&gt;

&lt;p&gt;An eval you run by hand when you remember to is an eval you'll stop running the week things get busy. The whole point is to make regressions &lt;em&gt;impossible to ship silently&lt;/em&gt;, and that means the eval runs automatically on every change to a prompt, a retrieval setting, or a model version.&lt;/p&gt;

&lt;p&gt;The pattern is a regression gate: run the eval set, compute the aggregate score, and &lt;strong&gt;fail the build if the score drops below a threshold&lt;/strong&gt; (or below the last known-good baseline). It looks like an ordinary test suite, because that's what it is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/test_evals.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;evals.dataset&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EVAL_SET&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;evals.checks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;check_must_not_say&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;answer_question&lt;/span&gt;

&lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.90&lt;/span&gt;  &lt;span class="c1"&gt;# 90% of eval cases must pass to ship
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;answer_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;i don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;check_must_not_say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_not_say&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_eval_suite_meets_threshold&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;run_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;EVAL_SET&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;failed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EVAL_SET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Eval score &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; below &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few practical notes that keep this sane in CI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pin the model version.&lt;/strong&gt; Provider model IDs update, and an unpinned model means your eval baseline shifts under you for reasons unrelated to your code. Pin it, and treat a model upgrade as its own deliberate eval run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget for cost and flakiness.&lt;/strong&gt; LLM calls cost money and occasionally time out. Cache where you can, run the judge-heavy suite on a schedule rather than every commit if needed, and set a slightly forgiving threshold so one stochastic blip doesn't red-X a good PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log the failures, not just the score.&lt;/strong&gt; When the gate trips, the output should name &lt;em&gt;which&lt;/em&gt; cases regressed so the fix is obvious. A bare "0.86 &amp;lt; 0.90" sends you debugging blind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now a prompt change is a PR with a number attached. The reviewer sees faithfulness went up and refusal rate held steady, or they see it tanked and the build is red. That's the entire difference between hoping and knowing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five mistakes that quietly poison your evals
&lt;/h2&gt;

&lt;p&gt;Even teams that build evals often undermine them. Watch for these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Testing only the happy path.&lt;/strong&gt; If every case in your set is a question the system already answers well, your score is a flattering lie. Adversarial and out-of-scope cases are where the signal is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuning on your test set.&lt;/strong&gt; Optimize prompts against the same examples you grade on and you'll overfit to them. Keep a holdout you don't peek at.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An uncalibrated judge.&lt;/strong&gt; Trusting an LLM judge you never checked against your own labels is trusting a number you made up. Calibrate first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One giant blended score.&lt;/strong&gt; A single average hides that faithfulness improved while refusals broke. Track metrics &lt;em&gt;separately&lt;/em&gt; so a regression in one can't be masked by a gain in another.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letting the set rot.&lt;/strong&gt; Your product changes; cases that no longer reflect real usage drag the signal down. Prune and grow the set as part of normal work, the same way you maintain any test suite.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these are exotic. They're the eval equivalent of not testing error paths — obvious in hindsight, easy to skip under deadline.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this connects to the rest of your LLM stack
&lt;/h2&gt;

&lt;p&gt;Evals aren't a standalone chore; they're the measurement layer that makes every other improvement legible. When you tighten a prompt, evals tell you if it worked — which is why &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;structured prompt engineering&lt;/a&gt; and a real eval loop are two halves of the same skill. When you redesign what goes into the context window — what to include, what to cut, how to order it — evals are how you know the redesign helped rather than just &lt;em&gt;felt&lt;/em&gt; cleaner; that discipline of deciding what earns a place in the prompt is increasingly called context engineering and has &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;its own dedicated course&lt;/a&gt;. And when you wire up function calling, multi-tool orchestration, and the production concerns of a real integration, evals are what keep the whole pipeline honest as it grows — the kind of end-to-end build covered in a deeper &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;course on advanced LLM integration&lt;/a&gt;. The pattern is always the same: build the measurement first, then every change becomes verifiable instead of hopeful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The teams whose LLM features actually hold up in production aren't using a secret model or a magic prompt. They're disciplined about measurement. They have a versioned eval set grown from real failures, code-based checks for everything a regex can catch, calibrated LLM judges for the subjective rest, and a CI gate that blocks regressions before users find them.&lt;/p&gt;

&lt;p&gt;Start smaller than you think you can. Write thirty cases this afternoon — half of them things your system currently gets &lt;em&gt;wrong&lt;/em&gt; — add three code checks and one rubric-based judge, and put a threshold in your test suite. The first time a red build stops you from shipping a prompt change that would have quietly broken refusals, you'll never go back to vibe-checking. That's the moment an LLM demo becomes an LLM system people can trust.&lt;/p&gt;

&lt;p&gt;The courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/ai-evals-llm-productie" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an AI-learning platform with hands-on, current tracks on evaluating AI systems in production, prompt engineering, RAG, and advanced LLM integration.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources &amp;amp; further reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zheng et al. — &lt;a href="https://arxiv.org/abs/2306.05685" rel="noopener noreferrer"&gt;Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena&lt;/a&gt; (documents position, verbosity, and self-enhancement bias in LLM judges)&lt;/li&gt;
&lt;li&gt;Liu et al. — &lt;a href="https://arxiv.org/abs/2303.16634" rel="noopener noreferrer"&gt;G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Liang et al. — &lt;a href="https://arxiv.org/abs/2211.09110" rel="noopener noreferrer"&gt;Holistic Evaluation of Language Models (HELM)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This article is educational content. Techniques and tooling evolve quickly; validate approaches against your own data and current library documentation.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>Claude Fable 5: A Developer's Guide to Anthropic's New Top</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:22:18 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</link>
      <guid>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</guid>
      <description>&lt;p&gt;Anthropic just moved the ceiling again. &lt;strong&gt;Claude Fable 5&lt;/strong&gt; is the company's most powerful, most intelligent model to date — and it isn't "Opus 4.9." It's a &lt;strong&gt;new tier that sits above the entire Opus family&lt;/strong&gt;. If you build with LLMs, that distinction matters: it changes how you think about model routing, cost, and which tasks deserve your most capable (and most expensive) reasoning.&lt;/p&gt;

&lt;p&gt;This is a practical, no-hype guide for developers. We'll cover what Claude Fable 5 actually is, how it slots into Anthropic's 2026 lineup, what changes in the API surface, when the premium is justified, and how to migrate existing code. Everything here is grounded in Anthropic's own model and API documentation — no invented benchmarks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Claude Fable 5?
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 is Anthropic's flagship reasoning model, exposed through the API as &lt;code&gt;claude-fable-5&lt;/code&gt;. The headline facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A new tier above Opus.&lt;/strong&gt; Until now, "Opus" was the top of the Claude lineup. Fable 5 establishes a level above it — positioned for the hardest reasoning, planning, and long-horizon agentic work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1M-token context window&lt;/strong&gt;, with up to &lt;strong&gt;128K tokens of output&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium pricing&lt;/strong&gt;: roughly &lt;strong&gt;$10 / $50 per million input / output tokens&lt;/strong&gt; — about double Opus 4.8's $5 / $25. That price tag is the whole point: Fable 5 is a precision tool you point at the problems that justify it, not a default for every call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive thinking only.&lt;/strong&gt; The fixed "thinking budget" knob is gone. The model decides how much to reason per request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental model to internalize: &lt;strong&gt;Fable 5 is the peak of a four-tier lineup, and capability scales with cost.&lt;/strong&gt; You don't run your whole pipeline on it any more than you'd render every frame of a film at maximum quality regardless of the shot. You route the hard parts to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Fable 5 Fits in the 2026 Anthropic Lineup
&lt;/h2&gt;

&lt;p&gt;Anthropic's current family is a ladder of capability-vs-cost. Picking the right rung per task is one of the highest-leverage habits an AI engineer can build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Reach for it when…&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Absolute peak capability; premium price&lt;/td&gt;
&lt;td&gt;The hardest reasoning, planning, cross-cutting refactors, and long-running agent loops where correctness outweighs cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top of the Opus family; a strong default in Claude Code&lt;/td&gt;
&lt;td&gt;Complex day-to-day work — planning, large refactors, tricky debugging — with a better capability/cost ratio than Fable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balanced, fast, 1M context&lt;/td&gt;
&lt;td&gt;The bulk of everyday coding, reading, and iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Haiku 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Light, fast, cheap&lt;/td&gt;
&lt;td&gt;High-volume small operations, classification, auxiliary steps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;model choice is a cost-and-quality lever.&lt;/strong&gt; A well-designed system routes each sub-task to the cheapest model that can do it well, and escalates to Fable 5 only where the payoff is real. If you want a structured, side-by-side breakdown of the 2026 models and how to choose between them, there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI model comparison course&lt;/a&gt; that goes deeper than any single table can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes in the API
&lt;/h2&gt;

&lt;p&gt;This is the part developers actually care about. Fable 5 shares the modern Claude request surface (the same one introduced with Opus 4.7/4.8), with a couple of sharp edges worth knowing before you ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adaptive thinking, not a token budget
&lt;/h3&gt;

&lt;p&gt;Fable 5 supports a single thinking mode: &lt;strong&gt;adaptive&lt;/strong&gt;. You no longer pass a fixed &lt;code&gt;budget_tokens&lt;/code&gt; value — the model regulates its own reasoning depth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;        &lt;span class="c1"&gt;# adaptive is the only thinking mode
&lt;/span&gt;    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;# strong default for coding/agentic work
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this module and add unit tests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things that will save you a debugging session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, or &lt;code&gt;budget_tokens&lt;/code&gt;.&lt;/strong&gt; They're removed on this generation and return &lt;code&gt;400&lt;/code&gt;. Steer behavior with prompting and the effort parameter instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt; on Fable 5.&lt;/strong&gt; Unlike Opus 4.8/4.7, an explicit &lt;code&gt;disabled&lt;/code&gt; returns &lt;code&gt;400&lt;/code&gt; here. To run without thinking, &lt;strong&gt;omit the &lt;code&gt;thinking&lt;/code&gt; parameter entirely&lt;/strong&gt;. This is the one genuinely new breaking change relative to the Opus 4.x line — easy to miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking text is omitted by default.&lt;/strong&gt; Thinking blocks still stream, but their content is empty unless you opt in with &lt;code&gt;thinking={"type": "adaptive", "display": "summarized"}&lt;/code&gt;. If your UI shows reasoning progress, set this or your users will see a long pause before output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The effort parameter is your real control knob
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;output_config.effort&lt;/code&gt; accepts &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;, and &lt;code&gt;max&lt;/code&gt;. It controls how much the model thinks &lt;em&gt;and&lt;/em&gt; acts — not just thinking depth. For coding and agentic workloads, &lt;strong&gt;&lt;code&gt;xhigh&lt;/code&gt; is the sweet spot&lt;/strong&gt; and is the effort level Claude Code defaults to. Treat effort as something to tune per route: &lt;code&gt;max&lt;/code&gt; for correctness-critical work, &lt;code&gt;medium&lt;/code&gt;/&lt;code&gt;low&lt;/code&gt; for latency-sensitive or simple steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large outputs need streaming
&lt;/h3&gt;

&lt;p&gt;With up to 128K output tokens available, non-streaming requests will hit SDK HTTP timeouts well before that ceiling. For anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;, stream and collect the final message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate the full migration plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_final_message&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What it still supports
&lt;/h3&gt;

&lt;p&gt;Fable 5 keeps the modern toolbox: &lt;strong&gt;structured outputs&lt;/strong&gt; (&lt;code&gt;output_config.format&lt;/code&gt;), &lt;strong&gt;prompt caching&lt;/strong&gt; (minimum cacheable prefix ~2,048 tokens), &lt;strong&gt;server-side compaction&lt;/strong&gt; for very long conversations, &lt;strong&gt;web search with dynamic filtering&lt;/strong&gt;, and &lt;strong&gt;task budgets&lt;/strong&gt; (beta) for telling an agent how many tokens it has for a full loop. If you're wiring these into a real application, the patterns matter as much as the model — that's the focus of this hands-on course on &lt;a href="https://cursuri-ai.ro/courses/construire-aplicatii-ai-python-sdk" rel="noopener noreferrer"&gt;building AI apps with the Anthropic and OpenAI SDKs&lt;/a&gt;, which walks from raw API calls to a production-shaped product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fable 5 for Agentic Coding
&lt;/h2&gt;

&lt;p&gt;The reason Fable 5 is interesting to developers specifically is long-horizon agentic execution: multi-file refactors, overnight runs, and tasks that span dozens of tool calls without a human correcting course.&lt;/p&gt;

&lt;p&gt;Three habits get the most out of it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Give the full task spec up front in one well-formed turn.&lt;/strong&gt; Fable 5 plans better when it has the complete goal early; drip-feeding requirements across many turns tends to cost more tokens and sometimes performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run at high or &lt;code&gt;xhigh&lt;/code&gt; effort with generous &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/strong&gt; Long-horizon coherence comes partly from the model reasoning more at each step — give it room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route deliberately.&lt;/strong&gt; Use Fable 5 for the planning and the genuinely hard edits; delegate mechanical or high-volume sub-steps to Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If terminal-first agentic coding is your world, the workflow discipline — &lt;code&gt;CLAUDE.md&lt;/code&gt; project memory, plan/edit/review loops, hooks as deterministic guardrails, and model routing across the lineup — is exactly what a dedicated &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code mastery course&lt;/a&gt; covers end to end. Agent architecture beyond a single tool (orchestration, delegation, parallelism) is its own discipline, well covered in this &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;course on designing autonomous AI agents&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context is a resource, even at 1M tokens
&lt;/h3&gt;

&lt;p&gt;A 1M-token window is not a license to dump everything into context. Irrelevant context dilutes the model's attention and costs tokens on every turn, no matter how capable the model is. The skill that separates engineers who "get lucky" with agents from those who ship reliable ones is deliberate &lt;strong&gt;context engineering&lt;/strong&gt; — what to load, what to compact, what to persist as memory across sessions. It's enough of a topic to warrant &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;its own course on context engineering and memory for agents&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Fable 5 Is Actually Worth the Premium
&lt;/h2&gt;

&lt;p&gt;Here's the honest cost reasoning, because "use the best model" is bad engineering advice.&lt;/p&gt;

&lt;p&gt;At roughly &lt;strong&gt;double the per-token cost of Opus 4.8&lt;/strong&gt;, Fable 5 pays off when the &lt;em&gt;cost of a wrong answer&lt;/em&gt; is high relative to the token bill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Worth it:&lt;/strong&gt; a complex cross-service refactor where a subtle regression costs hours of human review; a planning step that determines the trajectory of a long agent run; an analysis where correctness is non-negotiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not worth it:&lt;/strong&gt; routine edits, summaries, classifications, and the long tail of mechanical sub-tasks — those belong on Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful rule of thumb: let &lt;strong&gt;Fable 5 plan and decide&lt;/strong&gt;, and let cheaper models &lt;strong&gt;execute&lt;/strong&gt; the parts that are already well-specified. That keeps your bill proportional to difficulty instead of flat-out maximal.&lt;/p&gt;

&lt;p&gt;The other lever is effort. Because effort matters more on this generation than on any prior Opus, a Fable 5 call at &lt;code&gt;medium&lt;/code&gt; effort can be both cheaper and faster than an Opus 4.8 call at &lt;code&gt;xhigh&lt;/code&gt; for some tasks — so benchmark on your own workload rather than assuming "bigger model = always slower and pricier in practice."&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from Opus 4.8 / 4.7
&lt;/h2&gt;

&lt;p&gt;If you're already on the modern Claude surface, moving to Fable 5 is mostly a model-ID swap plus a couple of checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Swap the model string&lt;/strong&gt; to &lt;code&gt;claude-fable-5&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove &lt;code&gt;budget_tokens&lt;/code&gt;&lt;/strong&gt; if any remain → use &lt;code&gt;thinking={"type": "adaptive"}&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strip &lt;code&gt;temperature&lt;/code&gt; / &lt;code&gt;top_p&lt;/code&gt; / &lt;code&gt;top_k&lt;/code&gt;&lt;/strong&gt; — they &lt;code&gt;400&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replace last-assistant-turn prefills&lt;/strong&gt; with structured outputs (&lt;code&gt;output_config.format&lt;/code&gt;) or a system-prompt instruction — prefills &lt;code&gt;400&lt;/code&gt; on this generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit for &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt;&lt;/strong&gt; — it &lt;code&gt;400&lt;/code&gt;s on Fable 5. Omit &lt;code&gt;thinking&lt;/code&gt; instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-tune &lt;code&gt;effort&lt;/code&gt; per route&lt;/strong&gt; — start at &lt;code&gt;high&lt;/code&gt;, use &lt;code&gt;xhigh&lt;/code&gt; for coding/agentic, reserve &lt;code&gt;max&lt;/code&gt; for correctness-critical work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;display: "summarized"&lt;/code&gt;&lt;/strong&gt; if you surface reasoning in a UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Steering this generation is done through prompting and effort rather than sampling parameters, so the quality of your instructions matters more than ever. If your prompts were tuned years ago for older models, they're probably leaving capability on the table — a structured refresh of &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;prompt engineering fundamentals&lt;/a&gt; tends to pay for itself quickly on a model this capable.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on Hype vs. Reality
&lt;/h2&gt;

&lt;p&gt;Two guardrails worth keeping as the launch noise settles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fable 5 is the most capable model — not necessarily the default everywhere.&lt;/strong&gt; In Claude Code, for instance, Opus 4.8 remains a strong default; Fable 5 is the tier you select for the hardest work. "Most capable" and "default" are different claims.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version hygiene matters.&lt;/strong&gt; Fable 5 is the current peak, Opus 4.8 is the top of the Opus family, and Opus 4.7 is the previous Opus generation. Anything from the Claude 3.x line (or GPT-4-class / Gemini 2.x models) is outdated and shouldn't be treated as current when you're evaluating tutorials or benchmarks. Always confirm model IDs, limits, and pricing against the official docs, since they shift between releases.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  TL;DR Cheat Sheet
&lt;/h2&gt;

&lt;p&gt;For quick reference when you wire Claude Fable 5 into a real codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model ID:&lt;/strong&gt; &lt;code&gt;claude-fable-5&lt;/code&gt;. Context window 1M tokens, output up to 128K.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking:&lt;/strong&gt; &lt;code&gt;{"type": "adaptive"}&lt;/code&gt; is the only mode. To run without it, &lt;strong&gt;omit the parameter&lt;/strong&gt; — never send &lt;code&gt;{"type": "disabled"}&lt;/code&gt; (it returns &lt;code&gt;400&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effort:&lt;/strong&gt; &lt;code&gt;output_config.effort&lt;/code&gt; is your main control — &lt;code&gt;xhigh&lt;/code&gt; for coding and agents, &lt;code&gt;max&lt;/code&gt; when correctness is critical, &lt;code&gt;low&lt;/code&gt;/&lt;code&gt;medium&lt;/code&gt; for simple or latency-sensitive steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Removed (all &lt;code&gt;400&lt;/code&gt; if sent):&lt;/strong&gt; &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, &lt;code&gt;budget_tokens&lt;/code&gt;, and last-assistant-turn prefills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning in your UI:&lt;/strong&gt; add &lt;code&gt;"display": "summarized"&lt;/code&gt; to the thinking config, or the thinking text comes back empty.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large outputs:&lt;/strong&gt; stream anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing:&lt;/strong&gt; send the hard reasoning to Fable 5; keep routine and high-volume work on Sonnet 4.6 and Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt; isn't just a bigger Opus — it's a new top tier that reframes how you should think about model routing in 2026. The winning pattern is the same as it's always been, just sharper: use the most capable model where correctness compounds, push everything else down the ladder to cheaper models, and tune effort per route. Master that, and Fable 5 becomes a precision instrument rather than a line item that surprises you on the invoice.&lt;/p&gt;

&lt;p&gt;If you want to go from "I read about it" to "I ship with it," the courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, a Romanian AI-learning platform with deep, hands-on tracks on Claude Code, agent architecture, the Anthropic SDK, context engineering, and model selection — all kept current with the 2026 lineup, Fable 5 included.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? Save it, and drop your Fable 5 routing strategy in the comments — what are you sending to the top tier, and what stays on Sonnet?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Prompt Caching with Claude: How We Cut AI API Costs by 90% in Production (2026 Guide)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 01 Jun 2026 09:02:05 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</link>
      <guid>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Anthropic's prompt caching gives you a &lt;strong&gt;90% discount&lt;/strong&gt; on cached input tokens and up to &lt;strong&gt;85% lower latency&lt;/strong&gt; on long-context calls. But the wins only show up if you understand cache breakpoints, TTLs, and what actually invalidates the cache. This guide walks through 5 production patterns we use, real benchmarks, and the pitfalls that silently kill your hit rate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost problem nobody warns you about
&lt;/h2&gt;

&lt;p&gt;When you ship anything serious with Claude — an agent, a RAG system, a code assistant, a customer support bot — you discover the same uncomfortable truth: &lt;strong&gt;your input token bill dwarfs your output bill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A typical agent loop looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt: ~3,000 tokens (instructions, persona, constraints)&lt;/li&gt;
&lt;li&gt;Tool definitions: ~4,000 tokens (JSON schemas for 10–20 tools)&lt;/li&gt;
&lt;li&gt;Conversation history: 5,000–50,000 tokens (grows every turn)&lt;/li&gt;
&lt;li&gt;RAG context: 5,000–20,000 tokens per query&lt;/li&gt;
&lt;li&gt;User message: ~200 tokens&lt;/li&gt;
&lt;li&gt;Model output: ~500 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every single turn, you re-send the same system prompt, the same tool definitions, and most of the conversation history. On Claude Sonnet 4.6 at $3 per million input tokens, a 15,000-token prefix sent across 20 conversation turns costs you &lt;strong&gt;$0.90 per conversation in input alone&lt;/strong&gt; — before you've generated a single useful token of output.&lt;/p&gt;

&lt;p&gt;Multiply that by 10,000 daily active users and you're burning &lt;strong&gt;$9,000/day&lt;/strong&gt; just to re-tokenize content you already sent.&lt;/p&gt;

&lt;p&gt;This is exactly what prompt caching fixes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude's prompt caching actually does
&lt;/h2&gt;

&lt;p&gt;Anthropic's prompt caching lets the API store the internal state for a prefix of your prompt and reuse it on subsequent requests. Two numbers matter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Pricing relative to base input&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache write&lt;/strong&gt; (first time a prefix is seen)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1.25×&lt;/strong&gt; base input cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache read&lt;/strong&gt; (subsequent hits)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.10×&lt;/strong&gt; base input cost (90% off)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You pay a small one-time premium to write the cache, then every hit after that is 10% of the normal price. The break-even point is &lt;strong&gt;after the second request&lt;/strong&gt; — anything more than one read and you're saving money.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mental model
&lt;/h3&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;prefix tree&lt;/strong&gt; with checkpoints. You mark up to 4 points in your prompt with &lt;code&gt;cache_control&lt;/code&gt;, and Claude caches everything from the start of the prompt up to each breakpoint. On the next request, if the prefix matches &lt;strong&gt;byte-for-byte&lt;/strong&gt;, you get a cache hit.&lt;/p&gt;

&lt;p&gt;The order Claude processes the prompt is fixed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools → system → messages (oldest → newest)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your cache breakpoints must respect that order. You cannot cache a later block without caching everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TTL trap
&lt;/h3&gt;

&lt;p&gt;The default cache TTL is &lt;strong&gt;5 minutes&lt;/strong&gt;, refreshed on every read. A 1-hour TTL is available as a premium option (costs more on write, same on read). Most teams over-pay for the 1-hour cache when 5 minutes would have served them fine — if your traffic is steady, every request refreshes the TTL and the cache effectively lives forever.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Want to go deeper on Claude's API mechanics in production? Prompt caching, tool use, batch API, streaming, and cost optimization are covered in depth in the &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration course on Cursuri-AI.ro&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 1: Cache the system prompt and tool definitions
&lt;/h2&gt;

&lt;p&gt;This is the highest-ROI change you can make, and most codebases get it wrong on the first try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong&lt;/strong&gt; (no caching):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Right&lt;/strong&gt; (cached):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# ... more tools ...
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache breakpoint on the last tool
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the system block&lt;/strong&gt; caches everything up through the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the last tool&lt;/strong&gt; caches everything through the tool definitions — this is critical because tools are evaluated &lt;em&gt;before&lt;/em&gt; system per the processing order above.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Wait — that's actually wrong as stated. Let me correct: because the order is &lt;code&gt;tools → system → messages&lt;/code&gt;, putting &lt;code&gt;cache_control&lt;/code&gt; on the &lt;strong&gt;last tool&lt;/strong&gt; caches just the tools, and putting it on &lt;strong&gt;system&lt;/strong&gt; caches tools + system. You typically only need the system breakpoint; it covers everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading the response
&lt;/h3&gt;

&lt;p&gt;The API returns cache stats in &lt;code&gt;response.usage&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_creation_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# tokens written to cache (1.25x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_read_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# tokens read from cache (0.10x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# uncached tokens (1x cost)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the first request: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is high, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is 0.&lt;br&gt;
On every subsequent request within 5 minutes: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is 0, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is high. That's the win condition.&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 2: Cache conversation history with rolling breakpoints
&lt;/h2&gt;

&lt;p&gt;In a multi-turn agent, the conversation grows on every turn. If you only cache the system prompt, you're still re-sending and re-billing every prior turn at full price.&lt;/p&gt;

&lt;p&gt;The trick is to add a &lt;strong&gt;second cache breakpoint&lt;/strong&gt; on the most recent assistant message, so the entire conversation up to that point is cached:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_messages_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    history: list of {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: ...}
    new_user_message: str
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Add cache breakpoint on the last historical message
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every new turn reads the entire prior conversation from cache. Cost per turn becomes nearly constant instead of growing linearly with conversation length.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4-breakpoint budget
&lt;/h3&gt;

&lt;p&gt;Claude allows up to &lt;strong&gt;4 cache breakpoints&lt;/strong&gt; per request. A common production layout uses all four:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 1&lt;/strong&gt;: end of tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 2&lt;/strong&gt;: end of system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 3&lt;/strong&gt;: end of "stable" conversation history (turns 1 through N-2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 4&lt;/strong&gt;: end of "recent" history (turn N-1)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you a layered cache: tools rarely change, system rarely changes, old history never changes, recent history is sliding. Each layer hits or misses independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Cache few-shot examples separately from the user query
&lt;/h2&gt;

&lt;p&gt;Few-shot prompting is one of the highest-leverage techniques in production LLM apps — and one of the most expensive if you don't cache. A typical few-shot block with 5–10 examples can run 8,000–15,000 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Example 1:
Input: ...
Output: ...

Example 2:
Input: ...
Output: ...

[... 8 more examples ...]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a classifier. Categorize support tickets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache the examples
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_ticket&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critical rule: &lt;strong&gt;put the variable content last&lt;/strong&gt;. Cache only works on prefix matches. If your user-specific data is in the middle of the prompt, everything after it becomes uncacheable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: RAG with cached document chunks
&lt;/h2&gt;

&lt;p&gt;RAG systems are notorious for blowing up token bills because the retrieved context is large and unique per query. You can't cache the retrieved chunks themselves (they change), but you &lt;em&gt;can&lt;/em&gt; cache the surrounding framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ~2000 tokens, stable
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For RAG with a stable knowledge base (corporate docs, product manuals, codebases), there's a more advanced pattern: &lt;strong&gt;pre-tile your documents into fixed-size cacheable blocks&lt;/strong&gt; and choose your retrieval strategy to favor returning whole blocks rather than slices. You trade some retrieval precision for massive cost savings on hot documents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you build RAG systems for production, the &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation) course on Cursuri-AI.ro&lt;/a&gt; covers caching strategies, retrieval optimization, hybrid search, and eval pipelines end-to-end.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 5: Cache tool results in long-running agents
&lt;/h2&gt;

&lt;p&gt;Agent loops are caching's sweet spot. An agent runs &lt;code&gt;tool_call → tool_result → tool_call → tool_result&lt;/code&gt; cycles, and each iteration the prompt grows by the new tool result. Without caching, you re-bill the entire history every iteration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add cache breakpoint to the latest message
&lt;/span&gt;        &lt;span class="n"&gt;cached_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}],&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

        &lt;span class="c1"&gt;# Append assistant turn + tool results, loop
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a 15-step agent run with a 4,000-token system prompt and 8,000-token tools, this pattern cuts input cost by &lt;strong&gt;~80–88%&lt;/strong&gt; versus uncached.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent loops, tool design, multi-step planning and cost modeling are the focus of the &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation course on Cursuri-AI.ro&lt;/a&gt; — built around the same Claude Agent SDK patterns shown here.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Real benchmarks: before vs after
&lt;/h2&gt;

&lt;p&gt;These numbers are from a production code-review agent running on Claude Sonnet 4.6, averaged over 1,000 conversations of 12 turns each.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Uncached&lt;/th&gt;
&lt;th&gt;Cached&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg input tokens per turn&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avg billed input cost per turn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0552&lt;/td&gt;
&lt;td&gt;$0.0061&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−89%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg time-to-first-token&lt;/td&gt;
&lt;td&gt;1,840 ms&lt;/td&gt;
&lt;td&gt;380 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg total cost per 12-turn conversation&lt;/td&gt;
&lt;td&gt;$0.66&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate (warm)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;96.3%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency win surprised us as much as the cost win. Cache reads skip the prompt processing phase entirely, which dominates time-to-first-token for long contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pitfalls that silently kill your hit rate
&lt;/h2&gt;

&lt;p&gt;These are mistakes we've made or seen in production code reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Whitespace and formatting drift
&lt;/h3&gt;

&lt;p&gt;Cache hits require &lt;strong&gt;byte-exact prefix matches&lt;/strong&gt;. If your system prompt is built with f-strings and you add a timestamp, conditional newline, or trailing space, you invalidate the cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BREAKS the cache every minute
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Current time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Works
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Pass time as a separate user message field if needed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Audit your prompts for hidden variability: locale-formatted numbers, dict iteration order in older Pythons, tool definitions where field order changes between deploys.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Reordering tool definitions
&lt;/h3&gt;

&lt;p&gt;If you generate tool schemas from a dict and the dict iteration order changes between runs, your cache evaporates. &lt;strong&gt;Always sort tool definitions&lt;/strong&gt; before sending:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_tools&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Wrong breakpoint placement
&lt;/h3&gt;

&lt;p&gt;Breakpoints must come &lt;strong&gt;after&lt;/strong&gt; the content you want to cache, not before. The breakpoint marks "cache everything up to here." Putting it on the user message instead of the system prompt is a common rookie mistake.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Caching tiny prefixes
&lt;/h3&gt;

&lt;p&gt;There's a minimum cacheable size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet &amp;amp; Opus&lt;/strong&gt;: 1,024 tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt;: 2,048 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below the minimum, the &lt;code&gt;cache_control&lt;/code&gt; is silently ignored — the API doesn't error, it just doesn't cache. Always check &lt;code&gt;response.usage.cache_creation_input_tokens &amp;gt; 0&lt;/code&gt; on your first request to confirm the cache actually wrote.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ignoring the 5-minute TTL on bursty traffic
&lt;/h3&gt;

&lt;p&gt;If your traffic is bursty — heavy during business hours, dead overnight — the 5-minute cache will expire between sessions and you'll pay the write premium every time. For bursty patterns, either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the 1-hour TTL (more expensive write, same read price)&lt;/li&gt;
&lt;li&gt;Or send a small "keep-alive" request every 4 minutes during expected idle windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Mixing cached and uncached models
&lt;/h3&gt;

&lt;p&gt;Cache is &lt;strong&gt;model-specific&lt;/strong&gt;. If your code falls back from Sonnet 4.6 to Haiku 4.5 on rate limit, the Haiku call has no cache history. Either keep fallback paths uncached, or build separate caches per model.&lt;/p&gt;




&lt;h2&gt;
  
  
  When NOT to use prompt caching
&lt;/h2&gt;

&lt;p&gt;Caching has overhead. Skip it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-shot calls with no shared prefix&lt;/strong&gt; — single-request classification, one-off summarization. The 1.25× write premium is pure loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-variability prompts&lt;/strong&gt; — if each request has different boilerplate, you're paying write premium for nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts below the minimum&lt;/strong&gt; — short prompts can't be cached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost is already negligible&lt;/strong&gt; — if you spend $20/month on the API, the engineering time to optimize caching costs more than the savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful heuristic: &lt;strong&gt;if your stable prefix is ≥2,000 tokens AND you make ≥3 requests per 5-minute window with that prefix, cache it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting it together: a production checklist
&lt;/h2&gt;

&lt;p&gt;Before you ship a Claude integration in 2026, run this list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] System prompt has &lt;code&gt;cache_control&lt;/code&gt; set&lt;/li&gt;
&lt;li&gt;[ ] Tool definitions are sorted and stable&lt;/li&gt;
&lt;li&gt;[ ] User-variable content is at the end of the prompt, not in the middle&lt;/li&gt;
&lt;li&gt;[ ] Cache stats (&lt;code&gt;cache_read_input_tokens&lt;/code&gt;) are logged and dashboarded&lt;/li&gt;
&lt;li&gt;[ ] Cache hit rate is monitored — alert if it drops below 80%&lt;/li&gt;
&lt;li&gt;[ ] No timestamps, request IDs, or random data injected into cached blocks&lt;/li&gt;
&lt;li&gt;[ ] First-request cache write is verified in tests&lt;/li&gt;
&lt;li&gt;[ ] Fallback model paths handle cache absence cleanly&lt;/li&gt;
&lt;li&gt;[ ] 5-minute vs 1-hour TTL choice is documented with reasoning&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Prompt caching is the single highest-leverage cost optimization for Claude in production. The mechanics are simple, but the gotchas — formatting drift, reorder bugs, minimum sizes, TTL mismatches — are where teams leave money on the table.&lt;/p&gt;

&lt;p&gt;If you treat caching as a first-class concern from day one, you ship AI features that are 5–10× cheaper to operate than the naive implementation. If you bolt it on later, you spend weeks chasing cache misses through your logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to go deeper
&lt;/h3&gt;

&lt;p&gt;I write about production AI engineering — Claude API, multi-agent systems, RAG, cost optimization — on &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an interactive learning platform with an always-available AI tutor that walks you through every concept and reviews your code. The four courses most relevant to what's in this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration&lt;/a&gt;&lt;/strong&gt; — Claude API in production: prompt caching, tool use, batch API, streaming, error handling, retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;&lt;/strong&gt; — structured prompting, few-shot patterns, evaluation, prompt versioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation&lt;/a&gt;&lt;/strong&gt; — agent loops, tool design, multi-agent orchestration, cost modeling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt;&lt;/strong&gt; — retrieval, embeddings, hybrid search, caching, eval pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Course content is delivered in Romanian (the platform's primary audience), but the code, frameworks, and patterns are language-agnostic — the IT Pro track is built specifically for engineers shipping AI in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's your cache hit rate in production?&lt;/strong&gt; Drop a comment with your setup — I'm collecting patterns for a follow-up post on &lt;strong&gt;caching at the multi-tenant scale&lt;/strong&gt; (per-customer cache namespaces, cache warm-up strategies, and the cost model when you have 10,000+ concurrent users).&lt;/p&gt;

&lt;p&gt;If this helped, a ❤️ or a 🦄 keeps it visible for other devs hitting the same cost wall. Follow for more deep-dives on Claude in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic's official prompt caching docs: &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;docs.anthropic.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude API pricing: &lt;a href="https://www.anthropic.com/pricing" rel="noopener noreferrer"&gt;anthropic.com/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full IT Pro AI engineering catalog: &lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro/courses&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>AI for Influencers in 2026: How to Build a Content Engine That Runs Itself</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 19 May 2026 13:34:41 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</link>
      <guid>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</guid>
      <description>&lt;p&gt;The influencer economy is no longer about who posts the most. It's about who has built the smartest &lt;strong&gt;AI content system&lt;/strong&gt; behind the scenes.&lt;/p&gt;

&lt;p&gt;In 2026, the top 1% of creators aren't outworking everyone else. They're out-engineering them. They've turned what used to be a 60-hour-a-week grind into a streamlined pipeline where AI handles 80% of the production work — and they keep 100% of the creative direction.&lt;/p&gt;

&lt;p&gt;Over the past two years, working with hundreds of creators and educators through &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — Eastern Europe's leading AI education platform — I've watched this shift happen in real time. The patterns are consistent, the playbook is replicable, and the gap between those who adopt it and those who don't is widening every month.&lt;/p&gt;

&lt;p&gt;This article breaks down exactly how it works, what tools they use, and how you can build the same stack — whether you're an influencer who codes, or a developer building tools for creators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Changed the Influencer Game (Permanently)
&lt;/h2&gt;

&lt;p&gt;Three years ago, an influencer's competitive advantage was personality plus consistency. Today, that's table stakes.&lt;/p&gt;

&lt;p&gt;The real moat now is &lt;strong&gt;operational leverage&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How fast can you identify a trending topic?&lt;/li&gt;
&lt;li&gt;How quickly can you produce content across 5+ formats?&lt;/li&gt;
&lt;li&gt;How precisely can you target each piece to its platform?&lt;/li&gt;
&lt;li&gt;How much of this can run without your direct involvement?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators who answered "all of it, mostly automated" are the ones scaling past 1M followers, 7-figure revenues, and 50+ pieces of content per week — solo or with tiny teams.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. It's already happening. The question is whether you're building the system or watching others build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-Layer AI Stack for Modern Influencers
&lt;/h2&gt;

&lt;p&gt;Every high-output creator I've analyzed runs some version of this five-layer architecture. The tools change. The structure doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Intelligence (Research &amp;amp; Trend Detection)
&lt;/h3&gt;

&lt;p&gt;Before you create, you need to know what to create.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors trending topics, keywords, and conversations in your niche&lt;/li&gt;
&lt;li&gt;Analyzes competitor content performance&lt;/li&gt;
&lt;li&gt;Identifies content gaps and opportunities&lt;/li&gt;
&lt;li&gt;Surfaces audience questions before they become saturated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools and APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity API&lt;/strong&gt; — for real-time research with citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa AI&lt;/strong&gt; — semantic search for niche topics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Trends API&lt;/strong&gt; + &lt;strong&gt;YouTube Data API&lt;/strong&gt; — for trend signals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit API&lt;/strong&gt; + &lt;strong&gt;Twitter/X API&lt;/strong&gt; — for audience listening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BuzzSumo&lt;/strong&gt; or &lt;strong&gt;SparkToro&lt;/strong&gt; — for content gap analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Don't just track what's popular. Track what's &lt;em&gt;about to&lt;/em&gt; become popular by monitoring signal velocity (rate of change), not absolute volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Ideation (Concept &amp;amp; Angle Generation)
&lt;/h3&gt;

&lt;p&gt;This is where most creators waste the most time — staring at a blank page deciding what to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI does well here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates 30+ angle variations from a single topic&lt;/li&gt;
&lt;li&gt;Adapts ideas to your specific voice and audience&lt;/li&gt;
&lt;li&gt;Identifies counterintuitive takes that drive engagement&lt;/li&gt;
&lt;li&gt;Maps ideas to platform-specific formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommended approach:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a custom GPT or Claude project trained on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your past top-performing content (with metrics)&lt;/li&gt;
&lt;li&gt;Your audience persona and voice guidelines&lt;/li&gt;
&lt;li&gt;Your content pillars and forbidden topics&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If you've never structured a voice profile before, this is one of the highest-leverage skills you can develop. We dedicate an entire module to it inside &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI for Content Creators track on Cursuri-AI.ro&lt;/a&gt; — including the exact prompts and templates we use internally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then prompt it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a content strategist for an influencer with this profile:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Generate angles that are specific, counterintuitive, and aligned with their voice.
        Avoid generic takes. Each angle should be testable as a hook.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Give me &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; distinct angles for content about: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="n"&gt;angles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;building a personal brand in 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, data-driven, contrarian, B2B-focused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angles&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this single function call can fuel a month of content. Cost: ~$0.15.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Production (Multi-Format Content Generation)
&lt;/h3&gt;

&lt;p&gt;This is the heaviest-lifting layer — and where AI compounds value most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The repurposing principle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One "pillar" piece (a long-form video, podcast, or article) should generate 10–15 derivative pieces with minimal manual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample workflow for a 30-minute podcast episode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transcription&lt;/strong&gt; → Whisper API or AssemblyAI ($0.36 for 30 min)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-form blog post&lt;/strong&gt; → Claude/GPT generates structured article from transcript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn carousel&lt;/strong&gt; → 8–10 slide deck with key insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Twitter/X thread&lt;/strong&gt; → 10-tweet thread with the strongest takes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-form clips&lt;/strong&gt; → Opus Clip or Riverside AI extracts viral moments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Newsletter&lt;/strong&gt; → Personalized summary with commentary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Shorts&lt;/strong&gt; → Auto-captioned vertical clips&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quote graphics&lt;/strong&gt; → Designed via Canva API or Bannerbear&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instagram Reels&lt;/strong&gt; → Repurposed clips with platform-native captions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SEO blog series&lt;/strong&gt; → 3–5 articles targeting specific search queries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total human time: 1–2 hours of review and approval, instead of 30+ hours of production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Distribution (Platform-Native Publishing)
&lt;/h3&gt;

&lt;p&gt;Most creators lose performance here by posting the same content identically across platforms. AI fixes this by adapting each piece to the platform's native expectations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive distribution looks like:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn → Professional tone, longer-form, hook in first 2 lines&lt;/li&gt;
&lt;li&gt;Twitter/X → Punchy, opinionated, thread-friendly&lt;/li&gt;
&lt;li&gt;Instagram → Visual-first, emotion-driven captions&lt;/li&gt;
&lt;li&gt;TikTok → Hook in 1 second, vertical, trend-aware&lt;/li&gt;
&lt;li&gt;YouTube → SEO-optimized titles, timestamps, structured descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Buffer&lt;/strong&gt;, &lt;strong&gt;Hypefury&lt;/strong&gt;, or &lt;strong&gt;Typefully&lt;/strong&gt; — scheduling with AI optimization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make&lt;/strong&gt; or &lt;strong&gt;n8n&lt;/strong&gt; — custom automation workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postiz&lt;/strong&gt; (open source) — self-hosted social scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 5: Optimization (Performance Feedback Loop)
&lt;/h3&gt;

&lt;p&gt;This is the layer most creators skip — and it's the one that compounds the hardest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to track:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hook performance (which first lines drive scroll-stops?)&lt;/li&gt;
&lt;li&gt;Format performance (which content types convert best per platform?)&lt;/li&gt;
&lt;li&gt;Topic performance (which themes consistently win?)&lt;/li&gt;
&lt;li&gt;Audience signals (which content brings in your ICP vs. tourists?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How AI helps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes patterns across hundreds of posts in seconds&lt;/li&gt;
&lt;li&gt;Identifies non-obvious performance correlations&lt;/li&gt;
&lt;li&gt;Suggests next-week content based on last week's winners&lt;/li&gt;
&lt;li&gt;Drafts variations of top performers for retesting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build a simple dashboard that ingests your analytics from each platform and feeds it back to your ideation layer. This closes the loop — every post makes the next one smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Minimal Working Example: Content Repurposing Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's a stripped-down Python pipeline that takes a transcript and produces three platform-adapted outputs. Useful as a starting point you can extend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate LinkedIn post, Twitter thread, and newsletter from a transcript.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an expert content strategist. The creator&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s voice is: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    From the transcript below, produce THREE outputs in JSON:
    1. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 200-word LinkedIn post with strong hook
    2. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 8-tweet thread (array of strings, max 280 chars each)
    3. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 400-word personal newsletter section

    Each must feel platform-native, not copy-pasted.

    Transcript:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Return only valid JSON.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sample_transcript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;[Your podcast/video transcript here]&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;voice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, contrarian, B2B-focused, data-driven&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== LINKEDIN ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== TWITTER THREAD ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== NEWSLETTER ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extend this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whisper for audio-to-text input&lt;/li&gt;
&lt;li&gt;A queue system (Redis + Celery) for batch processing&lt;/li&gt;
&lt;li&gt;A simple Streamlit UI for non-technical creator team members&lt;/li&gt;
&lt;li&gt;Webhook integration with Buffer or Typefully for direct publishing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 5 Mistakes That Kill AI Content Pipelines
&lt;/h2&gt;

&lt;p&gt;I've audited dozens of creator AI workflows. The same mistakes appear over and over.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Treating AI as a Writer Instead of a Drafter
&lt;/h3&gt;

&lt;p&gt;AI-generated text published without human editing is detectable, generic, and erodes trust. Use AI for the first 80%, but always edit the final 20% — that's where your voice lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Skipping the Voice Calibration Step
&lt;/h3&gt;

&lt;p&gt;Without a documented voice profile (tone, vocabulary, forbidden phrases, examples), every output regresses to the mean. Spend 4 hours documenting your voice once. It pays back for years. If you want a structured framework for this, we walk through the full process in &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI workflow courses&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Building Without Measurement
&lt;/h3&gt;

&lt;p&gt;Pipelines without analytics are vibes-based content factories. If you can't tell which output formats win, you're optimizing blind.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Over-Automating Distribution
&lt;/h3&gt;

&lt;p&gt;Full automation of posting (no human in the loop) is how creators end up with embarrassing posts going live during global news events. Keep a 1-click approval step at minimum.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Choosing Tools Over Architecture
&lt;/h3&gt;

&lt;p&gt;The creators who win don't have the best tools. They have the clearest workflow. Tools change every quarter. Architecture compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming Next (2026–2027)
&lt;/h2&gt;

&lt;p&gt;A few signals worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personalized AI clones&lt;/strong&gt; — creators training models on their voice/likeness to scale 1:1 audience interaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal generation at scale&lt;/strong&gt; — single prompts producing full video, audio, and graphics in one pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-native platforms&lt;/strong&gt; — new social networks built around AI-generated content as a first-class citizen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-driven content ops&lt;/strong&gt; — autonomous agents that research, produce, schedule, and optimize with minimal human input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators preparing for this now — by building modular, API-driven systems — will be the ones operating at unprecedented scale by 2027.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: AI for Influencers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Do I need to code to use AI as an influencer?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Many top creators use no-code tools (Zapier, Make, ChatGPT, Claude Projects). But knowing even basic Python unlocks 10x more customization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Will AI-generated content hurt my reach?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Only if it sounds generic. Platforms penalize low-effort content, not AI assistance. Original voice + AI scaffolding consistently outperforms 100% human or 100% AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How much should I budget for AI tools?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A solo creator can build a complete stack for $50–150/month. Larger operations run $500–2000/month. ROI is usually measured in weeks, not months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is this ethical? Should I disclose AI usage?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Be transparent about &lt;em&gt;what&lt;/em&gt; AI does in your workflow (research, drafting, editing), but you don't need to flag every AI-touched word. The standard: would your audience feel deceived if they saw your process? If no, you're fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Which AI model should I use as a creator?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For creative content: Claude tends to lead. For research with citations: Perplexity. For images: Midjourney or Flux. For video: Runway or Sora. Test all of them — they each have strengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Build the System, Not the Output
&lt;/h2&gt;

&lt;p&gt;The influencer economy is splitting into two clear tiers.&lt;/p&gt;

&lt;p&gt;The first tier still manually crafts every piece of content. They post when they have time. They burn out. They plateau.&lt;/p&gt;

&lt;p&gt;The second tier has built systems. AI handles the heavy lifting. They post consistently across every platform. Their content compounds because their architecture compounds.&lt;/p&gt;

&lt;p&gt;The gap between these two tiers is widening every month. And by 2027, it will be unbridgeable for those who waited too long to start.&lt;/p&gt;

&lt;p&gt;The good news: building your AI content engine doesn't require a team or a six-figure budget. It requires clear thinking, a few APIs, and the willingness to treat content like the engineering problem it actually is.&lt;/p&gt;

&lt;p&gt;Start with one layer. Make it work. Add the next.&lt;/p&gt;

&lt;p&gt;That's how the top 1% built it. And it's how you build it too.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want to Go Deeper?
&lt;/h2&gt;

&lt;p&gt;If this resonated and you want a structured path instead of piecing it together from scattered blog posts and YouTube videos:&lt;/p&gt;

&lt;p&gt;🎓 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — Our complete AI education platform covers the entire creator stack: prompting, automation, content pipelines, AI workflows for business, and how to build production-grade AI systems. Interactive courses with an AI tutor that adapts to how you learn — not passive video watching.&lt;/p&gt;

&lt;p&gt;Whether you're a creator looking to scale, a developer building tools for the creator economy, or a business owner figuring out how to integrate AI into your operations — &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;start here&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm the founder of &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, where I help thousands of creators, professionals, and businesses build with AI. I write about AI workflows, content automation, and the engineering side of the creator economy.&lt;/p&gt;

&lt;p&gt;If this article helped, drop a reaction and follow for more deep dives. &lt;strong&gt;What layer of your content stack are you working on right now?&lt;/strong&gt; Let me know in the comments — I read every one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contentcreation</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>7 Production Patterns for AI Agents That Don't Break in 2026</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 13 May 2026 11:38:37 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</link>
      <guid>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</guid>
      <description>&lt;p&gt;A demo agent that loops three times, calls one tool, and returns "Hello, I helped you" is easy. A production agent that handles 10k requests a day across paying customers, without lighting your API bill on fire or hallucinating tool arguments at 3am, is a different animal.&lt;/p&gt;

&lt;p&gt;I've shipped AI agents in production for the last 18 months — search, content generation, support triage, document analysis. The same seven patterns keep showing up in every codebase that &lt;em&gt;actually&lt;/em&gt; works. None of them are exotic. Most of them are boring. That's the point: production agents are boring on purpose.&lt;/p&gt;

&lt;p&gt;Here are the patterns, with Python examples you can drop into your own loop today.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Tool Result Validator
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; LLMs hallucinate tool arguments. They will confidently call &lt;code&gt;send_email(to="user@example.com", subject="Refund", body="...")&lt;/code&gt; when the user never asked for an email. They will pass &lt;code&gt;user_id="123abc"&lt;/code&gt; to a function that requires an integer. They will invent product SKUs that don't exist.&lt;/p&gt;

&lt;p&gt;If your tool layer trusts the model's output, every hallucination becomes a production incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Validate tool arguments at the &lt;em&gt;tool boundary&lt;/em&gt;, not inside the tool. Reject early with a structured error the model can recover from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SendEmailArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_SCHEMAS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid_arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool call rejected. Fix these fields: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_confirmation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Always return the validation error &lt;em&gt;back to the model&lt;/em&gt; as a tool result. Don't raise it. The agent can usually self-correct in the next turn — but only if it sees the error.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bounded Memory
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Naive agent loops accumulate every tool call, every observation, every reasoning step into the conversation history. After 15 turns, you're sending 80k tokens per request. Your latency doubles. Your cost goes up 10x. The model starts losing track of what it was doing because the relevant context is buried under five tool dumps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat conversation history as a finite resource. Compress aggressively, summarize old turns, and keep tool outputs out of the main thread when you can.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Keep system message + last 4 turns verbatim
&lt;/span&gt;        &lt;span class="n"&gt;keep_recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;to_summarize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_with_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;earlier_context&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/earlier_context&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;keep_recent&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't summarize tool &lt;em&gt;call&lt;/em&gt; messages — the model needs the exact arguments to chain reasoning. Summarize only the &lt;em&gt;observations&lt;/em&gt;, and only when they're old enough that detail no longer matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Observable Loop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent is in production. A user complains it gave them garbage. You have... a final string output and a vague memory of what the loop does. Good luck debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Emit a structured event for every state transition in the loop. Every model call, every tool call, every retry, every error. Ship them to whatever observability stack you already use (Datadog, Honeycomb, OpenTelemetry, even just structured JSON to stdout).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;

&lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;span_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                  &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max turns exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Include a stable &lt;code&gt;run_id&lt;/code&gt; on &lt;em&gt;every&lt;/em&gt; event. When a customer reports an issue, you want one query that returns the entire trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent depends on three external services and a vector store. One of them is having a bad day. Your agent now returns a 500 to the user, even though for &lt;em&gt;this particular query&lt;/em&gt; the broken dependency wasn't actually needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Wrap dependencies in fallback chains. If the primary fails, the agent should know that capability is degraded — not crash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;impl&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;impl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
                &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.fallback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;degraded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is unavailable. Try a different approach.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial bit is the &lt;code&gt;degraded&lt;/code&gt; response — it goes back to the model as a tool result, and a well-prompted agent will re-plan. Maybe it tries a different tool. Maybe it tells the user "I can't check live inventory right now, but here's what I know." Either is better than a 500.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Surface the degraded status in your prompt. A line like &lt;em&gt;"If a tool returns status=degraded, do not retry it. Acknowledge the limitation in your final response."&lt;/em&gt; prevents the model from looping on a dead service.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Cost Circuit Breaker
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; A bug or an adversarial input puts your agent in a tool-calling loop. By the time you notice, you've spent $400 in 20 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Track cumulative cost per run and per session. Hard-stop when limits are exceeded. This is not optional in production — it's the difference between a bad day and a layoff conversation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.00&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run exceeded $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spent_today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_today&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; exceeded daily budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Different limits for different surfaces. An internal batch job can have a $5 ceiling per run. A free-tier chat user gets $0.10. A paying enterprise customer gets $2. Hardcoding one number is a footgun.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Deterministic Critic
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; "LLM-as-a-judge" sounds clever, but using a model to grade itself is unreliable and slow. Two model calls per output, both hallucinate, both cost money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; For checks you can express as code, &lt;em&gt;use code&lt;/em&gt;. Reserve LLM grading for genuinely subjective dimensions, and only after the deterministic checks pass.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_cite_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\[\d+\]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;too_long&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;BANNED_PHRASES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;banned_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_keywords:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;llm_grade&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the critic rejects, feed the issues back to the agent as a "revise this" instruction. After two rejections, return whatever you have with a flag — infinite revision loops are their own bug class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't make the critic too strict. If your accept rate is below 70%, your prompt is broken, not your output.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Stateless Replay (Idempotency)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent half-completed a task — it sent the email, then crashed before logging the result. The user retries. Now they get two emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat every external side-effect as idempotent by design. Use deterministic IDs derived from the input, dedupe at the tool layer, and make agent runs &lt;em&gt;replayable&lt;/em&gt; from any saved checkpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;canonical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if the agent retries the same step within the run, it gets the cached result. If you persist the cache across runs (with a longer TTL), you get cross-run idempotency too — which is what you want for anything that costs money or sends messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Be careful what you put in the idempotency key. Timestamps, request IDs, or random nonces in the args will defeat it. Strip them before hashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It Together
&lt;/h2&gt;

&lt;p&gt;A production agent loop using all seven patterns is roughly 200 lines of Python. Not glamorous, but it survives. Here's the skeleton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_production&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;task_context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revise: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task incomplete after max turns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the loop. Drop in your favorite model API (Claude, GPT, open source — patterns work the same), wire up your tools with the validator from pattern 1, and you have something that won't embarrass you in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Read Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic's "Building effective agents" guide&lt;/a&gt; — the canonical reference on when to use agents vs simple workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;OpenAI's Agents SDK docs&lt;/a&gt; — clean reference implementation of multi-agent handoffs.&lt;/li&gt;
&lt;li&gt;For Romanian-speaking developers building agents in production, the &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;AI Agents course on Cursuri-AI.ro&lt;/a&gt; goes deeper on these patterns with hands-on exercises.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've shipped agents in production, what patterns did I miss? Drop them in the comments — I'll add the best ones to a follow-up post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by a developer who has paged themselves at 3am because an agent went into a tool-calling loop. Don't be that developer. Use the circuit breaker.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Fine-Tuning LLMs in 2026: A Practical Guide for Engineers (LoRA, QLoRA, DPO, GRPO)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Fri, 01 May 2026 20:31:02 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/fine-tuning-llms-in-2026-a-practical-guide-for-engineers-lora-qlora-dpo-grpo-jjo</link>
      <guid>https://dev.to/cursuri-ai/fine-tuning-llms-in-2026-a-practical-guide-for-engineers-lora-qlora-dpo-grpo-jjo</guid>
      <description>&lt;p&gt;Fine-tuning has gone from "research lab toy" to a &lt;strong&gt;first-class production technique&lt;/strong&gt; for AI engineers. With LoRA-class adapters, modern alignment algorithms (DPO, GRPO, RLVR), and serving stacks like vLLM, you can ship a custom model on a single H100 — sometimes on a single 4090.&lt;/p&gt;

&lt;p&gt;But the question isn't &lt;em&gt;can&lt;/em&gt; you fine-tune. It's: &lt;strong&gt;should you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This guide is the engineering checklist I wish I'd had two years ago. It covers the decision tree, the modern toolchain, the gotchas, and the EU compliance constraints you can't ignore in 2026.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🇪🇺 Romanian / EU readers: the full hands-on Romanian-language program is at &lt;a href="https://cursuri-ai.ro/courses/fine-tuning-modele-ai" rel="noopener noreferrer"&gt;Fine-Tuning și Adaptarea Modelelor AI — Enterprise Edition&lt;/a&gt;. It includes a complete end-to-end project, EU AI Act governance, and FinOps modeling.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't fine-tune first.&lt;/strong&gt; Try prompting → RAG → fine-tuning. In that order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LoRA / QLoRA&lt;/strong&gt; is the default in 2026. Full fine-tuning is rarely the right call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alignment ≠ SFT.&lt;/strong&gt; SFT teaches &lt;em&gt;format&lt;/em&gt;; DPO/GRPO/RLVR teach &lt;em&gt;preferences and reasoning&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation is the hard part.&lt;/strong&gt; Loss curves don't tell you if the model is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serving matters.&lt;/strong&gt; A great fine-tune served badly is just an expensive demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act applies.&lt;/strong&gt; Document your data, your evals, and your model card.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. When fine-tuning is actually the right tool
&lt;/h2&gt;

&lt;p&gt;Most teams reach for fine-tuning too early. Here's the honest decision tree:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;First try&lt;/th&gt;
&lt;th&gt;Fine-tune only if&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inconsistent output format&lt;/td&gt;
&lt;td&gt;Prompting + structured outputs&lt;/td&gt;
&lt;td&gt;Format breaks &amp;gt; 5% even with strict prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge cutoff / private data&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;RAG retrieves the right chunks but the model still misuses them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain-specific style/voice&lt;/td&gt;
&lt;td&gt;System prompt + few-shot&lt;/td&gt;
&lt;td&gt;You need it baked in across thousands of calls (latency/cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized reasoning (math, code, legal)&lt;/td&gt;
&lt;td&gt;Better base model + CoT&lt;/td&gt;
&lt;td&gt;You have a clean preference dataset and need stable behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool use / agents&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; + good prompts&lt;/td&gt;
&lt;td&gt;Tool-call accuracy is below your SLA after prompt iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; if you can't articulate &lt;em&gt;what your fine-tune teaches that a 200-line system prompt can't&lt;/em&gt;, you're not ready to fine-tune.&lt;/p&gt;

&lt;p&gt;If you're earlier in the journey, the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; and &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration&lt;/a&gt; cover the cheaper alternatives in depth.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The 2026 technique landscape
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Full fine-tuning
&lt;/h3&gt;

&lt;p&gt;Updates every parameter. Maximum capacity, maximum cost, maximum risk of catastrophic forgetting. Justified for: foundational training, large domain shifts, or when you own the inference path and the dataset is huge (&amp;gt;1M high-quality examples).&lt;/p&gt;

&lt;h3&gt;
  
  
  LoRA (Low-Rank Adaptation)
&lt;/h3&gt;

&lt;p&gt;The original &lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;LoRA paper (Hu et al., 2021)&lt;/a&gt; is still required reading. You freeze the base weights and train two small low-rank matrices &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; per attention layer. Typical adapter is 0.1–1% of the model's parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                       &lt;span class="c1"&gt;# rank
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# scaling
&lt;/span&gt;    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# trainable params: 8.4M || all params: 7.2B || trainable%: 0.12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  QLoRA
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2305.14314" rel="noopener noreferrer"&gt;QLoRA (Dettmers et al., 2023)&lt;/a&gt; loads the base model in 4-bit (NF4) and trains LoRA adapters on top. This is what lets you fine-tune a 70B model on a single 80GB GPU. Use &lt;code&gt;bitsandbytes&lt;/code&gt; + &lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;HuggingFace PEFT&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  DoRA, OLoRA, rsLoRA
&lt;/h3&gt;

&lt;p&gt;Newer variants that decouple magnitude/direction (DoRA), use orthogonal init (OLoRA), or rescale rank (rsLoRA). Marginal gains in most cases — start with vanilla LoRA, only switch if you've measured a problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Alignment: SFT is just step one
&lt;/h2&gt;

&lt;p&gt;Supervised Fine-Tuning (SFT) teaches the model &lt;em&gt;what good output looks like&lt;/em&gt;. It does &lt;strong&gt;not&lt;/strong&gt; teach preferences, refusals, or reasoning quality. That's what alignment is for.&lt;/p&gt;

&lt;h3&gt;
  
  
  DPO (Direct Preference Optimization)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2305.18290" rel="noopener noreferrer"&gt;DPO (Rafailov et al., 2023)&lt;/a&gt; replaces the RLHF pipeline (reward model + PPO) with a single classification-style loss on preference pairs. Simpler, more stable, and the de facto default in 2026.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DPOTrainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DPOConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DPOConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                   &lt;span class="c1"&gt;# KL regularization
&lt;/span&gt;    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5e-7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DPOTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sft_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# PEFT auto-handles reference
&lt;/span&gt;    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;preference_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GRPO and RLVR
&lt;/h3&gt;

&lt;p&gt;GRPO (Group Relative Policy Optimization, popularized by DeepSeek-R1) and RLVR (RL with Verifiable Rewards) are the techniques behind the reasoning-model wave. If you're training for math, code, or anything with a programmatic verifier — these matter.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://huggingface.co/docs/trl" rel="noopener noreferrer"&gt;HuggingFace TRL library&lt;/a&gt; now ships first-class support for SFT, DPO, GRPO, and KTO.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The data pipeline is the moat
&lt;/h2&gt;

&lt;p&gt;A bad dataset will defeat a perfect training loop every time. Things that actually move metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Diversity over volume.&lt;/strong&gt; 5K diverse examples beats 50K near-duplicates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard negatives.&lt;/strong&gt; For preference data, pairs where chosen and rejected are &lt;em&gt;almost equally good&lt;/em&gt; teach more than obvious wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decontamination.&lt;/strong&gt; Strip eval-set leakage from training data. &lt;em&gt;Always.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format consistency.&lt;/strong&gt; Tokenize early to catch chat-template mismatches before you waste 10 GPU-hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PII and licensing.&lt;/strong&gt; This is where the EU AI Act lives. Document provenance.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. The 2026 tooling stack
&lt;/h2&gt;

&lt;p&gt;Here's what a production-grade fine-tuning project looks like today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Training framework&lt;/td&gt;
&lt;td&gt;&lt;a href="https://huggingface.co/docs/trl" rel="noopener noreferrer"&gt;HuggingFace TRL&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adapters&lt;/td&gt;
&lt;td&gt;&lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;HuggingFace PEFT&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bitsandbytes&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distributed&lt;/td&gt;
&lt;td&gt;Accelerate / DeepSpeed ZeRO-3 / FSDP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiment tracking&lt;/td&gt;
&lt;td&gt;Weights &amp;amp; Biases or MLflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serving&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/vllm-project/vllm" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval harness&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lm-evaluation-harness&lt;/code&gt; + custom domain evals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Closed-source baseline&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/fine-tuning" rel="noopener noreferrer"&gt;OpenAI fine-tuning&lt;/a&gt; for comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Wiring all of this into a real CI/CD lifecycle is what separates a notebook experiment from a deployable system. That's the focus of &lt;a href="https://cursuri-ai.ro/courses/mlops-prototip-productie" rel="noopener noreferrer"&gt;MLOps: Prototype to Production&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Evaluation: where most projects quietly fail
&lt;/h2&gt;

&lt;p&gt;Loss curves go down. The model "feels better." You ship. Production complaints spike. Sound familiar?&lt;/p&gt;

&lt;p&gt;Build a &lt;strong&gt;holistic eval suite&lt;/strong&gt; before you start training:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capability evals&lt;/strong&gt; — domain-specific tasks scored by rubric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regression evals&lt;/strong&gt; — verify the model didn't lose abilities (catastrophic forgetting is real).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety evals&lt;/strong&gt; — refusals, jailbreak resistance, policy adherence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-as-judge&lt;/strong&gt; — useful, but bias-corrected with human spot-checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost &amp;amp; latency&lt;/strong&gt; — TTFT, throughput, p95 — these &lt;em&gt;are&lt;/em&gt; product metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your eval suite isn't version-controlled and reproducible, you don't have an eval suite. You have vibes.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Serving: the part nobody talks about until it breaks
&lt;/h2&gt;

&lt;p&gt;LoRA adapters can be &lt;strong&gt;hot-swapped&lt;/strong&gt; at inference time. vLLM, SGLang, and TensorRT-LLM all support multi-LoRA serving — meaning you can host one base model and dozens of fine-tuned adapters with near-zero overhead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# vLLM with LoRA adapters&lt;/span&gt;
vllm serve meta-llama/Llama-3.1-8B-Instruct &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-lora&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lora-modules&lt;/span&gt; legal-adapter&lt;span class="o"&gt;=&lt;/span&gt;./adapters/legal sales-adapter&lt;span class="o"&gt;=&lt;/span&gt;./adapters/sales &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-loras&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the architectural unlock that makes fine-tuning economically viable for SaaS multi-tenancy.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. EU AI Act: not optional in 2026
&lt;/h2&gt;

&lt;p&gt;If you're shipping in the EU, fine-tuning a foundation model can put you in the &lt;em&gt;deployer&lt;/em&gt; or &lt;em&gt;provider&lt;/em&gt; category under the &lt;a href="https://artificialintelligenceact.eu/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt;. Practical consequences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model card&lt;/strong&gt; documenting training data, intended use, limitations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk assessment&lt;/strong&gt; if the use case touches Annex III (HR, education, critical infrastructure, law enforcement, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging&lt;/strong&gt; of significant model updates and eval results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency obligations&lt;/strong&gt; to end users for AI-generated content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't lawyer paranoia — auditors are already asking. Bake it into your pipeline from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. The mistakes I see most often
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning before exhausting prompting and RAG.&lt;/strong&gt; Cheaper, faster, easier to roll back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using &lt;code&gt;r=64&lt;/code&gt; because "bigger is better".&lt;/strong&gt; Most tasks saturate at &lt;code&gt;r=8&lt;/code&gt; to &lt;code&gt;r=16&lt;/code&gt;. Measure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mismatched chat template&lt;/strong&gt; between training and inference. Silent quality killer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training on the eval set.&lt;/strong&gt; Decontaminate. Then decontaminate again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping the SFT-only baseline.&lt;/strong&gt; You can't claim DPO helped if you didn't measure SFT-only first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring catastrophic forgetting.&lt;/strong&gt; Always run a regression eval against the base model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting the FinOps math.&lt;/strong&gt; A $400 fine-tune that adds $0.002/request to inference is not a win at 1M requests/day.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;p&gt;If you want a structured path that goes from prompt engineering to deploying fine-tuned models in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Foundation:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/introducere-ai-engineering" rel="noopener noreferrer"&gt;Introduction to AI Engineering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before fine-tuning:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; → &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG: Retrieval-Augmented Generation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The full deep dive:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/fine-tuning-modele-ai" rel="noopener noreferrer"&gt;Fine-Tuning and Model Adaptation — Enterprise Edition&lt;/a&gt; (LoRA/QLoRA/DoRA, DPO/GRPO/RLVR, vLLM serving, EU AI Act, end-to-end project)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productionization:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/mlops-prototip-productie" rel="noopener noreferrer"&gt;MLOps: Prototype to Production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration layer:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP — Model Context Protocol&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browse the full IT engineering track at &lt;a href="https://cursuri-ai.ro/cursuri/it" rel="noopener noreferrer"&gt;cursuri-ai.ro/cursuri/it&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Fine-tuning in 2026 is no longer about &lt;em&gt;can the model learn the task&lt;/em&gt;. It's about &lt;strong&gt;whether your dataset, eval suite, serving stack, and governance process are good enough to deserve a custom model&lt;/strong&gt;. Get those right, and a single adapter can be the difference between a feature that costs you money and a feature that defines your product.&lt;/p&gt;

&lt;p&gt;If this resonated, I'd love to hear what fine-tuning problem you're actually stuck on — drop it in the comments. 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — the AI engineering education platform for Romanian and EU professionals.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Claude Opus 4.7 vs GPT-5.5: A Developer's Pragmatic Comparison Guide (2026)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 28 Apr 2026 10:03:06 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/claude-opus-47-vs-gpt-55-a-developers-pragmatic-comparison-guide-2026-11jb</link>
      <guid>https://dev.to/cursuri-ai/claude-opus-47-vs-gpt-55-a-developers-pragmatic-comparison-guide-2026-11jb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — In 2026, choosing an LLM is no longer about picking "the best model." It's about understanding which model solves &lt;em&gt;your specific problem&lt;/em&gt; at the lowest total cost and risk. Claude Opus 4.7 brings a 1M token context window and exceptional reasoning. GPT-5.5 brings ecosystem maturity and multimodal strength. The right answer for production is almost always &lt;strong&gt;multi-model orchestration&lt;/strong&gt;, not allegiance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're a backend engineer, ML engineer, or solutions architect choosing a foundation model in 2026, this guide is for you. No marketing fluff. Just patterns I've validated on real projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Quick Note on Honesty
&lt;/h2&gt;

&lt;p&gt;Before we go further: &lt;strong&gt;I'm not going to fabricate specs.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; is verified to ship with a &lt;strong&gt;1M token context window&lt;/strong&gt; (Anthropic's official spec).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; remains in active production as the cost-efficient predecessor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt; is OpenAI's current flagship at the time of writing. For exact context window, pricing, and benchmark numbers, &lt;strong&gt;always check OpenAI's official documentation&lt;/strong&gt; — those numbers shift between point releases, and any blog quoting them risks being stale within a month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article focuses on &lt;strong&gt;architectural and methodological differences&lt;/strong&gt; that age well, not spec-sheet trivia that doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Comparison Matters Differently in 2026
&lt;/h2&gt;

&lt;p&gt;Three years ago, picking a model meant running it through a weekend benchmark and shipping. Today, the calculus has changed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context windows have stopped being a bottleneck.&lt;/strong&gt; With Opus 4.7's 1M token window, the question is no longer "can I fit my codebase?" — it's "should I, given attention dynamics and cost?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Cost of Ownership has become non-trivial.&lt;/strong&gt; API price-per-token is maybe 30% of what you actually pay in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory pressure is real.&lt;/strong&gt; The EU AI Act and GDPR are no longer theoretical — they shape architecture decisions for any team with European users.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Engineers who still treat model selection as a 2-hour decision are leaving serious money and reliability on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architectural Differences That Actually Matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context Window
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Practical Implication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;1,000,000 tokens&lt;/td&gt;
&lt;td&gt;Full enterprise codebases, long-form legal docs, multi-document RAG without chunking compromises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;(See Anthropic docs)&lt;/td&gt;
&lt;td&gt;Cost-optimized workhorse for everyday agentic workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;(See OpenAI docs)&lt;/td&gt;
&lt;td&gt;Tight integration with Azure OpenAI, mature tooling ecosystem&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The 1M context window is not just bigger — it changes architectural patterns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you have a million tokens, you stop building chunked RAG pipelines for many use cases. You stop fighting context truncation. You can pass a full repo, a full deposition, a full quarterly filing — and ask the model to reason over it directly.&lt;/p&gt;

&lt;p&gt;But this comes with a real trade-off: &lt;strong&gt;attention quality degrades unevenly across very long contexts.&lt;/strong&gt; Just because you &lt;em&gt;can&lt;/em&gt; stuff 800K tokens in doesn't mean the model will reliably find the needle. Always run targeted &lt;strong&gt;needle-in-haystack&lt;/strong&gt; evals on &lt;em&gt;your&lt;/em&gt; data structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Style
&lt;/h3&gt;

&lt;p&gt;This is hard to quantify but easy to feel after enough projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; tends to reason more conservatively. It pushes back on ambiguity, asks clarifying questions, and produces structured outputs that hold up well under JSON schema validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt; tends to be more proactive and creative. It will often produce a complete answer where Claude would ask "did you mean X or Y?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither is universally better. Conservative reasoning saves you from hallucinated database queries in production. Proactive reasoning ships features faster in a hackathon.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Use &amp;amp; Agentic Workflows
&lt;/h3&gt;

&lt;p&gt;Both models support function calling and agentic loops. In my experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude's tool use feels more deterministic. JSON schemas hold. Parallel tool calls behave predictably.&lt;/li&gt;
&lt;li&gt;GPT's tool use has a more mature ecosystem (Assistants API, more SDK examples, broader community).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building a &lt;strong&gt;pure agent system&lt;/strong&gt;, both work. If you're integrating into an existing &lt;strong&gt;Azure / Microsoft stack&lt;/strong&gt;, GPT-5.5 has lower friction. If you're building a &lt;strong&gt;regulated workflow with strict guarantees&lt;/strong&gt;, Claude's structured output behavior wins on reliability.&lt;/p&gt;




&lt;h2&gt;
  
  
  When To Choose Each — A Decision Framework
&lt;/h2&gt;

&lt;p&gt;Stop asking "which is best?" Start asking these four questions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. What problem am I actually solving?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-form document reasoning, code analysis at scale, regulated decision support&lt;/strong&gt; → Claude Opus 4.7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal user-facing features, real-time voice, ecosystem-heavy integrations&lt;/strong&gt; → GPT-5.5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume cost-sensitive agentic workloads&lt;/strong&gt; → Claude Opus 4.6 (or smaller models)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. What's my failure cost?
&lt;/h3&gt;

&lt;p&gt;A chatbot that recommends the wrong product costs a sale. An assistant that misreads a contract clause costs a lawsuit. Match the model's reliability profile to your downside risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Who maintains this in 18 months?
&lt;/h3&gt;

&lt;p&gt;Models get deprecated. Pricing changes. APIs evolve. Pick the model whose &lt;strong&gt;migration path&lt;/strong&gt; you can stomach. If your answer is "we can't migrate" — you've built tech debt, not capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. What's my regulatory surface?
&lt;/h3&gt;

&lt;p&gt;For EU-resident users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act&lt;/strong&gt; classifies systems by risk tier — high-risk systems carry significant compliance overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR&lt;/strong&gt; still applies to any prompt containing personal data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor concentration risk&lt;/strong&gt; is now a documented audit concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Single-vendor architectures are increasingly hard to defend in compliance reviews.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build Your Own Evaluation Harness (Don't Trust Public Benchmarks)
&lt;/h2&gt;

&lt;p&gt;Public benchmarks measure general capability. Your production system needs &lt;em&gt;domain-specific&lt;/em&gt; capability. Here's a minimal evaluation pattern I use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;anthropic_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run a single task against a model and return structured output.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# openai
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;evaluate_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_eval_suite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compare both models on the same tasks.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few principles for building your eval suite:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use real production data&lt;/strong&gt; (anonymized). Synthetic tasks lie.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include adversarial cases&lt;/strong&gt; — ambiguous inputs, near-duplicates, edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure cost-per-correct-answer&lt;/strong&gt;, not just accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run it weekly&lt;/strong&gt; — model behavior drifts between point releases.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Hidden Costs Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;API price-per-token is the smallest part of your real cost. Here's the full picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost Layer&lt;/th&gt;
&lt;th&gt;Typical Range&lt;/th&gt;
&lt;th&gt;What Drives It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct API tokens&lt;/td&gt;
&lt;td&gt;20-30% of total&lt;/td&gt;
&lt;td&gt;Pricing tier, prompt size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Re-prompting on errors&lt;/td&gt;
&lt;td&gt;10-20%&lt;/td&gt;
&lt;td&gt;Model reliability, validation strictness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-loop validation&lt;/td&gt;
&lt;td&gt;15-30%&lt;/td&gt;
&lt;td&gt;Use case sensitivity, regulatory requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching infrastructure&lt;/td&gt;
&lt;td&gt;5-10%&lt;/td&gt;
&lt;td&gt;Architecture, library choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vendor migration overhead&lt;/td&gt;
&lt;td&gt;10-25% (when triggered)&lt;/td&gt;
&lt;td&gt;Lock-in level, abstraction quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance audits&lt;/td&gt;
&lt;td&gt;5-15%&lt;/td&gt;
&lt;td&gt;Regulatory environment, data sensitivity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;A model that's "20% cheaper at the API" can be 2x more expensive in TCO&lt;/strong&gt; if it triggers more re-prompts or requires heavier human validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Model Orchestration: The Pattern That Wins
&lt;/h2&gt;

&lt;p&gt;In 2026, the production-grade answer is rarely "one model for everything." Common patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│  Router (lightweight model)                                 │
│  ├── Classifies request complexity &amp;amp; sensitivity            │
│  └── Routes to appropriate model                            │
└─────────────────────────────────────────────────────────────┘
            │
   ┌────────┼────────┐
   ▼        ▼        ▼
[Haiku]  [Opus 4.6]  [Opus 4.7]
 cheap    balanced    deep reasoning
 fast     production  complex docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern routinely cuts costs by &lt;strong&gt;40-60%&lt;/strong&gt; versus single-model architectures, with no quality loss when the router is well-calibrated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Going Deeper: Resources
&lt;/h2&gt;

&lt;p&gt;If you want to go beyond this article and build genuine expertise in model selection, evaluation, and multi-model architecture, I've put together a structured course covering exactly these topics:&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI Model Comparison 2026 — Enterprise Edition&lt;/a&gt;&lt;/strong&gt; &lt;em&gt;(course is in Romanian)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full enterprise evaluation methodology — from benchmark to production&lt;/li&gt;
&lt;li&gt;How to interpret 2026 benchmarks correctly (signal vs. marketing noise)&lt;/li&gt;
&lt;li&gt;Structured selection frameworks based on cost / risk / use case&lt;/li&gt;
&lt;li&gt;Complete landscape: Anthropic, OpenAI, Google, Meta, Mistral&lt;/li&gt;
&lt;li&gt;Multi-model architectures and cost optimization strategies&lt;/li&gt;
&lt;li&gt;Applied case studies with European regulatory context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔗 Full platform: &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — single subscription, full catalog of AI courses for IT and non-IT professionals.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;The real edge in 2026 isn't access to AI — it's &lt;strong&gt;methodological maturity in choosing, evaluating, and governing AI&lt;/strong&gt;. Model access has become a commodity. The competence to architect around models is the scarce resource.&lt;/p&gt;

&lt;p&gt;If you take one thing from this article, let it be this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Stop asking "which model is best?" Start asking "which model best fits this specific decision, and what's my exit if I'm wrong?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single shift in framing will save your team thousands of hours and tens of thousands of euros over the next twelve months.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a comment with your current model stack — I'm always curious how teams are actually orchestrating these in production.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>anthropic</category>
      <category>openai</category>
    </item>
    <item>
      <title>The Anatomy of a Modern AI Marketing Curriculum in 2026 — What It Covers and Why It Matters</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:13:27 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/the-anatomy-of-a-modern-ai-marketing-curriculum-in-2026-what-it-covers-and-why-it-matters-mh6</link>
      <guid>https://dev.to/cursuri-ai/the-anatomy-of-a-modern-ai-marketing-curriculum-in-2026-what-it-covers-and-why-it-matters-mh6</guid>
      <description>&lt;h1&gt;
  
  
  The Anatomy of a Modern AI Marketing Curriculum in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;"Digital marketing is no longer a copywriting discipline with an analytics layer on top. In 2026, it's a distributed system of generative models, data pipelines, and cross-channel automations — strategically orchestrated by a human who understands both AI and the market."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The global AI-in-marketing market hit &lt;strong&gt;$45.8 billion&lt;/strong&gt; in 2026, up from $21.5 billion in 2024.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;78% of B2B and B2C companies&lt;/strong&gt; now use at least one AI tool in their marketing stack.&lt;/li&gt;
&lt;li&gt;A modern AI Marketing curriculum covers &lt;strong&gt;9 core areas&lt;/strong&gt;: fundamentals, content and SEO, social media, email and automation, paid ads, analytics, video/audio/visual, ethics and legislation, and applied projects.&lt;/li&gt;
&lt;li&gt;The dominant tech stack: &lt;strong&gt;GPT-5.4, Claude Opus 4.6, Performance Max, Meta Advantage+, Jasper, Canva AI&lt;/strong&gt;, integrated with modern CRMs and data warehouses.&lt;/li&gt;
&lt;li&gt;This article maps, section by section, what such a curriculum should look like if you want to move from "I've heard of AI" to "I run an AI-first department."&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this article lives on dev.to
&lt;/h2&gt;

&lt;p&gt;Plenty of developers build MarTech tools, work at startups where they wear multiple hats, or run side projects that require them to understand funnels, SEO, and conversions. Over the last 18 months, AI has fundamentally rewritten how marketing gets done — and the line between "developer" and "growth engineer" has visibly thinned.&lt;/p&gt;

&lt;p&gt;This article is an X-ray of the skills a modern AI Marketing specialist needs in 2026. It's useful if you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a product and want to understand how it gets promoted in the AI era&lt;/li&gt;
&lt;li&gt;Freelance or consult and integrate AI into client deliverables&lt;/li&gt;
&lt;li&gt;Work at the MarTech intersection — data engineering, analytics, experimentation&lt;/li&gt;
&lt;li&gt;Want a solid baseline for evaluating or hiring specialists in this field&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — a Romanian platform focused exclusively on professional AI education — and this article reflects the curriculum we've designed for the marketing track.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 2026 numbers you need to know
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;2024&lt;/th&gt;
&lt;th&gt;2026&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Global AI Marketing market&lt;/td&gt;
&lt;td&gt;$21.5B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$45.8B&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Companies using AI in marketing&lt;/td&gt;
&lt;td&gt;37%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROI — AI-augmented vs. traditional campaigns&lt;/td&gt;
&lt;td&gt;+10-15%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+35-50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per lead reduction&lt;/td&gt;
&lt;td&gt;-8%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-28%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content production time reduction&lt;/td&gt;
&lt;td&gt;-25%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-65%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Romania:&lt;/strong&gt; 52% of digital agencies and 34% of companies with marketing budgets above €10,000/month actively use AI in their workflows (iSense Solutions for IAB Romania, 2026).&lt;/p&gt;

&lt;p&gt;The takeaway is unambiguous: a marketer who doesn't operate with AI in 2026 is no longer competitive. And a developer building products can no longer afford to treat marketing as a black box.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 9 areas of a modern curriculum
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AI fundamentals for digital marketing
&lt;/h3&gt;

&lt;p&gt;Without a proper grasp of generative models, everything else stays shallow. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operational differences between &lt;strong&gt;GPT-5.4&lt;/strong&gt; (1M token context, excellent for content at scale) and &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; (complex analytical reasoning, strategy)&lt;/li&gt;
&lt;li&gt;The architecture of a modern &lt;strong&gt;MarTech stack&lt;/strong&gt;: CRM → CDP → AI orchestrator → channels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation levels&lt;/strong&gt; (L1-L5) — from manual prompting to fully autonomous systems with human-in-the-loop&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Content and SEO with AI
&lt;/h3&gt;

&lt;p&gt;Content generation was the first battlefield AI won. In 2026, it's no longer "I wrote a blog post with ChatGPT" — it's full pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scalable content generation aligned with brand voice&lt;/li&gt;
&lt;li&gt;Optimization for &lt;strong&gt;Google AI Overviews&lt;/strong&gt; — the new ranking model partially replacing classic SERPs&lt;/li&gt;
&lt;li&gt;Differentiated copywriting for &lt;strong&gt;ads, email, and landing pages&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Editorial calendars orchestrated by AI based on trending signals and seasonality&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Social media and community
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cross-channel automation (LinkedIn, Instagram, TikTok, X) while respecting each platform's tone&lt;/li&gt;
&lt;li&gt;Visual and video content generation straight from prompts (&lt;strong&gt;Sora, Runway, Midjourney&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;Intelligent &lt;strong&gt;social listening&lt;/strong&gt; — automatic sentiment detection and reputation-crisis alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Email marketing and automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Campaigns with &lt;strong&gt;1:1 personalization&lt;/strong&gt; driven by hundreds of behavioral signals&lt;/li&gt;
&lt;li&gt;Adaptive funnels that self-optimize based on segment reactions&lt;/li&gt;
&lt;li&gt;Predictive segmentation — you no longer slice the list demographically; you slice it by &lt;strong&gt;intent score&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Paid ads and performance marketing
&lt;/h3&gt;

&lt;p&gt;This is where the gap between "marketing with AI" and "AI-first marketing" is most visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Performance Max&lt;/strong&gt; — campaigns that simultaneously optimize bid, creative, and audience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta Advantage+&lt;/strong&gt; — the Meta equivalent, with product catalog and automated targeting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROAS&lt;/strong&gt; optimization and budgeting with predictive models (not static rules)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Analytics and data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Predictive customer analytics&lt;/strong&gt; — churn prediction, LTV forecasting, next-best-action&lt;/li&gt;
&lt;li&gt;Personalization at scale using &lt;strong&gt;vector embeddings&lt;/strong&gt; and behavioral similarity&lt;/li&gt;
&lt;li&gt;Decision dashboards that propose actions, not just display metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Video, audio, and visual marketing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Image generation and visual design (Midjourney, DALL-E, Adobe Firefly)&lt;/li&gt;
&lt;li&gt;End-to-end video marketing: &lt;strong&gt;script → voiceover → editing → subtitles → distribution&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Podcast and voice marketing&lt;/strong&gt; — a fast-growing niche in 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Ethics, legislation, and AI-first strategy
&lt;/h3&gt;

&lt;p&gt;The most underrated area — and the riskiest if ignored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand safety&lt;/strong&gt; in the age of generated content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act&lt;/strong&gt; — practical requirements for marketing applications (risk classification, transparency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR&lt;/strong&gt; applied specifically to personalization and algorithmic profiling&lt;/li&gt;
&lt;li&gt;AI-First transformation roadmap for an organization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Case studies and applied projects
&lt;/h3&gt;

&lt;p&gt;Any serious curriculum closes with real application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;End-to-end AI digital transformation of a &lt;strong&gt;Romanian e-commerce business&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;AI strategy for a local &lt;strong&gt;marketing agency&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final capstone project&lt;/strong&gt; — building your own AI-first marketing strategy, ready to implement&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The dominant 2026 tech stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
txt
── Foundation models ──
• GPT-5.4 (OpenAI)                 — 1M token context, content at scale
• Claude Opus 4.6 (Anthropic)      — analytical reasoning, strategy, long docs
• Claude Sonnet 4.6                — operational workloads, cost-efficient

── Advertising platforms ──
• Google Performance Max + Gemini  — fully orchestrated campaigns
• Meta Advantage+                  — equivalent on Meta Ads

── Specialized tools ──
• Jasper, Copy.ai                  — ad-focused copywriting
• Canva AI, Adobe Firefly          — visual design
• Midjourney, DALL-E 3+            — premium imagery
• Runway, Sora                     — video generation
• ElevenLabs                       — voice generation

── Analytics &amp;amp; data ──
• Segment / RudderStack            — CDP
• Snowflake / BigQuery             — data warehouse
• Hex, Mode                        — AI-assisted analytics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>marketing</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>MCP (Model Context Protocol): The Complete Guide to Building AI-Powered Integrations in 2026</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Sun, 19 Apr 2026 20:18:08 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/mcp-model-context-protocol-the-complete-guide-to-building-ai-powered-integrations-in-2026-5bnd</link>
      <guid>https://dev.to/cursuri-ai/mcp-model-context-protocol-the-complete-guide-to-building-ai-powered-integrations-in-2026-5bnd</guid>
      <description>&lt;p&gt;Every developer building AI apps hits the same problem: connecting an LLM to real tools means writing custom glue code for every single integration. Different schemas, different auth, different error handling — repeated for every model and every data source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; fixes this. It's an open standard — think USB-C for AI connectivity — that lets any AI client talk to any tool server through one universal interface. And it's not theoretical: OpenAI, Google, Microsoft, Salesforce, and thousands of developers already use it in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Actually Does
&lt;/h2&gt;

&lt;p&gt;Before MCP, connecting Claude or GPT to your database meant writing a custom function, defining a JSON schema, handling auth, and repeating all of that for every tool. Scale that to 30 integrations across multiple environments — it breaks fast.&lt;/p&gt;

&lt;p&gt;MCP replaces all of that with a single protocol based on JSON-RPC 2.0. A server declares what it can do; a client discovers it automatically. No hardcoding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Your App (Host)  →  MCP Client  →  MCP Server (tools, data, prompts)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A server can expose three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — functions the AI can call (&lt;code&gt;query_database&lt;/code&gt;, &lt;code&gt;send_email&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt; — structured data it can read (schemas, file contents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt; — reusable templates (code review checklist, SQL generator)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Working Example in Python
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Database Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query users filtered by status.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;get_db_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, email FROM users WHERE status = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema://users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_users_schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns the users table schema.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR, email VARCHAR, status VARCHAR);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stdio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;15 lines. Your AI agent can now query your database and understand its schema through any MCP-compatible client.&lt;/p&gt;

&lt;h2&gt;
  
  
  TypeScript Works Too
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;McpServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/mcp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StdioServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/stdio.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GitHub Assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;list_issues&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;List open issues for a repository&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`https://api.github.com/repos/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/issues?state=open&amp;amp;per_page=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioServerTransport&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Two Transports, Different Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;stdio&lt;/strong&gt; — local tools. Server runs as a child process, zero network overhead. Great for file access, local DBs, CLI tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streamable HTTP&lt;/strong&gt; — remote/shared servers. Runs as a web service, supports OAuth 2.0. Ideal for SaaS integrations and team-shared tools.&lt;/p&gt;

&lt;p&gt;Most production setups use both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Won
&lt;/h2&gt;

&lt;p&gt;The adoption timeline tells the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nov 2024&lt;/strong&gt; — Anthropic launches MCP as open-source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 2025&lt;/strong&gt; — OpenAI adopts MCP officially&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;May 2025&lt;/strong&gt; — Microsoft joins the MCP steering committee&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jun 2025&lt;/strong&gt; — Salesforce builds Agentforce 3 on MCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dec 2025&lt;/strong&gt; — MCP moves to the Linux Foundation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today: 10,000+ servers in production, 70%+ of major SaaS brands ship MCP servers, every major AI platform supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Done Right
&lt;/h2&gt;

&lt;p&gt;MCP's security model is one of its strongest features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Granular permissions&lt;/strong&gt; — each server declares capabilities, the host controls access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User consent&lt;/strong&gt; — critical actions need explicit approval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process isolation&lt;/strong&gt; — servers run in separate processes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full audit trail&lt;/strong&gt; — every invocation is logged&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From Demo to Production
&lt;/h2&gt;

&lt;p&gt;A tutorial MCP server and a production one are very different. Production needs OAuth 2.0, rate limiting, Docker/Kubernetes deployment, CI/CD pipelines, GDPR compliance, and threat modeling.&lt;/p&gt;

&lt;p&gt;If you want the full path — from fundamentals to deploying enterprise-grade MCP servers with Python and TypeScript — check out this &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;complete MCP course&lt;/a&gt;. 24 hours of hands-on content with real projects: PostgreSQL, external APIs, multi-server gateways, and production security patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Here
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Install Claude Desktop or Cursor as your MCP host&lt;/li&gt;
&lt;li&gt;Try a pre-built server (filesystem, PostgreSQL)&lt;/li&gt;
&lt;li&gt;Build a custom server with FastMCP or the TypeScript SDK&lt;/li&gt;
&lt;li&gt;Add HTTP transport and OAuth for remote access&lt;/li&gt;
&lt;li&gt;Deploy with Docker&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MCP is infrastructure, not a trend. The developers who learn it now will build the next generation of AI applications.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want more production-focused AI engineering content? Visit &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — courses built for developers who ship.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>🤖 How a Virtual AI Professor Is Changing the Way Romania Learns</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 13 Apr 2026 22:02:49 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/how-a-virtual-ai-professor-is-changing-the-way-romania-learns-2957</link>
      <guid>https://dev.to/cursuri-ai/how-a-virtual-ai-professor-is-changing-the-way-romania-learns-2957</guid>
      <description>&lt;h2&gt;
  
  
  🏫 The Classroom Has No Walls Anymore
&lt;/h2&gt;

&lt;p&gt;Romania isn't usually the first country that comes to mind when you think about AI-driven education. But something interesting is happening here — a small team built &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, a platform where an AI virtual professor teaches structured, university-grade courses entirely in Romanian. 🇷🇴&lt;/p&gt;

&lt;h2&gt;
  
  
  🎓 What Makes an AI Professor Different?
&lt;/h2&gt;

&lt;p&gt;Traditional e-learning platforms rely on human instructors recording content once, then distributing it forever. The content ages. The examples become irrelevant. The quizzes stay the same. 😴&lt;/p&gt;

&lt;p&gt;An AI-powered professor flips this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔄 &lt;strong&gt;Content stays current.&lt;/strong&gt; Courses reference 2025–2026 frameworks, tools, and regulations — including Romania-specific fiscal and legal context.&lt;/li&gt;
&lt;li&gt;📏 &lt;strong&gt;Every learner gets the same depth.&lt;/strong&gt; There's no "phoning it in" on module 7 because the instructor got tired. Each of the 29 courses on the platform has the same structured depth: modules, lessons, practical exercises, and quizzes.&lt;/li&gt;
&lt;li&gt;🤝 &lt;strong&gt;Non-technical people aren't left behind.&lt;/strong&gt; Half the catalog is designed for business professionals — marketing, HR, finance, real estate, entrepreneurship — not just developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; doesn't just teach you what a prompt is. It walks through advanced techniques like chain-of-thought reasoning, few-shot patterns, and evaluation frameworks — structured the way a university course would be, but accessible to anyone. 💡&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚙️ The Technical Architecture (for the Devs Reading This)
&lt;/h2&gt;

&lt;p&gt;Behind the scenes wih:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;📋 &lt;strong&gt;Plans&lt;/strong&gt; the full course structure (modules, lessons, learning objectives)&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Generates&lt;/strong&gt; each lesson in parallel using LLMs&lt;/li&gt;
&lt;li&gt;🧩 &lt;strong&gt;Assembles&lt;/strong&gt; the course with quizzes, practical exercises, and narrated audio&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Validates&lt;/strong&gt; output quality — structure, factual accuracy, quiz correctness&lt;/li&gt;
&lt;li&gt;🚢 &lt;strong&gt;Deploys&lt;/strong&gt; to production on AWS ECS Fargate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The generation pipeline catches its own mistakes — mismatched quiz keys, malformed options, missing content — and fixes them before anything goes live. It's a real production system, not a ChatGPT wrapper with a UI on top. 😏&lt;/p&gt;

&lt;h2&gt;
  
  
  🇷🇴 Why Romania, Why Now?
&lt;/h2&gt;

&lt;p&gt;Romania has a massive tech talent pool but a persistent gap in AI-specific education — especially in Romanian. Most high-quality AI content is in English, paywalled, or assumes you already have a CS degree. 😤&lt;/p&gt;

&lt;p&gt;Cursuri-AI.ro fills that gap with courses like &lt;a href="https://cursuri-ai.ro/courses/ai-lideri-business" rel="noopener noreferrer"&gt;AI for Business Leaders&lt;/a&gt;, which teaches executives how to evaluate AI projects, manage AI teams, and understand ROI — without writing a single line of code. That kind of course simply didn't exist in Romanian before. 🏆&lt;/p&gt;

&lt;p&gt;The bet is simple: &lt;strong&gt;if you lower the barrier to AI literacy in a country's native language, adoption accelerates across every industry&lt;/strong&gt; — not just tech. 📈&lt;/p&gt;

&lt;h2&gt;
  
  
  🔮 What This Means for EdTech
&lt;/h2&gt;

&lt;p&gt;The virtual AI professor model isn't just a novelty. It points to a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📚 Course catalogs can &lt;strong&gt;scale to hundreds of topics&lt;/strong&gt; without hiring hundreds of instructors&lt;/li&gt;
&lt;li&gt;♻️ Content can be &lt;strong&gt;regenerated&lt;/strong&gt; when the field evolves, instead of becoming stale&lt;/li&gt;
&lt;li&gt;🌍 &lt;strong&gt;Localization&lt;/strong&gt; becomes trivial — the same system can teach in any language with the same depth&lt;/li&gt;
&lt;li&gt;💎 &lt;strong&gt;Quality is consistent&lt;/strong&gt; — every module, every quiz, every explanation meets the same standard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This doesn't replace human mentorship. But it democratizes the structured knowledge layer that most people need before mentorship even becomes useful. 🙌&lt;/p&gt;

&lt;h2&gt;
  
  
  👀 Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you're curious, browse the course catalog at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;cursuri-ai.ro&lt;/a&gt;. The platform has 29 courses across IT and non-IT tracks, all in Romanian, all taught by the AI professor. 🎓&lt;/p&gt;

&lt;p&gt;Whether you're a developer who wants to go deep on RAG and AI agents, or a marketing lead trying to figure out how AI fits into your workflow — there's probably a course for you. ✨&lt;/p&gt;

</description>
      <category>ai</category>
      <category>web</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>How AI Is Reshaping Romania's Financial System — And What Developers Should Know</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 07 Apr 2026 23:13:38 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/how-ai-is-reshaping-romanias-financial-system-and-what-developers-should-know-2a1h</link>
      <guid>https://dev.to/cursuri-ai/how-ai-is-reshaping-romanias-financial-system-and-what-developers-should-know-2a1h</guid>
      <description>&lt;h2&gt;
  
  
  🏦 Romania's Financial Sector Is Quietly Becoming an AI Playground
&lt;/h2&gt;

&lt;p&gt;While Western Europe dominates the AI headlines, Romania's financial ecosystem is undergoing a silent transformation. From automated tax compliance to real-time fraud detection, AI is no longer a PowerPoint slide in board meetings — it's in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 The Current Landscape
&lt;/h2&gt;

&lt;p&gt;Romania's financial system is ripe for AI adoption: a complex tax code (VAT 21%, micro-enterprise thresholds at 100k EUR, multiple regimes in parallel), rapid digitization mandated by law (e-Factura, e-Transport, SAF-T, RO e-TVA), a strong developer talent pool, and full EU regulatory alignment (GDPR, EU AI Act, PSD2, DORA). High regulatory complexity + strong tech talent + EU digital mandates = massive opportunity.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Where AI Is Already Deployed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fraud Detection &amp;amp; AML&lt;/strong&gt; — Banks like Banca Transilvania, BRD, and ING Romania use ML-based transaction monitoring with gradient-boosted trees, graph neural networks, and real-time streaming, reducing false positives by up to 60%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated Tax Compliance&lt;/strong&gt; — e-Factura generates millions of XMLs monthly. AI handles auto-classification by tax category, VAT anomaly detection, and predictive compliance before ANAF flags you. ANAF itself uses AI to cross-reference e-Factura with e-Transport and SAF-T.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credit Scoring &amp;amp; Lending&lt;/strong&gt; — Beyond Biroul de Credit, fintechs like Mokka, iWanto, and Salarium integrate PSD2 transaction history, behavioral patterns, and NLP on financial documents for instant creditworthiness assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversational AI&lt;/strong&gt; — Romanian-language NLU models fine-tuned on banking domain, intent classification for transaction queries, voice AI for phone banking. The challenge: Romanian is a low-resource language for NLP.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚖️ Regulatory Framework
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;EU AI Act&lt;/strong&gt; — Credit scoring and financial risk AI = high-risk. Mandatory risk assessments, human oversight, transparency, bias testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GDPR Art. 22&lt;/strong&gt; — Citizens have the right not to be subject to purely automated decisions with legal effects. You need human-in-the-loop, explainability, and contestation mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DORA (Jan 2025)&lt;/strong&gt; — Stress-test AI models, maintain audit trails for all decisions, report AI incidents to BNR.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔧 Common Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choices&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kafka, AWS Kinesis, RabbitMQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL, ClickHouse, S3 + Parquet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PyTorch, scikit-learn, XGBoost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serving&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI + Docker, SageMaker, MLflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLMs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude API, OpenAI API, fine-tuned Llama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Evidently AI, Grafana, OpenTelemetry&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Opportunities
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Banking + AI&lt;/strong&gt; — PSD2 opened the doors but few build intelligent products on it. Personal finance, automated savings, SME cash flow prediction — all underserved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RegTech Automation&lt;/strong&gt; — e-Factura validation, SAF-T generation, tax optimization. Massive market from freelancers to enterprises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Romanian Financial NLP&lt;/strong&gt; — Huge gap in domain-specific Romanian models for finance/legal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Accounting&lt;/strong&gt; — ~70,000 Romanian accounting firms still semi-manual. Auto-categorization, reconciliation, and declaration generation would be transformative.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Want to dive deeper? &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; covers AI applications across finance, business, and tech — 28 professional courses in Romanian, each with an integrated AI tutor 24/7.&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fintech sector: &lt;strong&gt;34% YoY&lt;/strong&gt; growth in transaction volume&lt;/li&gt;
&lt;li&gt;e-Factura: &lt;strong&gt;200M+ invoices/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Banking IT spending: &lt;strong&gt;+28%&lt;/strong&gt; in two years&lt;/li&gt;
&lt;li&gt;EU AI Act compliance: creating a new wave of demand for regulation-aware AI engineers&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Romania's financial system is at an inflection point. Mandatory digitization + EU regulation + strong dev community = AI isn't optional, it's required. Whether you're building fraud models, automating tax compliance, or creating Romanian-language financial assistants — the demand is real and growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your experience with AI in financial systems? Drop a comment 👇&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Learn AI hands-on, in Romanian: &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — 28 professional courses from AI Engineering to Finance AI, each with a 24/7 AI tutor built into every lesson.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
