<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: galian</title>
    <description>The latest articles on DEV Community by galian (@galian).</description>
    <link>https://dev.to/galian</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3827330%2F5a53ab61-2fc1-4072-a44e-873913dd8cd7.png</url>
      <title>DEV Community: galian</title>
      <link>https://dev.to/galian</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/galian"/>
    <language>en</language>
    <item>
      <title>Claude Sonnet 5 Just Made Running Agents Cheap — What Builders Actually Need to Know</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 30 Jun 2026 22:06:37 +0000</pubDate>
      <link>https://dev.to/galian/claude-sonnet-5-just-made-running-agents-cheap-what-builders-actually-need-to-know-11j7</link>
      <guid>https://dev.to/galian/claude-sonnet-5-just-made-running-agents-cheap-what-builders-actually-need-to-know-11j7</guid>
      <description>&lt;p&gt;Anthropic shipped &lt;strong&gt;Claude Sonnet 5&lt;/strong&gt; on June 30, 2026, and the framing in the announcement is unusually blunt for a model launch: it's pitched as the most &lt;em&gt;agentic&lt;/em&gt; Sonnet yet — a model built to make plans, drive tools like browsers and terminals, and run autonomously at a level that, a few months ago, took something bigger and more expensive.&lt;/p&gt;

&lt;p&gt;For anyone building on top of these models — agents, pipelines, coding tools — that's the headline that matters. Not "it's smarter," but "near-frontier capability just got cheaper to run in a loop." I write and teach about agentic engineering at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, Eastern Europe's AI education platform, so I'll keep this grounded in what changes for people who actually ship on these APIs — not the launch-day benchmark theater.&lt;/p&gt;

&lt;p&gt;One disclaimer up front: model pricing and availability in this space change almost monthly, and this is a day-one snapshot. Verify the current numbers on Anthropic's official pages before you wire anything to a budget. I'm deliberately &lt;em&gt;not&lt;/em&gt; quoting benchmark scores here — the launch materials presented them in a way that's easy to misread, so for hard numbers go straight to the &lt;a href="https://www.anthropic.com/claude-sonnet-5-system-card" rel="noopener noreferrer"&gt;Sonnet 5 System Card&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one-sentence version
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Sonnet 5 moves "good enough to run agents autonomously" down a price tier — and ships a new tokenizer that can quietly inflate your token counts by up to 35%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both halves of that sentence matter, and the second one is the part nobody puts on a launch slide. Let's take them in order.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually new
&lt;/h2&gt;

&lt;p&gt;Stripping the marketing down to verifiable claims from Anthropic's own announcement, here's what Sonnet 5 is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The most agentic Sonnet so far.&lt;/strong&gt; It's described as able to "make plans, use tools like browsers and terminals, and run autonomously," with improvements specifically in multi-step tool use — the exact workload that defines an agent rather than a chatbot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Close to Opus 4.8 — at a lower price.&lt;/strong&gt; Anthropic's own phrasing is that its "performance is close to that of Opus 4.8, but at lower prices." That's the whole pitch: most of the capability, a fraction of the cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A real step up from Sonnet 4.6.&lt;/strong&gt; Called a "substantial improvement over its predecessor, Sonnet 4.6, on important aspects of agentic performance like reasoning, tool use, coding, and knowledge work."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safer in agentic contexts.&lt;/strong&gt; Anthropic reports an "overall lower rate of undesirable behaviors than Sonnet 4.6," plus lower rates of hallucination and sycophancy — which matters more than it sounds when a model is acting in a loop without a human reading every step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deliberately weaker at offensive cyber.&lt;/strong&gt; It shows "substantially poorer performance than models such as Opus 4.8" on dangerous cyber tasks and was "never able to develop a full working exploit." That's a safety design choice, not an oversight — worth knowing if security tooling is your domain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two things Anthropic did &lt;strong&gt;not&lt;/strong&gt; publish that I'm not going to invent for you: an official &lt;strong&gt;context window&lt;/strong&gt; and &lt;strong&gt;max output token&lt;/strong&gt; figure for Sonnet 5 weren't stated in the launch materials at the time of writing. If you need those for capacity planning, pull them from the official API docs rather than trusting a blog (including this one). Guessing is how teams ship broken truncation logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The economics shift is the real story
&lt;/h2&gt;

&lt;p&gt;Here's why builders should care more than end users.&lt;/p&gt;

&lt;p&gt;When you chat with a model, price-per-token is almost noise — you send a few thousand tokens and read the answer. When you run an &lt;strong&gt;agent&lt;/strong&gt;, the model is in a loop: read context, call a tool, read the result, reason, call another tool, repeat. A single "task" can burn hundreds of thousands of tokens across dozens of turns. At that volume, the price-per-million-tokens line &lt;em&gt;is&lt;/em&gt; your unit economics.&lt;/p&gt;

&lt;p&gt;So a model that lands near Opus-4.8 quality at Sonnet pricing doesn't just make chat cheaper — it changes which agent designs are economically viable at all. Workflows you'd previously gate behind Opus (multi-step research, autonomous refactors, long tool-using runs) become defensible on a Sonnet budget. That's the unlock.&lt;/p&gt;

&lt;p&gt;Here's the day-one pricing picture, with the rest of the current Anthropic lineup for context:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input / 1M tokens&lt;/th&gt;
&lt;th&gt;Output / 1M tokens&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Sonnet 5&lt;/strong&gt; (intro, through Aug 31 2026)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Promotional launch pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Sonnet 5&lt;/strong&gt; (standard, from Sep 1 2026)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$15&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same as Sonnet 4.6's tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.8&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;$25&lt;/td&gt;
&lt;td&gt;Top accuracy; default in Claude Code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haiku 4.5&lt;/td&gt;
&lt;td&gt;$1&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;Cheapest / fastest tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few honest notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;introductory $2 / $10&lt;/strong&gt; runs through &lt;strong&gt;August 31, 2026&lt;/strong&gt;, then settles to &lt;strong&gt;$3 / $15&lt;/strong&gt; — the same standard tier Sonnet has occupied. So the long-run story isn't "Sonnet got cheaper"; it's "the Sonnet tier got dramatically more capable for the same price."&lt;/li&gt;
&lt;li&gt;Sonnet 5 is the &lt;strong&gt;default model on Free and Pro plans&lt;/strong&gt;, and is available to Max, Team, and Enterprise users — in Claude Code, the Claude platform, and the API. So if you're on Claude Code, you may already be one model-switch away from it.&lt;/li&gt;
&lt;li&gt;Against Opus 4.8 the price ratio is roughly &lt;strong&gt;1.7×&lt;/strong&gt; (output $25 vs $15). When you're running agents at scale, that multiple compounds fast — which is exactly why the "close to Opus" claim is worth pressure-testing on &lt;em&gt;your&lt;/em&gt; workload, not taking on faith.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The tokenizer gotcha that will mess up your cost math
&lt;/h2&gt;

&lt;p&gt;This is the part I most want builders to internalize, because it's the easiest way to get a nasty surprise on your next invoice.&lt;/p&gt;

&lt;p&gt;Sonnet 5 ships with an &lt;strong&gt;updated tokenizer&lt;/strong&gt;. Anthropic states that the same input text now maps to &lt;strong&gt;roughly 1.0–1.35× as many tokens&lt;/strong&gt; as before, depending on content type. Read that again: identical prompts can cost up to &lt;strong&gt;35% more tokens&lt;/strong&gt; on Sonnet 5 than the token count you measured on an older model — &lt;em&gt;before&lt;/em&gt; any change in per-token price.&lt;/p&gt;

&lt;p&gt;Why it bites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your cost dashboards, budget alerts, and per-request estimates were calibrated on the old tokenizer. Swap the model without re-measuring and your "same" workload silently costs more.&lt;/li&gt;
&lt;li&gt;Code, structured data (JSON/XML), and non-English text tend to sit at the higher end of that multiplier — and those are precisely the inputs agentic and coding workloads are made of.&lt;/li&gt;
&lt;li&gt;It interacts with context windows and truncation: more tokens for the same text means you hit limits sooner than your old math predicts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The fix is boring and non-negotiable: re-baseline.&lt;/strong&gt; Before you flip production traffic to Sonnet 5, measure real token counts on a representative sample of &lt;em&gt;your&lt;/em&gt; prompts with the new tokenizer, recompute cost per task, and update your budgets and alerts. The headline price drop is real — but the effective saving is &lt;code&gt;(price delta) × (token inflation)&lt;/code&gt;, and you can't know the second factor without measuring. Anyone who tells you "it's 33% cheaper" did half the arithmetic.&lt;/p&gt;

&lt;p&gt;This is also where good &lt;strong&gt;evals&lt;/strong&gt; earn their keep. A model swap isn't just a cost change; it's a behavior change. Run your task suite on Sonnet 5 against the model you're replacing before you commit — quality, tool-call success rate, and cost together. If you don't have an eval harness yet, this is the launch that should convince you to build one; it's a discipline we treat as core, not optional, in our &lt;a href="https://cursuri-ai.ro/courses/ai-evals-llm-productie" rel="noopener noreferrer"&gt;course on building LLM evals for production&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to still reach for Opus 4.8
&lt;/h2&gt;

&lt;p&gt;"Close to Opus" is not "Opus." The honest read on where Sonnet 5 fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reach for Sonnet 5&lt;/strong&gt; as your default agent workhorse: high-volume tool-using loops, coding assistance, research and summarization, anything where you're paying per turn and the marginal quality of Opus isn't worth ~1.7× the output cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stay on Opus 4.8&lt;/strong&gt; for the hardest reasoning, the highest-stakes accuracy, and security-sensitive work where Sonnet 5 is &lt;em&gt;intentionally&lt;/em&gt; weaker (offensive-cyber tasks). When a wrong answer is expensive, the price gap is cheap insurance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern most production teams land on isn't "pick one." It's a &lt;strong&gt;router&lt;/strong&gt;: Sonnet 5 handles the bulk of turns, and you escalate to Opus 4.8 for the steps that genuinely need it — with a human in the loop on the consequential ones. Getting that routing logic right (and knowing which task belongs in which tier) is a real engineering skill, and it's the through-line of our &lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;model-comparison course&lt;/a&gt;, which treats "which model for which job" as a decision you make with data rather than vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  A pragmatic migration checklist
&lt;/h2&gt;

&lt;p&gt;If you're considering moving an agent or pipeline to Sonnet 5, here's the order I'd do it in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Re-baseline tokens.&lt;/strong&gt; Run a representative sample through the new tokenizer. Recompute cost per task. Update budget alerts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run your evals.&lt;/strong&gt; Quality, tool-call success, latency, and cost, head-to-head against the model you're replacing. No eval suite? Build a small one first — even 30 representative tasks beats a gut call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shadow, then canary.&lt;/strong&gt; Route a slice of real traffic to Sonnet 5, compare outputs, then scale gradually. Don't flip 100% on day one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep an escalation path.&lt;/strong&gt; Wire Opus 4.8 as the fallback for tasks that fail Sonnet 5's quality bar. Routing beats an all-or-nothing bet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-read your safety posture.&lt;/strong&gt; Lower hallucination and sycophancy is good news for autonomous runs, but "safer" isn't "supervise nothing." Keep guardrails and human checkpoints where consequences are real.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is exotic. It's the same discipline that separates teams who run agents in production from teams who demo them — and it's exactly the muscle we build in our hands-on track on &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI agents and automation&lt;/a&gt;, taught around real repositories rather than toy notebooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Claude Sonnet 5 better than Opus 4.8?
&lt;/h3&gt;

&lt;p&gt;Not across the board. Anthropic positions Sonnet 5's performance as &lt;em&gt;close to&lt;/em&gt; Opus 4.8 at a lower price — so for high-volume agentic and coding work it's often the better &lt;em&gt;value&lt;/em&gt;, but Opus 4.8 still leads on the hardest reasoning, top-end accuracy, and (deliberately) on offensive-cyber capability. Match the tier to the task instead of picking a favorite.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does Claude Sonnet 5 cost?
&lt;/h3&gt;

&lt;p&gt;It launched with introductory pricing of &lt;strong&gt;$2 per million input tokens and $10 per million output tokens through August 31, 2026&lt;/strong&gt;, then moves to a standard &lt;strong&gt;$3 / $15&lt;/strong&gt; — the same tier Sonnet 4.6 occupied. Your &lt;em&gt;effective&lt;/em&gt; cost also depends on the new tokenizer (see below), so measure before you budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the new tokenizer really change my costs?
&lt;/h3&gt;

&lt;p&gt;Yes. Anthropic states the same input can map to roughly &lt;strong&gt;1.0–1.35× as many tokens&lt;/strong&gt; under Sonnet 5's updated tokenizer, depending on content type — code and structured data sit at the higher end. Re-measure your real prompts before assuming the headline price drop equals your actual saving.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Sonnet 5 in Claude Code?
&lt;/h3&gt;

&lt;p&gt;Yes. It's available in Claude Code, the Claude platform, and the API, and it's the default model on Free and Pro plans (and available to Max, Team, and Enterprise). If you're already in Claude Code, switching is a model selection, not a migration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I migrate my agents to Sonnet 5 immediately?
&lt;/h3&gt;

&lt;p&gt;Don't flip production on day one. Re-baseline token counts, run your eval suite head-to-head against your current model, then canary a slice of traffic before scaling — and keep an escalation path to Opus 4.8 for tasks that need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill underneath the model
&lt;/h2&gt;

&lt;p&gt;Here's the part the launch posts skip: a cheaper, more agentic model doesn't make anyone a better builder. It just makes the &lt;em&gt;consequences&lt;/em&gt; of your design bigger — cheaper to be right at scale, and cheaper to be confidently wrong at scale. Point Sonnet 5's autonomy at a vague spec and you get a fast, plausible wall of actions you didn't design and can't fully audit.&lt;/p&gt;

&lt;p&gt;The developers getting real leverage from this launch aren't the ones who memorized the new price-per-token. They're the ones who understand agent architecture, context engineering, evals, and cost modeling well enough to know &lt;em&gt;when&lt;/em&gt; the cheap-and-autonomous option is the right call and when it's a trap. That foundation — taught around real repositories with an interactive AI instructor, not slide decks — is what we build at our &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Eastern European AI education platform&lt;/a&gt;, including a dedicated, hands-on track on &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;agentic coding with Claude Code&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Claude Sonnet 5 is a genuinely significant release for builders, but not for the reason most coverage leads with. The story isn't a benchmark number — it's that near-frontier agentic capability just moved down a price tier, which changes which agent designs are economically worth shipping. The catch is the new tokenizer: the real saving is the price drop &lt;em&gt;minus&lt;/em&gt; token inflation, and you only learn the second number by measuring.&lt;/p&gt;

&lt;p&gt;So don't migrate on the headline. Re-baseline your tokens, run your evals, canary your traffic, and keep Opus 4.8 one route away for the work that needs it. Do that, and Sonnet 5 is one of the better deals in the 2026 model lineup. Skip it, and you'll find out the hard way — on your invoice.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by the team at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — practical, hands-on AI engineering courses for developers and professionals across Eastern Europe, from agentic coding and AI agents to evals, context engineering, and the modern AI-native workflow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://www.anthropic.com/news/claude-sonnet-5" rel="noopener noreferrer"&gt;Introducing Claude Sonnet 5 — Anthropic&lt;/a&gt; · &lt;a href="https://www.anthropic.com/claude-sonnet-5-system-card" rel="noopener noreferrer"&gt;Claude Sonnet 5 System Card&lt;/a&gt; · &lt;a href="https://platform.claude.com/docs/en/about-claude/pricing" rel="noopener noreferrer"&gt;Claude Platform — Pricing&lt;/a&gt; · &lt;a href="https://claude.com/pricing" rel="noopener noreferrer"&gt;Claude Pricing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>cursor</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Cursor vs GitHub Copilot vs Claude Code: Which AI Coding Tool in 2026?</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 29 Jun 2026 14:59:29 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/cursor-vs-github-copilot-vs-claude-code-which-ai-coding-tool-in-2026-6c8</link>
      <guid>https://dev.to/cursuri-ai/cursor-vs-github-copilot-vs-claude-code-which-ai-coding-tool-in-2026-6c8</guid>
      <description>&lt;p&gt;If you write code for a living in 2026, you're not asking &lt;em&gt;whether&lt;/em&gt; to use an AI coding tool — you're asking &lt;em&gt;which one&lt;/em&gt;. And the three names that dominate every team's Slack debate are &lt;strong&gt;Cursor&lt;/strong&gt;, &lt;strong&gt;GitHub Copilot&lt;/strong&gt;, and &lt;strong&gt;Claude Code&lt;/strong&gt;. They look similar from a distance (type intent, get code) but they're built on three genuinely different bets about how software gets written.&lt;/p&gt;

&lt;p&gt;I've spent serious time in all three on real, multi-file, multi-repo work — not toy demos — and this is the comparison I wish someone had handed me before I burned a month figuring it out. I write and teach about agentic engineering at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, Eastern Europe's AI education platform, so I'll keep this grounded in how these tools actually behave in production, not in launch-day marketing.&lt;/p&gt;

&lt;p&gt;A note before we start: pricing and features in this category change almost monthly. Everything below is a mid-2026 snapshot — verify the current numbers on each tool's official page before you budget for a team.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR — three different philosophies
&lt;/h2&gt;

&lt;p&gt;Here's the one-sentence version of each, before we go deep:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; is an &lt;strong&gt;AI-native editor&lt;/strong&gt; — it rebuilt the IDE around the agent. Best for developers who want fast, fluid, in-the-flow generation with deep editor integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Copilot&lt;/strong&gt; is the &lt;strong&gt;ecosystem play&lt;/strong&gt; — it lives where your code, issues, and PRs already are. Best for teams standardized on GitHub who want AI woven through the whole SDLC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; is the &lt;strong&gt;terminal-first agent&lt;/strong&gt; — it treats the command line as the primary surface and excels at autonomous, multi-step, multi-file work. Best for engineers comfortable orchestrating agents rather than babysitting autocomplete.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of them is "the best." They optimize for different moments, and the real skill is knowing which to reach for. Let's break down why.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Cursor?
&lt;/h2&gt;

&lt;p&gt;Cursor is an AI-native IDE built as a fork of VS Code, so the editor feels instantly familiar — your extensions, keybindings, and themes mostly carry over. What's different is that the AI isn't bolted on as a plugin; the whole editing experience is designed around it.&lt;/p&gt;

&lt;p&gt;Its signature features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tab completion&lt;/strong&gt; — a multi-line, context-aware autocomplete that predicts your &lt;em&gt;next edit&lt;/em&gt;, not just the next token. It's the feature people miss most when they switch away.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composer&lt;/strong&gt; — Cursor's agentic, multi-file editing mode. You describe a change in natural language and it edits across files, runs commands, and iterates. Cursor now ships &lt;strong&gt;Composer 2.5&lt;/strong&gt;, its own model trained specifically for agentic coding, alongside routing to frontier models from Anthropic, OpenAI, and Google.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Agents&lt;/strong&gt; — introduced in the Cursor 3.5 release (May 20, 2026), these run in isolated cloud VMs with terminal and browser access, can work across multiple repos in parallel, and report results back to your IDE asynchronously. It's Cursor's answer to "I want the agent working while I do something else."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cursor's center of gravity is &lt;strong&gt;in-the-flow coding&lt;/strong&gt;: you stay in the editor, you see every diff, and the AI keeps pace with your thinking. It rewards developers who want speed without giving up granular control over the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is GitHub Copilot?
&lt;/h2&gt;

&lt;p&gt;Copilot is the most widely deployed of the three, and its biggest advantage is gravitational: it lives inside the tools and platform most teams already use. It runs in VS Code, JetBrains IDEs, Visual Studio, and on GitHub itself.&lt;/p&gt;

&lt;p&gt;By 2026 Copilot has grown well past autocomplete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent mode&lt;/strong&gt; became generally available across both VS Code and JetBrains in March 2026 (previously VS Code only) — a multi-step agent that plans, edits across files, and runs commands inside your editor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The autonomous coding agent&lt;/strong&gt; is the standout. You assign a GitHub issue to Copilot, and it works asynchronously in the background — analyzing the repo, making changes, and opening a ready-to-review pull request. Assign, walk away, come back to a PR. It's the closest any mainstream tool comes to "fire-and-forget" feature work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic code review&lt;/strong&gt; gathers full project context before suggesting changes and can hand fixes straight to the coding agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Spark&lt;/strong&gt; lets you describe an app in plain English and get generated code with a live preview.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strategic point: Copilot's value isn't any single feature — it's that AI is now threaded through the entire GitHub-centric SDLC, from issue to PR to review. If your team lives on GitHub, that integration is hard to beat.&lt;/p&gt;

&lt;p&gt;One billing change worth flagging: as of June 1, 2026, GitHub moved to &lt;strong&gt;GitHub AI Credits&lt;/strong&gt; (token-based billing) in place of the older Premium Request Units. You're now billed by tokens processed at published model rates, which makes heavy agent usage more transparent — and easier to accidentally overspend if you're not watching.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Claude Code?
&lt;/h2&gt;

&lt;p&gt;Claude Code, from Anthropic, takes the opposite stance from Cursor: instead of building an editor, it makes the &lt;strong&gt;terminal&lt;/strong&gt; the primary surface (with IDE extensions available on top). That sounds minimalist until you see what it does with full shell access.&lt;/p&gt;

&lt;p&gt;Its defining strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic, multi-file, repo-aware work&lt;/strong&gt; from the command line — it reads your codebase, makes coordinated changes across many files, runs your tests, and handles git operations and CI-aware workflows natively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents&lt;/strong&gt; — reusable agent configurations with their own custom prompts and tool access, so you can define a "reviewer," a "test-writer," or a "migration" agent and invoke it on demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent teams and multi-agent orchestration&lt;/strong&gt; — coordinate multiple agent sessions working in parallel, with an agent view dashboard to manage them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code runs on Anthropic's models — currently Claude Opus 4.8 as the default, with the newer Claude Fable 5 as the most capable tier — and it's deliberately model-opinionated rather than a router. The tradeoff is real: it's the most powerful for autonomous, complex tasks, and the least hand-holdy. It assumes you're comfortable thinking like an &lt;em&gt;orchestrator of agents&lt;/em&gt; rather than a writer of lines.&lt;/p&gt;

&lt;p&gt;A word of caution that applies to every agent platform but bites hardest here: &lt;strong&gt;parallel agents multiply your token spend.&lt;/strong&gt; Running ten agents at once consumes your quota roughly ten times faster. The autonomy is exhilarating; the bill is real. Set limits before you scale up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-head: the dimensions that actually matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The editing model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; wins on &lt;em&gt;in-editor flow&lt;/em&gt;. Tab completion and inline diffs keep you in control of every change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; wins on &lt;em&gt;breadth of surface&lt;/em&gt; — it's good everywhere your code already is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; wins on &lt;em&gt;autonomous depth&lt;/em&gt; — it goes furthest without supervision, but you give up the inline, line-by-line feel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agents and autonomy
&lt;/h3&gt;

&lt;p&gt;All three now have agents, but the philosophy differs. Cursor's Cloud Agents and Copilot's coding agent are both "assign work, get a result later." Claude Code goes further with explicit multi-agent orchestration and reusable subagents. If your work is increasingly &lt;em&gt;delegating&lt;/em&gt; rather than &lt;em&gt;typing&lt;/em&gt;, this is the dimension to weigh most — and it's exactly the shift that makes understanding &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI agent architecture and automation&lt;/a&gt; a genuine career edge rather than a nice-to-have.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem and integration
&lt;/h3&gt;

&lt;p&gt;This is Copilot's home turf. The issue-to-PR loop, native code review, and presence across every major IDE make it the path of least resistance for GitHub-standardized teams. Cursor integrates deeply but inside &lt;em&gt;its&lt;/em&gt; editor; Claude Code integrates deeply with your &lt;em&gt;shell and git&lt;/em&gt;, which is either liberating or intimidating depending on your comfort with the command line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models
&lt;/h3&gt;

&lt;p&gt;Cursor routes across many frontier models and adds its own Composer model. Copilot offers a model picker. Claude Code is Anthropic-only by design. If model choice matters to you (and for some workloads it genuinely does), Cursor and Copilot give you more knobs; Claude Code bets that a tightly-integrated, top-tier model beats a buffet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing, side by side (mid-2026 snapshot)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Entry&lt;/th&gt;
&lt;th&gt;Mid tier&lt;/th&gt;
&lt;th&gt;Power / team&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cursor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hobby (free)&lt;/td&gt;
&lt;td&gt;Pro — $20/user/mo&lt;/td&gt;
&lt;td&gt;Teams — $40/user/mo (Standard), $120/user/mo (Premium)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Pro — $10/mo · Pro+ — $39/mo&lt;/td&gt;
&lt;td&gt;Max — $100/mo · Business / Enterprise seats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pro — $20/mo&lt;/td&gt;
&lt;td&gt;Max 5× — $100/mo&lt;/td&gt;
&lt;td&gt;Max 20× — $200/mo · API pay-per-token&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few honest caveats on cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; has the cheapest entry paid tier ($10), but token-based AI Credits mean heavy agent use can climb fast beyond the included allotment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor's&lt;/strong&gt; $20 Pro includes a fixed amount of frontier-model usage; power users hit the ceiling and either upgrade or switch to its cheaper Auto/Composer routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code's&lt;/strong&gt; Max tiers are priced for sustained, agent-heavy sessions — and again, parallel agents are a multiplier, not an add.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prices and tiers shift constantly in this category. Treat the table as a snapshot, not a quote, and confirm before committing a team budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which one should you choose?
&lt;/h2&gt;

&lt;p&gt;Here's the honest, persona-based answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor if&lt;/strong&gt; you want the best in-editor experience, you value fast inline generation and tight control over every diff, and you're happy living inside a (very good) VS Code fork. It's the most natural upgrade for a developer who loves their editor and wants AI to keep pace with their flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose GitHub Copilot if&lt;/strong&gt; your team is standardized on GitHub and you want AI woven through the entire lifecycle — issues, PRs, reviews — across whatever IDEs your team already uses. The issue-to-PR autonomous agent alone can change how a team ships. It's the safest institutional bet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Claude Code if&lt;/strong&gt; you're comfortable in the terminal, your work skews toward complex multi-file refactors and autonomous tasks, and you want to orchestrate agents rather than supervise autocomplete. It has the highest ceiling for autonomy — and asks the most of you in return.&lt;/p&gt;

&lt;p&gt;And the answer most senior engineers actually land on? &lt;strong&gt;More than one.&lt;/strong&gt; Plenty of us keep Cursor open for flow-state editing, lean on Copilot inside the GitHub workflow, and fire up Claude Code for the gnarly autonomous jobs. The tools overlap, but they're not redundant — they're a toolkit. The real meta-skill isn't loyalty to one editor; it's &lt;strong&gt;fluency across the category&lt;/strong&gt; so you instinctively reach for the right one per task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill underneath the tools
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth that the demos hide: these tools amplify the engineer you already are. Point a powerful agent at a vague intent and you get a fast, confident wall of code you didn't design and can't fully maintain. The developers getting outsized leverage from Cursor, Copilot, and Claude Code aren't the ones who learned the keyboard shortcuts — they're the ones who understand agent architecture, context engineering, and how to specify intent precisely enough that autonomy becomes an asset instead of a liability.&lt;/p&gt;

&lt;p&gt;That foundation is exactly what we build at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI education platform&lt;/a&gt; for Eastern Europe — practical, project-based courses taught around real repositories with an interactive AI instructor, not slide decks. If you want to go from "I use these tools" to "I get serious leverage from them," we maintain dedicated, hands-on tracks for &lt;a href="https://cursuri-ai.ro/courses/cursor-pro" rel="noopener noreferrer"&gt;using Cursor as a pro&lt;/a&gt; and for &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;agentic coding with Claude Code&lt;/a&gt; — both built around real multi-file, real-repo workflows rather than toy examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In 2026, "AI coding tool" isn't one product category — it's three philosophies wearing similar clothes. Cursor bet on the editor, Copilot bet on the ecosystem, and Claude Code bet on the terminal-native agent. Each is genuinely excellent at the thing it optimized for, and genuinely compromised at the things it didn't.&lt;/p&gt;

&lt;p&gt;So don't ask "which is best." Ask "best at what, for whom, doing which task" — and then build the judgment to switch fluently between them. That judgment, not the tool, is what compounds over a career. Try each one on a real feature, not a demo, and you'll feel the differences fast.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by the team at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — practical, hands-on AI engineering courses for developers and professionals across Eastern Europe, from agentic coding and AI agents to context engineering and the modern AI-native IDE workflow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://cursor.com/docs/models-and-pricing" rel="noopener noreferrer"&gt;Cursor Models &amp;amp; Pricing&lt;/a&gt; · &lt;a href="https://github.com/features/copilot/plans" rel="noopener noreferrer"&gt;GitHub Copilot Plans &amp;amp; Pricing&lt;/a&gt; · &lt;a href="https://docs.github.com/en/copilot/get-started/plans" rel="noopener noreferrer"&gt;GitHub Copilot Plans (Docs)&lt;/a&gt; · &lt;a href="https://claude.com/pricing" rel="noopener noreferrer"&gt;Claude Pricing&lt;/a&gt; · &lt;a href="https://platform.claude.com/docs/en/about-claude/pricing" rel="noopener noreferrer"&gt;Claude Platform Docs — Pricing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Kiro: A Practical Guide to AWS's Spec-Driven Agentic IDE"</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Fri, 26 Jun 2026 21:45:36 +0000</pubDate>
      <link>https://dev.to/galian/kiro-a-practical-guide-to-awss-spec-driven-agentic-ide-26o9</link>
      <guid>https://dev.to/galian/kiro-a-practical-guide-to-awss-spec-driven-agentic-ide-26o9</guid>
      <description>&lt;p&gt;If you've spent any time with AI coding assistants, you know the failure mode: you write a vague prompt, the agent generates a wall of plausible-looking code, and twenty minutes later you're debugging something you didn't design and don't fully understand. &lt;strong&gt;Kiro&lt;/strong&gt;, the agentic IDE from AWS, is a bet that the fix isn't a smarter autocomplete — it's making a &lt;em&gt;specification&lt;/em&gt; the unit of work instead of a prompt.&lt;/p&gt;

&lt;p&gt;I've been digging into how Kiro actually works, and this is the practical guide I wish I'd had on day one: what spec-driven development really means, how agent hooks and steering files change your workflow, where Kiro fits next to tools like Cursor and Claude Code, and when it's worth it. I write and teach about agentic engineering at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, Eastern Europe's AI education platform, so I'll keep this grounded in how these tools behave in real projects rather than in launch-day hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Kiro?
&lt;/h2&gt;

&lt;p&gt;Kiro is an agentic IDE built on the Code OSS platform — the same open-source foundation behind VS Code — which means the editor itself feels immediately familiar. What's different is the engine. Instead of treating each request as a one-off chat turn, Kiro is designed to turn a high-level prompt into a structured &lt;strong&gt;spec&lt;/strong&gt;, then drive implementation, tests, and documentation from that spec.&lt;/p&gt;

&lt;p&gt;The headline idea, in Kiro's own framing, is "moving beyond AI coding to agentic engineering." That sounds like marketing until you see the artifacts it produces. A feature request doesn't become a blob of code — it becomes three reviewable files: requirements, design, and tasks. You stay in the loop at each stage. The agent does the typing; you keep the judgment.&lt;/p&gt;

&lt;p&gt;It's worth being precise about what Kiro is &lt;em&gt;not&lt;/em&gt;: it isn't an AWS cloud service you provision in a console, and it doesn't lock you into AWS infrastructure to write code. It's a desktop IDE. You can point it at any project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec-driven development: the core idea
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools optimize for speed-to-first-keystroke. Spec-driven development optimizes for &lt;em&gt;correctness-to-intent&lt;/em&gt; — does the code match what you actually meant? Kiro does this by formalizing the part of engineering we usually skip when we're moving fast: writing down what we're building before we build it.&lt;/p&gt;

&lt;p&gt;When you describe a feature, Kiro generates a spec in three phases:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Requirements
&lt;/h3&gt;

&lt;p&gt;Kiro turns your prompt into user stories with explicit acceptance criteria, written in &lt;strong&gt;EARS notation&lt;/strong&gt; (Easy Approach to Requirements Syntax). EARS is a lightweight, real technique for writing testable requirements — patterns like &lt;em&gt;"When [trigger], the system shall [response]"&lt;/em&gt;. The value is that ambiguity gets surfaced &lt;em&gt;before&lt;/em&gt; code exists. If your one-line prompt was underspecified, you'll see it in the requirements draft and can correct it in seconds, not after a debugging session.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Design
&lt;/h3&gt;

&lt;p&gt;Next, Kiro produces a technical design: the architecture, the components, the data flow, and the implementation approach. This is the document a senior engineer would normally write (or wish a junior had written) before touching the codebase. You review it, push back, and refine.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tasks
&lt;/h3&gt;

&lt;p&gt;Finally, the design becomes a sequenced task list — discrete, trackable units of work the agent implements in order. Because tasks are explicit, you get accountability: you can see what's done, what's in progress, and what's left, instead of trusting a black box.&lt;/p&gt;

&lt;p&gt;The payoff is maintainability. A spec that lives in your repo is documentation that doesn't rot, because it &lt;em&gt;is&lt;/em&gt; the thing the agent built from. Six months later, the requirements and design files explain &lt;em&gt;why&lt;/em&gt; the code looks the way it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent hooks: automation that runs itself
&lt;/h2&gt;

&lt;p&gt;The second pillar is &lt;strong&gt;agent hooks&lt;/strong&gt; — automated triggers that fire agent prompts or shell commands when something happens in your IDE. Instead of remembering to run the linter, regenerate tests, or scan for secrets, you wire those actions to events once and forget about them.&lt;/p&gt;

&lt;p&gt;Hooks can be triggered by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File events&lt;/strong&gt; — a file is created, saved, or deleted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt and agent lifecycle events&lt;/strong&gt; — prompt submit, agent stop, pre/post tool use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec task events&lt;/strong&gt; — before or after a task executes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual triggers&lt;/strong&gt; — a button you press on demand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, hooks are just JSON files. Workspace-level hooks live in &lt;code&gt;.kiro/hooks/&lt;/code&gt;, and user-level hooks in &lt;code&gt;~/.kiro/hooks/&lt;/code&gt;. You can create them three ways: describe what you want in plain English and let Kiro generate the JSON, fill out a form, or write the JSON by hand. The practical version of this: every time you save a file, a hook can run your tests and a security scan automatically, so problems surface the moment they're introduced — not in CI an hour later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steering files: stop repeating yourself
&lt;/h2&gt;

&lt;p&gt;If you've ever pasted "remember, we use tabs not spaces, we use Vitest not Jest, and never import from the legacy module" into chat for the hundredth time, &lt;strong&gt;steering files&lt;/strong&gt; are the fix. Steering gives Kiro persistent knowledge about your project through markdown files, so your conventions, libraries, and standards are applied consistently without re-explaining them every session.&lt;/p&gt;

&lt;p&gt;Steering files can be scoped two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workspace steering&lt;/strong&gt; lives in &lt;code&gt;.kiro/steering/&lt;/code&gt; and applies only to that project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global steering&lt;/strong&gt; applies across everything you build&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is essentially context engineering applied to a coding agent — encoding the durable knowledge an agent needs so it behaves like a teammate who's read your style guide, not a contractor seeing the repo for the first time. If you want to go deep on the discipline behind this, persistent memory and context strategy for agents is exactly what our &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;context engineering and agent memory course&lt;/a&gt; covers end to end.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP and agentic chat
&lt;/h2&gt;

&lt;p&gt;Beyond specs, hooks, and steering, Kiro ships the features you'd expect from a modern AI editor. It supports the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; for connecting external tools and data sources to the agent, and it includes an agentic chat with context providers for files, URLs, and docs for the ad-hoc work that doesn't justify a full spec.&lt;/p&gt;

&lt;p&gt;MCP support matters more than it sounds. It's the open standard that lets an agent reach your database, your ticketing system, your internal docs — without bespoke glue for each one. If MCP is new to you, building and integrating MCP servers is its own skill set; our &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP course&lt;/a&gt; walks through standing up real servers and wiring them into agentic workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your first hour with Kiro
&lt;/h2&gt;

&lt;p&gt;The fastest way to understand the workflow is to feel the loop once on a real, small feature. In practice it looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Describe the feature in plain language.&lt;/strong&gt; Not "build an app" — something concrete like "add an endpoint that returns a user's last five orders, paginated."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review the requirements.&lt;/strong&gt; Kiro drafts user stories and acceptance criteria. This is where you catch the ambiguity: did you mean five orders total, or five per page? Fix it in the spec, where it costs nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review the design.&lt;/strong&gt; Check that the proposed architecture matches your codebase's conventions — and if it doesn't, that's a sign your steering files need to capture those conventions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let it work the task list.&lt;/strong&gt; The agent implements tasks in sequence; you watch and intervene where judgment is needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire one hook.&lt;/strong&gt; Even a single "run tests on save" hook changes how the session feels — feedback becomes immediate instead of deferred.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Do that once and the abstract pitch — "specs as the unit of work" — turns concrete. The discipline isn't heavy; it's front-loaded, and the front-loading is where the bugs you didn't ship were quietly avoided.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kiro vs. vibe coding
&lt;/h2&gt;

&lt;p&gt;"Vibe coding" — prompting your way to an app on feel, accepting whatever the model produces — is genuinely useful for prototypes, throwaway scripts, and learning. It's also where a lot of teams get burned when that "prototype" quietly becomes production.&lt;/p&gt;

&lt;p&gt;Kiro is, in a sense, the structured opposite. The spec phase forces the requirements-and-design thinking that vibe coding skips. That doesn't make vibe coding wrong — it makes them tools for different moments. Reaching for a spec to build a one-off script is overkill; vibe-coding a payment flow is asking for trouble. Knowing &lt;em&gt;which mode fits which task&lt;/em&gt; is the actual skill, and it's the through-line of our &lt;a href="https://cursuri-ai.ro/courses/vibe-coding-prompt-la-aplicatie" rel="noopener noreferrer"&gt;vibe coding course&lt;/a&gt;, which treats prompt-to-app speed and structured engineering as complementary, not rival, philosophies.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Kiro compares to other agentic tools
&lt;/h2&gt;

&lt;p&gt;Kiro isn't alone — the agentic IDE space is crowded, and the tools overlap. A few honest distinctions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; is an AI-native editor built around fast in-editor generation, multi-file edits, and an agent mode. Its center of gravity is fluid, in-the-flow coding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; is a terminal-first agentic tool that excels at multi-file changes, git operations, and CI-aware work from the command line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro&lt;/strong&gt; distinguishes itself by making the &lt;em&gt;spec&lt;/em&gt; the artifact — front-loading requirements and design before implementation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't mutually exclusive; plenty of engineers use more than one and switch by task. The meta-skill is fluency across the category rather than loyalty to one editor. If you want a structured path through these tools, we maintain dedicated, hands-on courses on &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;agentic coding with Claude Code&lt;/a&gt; and on &lt;a href="https://cursuri-ai.ro/courses/cursor-pro" rel="noopener noreferrer"&gt;Cursor as a pro&lt;/a&gt; — both built around real multi-file, real-repo workflows rather than toy demos.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;At the time of writing, Kiro uses a credit-based model measured in &lt;strong&gt;agent interactions&lt;/strong&gt;, with no daily or weekly rate limits and pre-paid overages so you don't hit a hard wall mid-task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free&lt;/strong&gt; — 50 agent interactions per user per month (fine for experimentation, not serious daily work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro&lt;/strong&gt; — $19 per user per month for 1,000 agent interactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro+&lt;/strong&gt; — $39 per user per month for 3,000 interactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pricing and tiers for tools in this category change often, so verify the current numbers on Kiro's official pricing page before you budget for a team. Treat the figures above as a snapshot, not a contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Kiro is worth it — and when it isn't
&lt;/h2&gt;

&lt;p&gt;Spec-driven development has a cost: the spec phase is overhead. That overhead pays off when the work is durable and shared, and it's pure friction when the work is disposable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kiro shines when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building features meant to live and be maintained, not prototypes you'll throw away&lt;/li&gt;
&lt;li&gt;More than one person (or one agent) touches the codebase and conventions matter&lt;/li&gt;
&lt;li&gt;You want an auditable trail of &lt;em&gt;why&lt;/em&gt; the code is the way it is&lt;/li&gt;
&lt;li&gt;You're tired of re-explaining your standards every session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reach for something lighter when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're exploring, prototyping, or scripting something you'll delete tomorrow&lt;/li&gt;
&lt;li&gt;The task is small enough that writing the spec costs more than writing the code&lt;/li&gt;
&lt;li&gt;You just need a quick answer or a one-file change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest take: spec-driven development is a discipline, and Kiro is tooling that makes the discipline cheaper to follow. The tool won't supply the engineering judgment — it removes the excuse not to apply it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Want to go deeper?
&lt;/h2&gt;

&lt;p&gt;Tools like Kiro lower the cost of doing engineering properly, but they reward people who already understand specs, agent architecture, context management, and the MCP ecosystem underneath. That foundation is what turns an agentic IDE from a faster autocomplete into genuine leverage.&lt;/p&gt;

&lt;p&gt;At &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt;, Eastern Europe's AI education platform, we build practical, project-based courses on exactly this stack — agentic coding, MCP, context engineering, and the modern AI-native IDE workflow — taught around real repositories with an interactive AI instructor, not slide decks. If Kiro made you curious about &lt;em&gt;agentic engineering&lt;/em&gt; as a craft rather than a buzzword, that's the rabbit hole our catalog is built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Kiro's core bet is simple and, I think, correct: the bottleneck in AI-assisted development was never typing speed — it was the gap between what you meant and what the model built. By making specs the unit of work, adding agent hooks for automation and steering files for persistent context, Kiro turns "AI coding" into something closer to engineering with an agent.&lt;/p&gt;

&lt;p&gt;It won't replace judgment, and it isn't the right tool for every task. But for durable, maintainable software built with an AI in the loop, spec-driven development is a genuinely different — and more accountable — way to work. Try it on a real feature, not a toy, and you'll feel the difference fast.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by the team at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — practical, hands-on AI engineering courses for developers and professionals across Eastern Europe, from agentic coding and MCP to context engineering and AI-native IDE workflows.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;kiro.dev&lt;/a&gt; · &lt;a href="https://kiro.dev/docs/specs/" rel="noopener noreferrer"&gt;Kiro Specs docs&lt;/a&gt; · &lt;a href="https://kiro.dev/docs/hooks/" rel="noopener noreferrer"&gt;Kiro Hooks docs&lt;/a&gt; · &lt;a href="https://kiro.dev/docs/steering/" rel="noopener noreferrer"&gt;Kiro Steering docs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Stop Vibe-Checking Your LLM: A Developer's Guide to Evals</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 22 Jun 2026 08:24:22 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/stop-vibe-checking-your-llm-a-developers-guide-to-evals-3oed</link>
      <guid>https://dev.to/cursuri-ai/stop-vibe-checking-your-llm-a-developers-guide-to-evals-3oed</guid>
      <description>&lt;p&gt;You tweaked the system prompt, ran the same two test questions you always run, the answers looked good, and you shipped. A week later support is forwarding you screenshots of the model confidently doing the exact thing your prompt was supposed to stop. You never saw it, because "did it get better?" was answered by vibes.&lt;/p&gt;

&lt;p&gt;This is the single most common failure mode in shipping LLM features, and it has nothing to do with which model you picked. &lt;strong&gt;If your only quality gate is reading a handful of outputs and nodding, every change you make is a coin flip.&lt;/strong&gt; You can't tell whether a prompt edit helped, hurt, or just moved the failures somewhere you didn't look. Evals are how you replace the nod with a number.&lt;/p&gt;

&lt;p&gt;This is a practical guide to building that number — from a 30-row eval set you can write this afternoon, through code-based checks and LLM-as-judge scoring, to wiring the whole thing into CI so regressions get blocked instead of discovered by users. No new framework to adopt; just the discipline that separates a demo from a system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you can't just &lt;code&gt;assert output == expected&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Traditional tests work because the output space is small and exact. &lt;code&gt;add(2, 2)&lt;/code&gt; is &lt;code&gt;4&lt;/code&gt; or it's a bug. LLM output breaks all three assumptions that make &lt;code&gt;assertEqual&lt;/code&gt; work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's non-deterministic.&lt;/strong&gt; The same prompt can produce different text on two calls. Even at temperature 0 you are not guaranteed byte-identical output across runs or model versions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's open-ended.&lt;/strong&gt; "Summarize this ticket" has thousands of correct answers. None of them are string-equal to your reference, and that's fine — a good summary isn't &lt;em&gt;the&lt;/em&gt; summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It fails softly.&lt;/strong&gt; A wrong answer isn't a stack trace. It's a fluent, plausible, well-formatted paragraph that happens to be incorrect. Nothing crashes. Nothing logs an error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the goal of an eval isn't "is the output identical to the expected string." It's "does the output satisfy the &lt;em&gt;properties&lt;/em&gt; I care about" — is it grounded in the provided context, does it stay on policy, does it actually answer the question, is it valid JSON. You're testing behavior against criteria, not bytes against bytes. Once that clicks, the rest is mechanics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the eval set, not the metric
&lt;/h2&gt;

&lt;p&gt;The instinct is to reach for a fancy metric first. Wrong order. The asset that makes everything else work is a small, representative &lt;strong&gt;eval set&lt;/strong&gt;: a fixed collection of inputs paired with what a good output looks like (or the criteria a good output must meet). This is your golden dataset, your regression suite, your source of truth.&lt;/p&gt;

&lt;p&gt;You do not need thousands of examples to start. &lt;strong&gt;Thirty to fifty well-chosen pairs&lt;/strong&gt; turn LLM tuning from vibes into engineering, because now every change is measured against the same fixed bar. Build the set like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mine real failures.&lt;/strong&gt; Every time the system gets something wrong in dev or prod, that exact input goes into the eval set with a note on what the right behavior is. Your bug reports &lt;em&gt;are&lt;/em&gt; your test cases. This is the highest-signal source you have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cover the categories, not just the happy path.&lt;/strong&gt; Easy questions, ambiguous ones, adversarial ones, out-of-scope ones ("I don't know" is the correct answer and you should test that it says so), and the edge cases specific to your domain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freeze it and version it.&lt;/strong&gt; The eval set lives in your repo next to the code. When you add a case, that's a commit. A moving target can't measure progress.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a holdout.&lt;/strong&gt; If you start tuning prompts &lt;em&gt;against&lt;/em&gt; the eval set, you'll overfit to it. Keep a slice you don't look at until you think you're done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal eval set is just data — JSON, a CSV, a Python list. Here's the shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/dataset.py
&lt;/span&gt;&lt;span class="n"&gt;EVAL_SET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund-window-basic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is our refund window?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds are accepted within 14 days of purchase.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_not_say&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no refunds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;out-of-scope&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Cluj tomorrow?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds are accepted within 14 days of purchase.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# correct behavior: decline, don't invent
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 30-50 of these, grown from real failures
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the foundation. Everything below scores outputs against this set.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two halves of every LLM eval
&lt;/h2&gt;

&lt;p&gt;Separate two questions that get mushed together when you eval by eyeball, because they have different fixes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Did the system retrieve / set up the right context?&lt;/strong&gt; (a retrieval or pipeline question)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Given that context, did the model produce a good answer?&lt;/strong&gt; (a generation question)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building RAG, the first half is its own discipline — measuring recall@k and precision@k on questions with known relevant documents tells you whether the right chunk even reached the prompt. That's a deep enough topic that it deserves its own treatment; a dedicated &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;course on RAG and retrieval-augmented generation&lt;/a&gt; spends real time there, and the failure modes are different from the ones below. This guide focuses on the second half: scoring the generated answer. The techniques split into two families — &lt;strong&gt;code-based checks&lt;/strong&gt; and &lt;strong&gt;model-based judges&lt;/strong&gt; — and you want both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code-based checks: cheaper and more reliable than you think
&lt;/h2&gt;

&lt;p&gt;Before you reach for an LLM to grade an LLM, a surprising amount of quality is checkable with plain code. These checks are deterministic, free, instant, and never hallucinate. Use them for everything they can cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural validity.&lt;/strong&gt; If the output should be JSON matching a schema, validate it. A response that doesn't parse is a hard failure, no judgment call needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Must-contain / must-not-contain.&lt;/strong&gt; The answer about a 14-day refund window must contain "14" and must not contain "30." Keyword and regex assertions catch a whole class of factual regressions for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format and bounds.&lt;/strong&gt; Length limits, required citations present, no leaked system-prompt text, no forbidden phrases (the "as an AI language model" tax), valid enum values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic similarity.&lt;/strong&gt; For open-ended answers, embed the output and your reference answer and check cosine similarity passes a threshold. It's fuzzy, but it catches "the answer wandered off topic" without needing a judge model.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/checks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_structural&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;schema_keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_must_not_say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;banned&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;banned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule of thumb: &lt;strong&gt;anything a regex or a schema can catch, don't pay a model to catch.&lt;/strong&gt; Reserve the expensive, fuzzy judge for the genuinely subjective stuff.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM-as-judge: powerful, biased, and fixable
&lt;/h2&gt;

&lt;p&gt;For the subjective half — "is this answer faithful to the source?", "is this helpful?", "is the tone right?" — you use a strong model to grade outputs. This is &lt;strong&gt;LLM-as-judge&lt;/strong&gt;, and it scales human-quality judgment to thousands of examples for the price of an API call. Two metrics carry most of the weight for RAG-style apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness / groundedness&lt;/strong&gt; — does every claim in the answer trace back to the provided context, or did the model invent things? This is your hallucination detector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer relevance&lt;/strong&gt; — does the response actually address the question that was asked, or is it a fluent dodge?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The catch: &lt;strong&gt;LLM judges have well-documented biases&lt;/strong&gt;, and if you ignore them your eval numbers are noise dressed up as signal. The big ones, all reported in the research on using models as evaluators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Position bias&lt;/strong&gt; — when comparing two answers, judges favor the one shown first (or in a fixed slot) regardless of quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbosity bias&lt;/strong&gt; — judges tend to rate longer, more elaborate answers higher even when a short answer is more correct.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-preference&lt;/strong&gt; — a judge model can favor text written in its own style or by its own family.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't abandon the technique; you engineer around the bias:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Score against a rubric, not a vibe.&lt;/strong&gt; Ask for a 1–5 score with explicit criteria for each level, and require the judge to output its reasoning &lt;em&gt;before&lt;/em&gt; the score. A judge forced to justify itself is more consistent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For pairwise comparisons, randomize and swap.&lt;/strong&gt; Run each comparison twice with the order flipped; only count it as a win if the judge picks the same answer both times. This cancels position bias directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calibrate against humans.&lt;/strong&gt; Hand-label 20–30 examples yourself, run the judge on them, and check it agrees with you. If it doesn't, fix the rubric before trusting it on 2,000. An uncalibrated judge is a random number generator with good grammar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a strong model as the judge.&lt;/strong&gt; Grading is harder than answering. Use a current frontier model for the judge even if your app runs on a smaller, cheaper one.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# evals/judge.py — sketch of a rubric-based faithfulness judge
&lt;/span&gt;&lt;span class="n"&gt;JUDGE_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are grading whether an ANSWER is fully supported by the CONTEXT.

CONTEXT:
{context}

ANSWER:
{answer}

Rules:
- A claim is &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supported&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; only if the CONTEXT states or directly implies it.
- Outside knowledge does NOT count as support.

First write one sentence of reasoning. Then output a JSON object:
{{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: true|false}}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;judge_faithfulness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;JUDGE_PROMPT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Designing judges that hold up — picking the rubric, calibrating, knowing when a model is the wrong tool for the grade — is exactly the muscle a &lt;a href="https://cursuri-ai.ro/courses/ai-evals-llm-productie" rel="noopener noreferrer"&gt;course on AI evals in production&lt;/a&gt; builds, because it's the difference between "the new prompt feels better" and "faithfulness went from 0.78 to 0.91 on the holdout."&lt;/p&gt;

&lt;h2&gt;
  
  
  Wire it into CI, or it won't survive contact with deadlines
&lt;/h2&gt;

&lt;p&gt;An eval you run by hand when you remember to is an eval you'll stop running the week things get busy. The whole point is to make regressions &lt;em&gt;impossible to ship silently&lt;/em&gt;, and that means the eval runs automatically on every change to a prompt, a retrieval setting, or a model version.&lt;/p&gt;

&lt;p&gt;The pattern is a regression gate: run the eval set, compute the aggregate score, and &lt;strong&gt;fail the build if the score drops below a threshold&lt;/strong&gt; (or below the last known-good baseline). It looks like an ordinary test suite, because that's what it is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/test_evals.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;evals.dataset&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EVAL_SET&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;evals.checks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;check_must_not_say&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;answer_question&lt;/span&gt;

&lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.90&lt;/span&gt;  &lt;span class="c1"&gt;# 90% of eval cases must pass to ship
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;answer_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;i don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;check_must_not_say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_not_say&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_eval_suite_meets_threshold&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;run_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;EVAL_SET&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;failed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EVAL_SET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Eval score &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; below &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PASS_THRESHOLD&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few practical notes that keep this sane in CI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pin the model version.&lt;/strong&gt; Provider model IDs update, and an unpinned model means your eval baseline shifts under you for reasons unrelated to your code. Pin it, and treat a model upgrade as its own deliberate eval run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget for cost and flakiness.&lt;/strong&gt; LLM calls cost money and occasionally time out. Cache where you can, run the judge-heavy suite on a schedule rather than every commit if needed, and set a slightly forgiving threshold so one stochastic blip doesn't red-X a good PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log the failures, not just the score.&lt;/strong&gt; When the gate trips, the output should name &lt;em&gt;which&lt;/em&gt; cases regressed so the fix is obvious. A bare "0.86 &amp;lt; 0.90" sends you debugging blind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now a prompt change is a PR with a number attached. The reviewer sees faithfulness went up and refusal rate held steady, or they see it tanked and the build is red. That's the entire difference between hoping and knowing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five mistakes that quietly poison your evals
&lt;/h2&gt;

&lt;p&gt;Even teams that build evals often undermine them. Watch for these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Testing only the happy path.&lt;/strong&gt; If every case in your set is a question the system already answers well, your score is a flattering lie. Adversarial and out-of-scope cases are where the signal is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuning on your test set.&lt;/strong&gt; Optimize prompts against the same examples you grade on and you'll overfit to them. Keep a holdout you don't peek at.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An uncalibrated judge.&lt;/strong&gt; Trusting an LLM judge you never checked against your own labels is trusting a number you made up. Calibrate first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One giant blended score.&lt;/strong&gt; A single average hides that faithfulness improved while refusals broke. Track metrics &lt;em&gt;separately&lt;/em&gt; so a regression in one can't be masked by a gain in another.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letting the set rot.&lt;/strong&gt; Your product changes; cases that no longer reflect real usage drag the signal down. Prune and grow the set as part of normal work, the same way you maintain any test suite.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these are exotic. They're the eval equivalent of not testing error paths — obvious in hindsight, easy to skip under deadline.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this connects to the rest of your LLM stack
&lt;/h2&gt;

&lt;p&gt;Evals aren't a standalone chore; they're the measurement layer that makes every other improvement legible. When you tighten a prompt, evals tell you if it worked — which is why &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;structured prompt engineering&lt;/a&gt; and a real eval loop are two halves of the same skill. When you redesign what goes into the context window — what to include, what to cut, how to order it — evals are how you know the redesign helped rather than just &lt;em&gt;felt&lt;/em&gt; cleaner; that discipline of deciding what earns a place in the prompt is increasingly called context engineering and has &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;its own dedicated course&lt;/a&gt;. And when you wire up function calling, multi-tool orchestration, and the production concerns of a real integration, evals are what keep the whole pipeline honest as it grows — the kind of end-to-end build covered in a deeper &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;course on advanced LLM integration&lt;/a&gt;. The pattern is always the same: build the measurement first, then every change becomes verifiable instead of hopeful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The teams whose LLM features actually hold up in production aren't using a secret model or a magic prompt. They're disciplined about measurement. They have a versioned eval set grown from real failures, code-based checks for everything a regex can catch, calibrated LLM judges for the subjective rest, and a CI gate that blocks regressions before users find them.&lt;/p&gt;

&lt;p&gt;Start smaller than you think you can. Write thirty cases this afternoon — half of them things your system currently gets &lt;em&gt;wrong&lt;/em&gt; — add three code checks and one rubric-based judge, and put a threshold in your test suite. The first time a red build stops you from shipping a prompt change that would have quietly broken refusals, you'll never go back to vibe-checking. That's the moment an LLM demo becomes an LLM system people can trust.&lt;/p&gt;

&lt;p&gt;The courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/ai-evals-llm-productie" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an AI-learning platform with hands-on, current tracks on evaluating AI systems in production, prompt engineering, RAG, and advanced LLM integration.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources &amp;amp; further reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zheng et al. — &lt;a href="https://arxiv.org/abs/2306.05685" rel="noopener noreferrer"&gt;Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena&lt;/a&gt; (documents position, verbosity, and self-enhancement bias in LLM judges)&lt;/li&gt;
&lt;li&gt;Liu et al. — &lt;a href="https://arxiv.org/abs/2303.16634" rel="noopener noreferrer"&gt;G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Liang et al. — &lt;a href="https://arxiv.org/abs/2211.09110" rel="noopener noreferrer"&gt;Holistic Evaluation of Language Models (HELM)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This article is educational content. Techniques and tooling evolve quickly; validate approaches against your own data and current library documentation.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>How Click Fraud Works — and How to Detect Bot Clicks in Real Time</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Thu, 18 Jun 2026 13:35:47 +0000</pubDate>
      <link>https://dev.to/galian/how-click-fraud-works-and-how-to-detect-bot-clicks-in-real-time-534d</link>
      <guid>https://dev.to/galian/how-click-fraud-works-and-how-to-detect-bot-clicks-in-real-time-534d</guid>
      <description>&lt;p&gt;Every time someone clicks your Google or Bing ad, you pay. The uncomfortable part:&lt;br&gt;
a meaningful share of those clicks were never going to become customers. They come&lt;br&gt;
from bots, competitors burning your budget, click farms, and misconfigured scripts&lt;br&gt;
hammering your landing page. Independent studies — and our own data at&lt;br&gt;
&lt;a href="https://protectads.com/?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;ProtectAds&lt;/a&gt; —&lt;br&gt;
put &lt;strong&gt;invalid traffic at roughly 15–30% of paid-search spend&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you run PPC, that's not a rounding error. On a €10,000/month budget, it's&lt;br&gt;
€1,500–€3,000 quietly leaking out every month. This article breaks down how click&lt;br&gt;
fraud actually works, why the ad platforms don't fully stop it for you, and why the&lt;br&gt;
"just block the IP" approach falls apart at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, the vocabulary: invalid traffic vs. click fraud
&lt;/h2&gt;

&lt;p&gt;The ad industry (IAB/MRC) splits &lt;strong&gt;Invalid Traffic (IVT)&lt;/strong&gt; into two buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GIVT — General Invalid Traffic:&lt;/strong&gt; the obvious stuff. Known data-center IPs,
declared bots and crawlers, spiders. Detectable with lists and simple rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SIVT — Sophisticated Invalid Traffic:&lt;/strong&gt; the expensive stuff. Hijacked devices,
headless browsers pretending to be Chrome, residential-proxy networks, click
farms with real fingerprints, automation that mimics human timing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Click fraud&lt;/strong&gt; is the malicious, intent-driven slice of IVT aimed specifically at&lt;br&gt;
your ads — a competitor draining your daily budget, or a click farm monetizing fake&lt;br&gt;
engagement. GIVT you can filter with a blocklist. SIVT is where real detection&lt;br&gt;
earns its keep. (We go deeper on this distinction&lt;br&gt;
&lt;a href="https://protectads.com/what-is-invalid-traffic?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;here&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the platforms don't fully solve this for you
&lt;/h2&gt;

&lt;p&gt;Google and Microsoft do filter obvious invalid clicks and issue some credits. But&lt;br&gt;
their filtering is conservative, opaque, and applied &lt;em&gt;after the fact&lt;/em&gt; — you find out&lt;br&gt;
in aggregate, weeks later, with little per-click evidence. They're also structurally&lt;br&gt;
conflicted: invalid clicks are still billed first and credited later, if at all.&lt;/p&gt;

&lt;p&gt;That leaves a gap for sophisticated invalid traffic that looks human enough to pass&lt;br&gt;
platform filters but never converts. Closing that gap is an engineering problem:&lt;br&gt;
score every click in real time and act on it before the budget is gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why sophisticated click fraud is so hard to catch
&lt;/h2&gt;

&lt;p&gt;The obvious stuff (GIVT) is easy to filter. The expensive stuff (SIVT) is hard on&lt;br&gt;
purpose — it's built to look human:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It rotates through fresh IPs and residential proxies, so any static blocklist is
out of date the moment you save it.&lt;/li&gt;
&lt;li&gt;It runs on real or convincingly spoofed devices, so a single attribute rarely
gives it away.&lt;/li&gt;
&lt;li&gt;It mimics human timing and behavior well enough to slip past simple rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why "detect fraud" isn't one trick — it's continuous analysis at scale,&lt;br&gt;
correlating signals across many clicks and campaigns over time, with thresholds&lt;br&gt;
tuned so you catch the bad traffic &lt;em&gt;without&lt;/em&gt; blocking real buyers. Doing that&lt;br&gt;
reliably, and fast enough to act before the budget is gone, is the hard part — and&lt;br&gt;
it's why most advertisers are better off with a dedicated system than a homegrown&lt;br&gt;
script. (Here's&lt;br&gt;
&lt;a href="https://protectads.com/how-it-works?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;how ProtectAds approaches it&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "just block the IP" doesn't scale
&lt;/h2&gt;

&lt;p&gt;The first instinct is a spreadsheet of bad IPs pasted into Google Ads exclusions.&lt;br&gt;
It breaks down fast:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Manual blocklist&lt;/th&gt;
&lt;th&gt;Automated protection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New IPs every day&lt;/td&gt;
&lt;td&gt;You're always behind&lt;/td&gt;
&lt;td&gt;Handled continuously&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attackers rotate IPs constantly&lt;/td&gt;
&lt;td&gt;Whack-a-mole&lt;/td&gt;
&lt;td&gt;Doesn't rely on a static list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ad platforms cap exclusion lists&lt;/td&gt;
&lt;td&gt;Fills up fast&lt;/td&gt;
&lt;td&gt;Prioritizes the worst offenders automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evidence for refund claims&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Per-click log you can export&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Across many campaigns/accounts&lt;/td&gt;
&lt;td&gt;Doesn't&lt;/td&gt;
&lt;td&gt;Centralized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;IPs are cheap and disposable for attackers; your manual list is neither. Sustainable&lt;br&gt;
protection has to run continuously, automate the exclusions, and keep an audit trail&lt;br&gt;
you can take back to the ad platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning detection into protection
&lt;/h2&gt;

&lt;p&gt;Detection is only half the job — you have to &lt;em&gt;act&lt;/em&gt; on it. A working pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Observe&lt;/strong&gt; ad clicks as they hit your site.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assess&lt;/strong&gt; each one in real time to tell genuine visitors from invalid traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block&lt;/strong&gt; offending IPs by pushing them to your campaign exclusion lists
automatically — no manual copy-paste.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report&lt;/strong&gt; with a per-click, per-campaign log so you can request ad credits with
evidence instead of vibes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is exactly what &lt;a href="https://protectads.com/features?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;ProtectAds&lt;/a&gt;&lt;br&gt;
does for &lt;strong&gt;Google Ads&lt;/strong&gt; (including Performance Max / PMax) and &lt;strong&gt;Microsoft (Bing)&lt;br&gt;
Ads&lt;/strong&gt;. You connect your account once, and it runs the detect → block → report loop&lt;br&gt;
continuously. Agencies running multiple client accounts and domains get dedicated&lt;br&gt;
&lt;a href="https://protectads.com/prices/agency?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;agency plans&lt;/a&gt;&lt;br&gt;
for managing protection at scale.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A note on scope: ProtectAds protects Google Ads and Bing Ads. Meta/Facebook Ads&lt;br&gt;
aren't covered — paid &lt;em&gt;search&lt;/em&gt; is where competitor and bot click fraud hits&lt;br&gt;
hardest, so that's where we focus.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How much is this worth to you?
&lt;/h2&gt;

&lt;p&gt;Take your monthly paid-search spend and multiply by 15–30%. That range is your&lt;br&gt;
realistic exposure to invalid traffic. Even at the low end, recovering it usually&lt;br&gt;
dwarfs the cost of detection — which is the entire economic argument for automating&lt;br&gt;
this instead of eyeballing reports once a quarter. (You can sanity-check your own&lt;br&gt;
number with the&lt;br&gt;
&lt;a href="https://protectads.com/click-fraud-calculator?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;click fraud calculator&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own campaigns
&lt;/h2&gt;

&lt;p&gt;If you're spending on Google or Bing ads and you've ever wondered why your&lt;br&gt;
click-through looks healthy but conversions don't follow, invalid traffic is a prime&lt;br&gt;
suspect. The fastest way to find out is to point real detection at your live&lt;br&gt;
campaigns and watch what it flags.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://protectads.com/free_trial?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;Start a free ProtectAds trial&lt;/a&gt;&lt;/strong&gt; —&lt;br&gt;
connect your Google Ads or Bing account, see the invalid clicks in your own data,&lt;br&gt;
and cancel anytime. Your ad budget is better spent on people who might actually buy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by the team at &lt;a href="https://protectads.com/?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=how-click-fraud-works" rel="noopener noreferrer"&gt;ProtectAds&lt;/a&gt; —&lt;br&gt;
real-time click fraud detection and protection for Google Ads and Microsoft (Bing) Ads.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>marketing</category>
      <category>fraud</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>Model Context Protocol Explained: Build Your First MCP Server in Python</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Thu, 18 Jun 2026 12:58:29 +0000</pubDate>
      <link>https://dev.to/galian/model-context-protocol-explained-build-your-first-mcp-server-in-python-ian</link>
      <guid>https://dev.to/galian/model-context-protocol-explained-build-your-first-mcp-server-in-python-ian</guid>
      <description>&lt;p&gt;If you've integrated an LLM with a database, a ticketing system, and an internal API, you've written the same glue three times — and you'll write it again for the next model and the next tool. That M×N integration problem is exactly what the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; was built to kill. Instead of every application hand-rolling a bespoke connector for every tool, MCP defines one open standard that any model and any tool can speak.&lt;/p&gt;

&lt;p&gt;The analogy its authors at Anthropic use is deliberately mundane: MCP is &lt;strong&gt;"a USB-C port for AI applications."&lt;/strong&gt; You don't wire each device to each laptop with a custom cable; you agree on one connector and everything interoperates. That framing is the whole point, and it's why MCP went from an Anthropic open-source release in late 2024 to something adopted across the industry — including by OpenAI and Google — by 2026.&lt;/p&gt;

&lt;p&gt;This is a practical guide. We'll cover what MCP actually is, the three-part architecture, the primitives you'll use every day, and then build a real, working MCP server in Python that a host like Claude Code or an IDE can call. No hand-waving — by the end you'll have code that runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What problem MCP actually solves
&lt;/h2&gt;

&lt;p&gt;Before MCP, "give the model access to our systems" meant writing function-calling glue specific to one provider's SDK, one tool's API, and one application's plumbing. Swap the model and you rewrote the tool layer. Add a tool and you touched every app that needed it. With &lt;em&gt;M&lt;/em&gt; applications and &lt;em&gt;N&lt;/em&gt; tools, you were on the hook for roughly &lt;em&gt;M×N&lt;/em&gt; integrations.&lt;/p&gt;

&lt;p&gt;MCP turns that into &lt;em&gt;M+N&lt;/em&gt;. Tool authors write &lt;strong&gt;one&lt;/strong&gt; MCP server. Application authors add &lt;strong&gt;one&lt;/strong&gt; MCP client. Any host that speaks MCP can use any server that speaks MCP — no per-pair glue. The server you write for your company's CRM works in Claude Code, in your custom agent, and in whatever host ships next year, without changes.&lt;/p&gt;

&lt;p&gt;That's the strategic shift: tools and models become decoupled, and the integration surface stops growing quadratically. Everything below is just the mechanics of how that's achieved.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: host, client, server
&lt;/h2&gt;

&lt;p&gt;MCP has exactly three roles. Getting these straight makes everything else click.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host&lt;/strong&gt; — the LLM application the user interacts with. Claude Code, an AI-enabled IDE, a desktop assistant, or your own agent. The host orchestrates the model and decides which servers to connect to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client&lt;/strong&gt; — a connector that lives &lt;em&gt;inside&lt;/em&gt; the host. The host spins up one client per server, and each client keeps a dedicated &lt;strong&gt;1:1 connection&lt;/strong&gt; to its server. You rarely write this yourself; the host's framework provides it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server&lt;/strong&gt; — a lightweight program that exposes capabilities (tools, data, prompt templates) over the protocol. &lt;strong&gt;This is what you build.&lt;/strong&gt; A server can wrap a local SQLite file, a SaaS API, a filesystem, or anything you can reach with code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, client and server exchange &lt;strong&gt;JSON-RPC 2.0&lt;/strong&gt; messages over a transport. There are two you care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stdio&lt;/strong&gt; — the server runs as a local subprocess and communicates over standard input/output. Perfect for local tools, dev work, and anything that touches the user's own machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamable HTTP&lt;/strong&gt; — the server runs as a remote service reachable over HTTP, with streaming for long-running responses. This is the modern remote transport (it superseded the older HTTP+SSE approach) and it's what you deploy when the server lives somewhere central.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You write your server logic once; choosing stdio vs. HTTP is mostly a deployment decision, not a rewrite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three primitives you'll actually use
&lt;/h2&gt;

&lt;p&gt;MCP servers expose capability through three primitives. The distinction between them isn't bureaucratic — it encodes &lt;em&gt;who is in control&lt;/em&gt;, which matters enormously for safety and UX.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools — model-controlled
&lt;/h3&gt;

&lt;p&gt;Tools are functions the &lt;strong&gt;model&lt;/strong&gt; decides to call: query a database, send an email, hit an API, run a calculation. They can have side effects, so a well-behaved host asks for user approval before executing one. If you've used function calling, tools are the MCP-native, portable version of it. This is the primitive you'll reach for most.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources — application-controlled
&lt;/h3&gt;

&lt;p&gt;Resources are read-only data the &lt;strong&gt;application&lt;/strong&gt; pulls into context: a file's contents, a database row, a config blob, a documentation page. They're identified by URI (for example &lt;code&gt;file:///logs/today.log&lt;/code&gt; or &lt;code&gt;db://customers/42&lt;/code&gt;) and they don't &lt;em&gt;do&lt;/em&gt; anything — they inform. The host decides when and whether to load them, which keeps the context window under deliberate control rather than at the model's whim.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompts — user-controlled
&lt;/h3&gt;

&lt;p&gt;Prompts are reusable templates the &lt;strong&gt;user&lt;/strong&gt; invokes intentionally — think a slash-command like "summarize this PR" or "draft a release note." They standardize the high-value interactions your server enables so users don't have to re-type elaborate instructions.&lt;/p&gt;

&lt;p&gt;The mental model: &lt;strong&gt;tools are for the model, resources are for the app, prompts are for the user.&lt;/strong&gt; Designing on the correct side of that line is the difference between an integration that feels safe and predictable and one that surprises people. That separation of control is also at the heart of building reliable agents, which is why a structured &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;course on designing autonomous AI agents&lt;/a&gt; spends real time on it rather than treating every capability as "just a tool."&lt;/p&gt;

&lt;h2&gt;
  
  
  Build your first MCP server in Python
&lt;/h2&gt;

&lt;p&gt;Enough theory. Let's build a server that exposes a tool, a resource, and a prompt — and actually runs.&lt;/p&gt;

&lt;p&gt;The official Python SDK ships a high-level helper, &lt;code&gt;FastMCP&lt;/code&gt;, that handles the JSON-RPC plumbing, schema generation, and transport for you. You describe capabilities with decorators; the SDK infers the input schema from your type hints and the description from your docstring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;The modern toolchain uses &lt;a href="https://github.com/astral-sh/uv" rel="noopener noreferrer"&gt;&lt;code&gt;uv&lt;/code&gt;&lt;/a&gt;, but plain &lt;code&gt;pip&lt;/code&gt; works too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# with uv (recommended)&lt;/span&gt;
uv init mcp-demo &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;mcp-demo
uv add &lt;span class="s2"&gt;"mcp[cli]"&lt;/span&gt;

&lt;span class="c"&gt;# or with pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"mcp[cli]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The server
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;server.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="c1"&gt;# Name your server — hosts show this to the user.
&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;demo-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count the number of words in a piece of text.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;


&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;days_between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the number of days between two ISO dates (YYYY-MM-DD).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes://team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;team_notes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Expose the team&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s shared notes as read-only context.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In real life this would read a file, a DB row, or an API.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Release freeze starts Friday. Owner: platform team.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="nd"&gt;@mcp.prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;code_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A reusable prompt template for reviewing a code snippet.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; engineer. Review the code below for &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correctness, security, and readability. Be specific.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Default transport is stdio — ideal for local hosts.
&lt;/span&gt;    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a complete, valid MCP server. Notice what you did &lt;strong&gt;not&lt;/strong&gt; write: no JSON-RPC handling, no schema definitions, no transport code. The type hints on &lt;code&gt;word_count(text: str) -&amp;gt; int&lt;/code&gt; become the tool's input/output schema automatically, and the docstring becomes the description the model reads to decide &lt;em&gt;when&lt;/em&gt; to call it. That docstring is not decoration — it's the model's only instruction manual for the tool, so write it like an API contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inspect it before wiring it to a model
&lt;/h3&gt;

&lt;p&gt;The SDK includes a dev inspector so you can poke at your server without an LLM in the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run mcp dev server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This launches the &lt;strong&gt;MCP Inspector&lt;/strong&gt;, a local UI where you can list the server's tools, resources, and prompts, call them with hand-entered arguments, and see exactly what comes back. Debugging here — &lt;em&gt;before&lt;/em&gt; a model is involved — is the single biggest time-saver in MCP development. If a tool misbehaves with the inspector, the problem is your server, not the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect it to a host
&lt;/h3&gt;

&lt;p&gt;To use the server from Claude Code or another MCP-aware host, you register it. For Claude Code, that's a one-liner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add demo-tools &lt;span class="nt"&gt;--&lt;/span&gt; uv run server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For hosts configured by file, you add an entry pointing at the command that launches your server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"demo-tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart the host, and your tools, resource, and prompt show up — the model can now call &lt;code&gt;word_count&lt;/code&gt;, the app can pull in &lt;code&gt;notes://team&lt;/code&gt;, and the user can invoke the &lt;code&gt;code_review&lt;/code&gt; prompt. The same &lt;code&gt;server.py&lt;/code&gt;, unchanged, works in every one of them. That portability is the entire payoff, and pushing a server like this from a local toy to something production-grade — auth, logging, error handling, deployment over Streamable HTTP — is exactly the jump covered in this hands-on course on &lt;a href="https://cursuri-ai.ro/courses/construire-aplicatii-ai-python-sdk" rel="noopener noreferrer"&gt;building real AI applications in Python&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  From toy to production: what the quickstart doesn't tell you
&lt;/h2&gt;

&lt;p&gt;The server above works, but shipping MCP to real users surfaces concerns the happy path hides. These are the ones that bite teams:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication and authorization.&lt;/strong&gt; A remote MCP server is a service on the internet. Streamable HTTP servers support OAuth-based auth, and you need it — an unauthenticated tool that can query your database or send email is an incident waiting to happen. Treat the server's tool surface as your real attack surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model can be tricked into calling tools.&lt;/strong&gt; Because tools are model-controlled, a prompt-injection payload hidden in a document or web page the model reads can try to coax it into calling a destructive tool. The mitigations are concrete: keep destructive tools behind user approval, scope each server's permissions narrowly, validate every argument server-side, and never assume the model's call is benign just because it's well-formed. This intersection of capability and risk is precisely why agentic systems need a security mindset, not just a features mindset — the subject of a dedicated &lt;a href="https://cursuri-ai.ro/courses/ai-security-ethics" rel="noopener noreferrer"&gt;course on AI security and ethical engineering&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool descriptions are part of your context budget.&lt;/strong&gt; Every tool's name, description, and schema get loaded into the model's context. Twenty sprawling tools with verbose docstrings quietly eat thousands of tokens and degrade the model's ability to choose well. Curate your tool surface like an API you have to maintain: fewer, sharper tools beat a kitchen sink. Managing what occupies the context window — tools included — is its own discipline, which a &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;course on context engineering for AI agents&lt;/a&gt; treats as a first-class skill rather than an afterthought.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Errors must be legible to a model, not just a human.&lt;/strong&gt; When a tool fails, return a structured, descriptive error the model can reason about and recover from — not a raw stack trace. "Customer 42 not found; verify the ID" lets the model self-correct; a 500 with a Python traceback does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateful vs. stateless.&lt;/strong&gt; stdio servers are naturally per-session and local; HTTP servers may serve many clients and need you to think about concurrency and isolation. Decide early, because retrofitting state handling is painful.&lt;/p&gt;

&lt;p&gt;None of these are reasons to avoid MCP — they're the normal engineering of turning a protocol demo into a dependable system, and the same skills you'd apply to any service boundary apply here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for how you build in 2026
&lt;/h2&gt;

&lt;p&gt;MCP's quiet significance is that it makes &lt;strong&gt;tools a portable asset instead of a per-app liability.&lt;/strong&gt; Write a great server for your internal systems once, and it appreciates: every new host, every new model, every teammate's agent can use it without you lifting a finger. That's the opposite of the function-calling glue we used to throw away every time a model changed.&lt;/p&gt;

&lt;p&gt;It also pushes good architecture by default. The host/client/server split forces a clean seam between "the model and the app" and "the capability," which is exactly the boundary you want when models get swapped, upgraded, or — as 2026 has reminded everyone — occasionally yanked. Building agents on top of well-designed MCP servers, with the right model routed to the right step, is where a lot of the real engineering leverage lives now; if you want that end-to-end picture, there's a focused &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;course on the Model Context Protocol and building enterprise integrations&lt;/a&gt; that goes far deeper than a single tutorial can.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol is not hype — it's plumbing, and good plumbing is what lets a field scale. It replaces &lt;em&gt;M×N&lt;/em&gt; bespoke integrations with &lt;em&gt;M+N&lt;/em&gt; reusable ones, gives you three clear primitives with sane control boundaries, and lets you ship a server in a dozen lines of Python that works across every MCP-aware host.&lt;/p&gt;

&lt;p&gt;Start small: build the &lt;code&gt;demo-tools&lt;/code&gt; server above, poke it with the inspector, wire it into a host you already use. Then point it at something real in your own stack — a read-only resource over your logs, a single well-scoped tool over an internal API. The first time you watch a model use a capability you exposed once and never re-integrated, the &lt;em&gt;M+N&lt;/em&gt; promise stops being abstract.&lt;/p&gt;

&lt;p&gt;Write the server once. Let every model use it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources &amp;amp; further reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic — &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Introducing the Model Context Protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model Context Protocol — &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Official specification and documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model Context Protocol — &lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This article is educational content. APIs and SDK details evolve; check the official MCP documentation for the current specification before building production systems.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Anthropic disables access to Fable 5 and Mythos 5 to comply with government directive</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Sat, 13 Jun 2026 10:04:07 +0000</pubDate>
      <link>https://dev.to/galian/anthropic-disables-access-to-fable-5-and-mythos-5-to-comply-with-government-directive-2p77</link>
      <guid>https://dev.to/galian/anthropic-disables-access-to-fable-5-and-mythos-5-to-comply-with-government-directive-2p77</guid>
      <description>&lt;p&gt;On June 12, 2026, a frontier AI model disappeared overnight for everyone outside the United States. Not because of an outage, and not because the vendor changed its pricing — but because of a &lt;strong&gt;US government export-control directive&lt;/strong&gt;. Anthropic had to disable &lt;strong&gt;Claude Fable 5 and Claude Mythos 5&lt;/strong&gt; for &lt;em&gt;all&lt;/em&gt; customers, just three days after making Fable 5 — its first "Mythos-class" model — public.&lt;/p&gt;

&lt;p&gt;If you build on LLMs, the interesting part of this story isn't the geopolitics. It's the engineering question it forces: &lt;strong&gt;what happens to your application when your model is gone tomorrow morning, through no fault of your own?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is a practical guide to answering that question. We'll cover what actually happened (briefly, and sourced), why a piece of software can fall under export controls at all, and then spend most of our time on the part that matters: how to architect an LLM application so that no single model — from any provider — can take it down.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened, briefly and sourced
&lt;/h2&gt;

&lt;p&gt;Anthropic published a statement titled "Statement on the US government directive to suspend access to Fable 5 and Mythos 5." Per that statement, the US government issued an &lt;strong&gt;export-control directive&lt;/strong&gt; requiring the company to suspend access to both models for &lt;strong&gt;any non-US national, whether inside or outside the United States&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To comply, Anthropic disabled both models for &lt;strong&gt;all&lt;/strong&gt; customers — not just the targeted group. Everything else stayed online: the statement notes that "access to all other Anthropic models will not be affected." Opus, Sonnet, and Haiku kept working; only the Mythos-class tier went dark.&lt;/p&gt;

&lt;p&gt;The government's stated rationale was that it believed it had identified a method of &lt;strong&gt;jailbreaking&lt;/strong&gt; Fable 5's safeguards. Anthropic publicly disagreed with the reasoning — arguing the vulnerability was narrow, that comparable capabilities already exist in other public models (it named GPT-5.5), and that applying this standard across the board "would essentially halt all new model deployments for all frontier model providers." The company called it a misunderstanding and said it was working to restore access.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The takeaway for builders has nothing to do with whether Anthropic or the government is right. It's this: a model you depend on can become unavailable for reasons entirely outside both your control &lt;em&gt;and&lt;/em&gt; your vendor's control — and on essentially no notice.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why software can be "export-controlled" in the first place
&lt;/h2&gt;

&lt;p&gt;It feels strange that an API you call over HTTPS can be subject to export law. It makes sense the moment you see frontier AI the way regulators do: as a &lt;strong&gt;dual-use technology&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A dual-use technology serves legitimate and potentially dangerous purposes at the same time. A sufficiently capable model can accelerate both useful research and things nobody wants proliferating — from assisting cyberattacks to sensitive biological domains. That's precisely why Anthropic layers safety classifiers (for cyber and bio) on top of its base model to produce the public Fable 5, while the unrestricted Mythos 5 ships only in a limited program.&lt;/p&gt;

&lt;p&gt;Under the US &lt;strong&gt;Export Administration Regulations (EAR)&lt;/strong&gt;, the government can restrict the export of strategic technology to non-US persons. The key concept is the &lt;strong&gt;"deemed export"&lt;/strong&gt;: giving a foreign national &lt;em&gt;access&lt;/em&gt; to controlled technology — even one physically located in the US — counts as an export in its own right. That's why the directive targeted "any non-US national, regardless of location," and why the simplest compliant move was to shut the models off for everyone rather than try to filter access by nationality in real time.&lt;/p&gt;

&lt;p&gt;If you're a developer in the EU, the UK, India, or anywhere outside the US, the practical reading is blunt: from the directive's point of view, &lt;strong&gt;you are the "foreign national,"&lt;/strong&gt; and the model can be pulled out from under you with no warning.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real lesson: a model is a dependency, not a constant
&lt;/h2&gt;

&lt;p&gt;This episode is a textbook case of &lt;strong&gt;vendor risk&lt;/strong&gt; and &lt;strong&gt;business continuity&lt;/strong&gt;. The model wasn't deprecated, didn't get more expensive, and wasn't beaten by a competitor. It vanished for reasons external to your product and even to your vendor.&lt;/p&gt;

&lt;p&gt;For anyone who wired a product, an internal workflow, or a customer promise to one specific model, the message is clear: &lt;strong&gt;an LLM is an external dependency&lt;/strong&gt; — exactly like a cloud provider, a payments processor, or any third-party API. And you manage external dependencies with redundancy, not hope.&lt;/p&gt;

&lt;p&gt;The lesson is &lt;em&gt;not&lt;/em&gt; "don't use the best model" or "don't trust Anthropic." Anthropic remains one of the most serious vendors in the market, and the transparency of this disclosure proves it. The lesson is &lt;strong&gt;architectural&lt;/strong&gt;: don't build such that the disappearance of any single model stops your product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing for provider resilience
&lt;/h2&gt;

&lt;p&gt;Resilience here doesn't mean running ten models in parallel. It means being able to &lt;strong&gt;switch&lt;/strong&gt; — quickly, deliberately, and without rewriting your application. Four building blocks get you there.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. An abstraction layer (don't call the SDK from everywhere)
&lt;/h3&gt;

&lt;p&gt;The single most damaging habit is sprinkling &lt;code&gt;client.messages.create(...)&lt;/code&gt; across dozens of files. When you need to switch providers, you're now doing surgery on your whole codebase under time pressure. Put one internal interface between your app and any vendor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Protocol&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Completion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LLMProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Completion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every provider — Anthropic, OpenAI, Google, a self-hosted model — implements the same &lt;code&gt;complete()&lt;/code&gt; contract. Your application only ever talks to &lt;code&gt;LLMProvider&lt;/code&gt;. Swapping a vendor becomes a config change, not a refactor. Designing these seams well is core system-architecture work; if you want the full treatment of boundaries, adapters, and scaling concerns, this &lt;a href="https://cursuri-ai.ro/courses/ai-system-architecture" rel="noopener noreferrer"&gt;course on AI system architecture at scale&lt;/a&gt; goes well beyond a single code sample.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. A router with failover and a circuit breaker
&lt;/h3&gt;

&lt;p&gt;With a common interface, the router becomes simple: try the primary, fall back on failure, and stop hammering a provider that's clearly down.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;LLMProvider&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cooldown_s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;providers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;providers&lt;/span&gt;           &lt;span class="c1"&gt;# ordered: primary first
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooldown_s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cooldown_s&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_down_until&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_available&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LLMProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_down_until&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Completion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;last_error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_available&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="c1"&gt;# timeout, 4xx/5xx, or a hard 403 like a suspension
&lt;/span&gt;                &lt;span class="n"&gt;last_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_down_until&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooldown_s&lt;/span&gt;   &lt;span class="c1"&gt;# trip the breaker
&lt;/span&gt;        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All providers unavailable; last error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;last_error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deliberately small. The point is the &lt;em&gt;shape&lt;/em&gt;: an ordered list of interchangeable providers, a breaker that quarantines a failing one for a cooldown, and an app that never sees the difference. A sudden &lt;code&gt;403&lt;/code&gt;/access revocation — exactly what a suspension looks like to your code — is just another failure that trips the breaker and routes to the next provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Redundancy across jurisdictions, not just vendors
&lt;/h3&gt;

&lt;p&gt;Keep at least two providers ready, ideally under &lt;strong&gt;different regulatory regimes&lt;/strong&gt;. The Fable 5 episode is precisely why jurisdictional diversity matters and not just vendor diversity: a single government action took out one vendor's top tier for a whole class of users. Two US-based providers don't fully protect you from a US-wide policy event; a mix does.&lt;/p&gt;

&lt;p&gt;Choosing those alternates well — on real capability, cost, and now &lt;em&gt;exposure&lt;/em&gt; — is a skill in itself. If you want a structured, side-by-side framework instead of vibes, there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI model comparison course&lt;/a&gt; that covers how to evaluate and route across the 2026 lineup.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. A safety net you actually control
&lt;/h3&gt;

&lt;p&gt;For your most critical paths, keep an &lt;strong&gt;open-weight model you can self-host&lt;/strong&gt; (something from the Llama or Qwen families). It doesn't have to be the best model in the world — it has to be &lt;em&gt;yours&lt;/em&gt; and &lt;em&gt;good enough&lt;/em&gt; to keep you running. A model on infrastructure you control cannot be revoked by anyone's directive. That's the difference between "degraded service" and "outage" on the day a managed model disappears.&lt;/p&gt;

&lt;h3&gt;
  
  
  Portability and switch drills
&lt;/h3&gt;

&lt;p&gt;Two cross-cutting habits make all of the above real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Portable prompts and evals.&lt;/strong&gt; If your prompts and evaluation suites are tuned to one model's quirks, you've created a hidden dependency. Treat them as portable artifacts and test them across providers, so a switch doesn't silently tank quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rehearsed failover.&lt;/strong&gt; A fallback plan you've never executed is an assumption, not a guarantee. Trigger a manual switch to your secondary on a schedule and watch what breaks — discover problems in a drill, not mid-incident.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your application is agentic — multi-step loops, tool use, sub-agents — provider resilience gets harder, because a mid-loop failover has to preserve state and tool context. Building that correctly is its own discipline, covered in this &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;course on designing autonomous AI agents&lt;/a&gt;, and the end-to-end practice of wiring real applications across SDKs (Anthropic, OpenAI, and self-hosted) is the focus of this hands-on course on &lt;a href="https://cursuri-ai.ro/courses/construire-aplicatii-ai-python-sdk" rel="noopener noreferrer"&gt;building AI apps in Python&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note on governance, not just plumbing
&lt;/h2&gt;

&lt;p&gt;There's a second, quieter lesson here for technical leaders. The trigger was a &lt;strong&gt;safety-and-governance&lt;/strong&gt; dispute: who decides, on what evidence, and how fast, that a frontier capability is too risky for some users. As AI moves deeper into critical systems, "is this model available?" becomes a governance question as much as an uptime one — and understanding &lt;em&gt;why&lt;/em&gt; safety classifiers, dual-use controls, and red-teaming exist is part of building responsibly. That intersection of security, ethics, and engineering is exactly what this &lt;a href="https://cursuri-ai.ro/courses/ai-security-ethics" rel="noopener noreferrer"&gt;course on AI security and ethical engineering&lt;/a&gt; is about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The suspension of Fable 5 and Mythos 5 isn't, at its core, a story about a broken model. It's a story about how quickly a frontier capability can move from "available to everyone" to "unavailable by government order" — and about who pays the bill when it does. Not the vendor. The person who built on that model assuming it would always be there.&lt;/p&gt;

&lt;p&gt;Anthropic calls it a misunderstanding and is working to restore access; the models may well return in some form. But resilience isn't built on "probably." It's built on architecture: an abstraction layer, at least one fallback from a different jurisdiction, a safety net you control, and the habit of rehearsing the switch before you need it.&lt;/p&gt;

&lt;p&gt;Engineers who treat models like the external dependencies they are will look back on the Fable 5 episode as a useful lesson rather than a costly outage. The difference isn't which model you pick. It's the architecture you wrap around it.&lt;/p&gt;

&lt;p&gt;The courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/ai-system-architecture" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an AI-learning platform with deep, hands-on tracks on system architecture, model selection, agent design, and the practical engineering of LLM applications — kept current with the 2026 lineup.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic — &lt;a href="https://www.anthropic.com/news/fable-mythos-access" rel="noopener noreferrer"&gt;Statement on the US government directive to suspend access to Fable 5 and Mythos 5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic — &lt;a href="https://www.anthropic.com/news/claude-fable-5-mythos-5" rel="noopener noreferrer"&gt;Claude Fable 5 and Claude Mythos 5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;CNBC — &lt;a href="https://www.cnbc.com/2026/06/12/anthropic-disables-access-to-fable-5-and-mythos-5-to-comply-with-government-directive.html" rel="noopener noreferrer"&gt;Anthropic disables access to Fable 5 and Mythos 5 to comply with government directive&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;9to5Mac — &lt;a href="https://9to5mac.com/2026/06/12/anthropic-pulls-claude-mythos-5-and-claude-fable-5-following-us-government-directive/" rel="noopener noreferrer"&gt;Anthropic pulls Claude Mythos 5 and Claude Fable 5 following US government directive&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;U.S. Bureau of Industry and Security — &lt;a href="https://www.bis.gov/regulations" rel="noopener noreferrer"&gt;Export Administration Regulations (EAR)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This article is for informational purposes and is not legal advice. Dates and named individuals drawn from press reporting may be updated as the situation evolves; check official sources for the current access status.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>learning</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Claude Fable 5: A Developer's Guide to Anthropic's New Top</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:22:18 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</link>
      <guid>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</guid>
      <description>&lt;p&gt;Anthropic just moved the ceiling again. &lt;strong&gt;Claude Fable 5&lt;/strong&gt; is the company's most powerful, most intelligent model to date — and it isn't "Opus 4.9." It's a &lt;strong&gt;new tier that sits above the entire Opus family&lt;/strong&gt;. If you build with LLMs, that distinction matters: it changes how you think about model routing, cost, and which tasks deserve your most capable (and most expensive) reasoning.&lt;/p&gt;

&lt;p&gt;This is a practical, no-hype guide for developers. We'll cover what Claude Fable 5 actually is, how it slots into Anthropic's 2026 lineup, what changes in the API surface, when the premium is justified, and how to migrate existing code. Everything here is grounded in Anthropic's own model and API documentation — no invented benchmarks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Claude Fable 5?
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 is Anthropic's flagship reasoning model, exposed through the API as &lt;code&gt;claude-fable-5&lt;/code&gt;. The headline facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A new tier above Opus.&lt;/strong&gt; Until now, "Opus" was the top of the Claude lineup. Fable 5 establishes a level above it — positioned for the hardest reasoning, planning, and long-horizon agentic work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1M-token context window&lt;/strong&gt;, with up to &lt;strong&gt;128K tokens of output&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium pricing&lt;/strong&gt;: roughly &lt;strong&gt;$10 / $50 per million input / output tokens&lt;/strong&gt; — about double Opus 4.8's $5 / $25. That price tag is the whole point: Fable 5 is a precision tool you point at the problems that justify it, not a default for every call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive thinking only.&lt;/strong&gt; The fixed "thinking budget" knob is gone. The model decides how much to reason per request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental model to internalize: &lt;strong&gt;Fable 5 is the peak of a four-tier lineup, and capability scales with cost.&lt;/strong&gt; You don't run your whole pipeline on it any more than you'd render every frame of a film at maximum quality regardless of the shot. You route the hard parts to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Fable 5 Fits in the 2026 Anthropic Lineup
&lt;/h2&gt;

&lt;p&gt;Anthropic's current family is a ladder of capability-vs-cost. Picking the right rung per task is one of the highest-leverage habits an AI engineer can build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Reach for it when…&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Absolute peak capability; premium price&lt;/td&gt;
&lt;td&gt;The hardest reasoning, planning, cross-cutting refactors, and long-running agent loops where correctness outweighs cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top of the Opus family; a strong default in Claude Code&lt;/td&gt;
&lt;td&gt;Complex day-to-day work — planning, large refactors, tricky debugging — with a better capability/cost ratio than Fable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balanced, fast, 1M context&lt;/td&gt;
&lt;td&gt;The bulk of everyday coding, reading, and iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Haiku 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Light, fast, cheap&lt;/td&gt;
&lt;td&gt;High-volume small operations, classification, auxiliary steps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;model choice is a cost-and-quality lever.&lt;/strong&gt; A well-designed system routes each sub-task to the cheapest model that can do it well, and escalates to Fable 5 only where the payoff is real. If you want a structured, side-by-side breakdown of the 2026 models and how to choose between them, there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI model comparison course&lt;/a&gt; that goes deeper than any single table can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes in the API
&lt;/h2&gt;

&lt;p&gt;This is the part developers actually care about. Fable 5 shares the modern Claude request surface (the same one introduced with Opus 4.7/4.8), with a couple of sharp edges worth knowing before you ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adaptive thinking, not a token budget
&lt;/h3&gt;

&lt;p&gt;Fable 5 supports a single thinking mode: &lt;strong&gt;adaptive&lt;/strong&gt;. You no longer pass a fixed &lt;code&gt;budget_tokens&lt;/code&gt; value — the model regulates its own reasoning depth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;        &lt;span class="c1"&gt;# adaptive is the only thinking mode
&lt;/span&gt;    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;# strong default for coding/agentic work
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this module and add unit tests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things that will save you a debugging session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, or &lt;code&gt;budget_tokens&lt;/code&gt;.&lt;/strong&gt; They're removed on this generation and return &lt;code&gt;400&lt;/code&gt;. Steer behavior with prompting and the effort parameter instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt; on Fable 5.&lt;/strong&gt; Unlike Opus 4.8/4.7, an explicit &lt;code&gt;disabled&lt;/code&gt; returns &lt;code&gt;400&lt;/code&gt; here. To run without thinking, &lt;strong&gt;omit the &lt;code&gt;thinking&lt;/code&gt; parameter entirely&lt;/strong&gt;. This is the one genuinely new breaking change relative to the Opus 4.x line — easy to miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking text is omitted by default.&lt;/strong&gt; Thinking blocks still stream, but their content is empty unless you opt in with &lt;code&gt;thinking={"type": "adaptive", "display": "summarized"}&lt;/code&gt;. If your UI shows reasoning progress, set this or your users will see a long pause before output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The effort parameter is your real control knob
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;output_config.effort&lt;/code&gt; accepts &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;, and &lt;code&gt;max&lt;/code&gt;. It controls how much the model thinks &lt;em&gt;and&lt;/em&gt; acts — not just thinking depth. For coding and agentic workloads, &lt;strong&gt;&lt;code&gt;xhigh&lt;/code&gt; is the sweet spot&lt;/strong&gt; and is the effort level Claude Code defaults to. Treat effort as something to tune per route: &lt;code&gt;max&lt;/code&gt; for correctness-critical work, &lt;code&gt;medium&lt;/code&gt;/&lt;code&gt;low&lt;/code&gt; for latency-sensitive or simple steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large outputs need streaming
&lt;/h3&gt;

&lt;p&gt;With up to 128K output tokens available, non-streaming requests will hit SDK HTTP timeouts well before that ceiling. For anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;, stream and collect the final message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate the full migration plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_final_message&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What it still supports
&lt;/h3&gt;

&lt;p&gt;Fable 5 keeps the modern toolbox: &lt;strong&gt;structured outputs&lt;/strong&gt; (&lt;code&gt;output_config.format&lt;/code&gt;), &lt;strong&gt;prompt caching&lt;/strong&gt; (minimum cacheable prefix ~2,048 tokens), &lt;strong&gt;server-side compaction&lt;/strong&gt; for very long conversations, &lt;strong&gt;web search with dynamic filtering&lt;/strong&gt;, and &lt;strong&gt;task budgets&lt;/strong&gt; (beta) for telling an agent how many tokens it has for a full loop. If you're wiring these into a real application, the patterns matter as much as the model — that's the focus of this hands-on course on &lt;a href="https://cursuri-ai.ro/courses/construire-aplicatii-ai-python-sdk" rel="noopener noreferrer"&gt;building AI apps with the Anthropic and OpenAI SDKs&lt;/a&gt;, which walks from raw API calls to a production-shaped product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fable 5 for Agentic Coding
&lt;/h2&gt;

&lt;p&gt;The reason Fable 5 is interesting to developers specifically is long-horizon agentic execution: multi-file refactors, overnight runs, and tasks that span dozens of tool calls without a human correcting course.&lt;/p&gt;

&lt;p&gt;Three habits get the most out of it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Give the full task spec up front in one well-formed turn.&lt;/strong&gt; Fable 5 plans better when it has the complete goal early; drip-feeding requirements across many turns tends to cost more tokens and sometimes performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run at high or &lt;code&gt;xhigh&lt;/code&gt; effort with generous &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/strong&gt; Long-horizon coherence comes partly from the model reasoning more at each step — give it room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route deliberately.&lt;/strong&gt; Use Fable 5 for the planning and the genuinely hard edits; delegate mechanical or high-volume sub-steps to Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If terminal-first agentic coding is your world, the workflow discipline — &lt;code&gt;CLAUDE.md&lt;/code&gt; project memory, plan/edit/review loops, hooks as deterministic guardrails, and model routing across the lineup — is exactly what a dedicated &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code mastery course&lt;/a&gt; covers end to end. Agent architecture beyond a single tool (orchestration, delegation, parallelism) is its own discipline, well covered in this &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;course on designing autonomous AI agents&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context is a resource, even at 1M tokens
&lt;/h3&gt;

&lt;p&gt;A 1M-token window is not a license to dump everything into context. Irrelevant context dilutes the model's attention and costs tokens on every turn, no matter how capable the model is. The skill that separates engineers who "get lucky" with agents from those who ship reliable ones is deliberate &lt;strong&gt;context engineering&lt;/strong&gt; — what to load, what to compact, what to persist as memory across sessions. It's enough of a topic to warrant &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;its own course on context engineering and memory for agents&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Fable 5 Is Actually Worth the Premium
&lt;/h2&gt;

&lt;p&gt;Here's the honest cost reasoning, because "use the best model" is bad engineering advice.&lt;/p&gt;

&lt;p&gt;At roughly &lt;strong&gt;double the per-token cost of Opus 4.8&lt;/strong&gt;, Fable 5 pays off when the &lt;em&gt;cost of a wrong answer&lt;/em&gt; is high relative to the token bill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Worth it:&lt;/strong&gt; a complex cross-service refactor where a subtle regression costs hours of human review; a planning step that determines the trajectory of a long agent run; an analysis where correctness is non-negotiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not worth it:&lt;/strong&gt; routine edits, summaries, classifications, and the long tail of mechanical sub-tasks — those belong on Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful rule of thumb: let &lt;strong&gt;Fable 5 plan and decide&lt;/strong&gt;, and let cheaper models &lt;strong&gt;execute&lt;/strong&gt; the parts that are already well-specified. That keeps your bill proportional to difficulty instead of flat-out maximal.&lt;/p&gt;

&lt;p&gt;The other lever is effort. Because effort matters more on this generation than on any prior Opus, a Fable 5 call at &lt;code&gt;medium&lt;/code&gt; effort can be both cheaper and faster than an Opus 4.8 call at &lt;code&gt;xhigh&lt;/code&gt; for some tasks — so benchmark on your own workload rather than assuming "bigger model = always slower and pricier in practice."&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from Opus 4.8 / 4.7
&lt;/h2&gt;

&lt;p&gt;If you're already on the modern Claude surface, moving to Fable 5 is mostly a model-ID swap plus a couple of checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Swap the model string&lt;/strong&gt; to &lt;code&gt;claude-fable-5&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove &lt;code&gt;budget_tokens&lt;/code&gt;&lt;/strong&gt; if any remain → use &lt;code&gt;thinking={"type": "adaptive"}&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strip &lt;code&gt;temperature&lt;/code&gt; / &lt;code&gt;top_p&lt;/code&gt; / &lt;code&gt;top_k&lt;/code&gt;&lt;/strong&gt; — they &lt;code&gt;400&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replace last-assistant-turn prefills&lt;/strong&gt; with structured outputs (&lt;code&gt;output_config.format&lt;/code&gt;) or a system-prompt instruction — prefills &lt;code&gt;400&lt;/code&gt; on this generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit for &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt;&lt;/strong&gt; — it &lt;code&gt;400&lt;/code&gt;s on Fable 5. Omit &lt;code&gt;thinking&lt;/code&gt; instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-tune &lt;code&gt;effort&lt;/code&gt; per route&lt;/strong&gt; — start at &lt;code&gt;high&lt;/code&gt;, use &lt;code&gt;xhigh&lt;/code&gt; for coding/agentic, reserve &lt;code&gt;max&lt;/code&gt; for correctness-critical work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;display: "summarized"&lt;/code&gt;&lt;/strong&gt; if you surface reasoning in a UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Steering this generation is done through prompting and effort rather than sampling parameters, so the quality of your instructions matters more than ever. If your prompts were tuned years ago for older models, they're probably leaving capability on the table — a structured refresh of &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;prompt engineering fundamentals&lt;/a&gt; tends to pay for itself quickly on a model this capable.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on Hype vs. Reality
&lt;/h2&gt;

&lt;p&gt;Two guardrails worth keeping as the launch noise settles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fable 5 is the most capable model — not necessarily the default everywhere.&lt;/strong&gt; In Claude Code, for instance, Opus 4.8 remains a strong default; Fable 5 is the tier you select for the hardest work. "Most capable" and "default" are different claims.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version hygiene matters.&lt;/strong&gt; Fable 5 is the current peak, Opus 4.8 is the top of the Opus family, and Opus 4.7 is the previous Opus generation. Anything from the Claude 3.x line (or GPT-4-class / Gemini 2.x models) is outdated and shouldn't be treated as current when you're evaluating tutorials or benchmarks. Always confirm model IDs, limits, and pricing against the official docs, since they shift between releases.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  TL;DR Cheat Sheet
&lt;/h2&gt;

&lt;p&gt;For quick reference when you wire Claude Fable 5 into a real codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model ID:&lt;/strong&gt; &lt;code&gt;claude-fable-5&lt;/code&gt;. Context window 1M tokens, output up to 128K.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking:&lt;/strong&gt; &lt;code&gt;{"type": "adaptive"}&lt;/code&gt; is the only mode. To run without it, &lt;strong&gt;omit the parameter&lt;/strong&gt; — never send &lt;code&gt;{"type": "disabled"}&lt;/code&gt; (it returns &lt;code&gt;400&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effort:&lt;/strong&gt; &lt;code&gt;output_config.effort&lt;/code&gt; is your main control — &lt;code&gt;xhigh&lt;/code&gt; for coding and agents, &lt;code&gt;max&lt;/code&gt; when correctness is critical, &lt;code&gt;low&lt;/code&gt;/&lt;code&gt;medium&lt;/code&gt; for simple or latency-sensitive steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Removed (all &lt;code&gt;400&lt;/code&gt; if sent):&lt;/strong&gt; &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, &lt;code&gt;budget_tokens&lt;/code&gt;, and last-assistant-turn prefills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning in your UI:&lt;/strong&gt; add &lt;code&gt;"display": "summarized"&lt;/code&gt; to the thinking config, or the thinking text comes back empty.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large outputs:&lt;/strong&gt; stream anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing:&lt;/strong&gt; send the hard reasoning to Fable 5; keep routine and high-volume work on Sonnet 4.6 and Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt; isn't just a bigger Opus — it's a new top tier that reframes how you should think about model routing in 2026. The winning pattern is the same as it's always been, just sharper: use the most capable model where correctness compounds, push everything else down the ladder to cheaper models, and tune effort per route. Master that, and Fable 5 becomes a precision instrument rather than a line item that surprises you on the invoice.&lt;/p&gt;

&lt;p&gt;If you want to go from "I read about it" to "I ship with it," the courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, a Romanian AI-learning platform with deep, hands-on tracks on Claude Code, agent architecture, the Anthropic SDK, context engineering, and model selection — all kept current with the 2026 lineup, Fable 5 included.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? Save it, and drop your Fable 5 routing strategy in the comments — what are you sending to the top tier, and what stays on Sonnet?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Claude Code Workflow: Best Practices That Ship Code"</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 08 Jun 2026 08:51:17 +0000</pubDate>
      <link>https://dev.to/galian/claude-code-workflow-best-practices-that-ship-code-na</link>
      <guid>https://dev.to/galian/claude-code-workflow-best-practices-that-ship-code-na</guid>
      <description>&lt;p&gt;Most posts about Claude Code stop at "install it and say hi." This guide goes further. A reliable &lt;strong&gt;Claude Code workflow&lt;/strong&gt; comes down to a handful of habits that actually ship code: a lean &lt;code&gt;CLAUDE.md&lt;/code&gt;, plan mode before any edit, subagents for noisy research, parallel agents in git worktrees, hooks as guardrails, and a verification loop that kills hallucinations. These are the Claude Code best practices worth adopting in 2026 — the opinionated, hands-on version, no fluff.&lt;/p&gt;

&lt;p&gt;Quick grounding for anyone new: &lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; is Anthropic's official agentic coding tool — an AI coding assistant that lives in your terminal. It reads your codebase, edits files, runs commands, and integrates with your dev tools through natural language. It runs in the terminal, in VS Code/Cursor and JetBrains, in a desktop app, on the web at &lt;code&gt;claude.ai/code&lt;/code&gt;, and in the Claude iOS app — all sharing the same engine, so your &lt;code&gt;CLAUDE.md&lt;/code&gt;, settings, and MCP servers travel with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most Claude Code workflow advice stops at "install and say hi"
&lt;/h2&gt;

&lt;p&gt;The beginner tutorials get you to a prompt and stop. The "Claude Code vs Cursor" posts argue about UI and never touch cost or parallelism. What nobody connects is the &lt;em&gt;layered setup&lt;/em&gt; — the thing that turns a clever autocomplete into a teammate you can delegate to. That layering (memory + skills + hooks + subagents + MCP) plus running several agents at once is where the real productivity lives, and it's exactly what gets fragmented across ten separate posts.&lt;/p&gt;

&lt;p&gt;For the structured version of everything below — terminal-first agentic coding, multi-file edits, git, MCP, headless CI/CD, subagents, and the security model — that's the spine of the &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code Mastery course on Cursuri-AI.ro&lt;/a&gt; (Romanian-language, so plan for that if English is your only language). A few relevant deep-dives are linked along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The right mental model: a teammate you delegate to, not autocomplete
&lt;/h2&gt;

&lt;p&gt;The single biggest mindset shift: stop typing instructions, start describing outcomes. Autocomplete predicts the next token. An agentic tool plans, executes with real dev tools, evaluates the result, and adjusts — a loop, not a guess. Treat it like autocomplete and you'll micromanage every line and get autocomplete-level value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Describe outcomes, not keystrokes — and let it interview you first
&lt;/h3&gt;

&lt;p&gt;Instead of "open &lt;code&gt;auth.ts&lt;/code&gt;, add a function &lt;code&gt;validateToken&lt;/code&gt;," describe the goal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add JWT validation to the login flow. Follow our existing error-handling pattern. Write tests and run them."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then — the part people skip — let it ask questions first. For anything non-trivial, add "ask me anything unclear before you start." A good agent will surface the ambiguity (which token library? refresh tokens?) before writing 200 lines down the wrong road. Treating the prompt like a brief to a competent junior, not a command to a compiler, is genuinely a prompting skill — the same one taught in the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;, and it transfers directly to writing a good &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CLAUDE.md that actually gets followed
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is persistent, per-project (or per-user) memory loaded every session. It's the highest-leverage file in the repo, and the most commonly abused.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep it under ~60 lines (and why long files get half-ignored)
&lt;/h3&gt;

&lt;p&gt;The temptation is to dump your entire style guide in there. Don't. A 400-line &lt;code&gt;CLAUDE.md&lt;/code&gt; competes with your actual prompt for attention and gets partially ignored. Keep it under ~60 lines as a rule of thumb — not an official limit, but it holds up. Stable rules only: conventions, the "definition of done," and the things that must never be touched.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use pnpm, not npm.
&lt;span class="p"&gt;-&lt;/span&gt; TypeScript strict mode. No &lt;span class="sb"&gt;`any`&lt;/span&gt;.

&lt;span class="gu"&gt;## Definition of done&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Tests pass (&lt;span class="sb"&gt;`pnpm test`&lt;/span&gt;) before you say it's done.
&lt;span class="p"&gt;-&lt;/span&gt; No new lint warnings.

&lt;span class="gu"&gt;## Never touch&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not edit files in &lt;span class="sb"&gt;`/generated`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Do not change the public API in &lt;span class="sb"&gt;`src/api/`&lt;/span&gt; without asking.

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; API style: see docs/api-style.md.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the shape of it: short, declarative, points to longer docs instead of inlining them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Let Claude update CLAUDE.md after every bug
&lt;/h3&gt;

&lt;p&gt;Here's the habit that compounds. After fixing a non-obvious bug, ask it to record the lesson: "add a one-line rule to &lt;code&gt;CLAUDE.md&lt;/code&gt; so this doesn't happen again." Over weeks, the file becomes a distilled record of the project's real gotchas — earned, not guessed. Recent versions also save learnings automatically, but curating the explicit rules by hand keeps the file tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plan mode before code: catch the wrong problem early
&lt;/h2&gt;

&lt;p&gt;Plan mode is a read-only permission mode: Claude investigates and proposes an approach without touching a single file. You cycle into it with &lt;strong&gt;Shift+Tab&lt;/strong&gt;, or start there with &lt;code&gt;--permission-mode plan&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start a session in plan mode — read-only until you approve&lt;/span&gt;
claude &lt;span class="nt"&gt;--permission-mode&lt;/span&gt; plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Approve the plan, not the diff — correcting a plan is far cheaper
&lt;/h3&gt;

&lt;p&gt;This is the one to drill into any junior. If the plan is wrong, you fix it in one sentence. If the &lt;em&gt;diff&lt;/em&gt; is wrong, you've already paid for 300 lines of edits across five files, and now you're untangling them. Reviewing intent before implementation is where plan mode earns its keep on anything bigger than a one-liner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ask for step one, review, then step two
&lt;/h3&gt;

&lt;p&gt;For large features, don't let it run the whole plan unattended. "Do step one, then stop and show me." Review, then "continue." It feels slower; it's faster, because you catch a wrong turn at step one instead of step six.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layered setup: skills, hooks, subagents, and MCP
&lt;/h2&gt;

&lt;p&gt;This is the part the comparison posts never assemble into one picture. Four mechanisms, each solving a different problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills as folders with a "gotchas" section
&lt;/h3&gt;

&lt;p&gt;Skills are markdown-based reusable workflows, invoked with &lt;code&gt;/&amp;lt;name&amp;gt;&lt;/code&gt; or auto-loaded when relevant. As of 2026 skills and slash commands are unified — a skill &lt;em&gt;is&lt;/em&gt; its slash command. Bundled ones include &lt;code&gt;/code-review&lt;/code&gt;, &lt;code&gt;/debug&lt;/code&gt;, and &lt;code&gt;/batch&lt;/code&gt;. Write your own for anything you do more than twice (&lt;code&gt;/deploy&lt;/code&gt;, &lt;code&gt;/review-pr&lt;/code&gt;), and give every skill a "gotchas" section listing the ways this task usually goes wrong. That section does more for reliability than the happy-path instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooks for safety and automation
&lt;/h3&gt;

&lt;p&gt;This is the crucial distinction: &lt;strong&gt;instructions in &lt;code&gt;CLAUDE.md&lt;/code&gt; and skills are requests, not guarantees.&lt;/strong&gt; If something &lt;em&gt;must&lt;/em&gt; happen, it belongs in a hook. Hooks fire on lifecycle events — &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;SessionStart&lt;/code&gt; — and can run a shell command, an HTTP request, a prompt, or a subagent. Common uses: auto-format on every file write, run a quick test pass after edits, and block obviously unsafe shell commands before they execute. Guardrails go in hooks; everything else is a polite suggestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagents for noisy research so your main context stays clean
&lt;/h3&gt;

&lt;p&gt;A subagent runs its own agentic loop in an isolated context window and returns only a summary (spawned via the Task tool). The value is context hygiene: when it needs to read fifteen files to answer "where is rate limiting enforced," a subagent does the digging and hands back a paragraph, instead of flooding the main session with fifteen files no one will look at again. Don't confuse these with &lt;em&gt;agent teams&lt;/em&gt; — that's an experimental, disabled-by-default feature where independent sessions message each other and share a task list. Different thing.&lt;/p&gt;

&lt;p&gt;If multi-agent orchestration patterns (ReAct, planners, delegation) are the real goal, the &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents architecture and automation course&lt;/a&gt; covers the theory that Claude Code's subagents put into practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP: connecting the issue tracker, DB, and browser
&lt;/h3&gt;

&lt;p&gt;MCP (Model Context Protocol) is an open standard for wiring Claude Code to external data and tools — Google Drive, Jira, Slack, databases, browsers. Configure servers with &lt;code&gt;claude mcp&lt;/code&gt; or &lt;code&gt;--mcp-config&lt;/code&gt;. This is what moves it from "edits files" to "operates your actual workflow": read the design doc in Drive, update the Jira ticket, query the staging DB to confirm a schema. MCP tool search is on by default so all those tools don't blow up your context cost. Pair an MCP connection with a skill that &lt;em&gt;teaches&lt;/em&gt; Claude how to use it, and the combination is far better than either alone.&lt;/p&gt;

&lt;p&gt;Worth understanding the protocol itself if you build internal tools — Claude Code is a first-class MCP &lt;em&gt;client&lt;/em&gt;, and there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP (Model Context Protocol) course&lt;/a&gt; on building the servers it connects to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Claude Code parallel agents with git worktrees
&lt;/h2&gt;

&lt;p&gt;Git worktree isolation is native (&lt;code&gt;--worktree&lt;/code&gt; / &lt;code&gt;-w&lt;/code&gt;), and it's the single change that most increases throughput. The idea: each agent works in its own checkout of the repo, so two agents never fight over the same files.&lt;/p&gt;

&lt;h3&gt;
  
  
  One bugfix agent + one feature agent, zero file conflicts
&lt;/h3&gt;

&lt;p&gt;A typical setup: one agent grinding through a flaky-test fix, another building a small feature, each in its own worktree on its own branch. They can't step on each other because they're literally in different directories. You monitor both from the agent view.&lt;/p&gt;

&lt;h3&gt;
  
  
  The worktree commands and how to merge back
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Spin up an isolated worktree-backed session&lt;/span&gt;
claude &lt;span class="nt"&gt;--worktree&lt;/span&gt;

&lt;span class="c"&gt;# Or, manage git worktrees yourself and run an agent in each&lt;/span&gt;
git worktree add ../app-bugfix   bugfix/flaky-auth-test
git worktree add ../app-feature  feature/csv-export
&lt;span class="c"&gt;# then run `claude` inside each directory&lt;/span&gt;

&lt;span class="c"&gt;# Watch full sessions running in parallel&lt;/span&gt;
claude agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each lands on its own branch; review the diffs and merge back through normal PRs — same review bar as any human-authored branch, no exceptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many concurrent agents is actually sane
&lt;/h3&gt;

&lt;p&gt;Be honest about the bottleneck: it's &lt;em&gt;your&lt;/em&gt; review capacity, not the tool's. Two or three agents are supervisable. Push past that and you're rubber-stamping diffs you didn't really read, which defeats the point. The cap isn't the machine — it's how many parallel changes you can genuinely verify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Killing hallucinations with an evidence-based verification loop
&lt;/h2&gt;

&lt;p&gt;Agentic tools confidently invent APIs, routes, config keys, and permissions. The fix isn't "be careful," it's a repeatable loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Show me the test output," not "it works"
&lt;/h3&gt;

&lt;p&gt;Never accept "it works." Accept the command it ran and the output it produced. "Run the tests and paste the output." "Show me the curl request and the actual response." An agent that claims a green build but can't show a passing run just hallucinated a green build. Make evidence the default, not a special request.&lt;/p&gt;

&lt;h3&gt;
  
  
  When it gets confused, stop and ask it to diagnose
&lt;/h3&gt;

&lt;p&gt;If it starts thrashing — two failed corrections in a row — stop asking for fixes and ask for a diagnosis: "Where exactly is this breaking, and what's your evidence?" Forcing it to locate the failure beats letting it patch blindly. Grounding the model in real, verifiable output is core to using any LLM in production reliably — the broader discipline is the subject of &lt;a href="https://cursuri-ai.ro/courses/introducere-ai-engineering" rel="noopener noreferrer"&gt;Introducere în AI Engineering&lt;/a&gt;, which covers evals and reliability alongside the tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context hygiene: the habit that changes everything
&lt;/h2&gt;

&lt;p&gt;Long sessions degrade. The model's attention spreads thin, old failed attempts pollute the context, and quality quietly drops. Managing context is 80% of getting consistent results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear the session after two failed corrections instead of arguing
&lt;/h3&gt;

&lt;p&gt;The hardest rule to follow: if you've corrected the same thing twice and it's still wrong, don't correct a third time. Clear the session (&lt;code&gt;/clear&lt;/code&gt;), re-state the goal cleanly with the lessons learned, and start fresh. Arguing with a confused context is the single biggest time sink there is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compact manually around 50% context
&lt;/h3&gt;

&lt;p&gt;Don't wait for auto-compaction. Around half-full, compact the session (&lt;code&gt;/compact&lt;/code&gt;) to summarize it and reclaim room, which keeps responses sharp. And when it goes genuinely off the rails, use conversation-rewind to roll back to an earlier point instead of trying to talk it back on course.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost math nobody shows you
&lt;/h2&gt;

&lt;p&gt;Here's what comparison posts skip. Two ways to pay: a Claude subscription (Pro, Max, Team, Enterprise) or pay-as-you-go on the Anthropic API. Most surfaces require one of these — Claude Code requires a paid plan, and the desktop app explicitly needs a paid subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro vs Max vs API — what to actually weigh
&lt;/h3&gt;

&lt;p&gt;As a snapshot (always confirm current numbers on &lt;code&gt;claude.com/pricing&lt;/code&gt; — plan structures shift): Pro is around &lt;strong&gt;$20/month&lt;/strong&gt;, Max is &lt;strong&gt;$100/month (5x)&lt;/strong&gt; or &lt;strong&gt;$200/month (20x)&lt;/strong&gt;, and the API is pure per-token. The lever that dominates the bill: &lt;strong&gt;output tokens cost ~5x input.&lt;/strong&gt; On the API, Opus 4.8 lists at &lt;strong&gt;$5/MTok input and $25/MTok output&lt;/strong&gt;; Sonnet 4.6 at &lt;strong&gt;$3/$15&lt;/strong&gt;; Haiku 4.5 at &lt;strong&gt;$1/$5&lt;/strong&gt;. Agentic coding generates a &lt;em&gt;lot&lt;/em&gt; of output, so output pricing — not input — is what you feel.&lt;/p&gt;

&lt;h3&gt;
  
  
  The model default is tier-dependent (a common myth)
&lt;/h3&gt;

&lt;p&gt;There is no single fixed default. It resolves by account type: Max, Team Premium, Enterprise pay-as-you-go, and the Anthropic API default to &lt;strong&gt;Opus 4.8&lt;/strong&gt;; Pro, Team Standard, and Enterprise seats default to &lt;strong&gt;Sonnet 4.6&lt;/strong&gt;. Switch anytime with &lt;code&gt;/model&lt;/code&gt; or &lt;code&gt;--model&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pick a model by alias or full name&lt;/span&gt;
claude &lt;span class="nt"&gt;--model&lt;/span&gt; opus
claude &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-6

&lt;span class="c"&gt;# opusplan: Opus to plan, Sonnet to execute&lt;/span&gt;
/model opusplan

&lt;span class="c"&gt;# 1M-token context variant (Opus and Sonnet only — not Haiku)&lt;/span&gt;
/model claude-opus-4-8[1m]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;opusplan&lt;/code&gt; is a strong default — Opus's reasoning where it matters (planning), Sonnet's speed and lower cost for execution. Two caveats: Sonnet's 1M-token context needs usage credits on &lt;em&gt;every&lt;/em&gt; plan (including Max), while Opus 1M is included on Max/Team/Enterprise; and Haiku 4.5 is 200k context, not 1M. If a new model doesn't show up in &lt;code&gt;/model&lt;/code&gt;, you're probably on an older build — run &lt;code&gt;claude update&lt;/code&gt; and it'll appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  API vs Max: which actually wins
&lt;/h3&gt;

&lt;p&gt;On the API with heavy daily coding, the bill gets unpredictable enough that a flat Max plan removes the anxiety of watching output tokens tick up mid-refactor. Code with it most days and a flat plan usually wins over metered API billing on raw cost &lt;em&gt;and&lt;/em&gt; peace of mind. Use it occasionally, and the API's pay-only-for-what-you-use can be cheaper. Cost-optimizing LLM usage in production — caching, model routing, batching — is a discipline of its own, covered in &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Integrare Avansată LLM în Aplicații de Producție&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code vs Cursor — when to reach for each
&lt;/h2&gt;

&lt;p&gt;Not a winner-takes-all. Cursor's IDE-native, inline experience is excellent for tight edit-review-edit loops where you want to stay in the editor. Reach for Claude Code's terminal when you want &lt;em&gt;agentic autonomy&lt;/em&gt; — multi-file refactors it drives end to end, headless runs in CI, parallel worktree agents, scripting it into pipelines. And since the VS Code/Cursor extension shares the same engine, it's not strictly either/or: use the extension for inline diffs and the terminal for the heavy autonomous work. Pick by task, not tribe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security and team scaling notes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Approval modes, sandboxing, and reviewing AI commits
&lt;/h3&gt;

&lt;p&gt;Permission modes are real guardrails: &lt;code&gt;default&lt;/code&gt;, &lt;code&gt;acceptEdits&lt;/code&gt;, &lt;code&gt;plan&lt;/code&gt;, and &lt;code&gt;bypassPermissions&lt;/code&gt;, selectable via &lt;code&gt;--permission-mode&lt;/code&gt; and cycled with Shift+Tab. Keep destructive operations behind approval, use hooks to outright block unsafe shell commands, and — non-negotiable — &lt;strong&gt;every AI-authored commit goes through the same review as a human's.&lt;/strong&gt; An agent that can run commands is a powerful tool and a real attack surface; treat its output as untrusted until reviewed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What changes when more than one person shares the conventions
&lt;/h3&gt;

&lt;p&gt;Once a team shares a repo, &lt;code&gt;CLAUDE.md&lt;/code&gt; becomes shared infrastructure: a change to it changes everyone's agent behavior, so it goes through PR review like code. Skills and hooks get version-controlled and packaged as plugins (versioned bundles of skills, subagents, hooks, and MCP servers) distributed through a marketplace, so the whole team runs the same setup. The thing that breaks at scale is uncoordinated worktree agents on the same files — keep agents on separate branches and merge through PRs, exactly as you would with people.&lt;/p&gt;

&lt;h2&gt;
  
  
  A copy-paste Claude Code workflow starter setup
&lt;/h2&gt;

&lt;p&gt;Drop a &lt;code&gt;CLAUDE.md&lt;/code&gt; like the one above in your repo root. Add a couple of hooks (auto-format on &lt;code&gt;PostToolUse&lt;/code&gt;, block unsafe commands on &lt;code&gt;PreToolUse&lt;/code&gt;). Write one custom slash command for your most repeated task. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the CLI (macOS/Linux/WSL)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://claude.ai/install.sh | bash

&lt;span class="c"&gt;# Start in your project, in plan mode&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
claude &lt;span class="nt"&gt;--permission-mode&lt;/span&gt; plan

&lt;span class="c"&gt;# Headless mode for scripting and CI&lt;/span&gt;
git diff main &lt;span class="nt"&gt;--name-only&lt;/span&gt; | claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"review these changed files for security issues"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's enough to feel the difference the same day.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR — the 8 habits that matter most
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Describe outcomes, not keystrokes&lt;/strong&gt; — and let it interview you before it starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep &lt;code&gt;CLAUDE.md&lt;/code&gt; under ~60 lines&lt;/strong&gt; — stable rules only; let it append lessons after bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan mode before code&lt;/strong&gt; — approving a plan is far cheaper than untangling a wrong diff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks for anything that must happen&lt;/strong&gt; — &lt;code&gt;CLAUDE.md&lt;/code&gt; and skills are requests, not guarantees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents for noisy research&lt;/strong&gt; — keep the main context clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel agents in git worktrees&lt;/strong&gt; — capped by &lt;em&gt;your&lt;/em&gt; review capacity, ~2-3 in practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demand evidence&lt;/strong&gt; — "show me the test output," never "it works."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context hygiene&lt;/strong&gt; — clear the session after two failed corrections; compact around 50%.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is exotic. It's the boring discipline of treating an agent like a capable teammate: clear briefs, guardrails, evidence, and review. That's the Claude Code workflow that turns &lt;strong&gt;agentic coding&lt;/strong&gt; from a party trick into the thing that clears your queue. For the structured, end-to-end path through all of it, the &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code Mastery course&lt;/a&gt; walks the same terrain in order — just note it's in Romanian. Either way, adopt the habits above and tune them to your own repo.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Prompt Caching with Claude: How We Cut AI API Costs by 90% in Production (2026 Guide)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 01 Jun 2026 09:02:05 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</link>
      <guid>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Anthropic's prompt caching gives you a &lt;strong&gt;90% discount&lt;/strong&gt; on cached input tokens and up to &lt;strong&gt;85% lower latency&lt;/strong&gt; on long-context calls. But the wins only show up if you understand cache breakpoints, TTLs, and what actually invalidates the cache. This guide walks through 5 production patterns we use, real benchmarks, and the pitfalls that silently kill your hit rate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost problem nobody warns you about
&lt;/h2&gt;

&lt;p&gt;When you ship anything serious with Claude — an agent, a RAG system, a code assistant, a customer support bot — you discover the same uncomfortable truth: &lt;strong&gt;your input token bill dwarfs your output bill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A typical agent loop looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt: ~3,000 tokens (instructions, persona, constraints)&lt;/li&gt;
&lt;li&gt;Tool definitions: ~4,000 tokens (JSON schemas for 10–20 tools)&lt;/li&gt;
&lt;li&gt;Conversation history: 5,000–50,000 tokens (grows every turn)&lt;/li&gt;
&lt;li&gt;RAG context: 5,000–20,000 tokens per query&lt;/li&gt;
&lt;li&gt;User message: ~200 tokens&lt;/li&gt;
&lt;li&gt;Model output: ~500 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every single turn, you re-send the same system prompt, the same tool definitions, and most of the conversation history. On Claude Sonnet 4.6 at $3 per million input tokens, a 15,000-token prefix sent across 20 conversation turns costs you &lt;strong&gt;$0.90 per conversation in input alone&lt;/strong&gt; — before you've generated a single useful token of output.&lt;/p&gt;

&lt;p&gt;Multiply that by 10,000 daily active users and you're burning &lt;strong&gt;$9,000/day&lt;/strong&gt; just to re-tokenize content you already sent.&lt;/p&gt;

&lt;p&gt;This is exactly what prompt caching fixes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude's prompt caching actually does
&lt;/h2&gt;

&lt;p&gt;Anthropic's prompt caching lets the API store the internal state for a prefix of your prompt and reuse it on subsequent requests. Two numbers matter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Pricing relative to base input&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache write&lt;/strong&gt; (first time a prefix is seen)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1.25×&lt;/strong&gt; base input cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache read&lt;/strong&gt; (subsequent hits)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.10×&lt;/strong&gt; base input cost (90% off)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You pay a small one-time premium to write the cache, then every hit after that is 10% of the normal price. The break-even point is &lt;strong&gt;after the second request&lt;/strong&gt; — anything more than one read and you're saving money.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mental model
&lt;/h3&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;prefix tree&lt;/strong&gt; with checkpoints. You mark up to 4 points in your prompt with &lt;code&gt;cache_control&lt;/code&gt;, and Claude caches everything from the start of the prompt up to each breakpoint. On the next request, if the prefix matches &lt;strong&gt;byte-for-byte&lt;/strong&gt;, you get a cache hit.&lt;/p&gt;

&lt;p&gt;The order Claude processes the prompt is fixed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools → system → messages (oldest → newest)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your cache breakpoints must respect that order. You cannot cache a later block without caching everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TTL trap
&lt;/h3&gt;

&lt;p&gt;The default cache TTL is &lt;strong&gt;5 minutes&lt;/strong&gt;, refreshed on every read. A 1-hour TTL is available as a premium option (costs more on write, same on read). Most teams over-pay for the 1-hour cache when 5 minutes would have served them fine — if your traffic is steady, every request refreshes the TTL and the cache effectively lives forever.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Want to go deeper on Claude's API mechanics in production? Prompt caching, tool use, batch API, streaming, and cost optimization are covered in depth in the &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration course on Cursuri-AI.ro&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 1: Cache the system prompt and tool definitions
&lt;/h2&gt;

&lt;p&gt;This is the highest-ROI change you can make, and most codebases get it wrong on the first try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong&lt;/strong&gt; (no caching):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Right&lt;/strong&gt; (cached):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# ... more tools ...
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache breakpoint on the last tool
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the system block&lt;/strong&gt; caches everything up through the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the last tool&lt;/strong&gt; caches everything through the tool definitions — this is critical because tools are evaluated &lt;em&gt;before&lt;/em&gt; system per the processing order above.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Wait — that's actually wrong as stated. Let me correct: because the order is &lt;code&gt;tools → system → messages&lt;/code&gt;, putting &lt;code&gt;cache_control&lt;/code&gt; on the &lt;strong&gt;last tool&lt;/strong&gt; caches just the tools, and putting it on &lt;strong&gt;system&lt;/strong&gt; caches tools + system. You typically only need the system breakpoint; it covers everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading the response
&lt;/h3&gt;

&lt;p&gt;The API returns cache stats in &lt;code&gt;response.usage&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_creation_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# tokens written to cache (1.25x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_read_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# tokens read from cache (0.10x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# uncached tokens (1x cost)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the first request: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is high, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is 0.&lt;br&gt;
On every subsequent request within 5 minutes: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is 0, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is high. That's the win condition.&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 2: Cache conversation history with rolling breakpoints
&lt;/h2&gt;

&lt;p&gt;In a multi-turn agent, the conversation grows on every turn. If you only cache the system prompt, you're still re-sending and re-billing every prior turn at full price.&lt;/p&gt;

&lt;p&gt;The trick is to add a &lt;strong&gt;second cache breakpoint&lt;/strong&gt; on the most recent assistant message, so the entire conversation up to that point is cached:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_messages_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    history: list of {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: ...}
    new_user_message: str
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Add cache breakpoint on the last historical message
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every new turn reads the entire prior conversation from cache. Cost per turn becomes nearly constant instead of growing linearly with conversation length.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4-breakpoint budget
&lt;/h3&gt;

&lt;p&gt;Claude allows up to &lt;strong&gt;4 cache breakpoints&lt;/strong&gt; per request. A common production layout uses all four:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 1&lt;/strong&gt;: end of tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 2&lt;/strong&gt;: end of system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 3&lt;/strong&gt;: end of "stable" conversation history (turns 1 through N-2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 4&lt;/strong&gt;: end of "recent" history (turn N-1)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you a layered cache: tools rarely change, system rarely changes, old history never changes, recent history is sliding. Each layer hits or misses independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Cache few-shot examples separately from the user query
&lt;/h2&gt;

&lt;p&gt;Few-shot prompting is one of the highest-leverage techniques in production LLM apps — and one of the most expensive if you don't cache. A typical few-shot block with 5–10 examples can run 8,000–15,000 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Example 1:
Input: ...
Output: ...

Example 2:
Input: ...
Output: ...

[... 8 more examples ...]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a classifier. Categorize support tickets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache the examples
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_ticket&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critical rule: &lt;strong&gt;put the variable content last&lt;/strong&gt;. Cache only works on prefix matches. If your user-specific data is in the middle of the prompt, everything after it becomes uncacheable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: RAG with cached document chunks
&lt;/h2&gt;

&lt;p&gt;RAG systems are notorious for blowing up token bills because the retrieved context is large and unique per query. You can't cache the retrieved chunks themselves (they change), but you &lt;em&gt;can&lt;/em&gt; cache the surrounding framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ~2000 tokens, stable
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For RAG with a stable knowledge base (corporate docs, product manuals, codebases), there's a more advanced pattern: &lt;strong&gt;pre-tile your documents into fixed-size cacheable blocks&lt;/strong&gt; and choose your retrieval strategy to favor returning whole blocks rather than slices. You trade some retrieval precision for massive cost savings on hot documents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you build RAG systems for production, the &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation) course on Cursuri-AI.ro&lt;/a&gt; covers caching strategies, retrieval optimization, hybrid search, and eval pipelines end-to-end.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 5: Cache tool results in long-running agents
&lt;/h2&gt;

&lt;p&gt;Agent loops are caching's sweet spot. An agent runs &lt;code&gt;tool_call → tool_result → tool_call → tool_result&lt;/code&gt; cycles, and each iteration the prompt grows by the new tool result. Without caching, you re-bill the entire history every iteration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add cache breakpoint to the latest message
&lt;/span&gt;        &lt;span class="n"&gt;cached_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}],&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

        &lt;span class="c1"&gt;# Append assistant turn + tool results, loop
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a 15-step agent run with a 4,000-token system prompt and 8,000-token tools, this pattern cuts input cost by &lt;strong&gt;~80–88%&lt;/strong&gt; versus uncached.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent loops, tool design, multi-step planning and cost modeling are the focus of the &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation course on Cursuri-AI.ro&lt;/a&gt; — built around the same Claude Agent SDK patterns shown here.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Real benchmarks: before vs after
&lt;/h2&gt;

&lt;p&gt;These numbers are from a production code-review agent running on Claude Sonnet 4.6, averaged over 1,000 conversations of 12 turns each.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Uncached&lt;/th&gt;
&lt;th&gt;Cached&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg input tokens per turn&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avg billed input cost per turn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0552&lt;/td&gt;
&lt;td&gt;$0.0061&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−89%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg time-to-first-token&lt;/td&gt;
&lt;td&gt;1,840 ms&lt;/td&gt;
&lt;td&gt;380 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg total cost per 12-turn conversation&lt;/td&gt;
&lt;td&gt;$0.66&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate (warm)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;96.3%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency win surprised us as much as the cost win. Cache reads skip the prompt processing phase entirely, which dominates time-to-first-token for long contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pitfalls that silently kill your hit rate
&lt;/h2&gt;

&lt;p&gt;These are mistakes we've made or seen in production code reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Whitespace and formatting drift
&lt;/h3&gt;

&lt;p&gt;Cache hits require &lt;strong&gt;byte-exact prefix matches&lt;/strong&gt;. If your system prompt is built with f-strings and you add a timestamp, conditional newline, or trailing space, you invalidate the cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BREAKS the cache every minute
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Current time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Works
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Pass time as a separate user message field if needed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Audit your prompts for hidden variability: locale-formatted numbers, dict iteration order in older Pythons, tool definitions where field order changes between deploys.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Reordering tool definitions
&lt;/h3&gt;

&lt;p&gt;If you generate tool schemas from a dict and the dict iteration order changes between runs, your cache evaporates. &lt;strong&gt;Always sort tool definitions&lt;/strong&gt; before sending:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_tools&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Wrong breakpoint placement
&lt;/h3&gt;

&lt;p&gt;Breakpoints must come &lt;strong&gt;after&lt;/strong&gt; the content you want to cache, not before. The breakpoint marks "cache everything up to here." Putting it on the user message instead of the system prompt is a common rookie mistake.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Caching tiny prefixes
&lt;/h3&gt;

&lt;p&gt;There's a minimum cacheable size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet &amp;amp; Opus&lt;/strong&gt;: 1,024 tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt;: 2,048 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below the minimum, the &lt;code&gt;cache_control&lt;/code&gt; is silently ignored — the API doesn't error, it just doesn't cache. Always check &lt;code&gt;response.usage.cache_creation_input_tokens &amp;gt; 0&lt;/code&gt; on your first request to confirm the cache actually wrote.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ignoring the 5-minute TTL on bursty traffic
&lt;/h3&gt;

&lt;p&gt;If your traffic is bursty — heavy during business hours, dead overnight — the 5-minute cache will expire between sessions and you'll pay the write premium every time. For bursty patterns, either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the 1-hour TTL (more expensive write, same read price)&lt;/li&gt;
&lt;li&gt;Or send a small "keep-alive" request every 4 minutes during expected idle windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Mixing cached and uncached models
&lt;/h3&gt;

&lt;p&gt;Cache is &lt;strong&gt;model-specific&lt;/strong&gt;. If your code falls back from Sonnet 4.6 to Haiku 4.5 on rate limit, the Haiku call has no cache history. Either keep fallback paths uncached, or build separate caches per model.&lt;/p&gt;




&lt;h2&gt;
  
  
  When NOT to use prompt caching
&lt;/h2&gt;

&lt;p&gt;Caching has overhead. Skip it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-shot calls with no shared prefix&lt;/strong&gt; — single-request classification, one-off summarization. The 1.25× write premium is pure loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-variability prompts&lt;/strong&gt; — if each request has different boilerplate, you're paying write premium for nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts below the minimum&lt;/strong&gt; — short prompts can't be cached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost is already negligible&lt;/strong&gt; — if you spend $20/month on the API, the engineering time to optimize caching costs more than the savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful heuristic: &lt;strong&gt;if your stable prefix is ≥2,000 tokens AND you make ≥3 requests per 5-minute window with that prefix, cache it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting it together: a production checklist
&lt;/h2&gt;

&lt;p&gt;Before you ship a Claude integration in 2026, run this list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] System prompt has &lt;code&gt;cache_control&lt;/code&gt; set&lt;/li&gt;
&lt;li&gt;[ ] Tool definitions are sorted and stable&lt;/li&gt;
&lt;li&gt;[ ] User-variable content is at the end of the prompt, not in the middle&lt;/li&gt;
&lt;li&gt;[ ] Cache stats (&lt;code&gt;cache_read_input_tokens&lt;/code&gt;) are logged and dashboarded&lt;/li&gt;
&lt;li&gt;[ ] Cache hit rate is monitored — alert if it drops below 80%&lt;/li&gt;
&lt;li&gt;[ ] No timestamps, request IDs, or random data injected into cached blocks&lt;/li&gt;
&lt;li&gt;[ ] First-request cache write is verified in tests&lt;/li&gt;
&lt;li&gt;[ ] Fallback model paths handle cache absence cleanly&lt;/li&gt;
&lt;li&gt;[ ] 5-minute vs 1-hour TTL choice is documented with reasoning&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Prompt caching is the single highest-leverage cost optimization for Claude in production. The mechanics are simple, but the gotchas — formatting drift, reorder bugs, minimum sizes, TTL mismatches — are where teams leave money on the table.&lt;/p&gt;

&lt;p&gt;If you treat caching as a first-class concern from day one, you ship AI features that are 5–10× cheaper to operate than the naive implementation. If you bolt it on later, you spend weeks chasing cache misses through your logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to go deeper
&lt;/h3&gt;

&lt;p&gt;I write about production AI engineering — Claude API, multi-agent systems, RAG, cost optimization — on &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an interactive learning platform with an always-available AI tutor that walks you through every concept and reviews your code. The four courses most relevant to what's in this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration&lt;/a&gt;&lt;/strong&gt; — Claude API in production: prompt caching, tool use, batch API, streaming, error handling, retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;&lt;/strong&gt; — structured prompting, few-shot patterns, evaluation, prompt versioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation&lt;/a&gt;&lt;/strong&gt; — agent loops, tool design, multi-agent orchestration, cost modeling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt;&lt;/strong&gt; — retrieval, embeddings, hybrid search, caching, eval pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Course content is delivered in Romanian (the platform's primary audience), but the code, frameworks, and patterns are language-agnostic — the IT Pro track is built specifically for engineers shipping AI in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's your cache hit rate in production?&lt;/strong&gt; Drop a comment with your setup — I'm collecting patterns for a follow-up post on &lt;strong&gt;caching at the multi-tenant scale&lt;/strong&gt; (per-customer cache namespaces, cache warm-up strategies, and the cost model when you have 10,000+ concurrent users).&lt;/p&gt;

&lt;p&gt;If this helped, a ❤️ or a 🦄 keeps it visible for other devs hitting the same cost wall. Follow for more deep-dives on Claude in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic's official prompt caching docs: &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;docs.anthropic.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude API pricing: &lt;a href="https://www.anthropic.com/pricing" rel="noopener noreferrer"&gt;anthropic.com/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full IT Pro AI engineering catalog: &lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro/courses&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>AI for Influencers in 2026: How to Build a Content Engine That Runs Itself</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 19 May 2026 13:34:41 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</link>
      <guid>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</guid>
      <description>&lt;p&gt;The influencer economy is no longer about who posts the most. It's about who has built the smartest &lt;strong&gt;AI content system&lt;/strong&gt; behind the scenes.&lt;/p&gt;

&lt;p&gt;In 2026, the top 1% of creators aren't outworking everyone else. They're out-engineering them. They've turned what used to be a 60-hour-a-week grind into a streamlined pipeline where AI handles 80% of the production work — and they keep 100% of the creative direction.&lt;/p&gt;

&lt;p&gt;Over the past two years, working with hundreds of creators and educators through &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — Eastern Europe's leading AI education platform — I've watched this shift happen in real time. The patterns are consistent, the playbook is replicable, and the gap between those who adopt it and those who don't is widening every month.&lt;/p&gt;

&lt;p&gt;This article breaks down exactly how it works, what tools they use, and how you can build the same stack — whether you're an influencer who codes, or a developer building tools for creators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Changed the Influencer Game (Permanently)
&lt;/h2&gt;

&lt;p&gt;Three years ago, an influencer's competitive advantage was personality plus consistency. Today, that's table stakes.&lt;/p&gt;

&lt;p&gt;The real moat now is &lt;strong&gt;operational leverage&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How fast can you identify a trending topic?&lt;/li&gt;
&lt;li&gt;How quickly can you produce content across 5+ formats?&lt;/li&gt;
&lt;li&gt;How precisely can you target each piece to its platform?&lt;/li&gt;
&lt;li&gt;How much of this can run without your direct involvement?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators who answered "all of it, mostly automated" are the ones scaling past 1M followers, 7-figure revenues, and 50+ pieces of content per week — solo or with tiny teams.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. It's already happening. The question is whether you're building the system or watching others build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-Layer AI Stack for Modern Influencers
&lt;/h2&gt;

&lt;p&gt;Every high-output creator I've analyzed runs some version of this five-layer architecture. The tools change. The structure doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Intelligence (Research &amp;amp; Trend Detection)
&lt;/h3&gt;

&lt;p&gt;Before you create, you need to know what to create.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors trending topics, keywords, and conversations in your niche&lt;/li&gt;
&lt;li&gt;Analyzes competitor content performance&lt;/li&gt;
&lt;li&gt;Identifies content gaps and opportunities&lt;/li&gt;
&lt;li&gt;Surfaces audience questions before they become saturated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools and APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity API&lt;/strong&gt; — for real-time research with citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa AI&lt;/strong&gt; — semantic search for niche topics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Trends API&lt;/strong&gt; + &lt;strong&gt;YouTube Data API&lt;/strong&gt; — for trend signals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit API&lt;/strong&gt; + &lt;strong&gt;Twitter/X API&lt;/strong&gt; — for audience listening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BuzzSumo&lt;/strong&gt; or &lt;strong&gt;SparkToro&lt;/strong&gt; — for content gap analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Don't just track what's popular. Track what's &lt;em&gt;about to&lt;/em&gt; become popular by monitoring signal velocity (rate of change), not absolute volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Ideation (Concept &amp;amp; Angle Generation)
&lt;/h3&gt;

&lt;p&gt;This is where most creators waste the most time — staring at a blank page deciding what to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI does well here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates 30+ angle variations from a single topic&lt;/li&gt;
&lt;li&gt;Adapts ideas to your specific voice and audience&lt;/li&gt;
&lt;li&gt;Identifies counterintuitive takes that drive engagement&lt;/li&gt;
&lt;li&gt;Maps ideas to platform-specific formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommended approach:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a custom GPT or Claude project trained on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your past top-performing content (with metrics)&lt;/li&gt;
&lt;li&gt;Your audience persona and voice guidelines&lt;/li&gt;
&lt;li&gt;Your content pillars and forbidden topics&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If you've never structured a voice profile before, this is one of the highest-leverage skills you can develop. We dedicate an entire module to it inside &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI for Content Creators track on Cursuri-AI.ro&lt;/a&gt; — including the exact prompts and templates we use internally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then prompt it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a content strategist for an influencer with this profile:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Generate angles that are specific, counterintuitive, and aligned with their voice.
        Avoid generic takes. Each angle should be testable as a hook.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Give me &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; distinct angles for content about: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="n"&gt;angles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;building a personal brand in 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, data-driven, contrarian, B2B-focused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angles&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this single function call can fuel a month of content. Cost: ~$0.15.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Production (Multi-Format Content Generation)
&lt;/h3&gt;

&lt;p&gt;This is the heaviest-lifting layer — and where AI compounds value most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The repurposing principle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One "pillar" piece (a long-form video, podcast, or article) should generate 10–15 derivative pieces with minimal manual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample workflow for a 30-minute podcast episode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transcription&lt;/strong&gt; → Whisper API or AssemblyAI ($0.36 for 30 min)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-form blog post&lt;/strong&gt; → Claude/GPT generates structured article from transcript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn carousel&lt;/strong&gt; → 8–10 slide deck with key insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Twitter/X thread&lt;/strong&gt; → 10-tweet thread with the strongest takes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-form clips&lt;/strong&gt; → Opus Clip or Riverside AI extracts viral moments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Newsletter&lt;/strong&gt; → Personalized summary with commentary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Shorts&lt;/strong&gt; → Auto-captioned vertical clips&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quote graphics&lt;/strong&gt; → Designed via Canva API or Bannerbear&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instagram Reels&lt;/strong&gt; → Repurposed clips with platform-native captions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SEO blog series&lt;/strong&gt; → 3–5 articles targeting specific search queries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total human time: 1–2 hours of review and approval, instead of 30+ hours of production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Distribution (Platform-Native Publishing)
&lt;/h3&gt;

&lt;p&gt;Most creators lose performance here by posting the same content identically across platforms. AI fixes this by adapting each piece to the platform's native expectations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive distribution looks like:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn → Professional tone, longer-form, hook in first 2 lines&lt;/li&gt;
&lt;li&gt;Twitter/X → Punchy, opinionated, thread-friendly&lt;/li&gt;
&lt;li&gt;Instagram → Visual-first, emotion-driven captions&lt;/li&gt;
&lt;li&gt;TikTok → Hook in 1 second, vertical, trend-aware&lt;/li&gt;
&lt;li&gt;YouTube → SEO-optimized titles, timestamps, structured descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Buffer&lt;/strong&gt;, &lt;strong&gt;Hypefury&lt;/strong&gt;, or &lt;strong&gt;Typefully&lt;/strong&gt; — scheduling with AI optimization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make&lt;/strong&gt; or &lt;strong&gt;n8n&lt;/strong&gt; — custom automation workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postiz&lt;/strong&gt; (open source) — self-hosted social scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 5: Optimization (Performance Feedback Loop)
&lt;/h3&gt;

&lt;p&gt;This is the layer most creators skip — and it's the one that compounds the hardest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to track:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hook performance (which first lines drive scroll-stops?)&lt;/li&gt;
&lt;li&gt;Format performance (which content types convert best per platform?)&lt;/li&gt;
&lt;li&gt;Topic performance (which themes consistently win?)&lt;/li&gt;
&lt;li&gt;Audience signals (which content brings in your ICP vs. tourists?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How AI helps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes patterns across hundreds of posts in seconds&lt;/li&gt;
&lt;li&gt;Identifies non-obvious performance correlations&lt;/li&gt;
&lt;li&gt;Suggests next-week content based on last week's winners&lt;/li&gt;
&lt;li&gt;Drafts variations of top performers for retesting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build a simple dashboard that ingests your analytics from each platform and feeds it back to your ideation layer. This closes the loop — every post makes the next one smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Minimal Working Example: Content Repurposing Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's a stripped-down Python pipeline that takes a transcript and produces three platform-adapted outputs. Useful as a starting point you can extend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate LinkedIn post, Twitter thread, and newsletter from a transcript.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an expert content strategist. The creator&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s voice is: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    From the transcript below, produce THREE outputs in JSON:
    1. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 200-word LinkedIn post with strong hook
    2. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 8-tweet thread (array of strings, max 280 chars each)
    3. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 400-word personal newsletter section

    Each must feel platform-native, not copy-pasted.

    Transcript:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Return only valid JSON.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sample_transcript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;[Your podcast/video transcript here]&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;voice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, contrarian, B2B-focused, data-driven&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== LINKEDIN ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== TWITTER THREAD ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== NEWSLETTER ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extend this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whisper for audio-to-text input&lt;/li&gt;
&lt;li&gt;A queue system (Redis + Celery) for batch processing&lt;/li&gt;
&lt;li&gt;A simple Streamlit UI for non-technical creator team members&lt;/li&gt;
&lt;li&gt;Webhook integration with Buffer or Typefully for direct publishing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 5 Mistakes That Kill AI Content Pipelines
&lt;/h2&gt;

&lt;p&gt;I've audited dozens of creator AI workflows. The same mistakes appear over and over.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Treating AI as a Writer Instead of a Drafter
&lt;/h3&gt;

&lt;p&gt;AI-generated text published without human editing is detectable, generic, and erodes trust. Use AI for the first 80%, but always edit the final 20% — that's where your voice lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Skipping the Voice Calibration Step
&lt;/h3&gt;

&lt;p&gt;Without a documented voice profile (tone, vocabulary, forbidden phrases, examples), every output regresses to the mean. Spend 4 hours documenting your voice once. It pays back for years. If you want a structured framework for this, we walk through the full process in &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI workflow courses&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Building Without Measurement
&lt;/h3&gt;

&lt;p&gt;Pipelines without analytics are vibes-based content factories. If you can't tell which output formats win, you're optimizing blind.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Over-Automating Distribution
&lt;/h3&gt;

&lt;p&gt;Full automation of posting (no human in the loop) is how creators end up with embarrassing posts going live during global news events. Keep a 1-click approval step at minimum.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Choosing Tools Over Architecture
&lt;/h3&gt;

&lt;p&gt;The creators who win don't have the best tools. They have the clearest workflow. Tools change every quarter. Architecture compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming Next (2026–2027)
&lt;/h2&gt;

&lt;p&gt;A few signals worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personalized AI clones&lt;/strong&gt; — creators training models on their voice/likeness to scale 1:1 audience interaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal generation at scale&lt;/strong&gt; — single prompts producing full video, audio, and graphics in one pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-native platforms&lt;/strong&gt; — new social networks built around AI-generated content as a first-class citizen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-driven content ops&lt;/strong&gt; — autonomous agents that research, produce, schedule, and optimize with minimal human input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators preparing for this now — by building modular, API-driven systems — will be the ones operating at unprecedented scale by 2027.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: AI for Influencers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Do I need to code to use AI as an influencer?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Many top creators use no-code tools (Zapier, Make, ChatGPT, Claude Projects). But knowing even basic Python unlocks 10x more customization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Will AI-generated content hurt my reach?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Only if it sounds generic. Platforms penalize low-effort content, not AI assistance. Original voice + AI scaffolding consistently outperforms 100% human or 100% AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How much should I budget for AI tools?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A solo creator can build a complete stack for $50–150/month. Larger operations run $500–2000/month. ROI is usually measured in weeks, not months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is this ethical? Should I disclose AI usage?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Be transparent about &lt;em&gt;what&lt;/em&gt; AI does in your workflow (research, drafting, editing), but you don't need to flag every AI-touched word. The standard: would your audience feel deceived if they saw your process? If no, you're fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Which AI model should I use as a creator?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For creative content: Claude tends to lead. For research with citations: Perplexity. For images: Midjourney or Flux. For video: Runway or Sora. Test all of them — they each have strengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Build the System, Not the Output
&lt;/h2&gt;

&lt;p&gt;The influencer economy is splitting into two clear tiers.&lt;/p&gt;

&lt;p&gt;The first tier still manually crafts every piece of content. They post when they have time. They burn out. They plateau.&lt;/p&gt;

&lt;p&gt;The second tier has built systems. AI handles the heavy lifting. They post consistently across every platform. Their content compounds because their architecture compounds.&lt;/p&gt;

&lt;p&gt;The gap between these two tiers is widening every month. And by 2027, it will be unbridgeable for those who waited too long to start.&lt;/p&gt;

&lt;p&gt;The good news: building your AI content engine doesn't require a team or a six-figure budget. It requires clear thinking, a few APIs, and the willingness to treat content like the engineering problem it actually is.&lt;/p&gt;

&lt;p&gt;Start with one layer. Make it work. Add the next.&lt;/p&gt;

&lt;p&gt;That's how the top 1% built it. And it's how you build it too.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want to Go Deeper?
&lt;/h2&gt;

&lt;p&gt;If this resonated and you want a structured path instead of piecing it together from scattered blog posts and YouTube videos:&lt;/p&gt;

&lt;p&gt;🎓 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — Our complete AI education platform covers the entire creator stack: prompting, automation, content pipelines, AI workflows for business, and how to build production-grade AI systems. Interactive courses with an AI tutor that adapts to how you learn — not passive video watching.&lt;/p&gt;

&lt;p&gt;Whether you're a creator looking to scale, a developer building tools for the creator economy, or a business owner figuring out how to integrate AI into your operations — &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;start here&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm the founder of &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, where I help thousands of creators, professionals, and businesses build with AI. I write about AI workflows, content automation, and the engineering side of the creator economy.&lt;/p&gt;

&lt;p&gt;If this article helped, drop a reaction and follow for more deep dives. &lt;strong&gt;What layer of your content stack are you working on right now?&lt;/strong&gt; Let me know in the comments — I read every one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contentcreation</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>7 Production Patterns for AI Agents That Don't Break in 2026</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 13 May 2026 11:38:37 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</link>
      <guid>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</guid>
      <description>&lt;p&gt;A demo agent that loops three times, calls one tool, and returns "Hello, I helped you" is easy. A production agent that handles 10k requests a day across paying customers, without lighting your API bill on fire or hallucinating tool arguments at 3am, is a different animal.&lt;/p&gt;

&lt;p&gt;I've shipped AI agents in production for the last 18 months — search, content generation, support triage, document analysis. The same seven patterns keep showing up in every codebase that &lt;em&gt;actually&lt;/em&gt; works. None of them are exotic. Most of them are boring. That's the point: production agents are boring on purpose.&lt;/p&gt;

&lt;p&gt;Here are the patterns, with Python examples you can drop into your own loop today.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Tool Result Validator
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; LLMs hallucinate tool arguments. They will confidently call &lt;code&gt;send_email(to="user@example.com", subject="Refund", body="...")&lt;/code&gt; when the user never asked for an email. They will pass &lt;code&gt;user_id="123abc"&lt;/code&gt; to a function that requires an integer. They will invent product SKUs that don't exist.&lt;/p&gt;

&lt;p&gt;If your tool layer trusts the model's output, every hallucination becomes a production incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Validate tool arguments at the &lt;em&gt;tool boundary&lt;/em&gt;, not inside the tool. Reject early with a structured error the model can recover from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SendEmailArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_SCHEMAS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid_arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool call rejected. Fix these fields: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_confirmation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Always return the validation error &lt;em&gt;back to the model&lt;/em&gt; as a tool result. Don't raise it. The agent can usually self-correct in the next turn — but only if it sees the error.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bounded Memory
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Naive agent loops accumulate every tool call, every observation, every reasoning step into the conversation history. After 15 turns, you're sending 80k tokens per request. Your latency doubles. Your cost goes up 10x. The model starts losing track of what it was doing because the relevant context is buried under five tool dumps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat conversation history as a finite resource. Compress aggressively, summarize old turns, and keep tool outputs out of the main thread when you can.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Keep system message + last 4 turns verbatim
&lt;/span&gt;        &lt;span class="n"&gt;keep_recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;to_summarize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_with_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;earlier_context&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/earlier_context&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;keep_recent&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't summarize tool &lt;em&gt;call&lt;/em&gt; messages — the model needs the exact arguments to chain reasoning. Summarize only the &lt;em&gt;observations&lt;/em&gt;, and only when they're old enough that detail no longer matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Observable Loop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent is in production. A user complains it gave them garbage. You have... a final string output and a vague memory of what the loop does. Good luck debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Emit a structured event for every state transition in the loop. Every model call, every tool call, every retry, every error. Ship them to whatever observability stack you already use (Datadog, Honeycomb, OpenTelemetry, even just structured JSON to stdout).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;

&lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;span_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                  &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max turns exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Include a stable &lt;code&gt;run_id&lt;/code&gt; on &lt;em&gt;every&lt;/em&gt; event. When a customer reports an issue, you want one query that returns the entire trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent depends on three external services and a vector store. One of them is having a bad day. Your agent now returns a 500 to the user, even though for &lt;em&gt;this particular query&lt;/em&gt; the broken dependency wasn't actually needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Wrap dependencies in fallback chains. If the primary fails, the agent should know that capability is degraded — not crash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;impl&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;impl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
                &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.fallback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;degraded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is unavailable. Try a different approach.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial bit is the &lt;code&gt;degraded&lt;/code&gt; response — it goes back to the model as a tool result, and a well-prompted agent will re-plan. Maybe it tries a different tool. Maybe it tells the user "I can't check live inventory right now, but here's what I know." Either is better than a 500.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Surface the degraded status in your prompt. A line like &lt;em&gt;"If a tool returns status=degraded, do not retry it. Acknowledge the limitation in your final response."&lt;/em&gt; prevents the model from looping on a dead service.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Cost Circuit Breaker
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; A bug or an adversarial input puts your agent in a tool-calling loop. By the time you notice, you've spent $400 in 20 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Track cumulative cost per run and per session. Hard-stop when limits are exceeded. This is not optional in production — it's the difference between a bad day and a layoff conversation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.00&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run exceeded $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spent_today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_today&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; exceeded daily budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Different limits for different surfaces. An internal batch job can have a $5 ceiling per run. A free-tier chat user gets $0.10. A paying enterprise customer gets $2. Hardcoding one number is a footgun.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Deterministic Critic
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; "LLM-as-a-judge" sounds clever, but using a model to grade itself is unreliable and slow. Two model calls per output, both hallucinate, both cost money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; For checks you can express as code, &lt;em&gt;use code&lt;/em&gt;. Reserve LLM grading for genuinely subjective dimensions, and only after the deterministic checks pass.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_cite_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\[\d+\]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;too_long&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;BANNED_PHRASES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;banned_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_keywords:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;llm_grade&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the critic rejects, feed the issues back to the agent as a "revise this" instruction. After two rejections, return whatever you have with a flag — infinite revision loops are their own bug class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't make the critic too strict. If your accept rate is below 70%, your prompt is broken, not your output.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Stateless Replay (Idempotency)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent half-completed a task — it sent the email, then crashed before logging the result. The user retries. Now they get two emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat every external side-effect as idempotent by design. Use deterministic IDs derived from the input, dedupe at the tool layer, and make agent runs &lt;em&gt;replayable&lt;/em&gt; from any saved checkpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;canonical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if the agent retries the same step within the run, it gets the cached result. If you persist the cache across runs (with a longer TTL), you get cross-run idempotency too — which is what you want for anything that costs money or sends messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Be careful what you put in the idempotency key. Timestamps, request IDs, or random nonces in the args will defeat it. Strip them before hashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It Together
&lt;/h2&gt;

&lt;p&gt;A production agent loop using all seven patterns is roughly 200 lines of Python. Not glamorous, but it survives. Here's the skeleton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_production&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;task_context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revise: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task incomplete after max turns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the loop. Drop in your favorite model API (Claude, GPT, open source — patterns work the same), wire up your tools with the validator from pattern 1, and you have something that won't embarrass you in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Read Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic's "Building effective agents" guide&lt;/a&gt; — the canonical reference on when to use agents vs simple workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;OpenAI's Agents SDK docs&lt;/a&gt; — clean reference implementation of multi-agent handoffs.&lt;/li&gt;
&lt;li&gt;For Romanian-speaking developers building agents in production, the &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;AI Agents course on Cursuri-AI.ro&lt;/a&gt; goes deeper on these patterns with hands-on exercises.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've shipped agents in production, what patterns did I miss? Drop them in the comments — I'll add the best ones to a follow-up post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by a developer who has paged themselves at 3am because an agent went into a tool-calling loop. Don't be that developer. Use the circuit breaker.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
