<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: galian</title>
    <description>The latest articles on DEV Community by galian (@galian).</description>
    <link>https://dev.to/galian</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3827330%2F5a53ab61-2fc1-4072-a44e-873913dd8cd7.png</url>
      <title>DEV Community: galian</title>
      <link>https://dev.to/galian</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/galian"/>
    <language>en</language>
    <item>
      <title>Claude Fable 5: A Developer's Guide to Anthropic's New Top</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:22:18 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</link>
      <guid>https://dev.to/cursuri-ai/claude-fable-5-a-developers-guide-to-anthropics-new-top-240m</guid>
      <description>&lt;p&gt;Anthropic just moved the ceiling again. &lt;strong&gt;Claude Fable 5&lt;/strong&gt; is the company's most powerful, most intelligent model to date — and it isn't "Opus 4.9." It's a &lt;strong&gt;new tier that sits above the entire Opus family&lt;/strong&gt;. If you build with LLMs, that distinction matters: it changes how you think about model routing, cost, and which tasks deserve your most capable (and most expensive) reasoning.&lt;/p&gt;

&lt;p&gt;This is a practical, no-hype guide for developers. We'll cover what Claude Fable 5 actually is, how it slots into Anthropic's 2026 lineup, what changes in the API surface, when the premium is justified, and how to migrate existing code. Everything here is grounded in Anthropic's own model and API documentation — no invented benchmarks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Claude Fable 5?
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 is Anthropic's flagship reasoning model, exposed through the API as &lt;code&gt;claude-fable-5&lt;/code&gt;. The headline facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A new tier above Opus.&lt;/strong&gt; Until now, "Opus" was the top of the Claude lineup. Fable 5 establishes a level above it — positioned for the hardest reasoning, planning, and long-horizon agentic work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1M-token context window&lt;/strong&gt;, with up to &lt;strong&gt;128K tokens of output&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium pricing&lt;/strong&gt;: roughly &lt;strong&gt;$10 / $50 per million input / output tokens&lt;/strong&gt; — about double Opus 4.8's $5 / $25. That price tag is the whole point: Fable 5 is a precision tool you point at the problems that justify it, not a default for every call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive thinking only.&lt;/strong&gt; The fixed "thinking budget" knob is gone. The model decides how much to reason per request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental model to internalize: &lt;strong&gt;Fable 5 is the peak of a four-tier lineup, and capability scales with cost.&lt;/strong&gt; You don't run your whole pipeline on it any more than you'd render every frame of a film at maximum quality regardless of the shot. You route the hard parts to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Fable 5 Fits in the 2026 Anthropic Lineup
&lt;/h2&gt;

&lt;p&gt;Anthropic's current family is a ladder of capability-vs-cost. Picking the right rung per task is one of the highest-leverage habits an AI engineer can build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Reach for it when…&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Absolute peak capability; premium price&lt;/td&gt;
&lt;td&gt;The hardest reasoning, planning, cross-cutting refactors, and long-running agent loops where correctness outweighs cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top of the Opus family; a strong default in Claude Code&lt;/td&gt;
&lt;td&gt;Complex day-to-day work — planning, large refactors, tricky debugging — with a better capability/cost ratio than Fable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balanced, fast, 1M context&lt;/td&gt;
&lt;td&gt;The bulk of everyday coding, reading, and iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Haiku 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Light, fast, cheap&lt;/td&gt;
&lt;td&gt;High-volume small operations, classification, auxiliary steps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;model choice is a cost-and-quality lever.&lt;/strong&gt; A well-designed system routes each sub-task to the cheapest model that can do it well, and escalates to Fable 5 only where the payoff is real. If you want a structured, side-by-side breakdown of the 2026 models and how to choose between them, there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI model comparison course&lt;/a&gt; that goes deeper than any single table can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes in the API
&lt;/h2&gt;

&lt;p&gt;This is the part developers actually care about. Fable 5 shares the modern Claude request surface (the same one introduced with Opus 4.7/4.8), with a couple of sharp edges worth knowing before you ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adaptive thinking, not a token budget
&lt;/h3&gt;

&lt;p&gt;Fable 5 supports a single thinking mode: &lt;strong&gt;adaptive&lt;/strong&gt;. You no longer pass a fixed &lt;code&gt;budget_tokens&lt;/code&gt; value — the model regulates its own reasoning depth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;        &lt;span class="c1"&gt;# adaptive is the only thinking mode
&lt;/span&gt;    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;# strong default for coding/agentic work
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this module and add unit tests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things that will save you a debugging session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, or &lt;code&gt;budget_tokens&lt;/code&gt;.&lt;/strong&gt; They're removed on this generation and return &lt;code&gt;400&lt;/code&gt;. Steer behavior with prompting and the effort parameter instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't send &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt; on Fable 5.&lt;/strong&gt; Unlike Opus 4.8/4.7, an explicit &lt;code&gt;disabled&lt;/code&gt; returns &lt;code&gt;400&lt;/code&gt; here. To run without thinking, &lt;strong&gt;omit the &lt;code&gt;thinking&lt;/code&gt; parameter entirely&lt;/strong&gt;. This is the one genuinely new breaking change relative to the Opus 4.x line — easy to miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking text is omitted by default.&lt;/strong&gt; Thinking blocks still stream, but their content is empty unless you opt in with &lt;code&gt;thinking={"type": "adaptive", "display": "summarized"}&lt;/code&gt;. If your UI shows reasoning progress, set this or your users will see a long pause before output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The effort parameter is your real control knob
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;output_config.effort&lt;/code&gt; accepts &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;, and &lt;code&gt;max&lt;/code&gt;. It controls how much the model thinks &lt;em&gt;and&lt;/em&gt; acts — not just thinking depth. For coding and agentic workloads, &lt;strong&gt;&lt;code&gt;xhigh&lt;/code&gt; is the sweet spot&lt;/strong&gt; and is the effort level Claude Code defaults to. Treat effort as something to tune per route: &lt;code&gt;max&lt;/code&gt; for correctness-critical work, &lt;code&gt;medium&lt;/code&gt;/&lt;code&gt;low&lt;/code&gt; for latency-sensitive or simple steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large outputs need streaming
&lt;/h3&gt;

&lt;p&gt;With up to 128K output tokens available, non-streaming requests will hit SDK HTTP timeouts well before that ceiling. For anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;, stream and collect the final message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-fable-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;output_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xhigh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate the full migration plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_final_message&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What it still supports
&lt;/h3&gt;

&lt;p&gt;Fable 5 keeps the modern toolbox: &lt;strong&gt;structured outputs&lt;/strong&gt; (&lt;code&gt;output_config.format&lt;/code&gt;), &lt;strong&gt;prompt caching&lt;/strong&gt; (minimum cacheable prefix ~2,048 tokens), &lt;strong&gt;server-side compaction&lt;/strong&gt; for very long conversations, &lt;strong&gt;web search with dynamic filtering&lt;/strong&gt;, and &lt;strong&gt;task budgets&lt;/strong&gt; (beta) for telling an agent how many tokens it has for a full loop. If you're wiring these into a real application, the patterns matter as much as the model — that's the focus of this hands-on course on &lt;a href="https://cursuri-ai.ro/courses/construire-aplicatii-ai-python-sdk" rel="noopener noreferrer"&gt;building AI apps with the Anthropic and OpenAI SDKs&lt;/a&gt;, which walks from raw API calls to a production-shaped product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fable 5 for Agentic Coding
&lt;/h2&gt;

&lt;p&gt;The reason Fable 5 is interesting to developers specifically is long-horizon agentic execution: multi-file refactors, overnight runs, and tasks that span dozens of tool calls without a human correcting course.&lt;/p&gt;

&lt;p&gt;Three habits get the most out of it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Give the full task spec up front in one well-formed turn.&lt;/strong&gt; Fable 5 plans better when it has the complete goal early; drip-feeding requirements across many turns tends to cost more tokens and sometimes performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run at high or &lt;code&gt;xhigh&lt;/code&gt; effort with generous &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/strong&gt; Long-horizon coherence comes partly from the model reasoning more at each step — give it room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route deliberately.&lt;/strong&gt; Use Fable 5 for the planning and the genuinely hard edits; delegate mechanical or high-volume sub-steps to Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If terminal-first agentic coding is your world, the workflow discipline — &lt;code&gt;CLAUDE.md&lt;/code&gt; project memory, plan/edit/review loops, hooks as deterministic guardrails, and model routing across the lineup — is exactly what a dedicated &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code mastery course&lt;/a&gt; covers end to end. Agent architecture beyond a single tool (orchestration, delegation, parallelism) is its own discipline, well covered in this &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;course on designing autonomous AI agents&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context is a resource, even at 1M tokens
&lt;/h3&gt;

&lt;p&gt;A 1M-token window is not a license to dump everything into context. Irrelevant context dilutes the model's attention and costs tokens on every turn, no matter how capable the model is. The skill that separates engineers who "get lucky" with agents from those who ship reliable ones is deliberate &lt;strong&gt;context engineering&lt;/strong&gt; — what to load, what to compact, what to persist as memory across sessions. It's enough of a topic to warrant &lt;a href="https://cursuri-ai.ro/courses/context-engineering-memorie-agenti" rel="noopener noreferrer"&gt;its own course on context engineering and memory for agents&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Fable 5 Is Actually Worth the Premium
&lt;/h2&gt;

&lt;p&gt;Here's the honest cost reasoning, because "use the best model" is bad engineering advice.&lt;/p&gt;

&lt;p&gt;At roughly &lt;strong&gt;double the per-token cost of Opus 4.8&lt;/strong&gt;, Fable 5 pays off when the &lt;em&gt;cost of a wrong answer&lt;/em&gt; is high relative to the token bill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Worth it:&lt;/strong&gt; a complex cross-service refactor where a subtle regression costs hours of human review; a planning step that determines the trajectory of a long agent run; an analysis where correctness is non-negotiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not worth it:&lt;/strong&gt; routine edits, summaries, classifications, and the long tail of mechanical sub-tasks — those belong on Sonnet 4.6 or Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful rule of thumb: let &lt;strong&gt;Fable 5 plan and decide&lt;/strong&gt;, and let cheaper models &lt;strong&gt;execute&lt;/strong&gt; the parts that are already well-specified. That keeps your bill proportional to difficulty instead of flat-out maximal.&lt;/p&gt;

&lt;p&gt;The other lever is effort. Because effort matters more on this generation than on any prior Opus, a Fable 5 call at &lt;code&gt;medium&lt;/code&gt; effort can be both cheaper and faster than an Opus 4.8 call at &lt;code&gt;xhigh&lt;/code&gt; for some tasks — so benchmark on your own workload rather than assuming "bigger model = always slower and pricier in practice."&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from Opus 4.8 / 4.7
&lt;/h2&gt;

&lt;p&gt;If you're already on the modern Claude surface, moving to Fable 5 is mostly a model-ID swap plus a couple of checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Swap the model string&lt;/strong&gt; to &lt;code&gt;claude-fable-5&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove &lt;code&gt;budget_tokens&lt;/code&gt;&lt;/strong&gt; if any remain → use &lt;code&gt;thinking={"type": "adaptive"}&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strip &lt;code&gt;temperature&lt;/code&gt; / &lt;code&gt;top_p&lt;/code&gt; / &lt;code&gt;top_k&lt;/code&gt;&lt;/strong&gt; — they &lt;code&gt;400&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replace last-assistant-turn prefills&lt;/strong&gt; with structured outputs (&lt;code&gt;output_config.format&lt;/code&gt;) or a system-prompt instruction — prefills &lt;code&gt;400&lt;/code&gt; on this generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit for &lt;code&gt;thinking={"type": "disabled"}&lt;/code&gt;&lt;/strong&gt; — it &lt;code&gt;400&lt;/code&gt;s on Fable 5. Omit &lt;code&gt;thinking&lt;/code&gt; instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-tune &lt;code&gt;effort&lt;/code&gt; per route&lt;/strong&gt; — start at &lt;code&gt;high&lt;/code&gt;, use &lt;code&gt;xhigh&lt;/code&gt; for coding/agentic, reserve &lt;code&gt;max&lt;/code&gt; for correctness-critical work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;display: "summarized"&lt;/code&gt;&lt;/strong&gt; if you surface reasoning in a UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Steering this generation is done through prompting and effort rather than sampling parameters, so the quality of your instructions matters more than ever. If your prompts were tuned years ago for older models, they're probably leaving capability on the table — a structured refresh of &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;prompt engineering fundamentals&lt;/a&gt; tends to pay for itself quickly on a model this capable.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on Hype vs. Reality
&lt;/h2&gt;

&lt;p&gt;Two guardrails worth keeping as the launch noise settles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fable 5 is the most capable model — not necessarily the default everywhere.&lt;/strong&gt; In Claude Code, for instance, Opus 4.8 remains a strong default; Fable 5 is the tier you select for the hardest work. "Most capable" and "default" are different claims.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version hygiene matters.&lt;/strong&gt; Fable 5 is the current peak, Opus 4.8 is the top of the Opus family, and Opus 4.7 is the previous Opus generation. Anything from the Claude 3.x line (or GPT-4-class / Gemini 2.x models) is outdated and shouldn't be treated as current when you're evaluating tutorials or benchmarks. Always confirm model IDs, limits, and pricing against the official docs, since they shift between releases.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  TL;DR Cheat Sheet
&lt;/h2&gt;

&lt;p&gt;For quick reference when you wire Claude Fable 5 into a real codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model ID:&lt;/strong&gt; &lt;code&gt;claude-fable-5&lt;/code&gt;. Context window 1M tokens, output up to 128K.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking:&lt;/strong&gt; &lt;code&gt;{"type": "adaptive"}&lt;/code&gt; is the only mode. To run without it, &lt;strong&gt;omit the parameter&lt;/strong&gt; — never send &lt;code&gt;{"type": "disabled"}&lt;/code&gt; (it returns &lt;code&gt;400&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effort:&lt;/strong&gt; &lt;code&gt;output_config.effort&lt;/code&gt; is your main control — &lt;code&gt;xhigh&lt;/code&gt; for coding and agents, &lt;code&gt;max&lt;/code&gt; when correctness is critical, &lt;code&gt;low&lt;/code&gt;/&lt;code&gt;medium&lt;/code&gt; for simple or latency-sensitive steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Removed (all &lt;code&gt;400&lt;/code&gt; if sent):&lt;/strong&gt; &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, &lt;code&gt;budget_tokens&lt;/code&gt;, and last-assistant-turn prefills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning in your UI:&lt;/strong&gt; add &lt;code&gt;"display": "summarized"&lt;/code&gt; to the thinking config, or the thinking text comes back empty.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large outputs:&lt;/strong&gt; stream anything above ~16K &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing:&lt;/strong&gt; send the hard reasoning to Fable 5; keep routine and high-volume work on Sonnet 4.6 and Haiku 4.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt; isn't just a bigger Opus — it's a new top tier that reframes how you should think about model routing in 2026. The winning pattern is the same as it's always been, just sharper: use the most capable model where correctness compounds, push everything else down the ladder to cheaper models, and tune effort per route. Master that, and Fable 5 becomes a precision instrument rather than a line item that surprises you on the invoice.&lt;/p&gt;

&lt;p&gt;If you want to go from "I read about it" to "I ship with it," the courses linked throughout are part of &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, a Romanian AI-learning platform with deep, hands-on tracks on Claude Code, agent architecture, the Anthropic SDK, context engineering, and model selection — all kept current with the 2026 lineup, Fable 5 included.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? Save it, and drop your Fable 5 routing strategy in the comments — what are you sending to the top tier, and what stays on Sonnet?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Claude Code Workflow: Best Practices That Ship Code"</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 08 Jun 2026 08:51:17 +0000</pubDate>
      <link>https://dev.to/galian/claude-code-workflow-best-practices-that-ship-code-na</link>
      <guid>https://dev.to/galian/claude-code-workflow-best-practices-that-ship-code-na</guid>
      <description>&lt;p&gt;Most posts about Claude Code stop at "install it and say hi." This guide goes further. A reliable &lt;strong&gt;Claude Code workflow&lt;/strong&gt; comes down to a handful of habits that actually ship code: a lean &lt;code&gt;CLAUDE.md&lt;/code&gt;, plan mode before any edit, subagents for noisy research, parallel agents in git worktrees, hooks as guardrails, and a verification loop that kills hallucinations. These are the Claude Code best practices worth adopting in 2026 — the opinionated, hands-on version, no fluff.&lt;/p&gt;

&lt;p&gt;Quick grounding for anyone new: &lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; is Anthropic's official agentic coding tool — an AI coding assistant that lives in your terminal. It reads your codebase, edits files, runs commands, and integrates with your dev tools through natural language. It runs in the terminal, in VS Code/Cursor and JetBrains, in a desktop app, on the web at &lt;code&gt;claude.ai/code&lt;/code&gt;, and in the Claude iOS app — all sharing the same engine, so your &lt;code&gt;CLAUDE.md&lt;/code&gt;, settings, and MCP servers travel with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most Claude Code workflow advice stops at "install and say hi"
&lt;/h2&gt;

&lt;p&gt;The beginner tutorials get you to a prompt and stop. The "Claude Code vs Cursor" posts argue about UI and never touch cost or parallelism. What nobody connects is the &lt;em&gt;layered setup&lt;/em&gt; — the thing that turns a clever autocomplete into a teammate you can delegate to. That layering (memory + skills + hooks + subagents + MCP) plus running several agents at once is where the real productivity lives, and it's exactly what gets fragmented across ten separate posts.&lt;/p&gt;

&lt;p&gt;For the structured version of everything below — terminal-first agentic coding, multi-file edits, git, MCP, headless CI/CD, subagents, and the security model — that's the spine of the &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code Mastery course on Cursuri-AI.ro&lt;/a&gt; (Romanian-language, so plan for that if English is your only language). A few relevant deep-dives are linked along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The right mental model: a teammate you delegate to, not autocomplete
&lt;/h2&gt;

&lt;p&gt;The single biggest mindset shift: stop typing instructions, start describing outcomes. Autocomplete predicts the next token. An agentic tool plans, executes with real dev tools, evaluates the result, and adjusts — a loop, not a guess. Treat it like autocomplete and you'll micromanage every line and get autocomplete-level value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Describe outcomes, not keystrokes — and let it interview you first
&lt;/h3&gt;

&lt;p&gt;Instead of "open &lt;code&gt;auth.ts&lt;/code&gt;, add a function &lt;code&gt;validateToken&lt;/code&gt;," describe the goal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add JWT validation to the login flow. Follow our existing error-handling pattern. Write tests and run them."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then — the part people skip — let it ask questions first. For anything non-trivial, add "ask me anything unclear before you start." A good agent will surface the ambiguity (which token library? refresh tokens?) before writing 200 lines down the wrong road. Treating the prompt like a brief to a competent junior, not a command to a compiler, is genuinely a prompting skill — the same one taught in the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;, and it transfers directly to writing a good &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CLAUDE.md that actually gets followed
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is persistent, per-project (or per-user) memory loaded every session. It's the highest-leverage file in the repo, and the most commonly abused.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep it under ~60 lines (and why long files get half-ignored)
&lt;/h3&gt;

&lt;p&gt;The temptation is to dump your entire style guide in there. Don't. A 400-line &lt;code&gt;CLAUDE.md&lt;/code&gt; competes with your actual prompt for attention and gets partially ignored. Keep it under ~60 lines as a rule of thumb — not an official limit, but it holds up. Stable rules only: conventions, the "definition of done," and the things that must never be touched.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use pnpm, not npm.
&lt;span class="p"&gt;-&lt;/span&gt; TypeScript strict mode. No &lt;span class="sb"&gt;`any`&lt;/span&gt;.

&lt;span class="gu"&gt;## Definition of done&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Tests pass (&lt;span class="sb"&gt;`pnpm test`&lt;/span&gt;) before you say it's done.
&lt;span class="p"&gt;-&lt;/span&gt; No new lint warnings.

&lt;span class="gu"&gt;## Never touch&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not edit files in &lt;span class="sb"&gt;`/generated`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Do not change the public API in &lt;span class="sb"&gt;`src/api/`&lt;/span&gt; without asking.

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; API style: see docs/api-style.md.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the shape of it: short, declarative, points to longer docs instead of inlining them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Let Claude update CLAUDE.md after every bug
&lt;/h3&gt;

&lt;p&gt;Here's the habit that compounds. After fixing a non-obvious bug, ask it to record the lesson: "add a one-line rule to &lt;code&gt;CLAUDE.md&lt;/code&gt; so this doesn't happen again." Over weeks, the file becomes a distilled record of the project's real gotchas — earned, not guessed. Recent versions also save learnings automatically, but curating the explicit rules by hand keeps the file tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plan mode before code: catch the wrong problem early
&lt;/h2&gt;

&lt;p&gt;Plan mode is a read-only permission mode: Claude investigates and proposes an approach without touching a single file. You cycle into it with &lt;strong&gt;Shift+Tab&lt;/strong&gt;, or start there with &lt;code&gt;--permission-mode plan&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start a session in plan mode — read-only until you approve&lt;/span&gt;
claude &lt;span class="nt"&gt;--permission-mode&lt;/span&gt; plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Approve the plan, not the diff — correcting a plan is far cheaper
&lt;/h3&gt;

&lt;p&gt;This is the one to drill into any junior. If the plan is wrong, you fix it in one sentence. If the &lt;em&gt;diff&lt;/em&gt; is wrong, you've already paid for 300 lines of edits across five files, and now you're untangling them. Reviewing intent before implementation is where plan mode earns its keep on anything bigger than a one-liner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ask for step one, review, then step two
&lt;/h3&gt;

&lt;p&gt;For large features, don't let it run the whole plan unattended. "Do step one, then stop and show me." Review, then "continue." It feels slower; it's faster, because you catch a wrong turn at step one instead of step six.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layered setup: skills, hooks, subagents, and MCP
&lt;/h2&gt;

&lt;p&gt;This is the part the comparison posts never assemble into one picture. Four mechanisms, each solving a different problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills as folders with a "gotchas" section
&lt;/h3&gt;

&lt;p&gt;Skills are markdown-based reusable workflows, invoked with &lt;code&gt;/&amp;lt;name&amp;gt;&lt;/code&gt; or auto-loaded when relevant. As of 2026 skills and slash commands are unified — a skill &lt;em&gt;is&lt;/em&gt; its slash command. Bundled ones include &lt;code&gt;/code-review&lt;/code&gt;, &lt;code&gt;/debug&lt;/code&gt;, and &lt;code&gt;/batch&lt;/code&gt;. Write your own for anything you do more than twice (&lt;code&gt;/deploy&lt;/code&gt;, &lt;code&gt;/review-pr&lt;/code&gt;), and give every skill a "gotchas" section listing the ways this task usually goes wrong. That section does more for reliability than the happy-path instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooks for safety and automation
&lt;/h3&gt;

&lt;p&gt;This is the crucial distinction: &lt;strong&gt;instructions in &lt;code&gt;CLAUDE.md&lt;/code&gt; and skills are requests, not guarantees.&lt;/strong&gt; If something &lt;em&gt;must&lt;/em&gt; happen, it belongs in a hook. Hooks fire on lifecycle events — &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;SessionStart&lt;/code&gt; — and can run a shell command, an HTTP request, a prompt, or a subagent. Common uses: auto-format on every file write, run a quick test pass after edits, and block obviously unsafe shell commands before they execute. Guardrails go in hooks; everything else is a polite suggestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagents for noisy research so your main context stays clean
&lt;/h3&gt;

&lt;p&gt;A subagent runs its own agentic loop in an isolated context window and returns only a summary (spawned via the Task tool). The value is context hygiene: when it needs to read fifteen files to answer "where is rate limiting enforced," a subagent does the digging and hands back a paragraph, instead of flooding the main session with fifteen files no one will look at again. Don't confuse these with &lt;em&gt;agent teams&lt;/em&gt; — that's an experimental, disabled-by-default feature where independent sessions message each other and share a task list. Different thing.&lt;/p&gt;

&lt;p&gt;If multi-agent orchestration patterns (ReAct, planners, delegation) are the real goal, the &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents architecture and automation course&lt;/a&gt; covers the theory that Claude Code's subagents put into practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP: connecting the issue tracker, DB, and browser
&lt;/h3&gt;

&lt;p&gt;MCP (Model Context Protocol) is an open standard for wiring Claude Code to external data and tools — Google Drive, Jira, Slack, databases, browsers. Configure servers with &lt;code&gt;claude mcp&lt;/code&gt; or &lt;code&gt;--mcp-config&lt;/code&gt;. This is what moves it from "edits files" to "operates your actual workflow": read the design doc in Drive, update the Jira ticket, query the staging DB to confirm a schema. MCP tool search is on by default so all those tools don't blow up your context cost. Pair an MCP connection with a skill that &lt;em&gt;teaches&lt;/em&gt; Claude how to use it, and the combination is far better than either alone.&lt;/p&gt;

&lt;p&gt;Worth understanding the protocol itself if you build internal tools — Claude Code is a first-class MCP &lt;em&gt;client&lt;/em&gt;, and there's a dedicated &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP (Model Context Protocol) course&lt;/a&gt; on building the servers it connects to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Claude Code parallel agents with git worktrees
&lt;/h2&gt;

&lt;p&gt;Git worktree isolation is native (&lt;code&gt;--worktree&lt;/code&gt; / &lt;code&gt;-w&lt;/code&gt;), and it's the single change that most increases throughput. The idea: each agent works in its own checkout of the repo, so two agents never fight over the same files.&lt;/p&gt;

&lt;h3&gt;
  
  
  One bugfix agent + one feature agent, zero file conflicts
&lt;/h3&gt;

&lt;p&gt;A typical setup: one agent grinding through a flaky-test fix, another building a small feature, each in its own worktree on its own branch. They can't step on each other because they're literally in different directories. You monitor both from the agent view.&lt;/p&gt;

&lt;h3&gt;
  
  
  The worktree commands and how to merge back
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Spin up an isolated worktree-backed session&lt;/span&gt;
claude &lt;span class="nt"&gt;--worktree&lt;/span&gt;

&lt;span class="c"&gt;# Or, manage git worktrees yourself and run an agent in each&lt;/span&gt;
git worktree add ../app-bugfix   bugfix/flaky-auth-test
git worktree add ../app-feature  feature/csv-export
&lt;span class="c"&gt;# then run `claude` inside each directory&lt;/span&gt;

&lt;span class="c"&gt;# Watch full sessions running in parallel&lt;/span&gt;
claude agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each lands on its own branch; review the diffs and merge back through normal PRs — same review bar as any human-authored branch, no exceptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many concurrent agents is actually sane
&lt;/h3&gt;

&lt;p&gt;Be honest about the bottleneck: it's &lt;em&gt;your&lt;/em&gt; review capacity, not the tool's. Two or three agents are supervisable. Push past that and you're rubber-stamping diffs you didn't really read, which defeats the point. The cap isn't the machine — it's how many parallel changes you can genuinely verify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Killing hallucinations with an evidence-based verification loop
&lt;/h2&gt;

&lt;p&gt;Agentic tools confidently invent APIs, routes, config keys, and permissions. The fix isn't "be careful," it's a repeatable loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Show me the test output," not "it works"
&lt;/h3&gt;

&lt;p&gt;Never accept "it works." Accept the command it ran and the output it produced. "Run the tests and paste the output." "Show me the curl request and the actual response." An agent that claims a green build but can't show a passing run just hallucinated a green build. Make evidence the default, not a special request.&lt;/p&gt;

&lt;h3&gt;
  
  
  When it gets confused, stop and ask it to diagnose
&lt;/h3&gt;

&lt;p&gt;If it starts thrashing — two failed corrections in a row — stop asking for fixes and ask for a diagnosis: "Where exactly is this breaking, and what's your evidence?" Forcing it to locate the failure beats letting it patch blindly. Grounding the model in real, verifiable output is core to using any LLM in production reliably — the broader discipline is the subject of &lt;a href="https://cursuri-ai.ro/courses/introducere-ai-engineering" rel="noopener noreferrer"&gt;Introducere în AI Engineering&lt;/a&gt;, which covers evals and reliability alongside the tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context hygiene: the habit that changes everything
&lt;/h2&gt;

&lt;p&gt;Long sessions degrade. The model's attention spreads thin, old failed attempts pollute the context, and quality quietly drops. Managing context is 80% of getting consistent results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear the session after two failed corrections instead of arguing
&lt;/h3&gt;

&lt;p&gt;The hardest rule to follow: if you've corrected the same thing twice and it's still wrong, don't correct a third time. Clear the session (&lt;code&gt;/clear&lt;/code&gt;), re-state the goal cleanly with the lessons learned, and start fresh. Arguing with a confused context is the single biggest time sink there is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compact manually around 50% context
&lt;/h3&gt;

&lt;p&gt;Don't wait for auto-compaction. Around half-full, compact the session (&lt;code&gt;/compact&lt;/code&gt;) to summarize it and reclaim room, which keeps responses sharp. And when it goes genuinely off the rails, use conversation-rewind to roll back to an earlier point instead of trying to talk it back on course.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost math nobody shows you
&lt;/h2&gt;

&lt;p&gt;Here's what comparison posts skip. Two ways to pay: a Claude subscription (Pro, Max, Team, Enterprise) or pay-as-you-go on the Anthropic API. Most surfaces require one of these — Claude Code requires a paid plan, and the desktop app explicitly needs a paid subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro vs Max vs API — what to actually weigh
&lt;/h3&gt;

&lt;p&gt;As a snapshot (always confirm current numbers on &lt;code&gt;claude.com/pricing&lt;/code&gt; — plan structures shift): Pro is around &lt;strong&gt;$20/month&lt;/strong&gt;, Max is &lt;strong&gt;$100/month (5x)&lt;/strong&gt; or &lt;strong&gt;$200/month (20x)&lt;/strong&gt;, and the API is pure per-token. The lever that dominates the bill: &lt;strong&gt;output tokens cost ~5x input.&lt;/strong&gt; On the API, Opus 4.8 lists at &lt;strong&gt;$5/MTok input and $25/MTok output&lt;/strong&gt;; Sonnet 4.6 at &lt;strong&gt;$3/$15&lt;/strong&gt;; Haiku 4.5 at &lt;strong&gt;$1/$5&lt;/strong&gt;. Agentic coding generates a &lt;em&gt;lot&lt;/em&gt; of output, so output pricing — not input — is what you feel.&lt;/p&gt;

&lt;h3&gt;
  
  
  The model default is tier-dependent (a common myth)
&lt;/h3&gt;

&lt;p&gt;There is no single fixed default. It resolves by account type: Max, Team Premium, Enterprise pay-as-you-go, and the Anthropic API default to &lt;strong&gt;Opus 4.8&lt;/strong&gt;; Pro, Team Standard, and Enterprise seats default to &lt;strong&gt;Sonnet 4.6&lt;/strong&gt;. Switch anytime with &lt;code&gt;/model&lt;/code&gt; or &lt;code&gt;--model&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pick a model by alias or full name&lt;/span&gt;
claude &lt;span class="nt"&gt;--model&lt;/span&gt; opus
claude &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-6

&lt;span class="c"&gt;# opusplan: Opus to plan, Sonnet to execute&lt;/span&gt;
/model opusplan

&lt;span class="c"&gt;# 1M-token context variant (Opus and Sonnet only — not Haiku)&lt;/span&gt;
/model claude-opus-4-8[1m]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;opusplan&lt;/code&gt; is a strong default — Opus's reasoning where it matters (planning), Sonnet's speed and lower cost for execution. Two caveats: Sonnet's 1M-token context needs usage credits on &lt;em&gt;every&lt;/em&gt; plan (including Max), while Opus 1M is included on Max/Team/Enterprise; and Haiku 4.5 is 200k context, not 1M. If a new model doesn't show up in &lt;code&gt;/model&lt;/code&gt;, you're probably on an older build — run &lt;code&gt;claude update&lt;/code&gt; and it'll appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  API vs Max: which actually wins
&lt;/h3&gt;

&lt;p&gt;On the API with heavy daily coding, the bill gets unpredictable enough that a flat Max plan removes the anxiety of watching output tokens tick up mid-refactor. Code with it most days and a flat plan usually wins over metered API billing on raw cost &lt;em&gt;and&lt;/em&gt; peace of mind. Use it occasionally, and the API's pay-only-for-what-you-use can be cheaper. Cost-optimizing LLM usage in production — caching, model routing, batching — is a discipline of its own, covered in &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Integrare Avansată LLM în Aplicații de Producție&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code vs Cursor — when to reach for each
&lt;/h2&gt;

&lt;p&gt;Not a winner-takes-all. Cursor's IDE-native, inline experience is excellent for tight edit-review-edit loops where you want to stay in the editor. Reach for Claude Code's terminal when you want &lt;em&gt;agentic autonomy&lt;/em&gt; — multi-file refactors it drives end to end, headless runs in CI, parallel worktree agents, scripting it into pipelines. And since the VS Code/Cursor extension shares the same engine, it's not strictly either/or: use the extension for inline diffs and the terminal for the heavy autonomous work. Pick by task, not tribe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security and team scaling notes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Approval modes, sandboxing, and reviewing AI commits
&lt;/h3&gt;

&lt;p&gt;Permission modes are real guardrails: &lt;code&gt;default&lt;/code&gt;, &lt;code&gt;acceptEdits&lt;/code&gt;, &lt;code&gt;plan&lt;/code&gt;, and &lt;code&gt;bypassPermissions&lt;/code&gt;, selectable via &lt;code&gt;--permission-mode&lt;/code&gt; and cycled with Shift+Tab. Keep destructive operations behind approval, use hooks to outright block unsafe shell commands, and — non-negotiable — &lt;strong&gt;every AI-authored commit goes through the same review as a human's.&lt;/strong&gt; An agent that can run commands is a powerful tool and a real attack surface; treat its output as untrusted until reviewed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What changes when more than one person shares the conventions
&lt;/h3&gt;

&lt;p&gt;Once a team shares a repo, &lt;code&gt;CLAUDE.md&lt;/code&gt; becomes shared infrastructure: a change to it changes everyone's agent behavior, so it goes through PR review like code. Skills and hooks get version-controlled and packaged as plugins (versioned bundles of skills, subagents, hooks, and MCP servers) distributed through a marketplace, so the whole team runs the same setup. The thing that breaks at scale is uncoordinated worktree agents on the same files — keep agents on separate branches and merge through PRs, exactly as you would with people.&lt;/p&gt;

&lt;h2&gt;
  
  
  A copy-paste Claude Code workflow starter setup
&lt;/h2&gt;

&lt;p&gt;Drop a &lt;code&gt;CLAUDE.md&lt;/code&gt; like the one above in your repo root. Add a couple of hooks (auto-format on &lt;code&gt;PostToolUse&lt;/code&gt;, block unsafe commands on &lt;code&gt;PreToolUse&lt;/code&gt;). Write one custom slash command for your most repeated task. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the CLI (macOS/Linux/WSL)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://claude.ai/install.sh | bash

&lt;span class="c"&gt;# Start in your project, in plan mode&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
claude &lt;span class="nt"&gt;--permission-mode&lt;/span&gt; plan

&lt;span class="c"&gt;# Headless mode for scripting and CI&lt;/span&gt;
git diff main &lt;span class="nt"&gt;--name-only&lt;/span&gt; | claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"review these changed files for security issues"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's enough to feel the difference the same day.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR — the 8 habits that matter most
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Describe outcomes, not keystrokes&lt;/strong&gt; — and let it interview you before it starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep &lt;code&gt;CLAUDE.md&lt;/code&gt; under ~60 lines&lt;/strong&gt; — stable rules only; let it append lessons after bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan mode before code&lt;/strong&gt; — approving a plan is far cheaper than untangling a wrong diff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks for anything that must happen&lt;/strong&gt; — &lt;code&gt;CLAUDE.md&lt;/code&gt; and skills are requests, not guarantees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents for noisy research&lt;/strong&gt; — keep the main context clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel agents in git worktrees&lt;/strong&gt; — capped by &lt;em&gt;your&lt;/em&gt; review capacity, ~2-3 in practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demand evidence&lt;/strong&gt; — "show me the test output," never "it works."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context hygiene&lt;/strong&gt; — clear the session after two failed corrections; compact around 50%.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is exotic. It's the boring discipline of treating an agent like a capable teammate: clear briefs, guardrails, evidence, and review. That's the Claude Code workflow that turns &lt;strong&gt;agentic coding&lt;/strong&gt; from a party trick into the thing that clears your queue. For the structured, end-to-end path through all of it, the &lt;a href="https://cursuri-ai.ro/courses/claude-code-mastery-coding-agentic" rel="noopener noreferrer"&gt;Claude Code Mastery course&lt;/a&gt; walks the same terrain in order — just note it's in Romanian. Either way, adopt the habits above and tune them to your own repo.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Prompt Caching with Claude: How We Cut AI API Costs by 90% in Production (2026 Guide)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 01 Jun 2026 09:02:05 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</link>
      <guid>https://dev.to/cursuri-ai/prompt-caching-with-claude-how-we-cut-ai-api-costs-by-90-in-production-2026-guide-35lo</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Anthropic's prompt caching gives you a &lt;strong&gt;90% discount&lt;/strong&gt; on cached input tokens and up to &lt;strong&gt;85% lower latency&lt;/strong&gt; on long-context calls. But the wins only show up if you understand cache breakpoints, TTLs, and what actually invalidates the cache. This guide walks through 5 production patterns we use, real benchmarks, and the pitfalls that silently kill your hit rate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost problem nobody warns you about
&lt;/h2&gt;

&lt;p&gt;When you ship anything serious with Claude — an agent, a RAG system, a code assistant, a customer support bot — you discover the same uncomfortable truth: &lt;strong&gt;your input token bill dwarfs your output bill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A typical agent loop looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt: ~3,000 tokens (instructions, persona, constraints)&lt;/li&gt;
&lt;li&gt;Tool definitions: ~4,000 tokens (JSON schemas for 10–20 tools)&lt;/li&gt;
&lt;li&gt;Conversation history: 5,000–50,000 tokens (grows every turn)&lt;/li&gt;
&lt;li&gt;RAG context: 5,000–20,000 tokens per query&lt;/li&gt;
&lt;li&gt;User message: ~200 tokens&lt;/li&gt;
&lt;li&gt;Model output: ~500 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every single turn, you re-send the same system prompt, the same tool definitions, and most of the conversation history. On Claude Sonnet 4.6 at $3 per million input tokens, a 15,000-token prefix sent across 20 conversation turns costs you &lt;strong&gt;$0.90 per conversation in input alone&lt;/strong&gt; — before you've generated a single useful token of output.&lt;/p&gt;

&lt;p&gt;Multiply that by 10,000 daily active users and you're burning &lt;strong&gt;$9,000/day&lt;/strong&gt; just to re-tokenize content you already sent.&lt;/p&gt;

&lt;p&gt;This is exactly what prompt caching fixes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude's prompt caching actually does
&lt;/h2&gt;

&lt;p&gt;Anthropic's prompt caching lets the API store the internal state for a prefix of your prompt and reuse it on subsequent requests. Two numbers matter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Pricing relative to base input&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache write&lt;/strong&gt; (first time a prefix is seen)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1.25×&lt;/strong&gt; base input cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cache read&lt;/strong&gt; (subsequent hits)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.10×&lt;/strong&gt; base input cost (90% off)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You pay a small one-time premium to write the cache, then every hit after that is 10% of the normal price. The break-even point is &lt;strong&gt;after the second request&lt;/strong&gt; — anything more than one read and you're saving money.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mental model
&lt;/h3&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;prefix tree&lt;/strong&gt; with checkpoints. You mark up to 4 points in your prompt with &lt;code&gt;cache_control&lt;/code&gt;, and Claude caches everything from the start of the prompt up to each breakpoint. On the next request, if the prefix matches &lt;strong&gt;byte-for-byte&lt;/strong&gt;, you get a cache hit.&lt;/p&gt;

&lt;p&gt;The order Claude processes the prompt is fixed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools → system → messages (oldest → newest)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your cache breakpoints must respect that order. You cannot cache a later block without caching everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TTL trap
&lt;/h3&gt;

&lt;p&gt;The default cache TTL is &lt;strong&gt;5 minutes&lt;/strong&gt;, refreshed on every read. A 1-hour TTL is available as a premium option (costs more on write, same on read). Most teams over-pay for the 1-hour cache when 5 minutes would have served them fine — if your traffic is steady, every request refreshes the TTL and the cache effectively lives forever.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Want to go deeper on Claude's API mechanics in production? Prompt caching, tool use, batch API, streaming, and cost optimization are covered in depth in the &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration course on Cursuri-AI.ro&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 1: Cache the system prompt and tool definitions
&lt;/h2&gt;

&lt;p&gt;This is the highest-ROI change you can make, and most codebases get it wrong on the first try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong&lt;/strong&gt; (no caching):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Right&lt;/strong&gt; (cached):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. [...3000 tokens of instructions...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# ... more tools ...
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache breakpoint on the last tool
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the system block&lt;/strong&gt; caches everything up through the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cache_control&lt;/code&gt; on the last tool&lt;/strong&gt; caches everything through the tool definitions — this is critical because tools are evaluated &lt;em&gt;before&lt;/em&gt; system per the processing order above.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Wait — that's actually wrong as stated. Let me correct: because the order is &lt;code&gt;tools → system → messages&lt;/code&gt;, putting &lt;code&gt;cache_control&lt;/code&gt; on the &lt;strong&gt;last tool&lt;/strong&gt; caches just the tools, and putting it on &lt;strong&gt;system&lt;/strong&gt; caches tools + system. You typically only need the system breakpoint; it covers everything before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading the response
&lt;/h3&gt;

&lt;p&gt;The API returns cache stats in &lt;code&gt;response.usage&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_creation_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# tokens written to cache (1.25x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_read_input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# tokens read from cache (0.10x cost)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# uncached tokens (1x cost)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the first request: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is high, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is 0.&lt;br&gt;
On every subsequent request within 5 minutes: &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; is 0, &lt;code&gt;cache_read_input_tokens&lt;/code&gt; is high. That's the win condition.&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 2: Cache conversation history with rolling breakpoints
&lt;/h2&gt;

&lt;p&gt;In a multi-turn agent, the conversation grows on every turn. If you only cache the system prompt, you're still re-sending and re-billing every prior turn at full price.&lt;/p&gt;

&lt;p&gt;The trick is to add a &lt;strong&gt;second cache breakpoint&lt;/strong&gt; on the most recent assistant message, so the entire conversation up to that point is cached:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_messages_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    history: list of {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: ...}
    new_user_message: str
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Add cache breakpoint on the last historical message
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every new turn reads the entire prior conversation from cache. Cost per turn becomes nearly constant instead of growing linearly with conversation length.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4-breakpoint budget
&lt;/h3&gt;

&lt;p&gt;Claude allows up to &lt;strong&gt;4 cache breakpoints&lt;/strong&gt; per request. A common production layout uses all four:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 1&lt;/strong&gt;: end of tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 2&lt;/strong&gt;: end of system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 3&lt;/strong&gt;: end of "stable" conversation history (turns 1 through N-2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakpoint 4&lt;/strong&gt;: end of "recent" history (turn N-1)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you a layered cache: tools rarely change, system rarely changes, old history never changes, recent history is sliding. Each layer hits or misses independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Cache few-shot examples separately from the user query
&lt;/h2&gt;

&lt;p&gt;Few-shot prompting is one of the highest-leverage techniques in production LLM apps — and one of the most expensive if you don't cache. A typical few-shot block with 5–10 examples can run 8,000–15,000 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Example 1:
Input: ...
Output: ...

Example 2:
Input: ...
Output: ...

[... 8 more examples ...]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a classifier. Categorize support tickets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FEW_SHOT_EXAMPLES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# cache the examples
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_ticket&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critical rule: &lt;strong&gt;put the variable content last&lt;/strong&gt;. Cache only works on prefix matches. If your user-specific data is in the middle of the prompt, everything after it becomes uncacheable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: RAG with cached document chunks
&lt;/h2&gt;

&lt;p&gt;RAG systems are notorious for blowing up token bills because the retrieved context is large and unique per query. You can't cache the retrieved chunks themselves (they change), but you &lt;em&gt;can&lt;/em&gt; cache the surrounding framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ~2000 tokens, stable
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;retrieved_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For RAG with a stable knowledge base (corporate docs, product manuals, codebases), there's a more advanced pattern: &lt;strong&gt;pre-tile your documents into fixed-size cacheable blocks&lt;/strong&gt; and choose your retrieval strategy to favor returning whole blocks rather than slices. You trade some retrieval precision for massive cost savings on hot documents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you build RAG systems for production, the &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation) course on Cursuri-AI.ro&lt;/a&gt; covers caching strategies, retrieval optimization, hybrid search, and eval pipelines end-to-end.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pattern 5: Cache tool results in long-running agents
&lt;/h2&gt;

&lt;p&gt;Agent loops are caching's sweet spot. An agent runs &lt;code&gt;tool_call → tool_result → tool_call → tool_result&lt;/code&gt; cycles, and each iteration the prompt grows by the new tool result. Without caching, you re-bill the entire history every iteration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add cache breakpoint to the latest message
&lt;/span&gt;        &lt;span class="n"&gt;cached_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}],&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

        &lt;span class="c1"&gt;# Append assistant turn + tool results, loop
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_cache_breakpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a 15-step agent run with a 4,000-token system prompt and 8,000-token tools, this pattern cuts input cost by &lt;strong&gt;~80–88%&lt;/strong&gt; versus uncached.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent loops, tool design, multi-step planning and cost modeling are the focus of the &lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation course on Cursuri-AI.ro&lt;/a&gt; — built around the same Claude Agent SDK patterns shown here.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Real benchmarks: before vs after
&lt;/h2&gt;

&lt;p&gt;These numbers are from a production code-review agent running on Claude Sonnet 4.6, averaged over 1,000 conversations of 12 turns each.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Uncached&lt;/th&gt;
&lt;th&gt;Cached&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg input tokens per turn&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;18,400&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avg billed input cost per turn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0552&lt;/td&gt;
&lt;td&gt;$0.0061&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−89%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg time-to-first-token&lt;/td&gt;
&lt;td&gt;1,840 ms&lt;/td&gt;
&lt;td&gt;380 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg total cost per 12-turn conversation&lt;/td&gt;
&lt;td&gt;$0.66&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate (warm)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;96.3%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency win surprised us as much as the cost win. Cache reads skip the prompt processing phase entirely, which dominates time-to-first-token for long contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pitfalls that silently kill your hit rate
&lt;/h2&gt;

&lt;p&gt;These are mistakes we've made or seen in production code reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Whitespace and formatting drift
&lt;/h3&gt;

&lt;p&gt;Cache hits require &lt;strong&gt;byte-exact prefix matches&lt;/strong&gt;. If your system prompt is built with f-strings and you add a timestamp, conditional newline, or trailing space, you invalidate the cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BREAKS the cache every minute
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Current time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Works
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Pass time as a separate user message field if needed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Audit your prompts for hidden variability: locale-formatted numbers, dict iteration order in older Pythons, tool definitions where field order changes between deploys.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Reordering tool definitions
&lt;/h3&gt;

&lt;p&gt;If you generate tool schemas from a dict and the dict iteration order changes between runs, your cache evaporates. &lt;strong&gt;Always sort tool definitions&lt;/strong&gt; before sending:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_tools&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Wrong breakpoint placement
&lt;/h3&gt;

&lt;p&gt;Breakpoints must come &lt;strong&gt;after&lt;/strong&gt; the content you want to cache, not before. The breakpoint marks "cache everything up to here." Putting it on the user message instead of the system prompt is a common rookie mistake.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Caching tiny prefixes
&lt;/h3&gt;

&lt;p&gt;There's a minimum cacheable size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet &amp;amp; Opus&lt;/strong&gt;: 1,024 tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt;: 2,048 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below the minimum, the &lt;code&gt;cache_control&lt;/code&gt; is silently ignored — the API doesn't error, it just doesn't cache. Always check &lt;code&gt;response.usage.cache_creation_input_tokens &amp;gt; 0&lt;/code&gt; on your first request to confirm the cache actually wrote.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ignoring the 5-minute TTL on bursty traffic
&lt;/h3&gt;

&lt;p&gt;If your traffic is bursty — heavy during business hours, dead overnight — the 5-minute cache will expire between sessions and you'll pay the write premium every time. For bursty patterns, either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the 1-hour TTL (more expensive write, same read price)&lt;/li&gt;
&lt;li&gt;Or send a small "keep-alive" request every 4 minutes during expected idle windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Mixing cached and uncached models
&lt;/h3&gt;

&lt;p&gt;Cache is &lt;strong&gt;model-specific&lt;/strong&gt;. If your code falls back from Sonnet 4.6 to Haiku 4.5 on rate limit, the Haiku call has no cache history. Either keep fallback paths uncached, or build separate caches per model.&lt;/p&gt;




&lt;h2&gt;
  
  
  When NOT to use prompt caching
&lt;/h2&gt;

&lt;p&gt;Caching has overhead. Skip it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-shot calls with no shared prefix&lt;/strong&gt; — single-request classification, one-off summarization. The 1.25× write premium is pure loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-variability prompts&lt;/strong&gt; — if each request has different boilerplate, you're paying write premium for nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts below the minimum&lt;/strong&gt; — short prompts can't be cached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost is already negligible&lt;/strong&gt; — if you spend $20/month on the API, the engineering time to optimize caching costs more than the savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful heuristic: &lt;strong&gt;if your stable prefix is ≥2,000 tokens AND you make ≥3 requests per 5-minute window with that prefix, cache it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting it together: a production checklist
&lt;/h2&gt;

&lt;p&gt;Before you ship a Claude integration in 2026, run this list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] System prompt has &lt;code&gt;cache_control&lt;/code&gt; set&lt;/li&gt;
&lt;li&gt;[ ] Tool definitions are sorted and stable&lt;/li&gt;
&lt;li&gt;[ ] User-variable content is at the end of the prompt, not in the middle&lt;/li&gt;
&lt;li&gt;[ ] Cache stats (&lt;code&gt;cache_read_input_tokens&lt;/code&gt;) are logged and dashboarded&lt;/li&gt;
&lt;li&gt;[ ] Cache hit rate is monitored — alert if it drops below 80%&lt;/li&gt;
&lt;li&gt;[ ] No timestamps, request IDs, or random data injected into cached blocks&lt;/li&gt;
&lt;li&gt;[ ] First-request cache write is verified in tests&lt;/li&gt;
&lt;li&gt;[ ] Fallback model paths handle cache absence cleanly&lt;/li&gt;
&lt;li&gt;[ ] 5-minute vs 1-hour TTL choice is documented with reasoning&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Prompt caching is the single highest-leverage cost optimization for Claude in production. The mechanics are simple, but the gotchas — formatting drift, reorder bugs, minimum sizes, TTL mismatches — are where teams leave money on the table.&lt;/p&gt;

&lt;p&gt;If you treat caching as a first-class concern from day one, you ship AI features that are 5–10× cheaper to operate than the naive implementation. If you bolt it on later, you spend weeks chasing cache misses through your logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to go deeper
&lt;/h3&gt;

&lt;p&gt;I write about production AI engineering — Claude API, multi-agent systems, RAG, cost optimization — on &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, an interactive learning platform with an always-available AI tutor that walks you through every concept and reviews your code. The four courses most relevant to what's in this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration&lt;/a&gt;&lt;/strong&gt; — Claude API in production: prompt caching, tool use, batch API, streaming, error handling, retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;&lt;/strong&gt; — structured prompting, few-shot patterns, evaluation, prompt versioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents &amp;amp; Automation&lt;/a&gt;&lt;/strong&gt; — agent loops, tool design, multi-agent orchestration, cost modeling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt;&lt;/strong&gt; — retrieval, embeddings, hybrid search, caching, eval pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Course content is delivered in Romanian (the platform's primary audience), but the code, frameworks, and patterns are language-agnostic — the IT Pro track is built specifically for engineers shipping AI in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's your cache hit rate in production?&lt;/strong&gt; Drop a comment with your setup — I'm collecting patterns for a follow-up post on &lt;strong&gt;caching at the multi-tenant scale&lt;/strong&gt; (per-customer cache namespaces, cache warm-up strategies, and the cost model when you have 10,000+ concurrent users).&lt;/p&gt;

&lt;p&gt;If this helped, a ❤️ or a 🦄 keeps it visible for other devs hitting the same cost wall. Follow for more deep-dives on Claude in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic's official prompt caching docs: &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;docs.anthropic.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude API pricing: &lt;a href="https://www.anthropic.com/pricing" rel="noopener noreferrer"&gt;anthropic.com/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full IT Pro AI engineering catalog: &lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro/courses&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>AI for Influencers in 2026: How to Build a Content Engine That Runs Itself</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 19 May 2026 13:34:41 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</link>
      <guid>https://dev.to/cursuri-ai/ai-for-influencers-in-2026-how-to-build-a-content-engine-that-runs-itself-48h0</guid>
      <description>&lt;p&gt;The influencer economy is no longer about who posts the most. It's about who has built the smartest &lt;strong&gt;AI content system&lt;/strong&gt; behind the scenes.&lt;/p&gt;

&lt;p&gt;In 2026, the top 1% of creators aren't outworking everyone else. They're out-engineering them. They've turned what used to be a 60-hour-a-week grind into a streamlined pipeline where AI handles 80% of the production work — and they keep 100% of the creative direction.&lt;/p&gt;

&lt;p&gt;Over the past two years, working with hundreds of creators and educators through &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — Eastern Europe's leading AI education platform — I've watched this shift happen in real time. The patterns are consistent, the playbook is replicable, and the gap between those who adopt it and those who don't is widening every month.&lt;/p&gt;

&lt;p&gt;This article breaks down exactly how it works, what tools they use, and how you can build the same stack — whether you're an influencer who codes, or a developer building tools for creators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Changed the Influencer Game (Permanently)
&lt;/h2&gt;

&lt;p&gt;Three years ago, an influencer's competitive advantage was personality plus consistency. Today, that's table stakes.&lt;/p&gt;

&lt;p&gt;The real moat now is &lt;strong&gt;operational leverage&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How fast can you identify a trending topic?&lt;/li&gt;
&lt;li&gt;How quickly can you produce content across 5+ formats?&lt;/li&gt;
&lt;li&gt;How precisely can you target each piece to its platform?&lt;/li&gt;
&lt;li&gt;How much of this can run without your direct involvement?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators who answered "all of it, mostly automated" are the ones scaling past 1M followers, 7-figure revenues, and 50+ pieces of content per week — solo or with tiny teams.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. It's already happening. The question is whether you're building the system or watching others build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-Layer AI Stack for Modern Influencers
&lt;/h2&gt;

&lt;p&gt;Every high-output creator I've analyzed runs some version of this five-layer architecture. The tools change. The structure doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Intelligence (Research &amp;amp; Trend Detection)
&lt;/h3&gt;

&lt;p&gt;Before you create, you need to know what to create.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors trending topics, keywords, and conversations in your niche&lt;/li&gt;
&lt;li&gt;Analyzes competitor content performance&lt;/li&gt;
&lt;li&gt;Identifies content gaps and opportunities&lt;/li&gt;
&lt;li&gt;Surfaces audience questions before they become saturated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools and APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity API&lt;/strong&gt; — for real-time research with citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa AI&lt;/strong&gt; — semantic search for niche topics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Trends API&lt;/strong&gt; + &lt;strong&gt;YouTube Data API&lt;/strong&gt; — for trend signals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit API&lt;/strong&gt; + &lt;strong&gt;Twitter/X API&lt;/strong&gt; — for audience listening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BuzzSumo&lt;/strong&gt; or &lt;strong&gt;SparkToro&lt;/strong&gt; — for content gap analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Don't just track what's popular. Track what's &lt;em&gt;about to&lt;/em&gt; become popular by monitoring signal velocity (rate of change), not absolute volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Ideation (Concept &amp;amp; Angle Generation)
&lt;/h3&gt;

&lt;p&gt;This is where most creators waste the most time — staring at a blank page deciding what to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI does well here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates 30+ angle variations from a single topic&lt;/li&gt;
&lt;li&gt;Adapts ideas to your specific voice and audience&lt;/li&gt;
&lt;li&gt;Identifies counterintuitive takes that drive engagement&lt;/li&gt;
&lt;li&gt;Maps ideas to platform-specific formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommended approach:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a custom GPT or Claude project trained on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your past top-performing content (with metrics)&lt;/li&gt;
&lt;li&gt;Your audience persona and voice guidelines&lt;/li&gt;
&lt;li&gt;Your content pillars and forbidden topics&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If you've never structured a voice profile before, this is one of the highest-leverage skills you can develop. We dedicate an entire module to it inside &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI for Content Creators track on Cursuri-AI.ro&lt;/a&gt; — including the exact prompts and templates we use internally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then prompt it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a content strategist for an influencer with this profile:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Generate angles that are specific, counterintuitive, and aligned with their voice.
        Avoid generic takes. Each angle should be testable as a hook.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Give me &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; distinct angles for content about: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="n"&gt;angles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_content_angles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;building a personal brand in 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;voice_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, data-driven, contrarian, B2B-focused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;angles&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this single function call can fuel a month of content. Cost: ~$0.15.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Production (Multi-Format Content Generation)
&lt;/h3&gt;

&lt;p&gt;This is the heaviest-lifting layer — and where AI compounds value most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The repurposing principle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One "pillar" piece (a long-form video, podcast, or article) should generate 10–15 derivative pieces with minimal manual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample workflow for a 30-minute podcast episode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transcription&lt;/strong&gt; → Whisper API or AssemblyAI ($0.36 for 30 min)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-form blog post&lt;/strong&gt; → Claude/GPT generates structured article from transcript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn carousel&lt;/strong&gt; → 8–10 slide deck with key insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Twitter/X thread&lt;/strong&gt; → 10-tweet thread with the strongest takes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-form clips&lt;/strong&gt; → Opus Clip or Riverside AI extracts viral moments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Newsletter&lt;/strong&gt; → Personalized summary with commentary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Shorts&lt;/strong&gt; → Auto-captioned vertical clips&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quote graphics&lt;/strong&gt; → Designed via Canva API or Bannerbear&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instagram Reels&lt;/strong&gt; → Repurposed clips with platform-native captions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SEO blog series&lt;/strong&gt; → 3–5 articles targeting specific search queries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total human time: 1–2 hours of review and approval, instead of 30+ hours of production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Distribution (Platform-Native Publishing)
&lt;/h3&gt;

&lt;p&gt;Most creators lose performance here by posting the same content identically across platforms. AI fixes this by adapting each piece to the platform's native expectations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive distribution looks like:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn → Professional tone, longer-form, hook in first 2 lines&lt;/li&gt;
&lt;li&gt;Twitter/X → Punchy, opinionated, thread-friendly&lt;/li&gt;
&lt;li&gt;Instagram → Visual-first, emotion-driven captions&lt;/li&gt;
&lt;li&gt;TikTok → Hook in 1 second, vertical, trend-aware&lt;/li&gt;
&lt;li&gt;YouTube → SEO-optimized titles, timestamps, structured descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Buffer&lt;/strong&gt;, &lt;strong&gt;Hypefury&lt;/strong&gt;, or &lt;strong&gt;Typefully&lt;/strong&gt; — scheduling with AI optimization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make&lt;/strong&gt; or &lt;strong&gt;n8n&lt;/strong&gt; — custom automation workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postiz&lt;/strong&gt; (open source) — self-hosted social scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 5: Optimization (Performance Feedback Loop)
&lt;/h3&gt;

&lt;p&gt;This is the layer most creators skip — and it's the one that compounds the hardest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to track:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hook performance (which first lines drive scroll-stops?)&lt;/li&gt;
&lt;li&gt;Format performance (which content types convert best per platform?)&lt;/li&gt;
&lt;li&gt;Topic performance (which themes consistently win?)&lt;/li&gt;
&lt;li&gt;Audience signals (which content brings in your ICP vs. tourists?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How AI helps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes patterns across hundreds of posts in seconds&lt;/li&gt;
&lt;li&gt;Identifies non-obvious performance correlations&lt;/li&gt;
&lt;li&gt;Suggests next-week content based on last week's winners&lt;/li&gt;
&lt;li&gt;Drafts variations of top performers for retesting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build a simple dashboard that ingests your analytics from each platform and feeds it back to your ideation layer. This closes the loop — every post makes the next one smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Minimal Working Example: Content Repurposing Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's a stripped-down Python pipeline that takes a transcript and produces three platform-adapted outputs. Useful as a starting point you can extend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate LinkedIn post, Twitter thread, and newsletter from a transcript.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an expert content strategist. The creator&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s voice is: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    From the transcript below, produce THREE outputs in JSON:
    1. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 200-word LinkedIn post with strong hook
    2. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 8-tweet thread (array of strings, max 280 chars each)
    3. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: A 400-word personal newsletter section

    Each must feel platform-native, not copy-pasted.

    Transcript:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Return only valid JSON.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sample_transcript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;[Your podcast/video transcript here]&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;voice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct, contrarian, B2B-focused, data-driven&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;repurpose_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== LINKEDIN ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== TWITTER THREAD ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== NEWSLETTER ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newsletter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extend this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whisper for audio-to-text input&lt;/li&gt;
&lt;li&gt;A queue system (Redis + Celery) for batch processing&lt;/li&gt;
&lt;li&gt;A simple Streamlit UI for non-technical creator team members&lt;/li&gt;
&lt;li&gt;Webhook integration with Buffer or Typefully for direct publishing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 5 Mistakes That Kill AI Content Pipelines
&lt;/h2&gt;

&lt;p&gt;I've audited dozens of creator AI workflows. The same mistakes appear over and over.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Treating AI as a Writer Instead of a Drafter
&lt;/h3&gt;

&lt;p&gt;AI-generated text published without human editing is detectable, generic, and erodes trust. Use AI for the first 80%, but always edit the final 20% — that's where your voice lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Skipping the Voice Calibration Step
&lt;/h3&gt;

&lt;p&gt;Without a documented voice profile (tone, vocabulary, forbidden phrases, examples), every output regresses to the mean. Spend 4 hours documenting your voice once. It pays back for years. If you want a structured framework for this, we walk through the full process in &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;our AI workflow courses&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Building Without Measurement
&lt;/h3&gt;

&lt;p&gt;Pipelines without analytics are vibes-based content factories. If you can't tell which output formats win, you're optimizing blind.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Over-Automating Distribution
&lt;/h3&gt;

&lt;p&gt;Full automation of posting (no human in the loop) is how creators end up with embarrassing posts going live during global news events. Keep a 1-click approval step at minimum.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Choosing Tools Over Architecture
&lt;/h3&gt;

&lt;p&gt;The creators who win don't have the best tools. They have the clearest workflow. Tools change every quarter. Architecture compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming Next (2026–2027)
&lt;/h2&gt;

&lt;p&gt;A few signals worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personalized AI clones&lt;/strong&gt; — creators training models on their voice/likeness to scale 1:1 audience interaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal generation at scale&lt;/strong&gt; — single prompts producing full video, audio, and graphics in one pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-native platforms&lt;/strong&gt; — new social networks built around AI-generated content as a first-class citizen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-driven content ops&lt;/strong&gt; — autonomous agents that research, produce, schedule, and optimize with minimal human input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The creators preparing for this now — by building modular, API-driven systems — will be the ones operating at unprecedented scale by 2027.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: AI for Influencers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Do I need to code to use AI as an influencer?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Many top creators use no-code tools (Zapier, Make, ChatGPT, Claude Projects). But knowing even basic Python unlocks 10x more customization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Will AI-generated content hurt my reach?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Only if it sounds generic. Platforms penalize low-effort content, not AI assistance. Original voice + AI scaffolding consistently outperforms 100% human or 100% AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How much should I budget for AI tools?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A solo creator can build a complete stack for $50–150/month. Larger operations run $500–2000/month. ROI is usually measured in weeks, not months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is this ethical? Should I disclose AI usage?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Be transparent about &lt;em&gt;what&lt;/em&gt; AI does in your workflow (research, drafting, editing), but you don't need to flag every AI-touched word. The standard: would your audience feel deceived if they saw your process? If no, you're fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Which AI model should I use as a creator?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For creative content: Claude tends to lead. For research with citations: Perplexity. For images: Midjourney or Flux. For video: Runway or Sora. Test all of them — they each have strengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Build the System, Not the Output
&lt;/h2&gt;

&lt;p&gt;The influencer economy is splitting into two clear tiers.&lt;/p&gt;

&lt;p&gt;The first tier still manually crafts every piece of content. They post when they have time. They burn out. They plateau.&lt;/p&gt;

&lt;p&gt;The second tier has built systems. AI handles the heavy lifting. They post consistently across every platform. Their content compounds because their architecture compounds.&lt;/p&gt;

&lt;p&gt;The gap between these two tiers is widening every month. And by 2027, it will be unbridgeable for those who waited too long to start.&lt;/p&gt;

&lt;p&gt;The good news: building your AI content engine doesn't require a team or a six-figure budget. It requires clear thinking, a few APIs, and the willingness to treat content like the engineering problem it actually is.&lt;/p&gt;

&lt;p&gt;Start with one layer. Make it work. Add the next.&lt;/p&gt;

&lt;p&gt;That's how the top 1% built it. And it's how you build it too.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want to Go Deeper?
&lt;/h2&gt;

&lt;p&gt;If this resonated and you want a structured path instead of piecing it together from scattered blog posts and YouTube videos:&lt;/p&gt;

&lt;p&gt;🎓 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — Our complete AI education platform covers the entire creator stack: prompting, automation, content pipelines, AI workflows for business, and how to build production-grade AI systems. Interactive courses with an AI tutor that adapts to how you learn — not passive video watching.&lt;/p&gt;

&lt;p&gt;Whether you're a creator looking to scale, a developer building tools for the creator economy, or a business owner figuring out how to integrate AI into your operations — &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;start here&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm the founder of &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, where I help thousands of creators, professionals, and businesses build with AI. I write about AI workflows, content automation, and the engineering side of the creator economy.&lt;/p&gt;

&lt;p&gt;If this article helped, drop a reaction and follow for more deep dives. &lt;strong&gt;What layer of your content stack are you working on right now?&lt;/strong&gt; Let me know in the comments — I read every one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contentcreation</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>7 Production Patterns for AI Agents That Don't Break in 2026</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Wed, 13 May 2026 11:38:37 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</link>
      <guid>https://dev.to/cursuri-ai/7-production-patterns-for-ai-agents-that-dont-break-in-2026-5g83</guid>
      <description>&lt;p&gt;A demo agent that loops three times, calls one tool, and returns "Hello, I helped you" is easy. A production agent that handles 10k requests a day across paying customers, without lighting your API bill on fire or hallucinating tool arguments at 3am, is a different animal.&lt;/p&gt;

&lt;p&gt;I've shipped AI agents in production for the last 18 months — search, content generation, support triage, document analysis. The same seven patterns keep showing up in every codebase that &lt;em&gt;actually&lt;/em&gt; works. None of them are exotic. Most of them are boring. That's the point: production agents are boring on purpose.&lt;/p&gt;

&lt;p&gt;Here are the patterns, with Python examples you can drop into your own loop today.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Tool Result Validator
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; LLMs hallucinate tool arguments. They will confidently call &lt;code&gt;send_email(to="user@example.com", subject="Refund", body="...")&lt;/code&gt; when the user never asked for an email. They will pass &lt;code&gt;user_id="123abc"&lt;/code&gt; to a function that requires an integer. They will invent product SKUs that don't exist.&lt;/p&gt;

&lt;p&gt;If your tool layer trusts the model's output, every hallucination becomes a production incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Validate tool arguments at the &lt;em&gt;tool boundary&lt;/em&gt;, not inside the tool. Reject early with a structured error the model can recover from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SendEmailArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_SCHEMAS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid_arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool call rejected. Fix these fields: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_user_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_confirmation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Always return the validation error &lt;em&gt;back to the model&lt;/em&gt; as a tool result. Don't raise it. The agent can usually self-correct in the next turn — but only if it sees the error.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bounded Memory
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Naive agent loops accumulate every tool call, every observation, every reasoning step into the conversation history. After 15 turns, you're sending 80k tokens per request. Your latency doubles. Your cost goes up 10x. The model starts losing track of what it was doing because the relevant context is buried under five tool dumps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat conversation history as a finite resource. Compress aggressively, summarize old turns, and keep tool outputs out of the main thread when you can.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summarize_at&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Keep system message + last 4 turns verbatim
&lt;/span&gt;        &lt;span class="n"&gt;keep_recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;to_summarize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_with_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_summarize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;earlier_context&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/earlier_context&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;keep_recent&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't summarize tool &lt;em&gt;call&lt;/em&gt; messages — the model needs the exact arguments to chain reasoning. Summarize only the &lt;em&gt;observations&lt;/em&gt;, and only when they're old enough that detail no longer matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Observable Loop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent is in production. A user complains it gave them garbage. You have... a final string output and a vague memory of what the loop does. Good luck debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Emit a structured event for every state transition in the loop. Every model call, every tool call, every retry, every error. Ship them to whatever observability stack you already use (Datadog, Honeycomb, OpenTelemetry, even just structured JSON to stdout).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;

&lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;span_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step.end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                  &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max turns exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Include a stable &lt;code&gt;run_id&lt;/code&gt; on &lt;em&gt;every&lt;/em&gt; event. When a customer reports an issue, you want one query that returns the entire trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent depends on three external services and a vector store. One of them is having a bad day. Your agent now returns a 500 to the user, even though for &lt;em&gt;this particular query&lt;/em&gt; the broken dependency wasn't actually needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Wrap dependencies in fallback chains. If the primary fails, the agent should know that capability is degraded — not crash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;implementations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;impl&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;impl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
                &lt;span class="nf"&gt;log_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.fallback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;degraded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is unavailable. Try a different approach.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial bit is the &lt;code&gt;degraded&lt;/code&gt; response — it goes back to the model as a tool result, and a well-prompted agent will re-plan. Maybe it tries a different tool. Maybe it tells the user "I can't check live inventory right now, but here's what I know." Either is better than a 500.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Surface the degraded status in your prompt. A line like &lt;em&gt;"If a tool returns status=degraded, do not retry it. Acknowledge the limitation in your final response."&lt;/em&gt; prevents the model from looping on a dead service.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Cost Circuit Breaker
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; A bug or an adversarial input puts your agent in a tool-calling loop. By the time you notice, you've spent $400 in 20 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Track cumulative cost per run and per session. Hard-stop when limits are exceeded. This is not optional in production — it's the difference between a bad day and a layoff conversation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.00&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_run&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_usd_per_user_per_day&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run exceeded $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_run&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spent_today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_today&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_day&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; exceeded daily budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Different limits for different surfaces. An internal batch job can have a $5 ceiling per run. A free-tier chat user gets $0.10. A paying enterprise customer gets $2. Hardcoding one number is a footgun.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Deterministic Critic
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; "LLM-as-a-judge" sounds clever, but using a model to grade itself is unreliable and slow. Two model calls per output, both hallucinate, both cost money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; For checks you can express as code, &lt;em&gt;use code&lt;/em&gt;. Reserve LLM grading for genuinely subjective dimensions, and only after the deterministic checks pass.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_cite_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\[\d+\]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;too_long&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;BANNED_PHRASES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;banned_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;must_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing_keywords:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;llm_grade&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subjective_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deterministic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the critic rejects, feed the issues back to the agent as a "revise this" instruction. After two rejections, return whatever you have with a flag — infinite revision loops are their own bug class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't make the critic too strict. If your accept rate is below 70%, your prompt is broken, not your output.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Stateless Replay (Idempotency)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent half-completed a task — it sent the email, then crashed before logging the result. The user retries. Now they get two emails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Treat every external side-effect as idempotent by design. Use deterministic IDs derived from the input, dedupe at the tool layer, and make agent runs &lt;em&gt;replayable&lt;/em&gt; from any saved checkpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;canonical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if the agent retries the same step within the run, it gets the cached result. If you persist the cache across runs (with a longer TTL), you get cross-run idempotency too — which is what you want for anything that costs money or sends messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Be careful what you put in the idempotency key. Timestamps, request IDs, or random nonces in the args will defeat it. Strip them before hashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It Together
&lt;/h2&gt;

&lt;p&gt;A production agent loop using all seven patterns is roughly 200 lines of Python. Not glamorous, but it survives. Here's the skeleton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_production&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CostBudget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;precheck_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BoundedMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OutputCritic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TURNS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;task_context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revise: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task incomplete after max turns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the loop. Drop in your favorite model API (Claude, GPT, open source — patterns work the same), wire up your tools with the validator from pattern 1, and you have something that won't embarrass you in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Read Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic's "Building effective agents" guide&lt;/a&gt; — the canonical reference on when to use agents vs simple workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;OpenAI's Agents SDK docs&lt;/a&gt; — clean reference implementation of multi-agent handoffs.&lt;/li&gt;
&lt;li&gt;For Romanian-speaking developers building agents in production, the &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;AI Agents course on Cursuri-AI.ro&lt;/a&gt; goes deeper on these patterns with hands-on exercises.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've shipped agents in production, what patterns did I miss? Drop them in the comments — I'll add the best ones to a follow-up post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by a developer who has paged themselves at 3am because an agent went into a tool-calling loop. Don't be that developer. Use the circuit breaker.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Fine-Tuning LLMs in 2026: A Practical Guide for Engineers (LoRA, QLoRA, DPO, GRPO)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Fri, 01 May 2026 20:31:02 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/fine-tuning-llms-in-2026-a-practical-guide-for-engineers-lora-qlora-dpo-grpo-jjo</link>
      <guid>https://dev.to/cursuri-ai/fine-tuning-llms-in-2026-a-practical-guide-for-engineers-lora-qlora-dpo-grpo-jjo</guid>
      <description>&lt;p&gt;Fine-tuning has gone from "research lab toy" to a &lt;strong&gt;first-class production technique&lt;/strong&gt; for AI engineers. With LoRA-class adapters, modern alignment algorithms (DPO, GRPO, RLVR), and serving stacks like vLLM, you can ship a custom model on a single H100 — sometimes on a single 4090.&lt;/p&gt;

&lt;p&gt;But the question isn't &lt;em&gt;can&lt;/em&gt; you fine-tune. It's: &lt;strong&gt;should you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This guide is the engineering checklist I wish I'd had two years ago. It covers the decision tree, the modern toolchain, the gotchas, and the EU compliance constraints you can't ignore in 2026.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🇪🇺 Romanian / EU readers: the full hands-on Romanian-language program is at &lt;a href="https://cursuri-ai.ro/courses/fine-tuning-modele-ai" rel="noopener noreferrer"&gt;Fine-Tuning și Adaptarea Modelelor AI — Enterprise Edition&lt;/a&gt;. It includes a complete end-to-end project, EU AI Act governance, and FinOps modeling.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't fine-tune first.&lt;/strong&gt; Try prompting → RAG → fine-tuning. In that order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LoRA / QLoRA&lt;/strong&gt; is the default in 2026. Full fine-tuning is rarely the right call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alignment ≠ SFT.&lt;/strong&gt; SFT teaches &lt;em&gt;format&lt;/em&gt;; DPO/GRPO/RLVR teach &lt;em&gt;preferences and reasoning&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation is the hard part.&lt;/strong&gt; Loss curves don't tell you if the model is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serving matters.&lt;/strong&gt; A great fine-tune served badly is just an expensive demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act applies.&lt;/strong&gt; Document your data, your evals, and your model card.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. When fine-tuning is actually the right tool
&lt;/h2&gt;

&lt;p&gt;Most teams reach for fine-tuning too early. Here's the honest decision tree:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;First try&lt;/th&gt;
&lt;th&gt;Fine-tune only if&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inconsistent output format&lt;/td&gt;
&lt;td&gt;Prompting + structured outputs&lt;/td&gt;
&lt;td&gt;Format breaks &amp;gt; 5% even with strict prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge cutoff / private data&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;RAG retrieves the right chunks but the model still misuses them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain-specific style/voice&lt;/td&gt;
&lt;td&gt;System prompt + few-shot&lt;/td&gt;
&lt;td&gt;You need it baked in across thousands of calls (latency/cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized reasoning (math, code, legal)&lt;/td&gt;
&lt;td&gt;Better base model + CoT&lt;/td&gt;
&lt;td&gt;You have a clean preference dataset and need stable behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool use / agents&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; + good prompts&lt;/td&gt;
&lt;td&gt;Tool-call accuracy is below your SLA after prompt iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; if you can't articulate &lt;em&gt;what your fine-tune teaches that a 200-line system prompt can't&lt;/em&gt;, you're not ready to fine-tune.&lt;/p&gt;

&lt;p&gt;If you're earlier in the journey, the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; and &lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Advanced LLM Integration&lt;/a&gt; cover the cheaper alternatives in depth.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The 2026 technique landscape
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Full fine-tuning
&lt;/h3&gt;

&lt;p&gt;Updates every parameter. Maximum capacity, maximum cost, maximum risk of catastrophic forgetting. Justified for: foundational training, large domain shifts, or when you own the inference path and the dataset is huge (&amp;gt;1M high-quality examples).&lt;/p&gt;

&lt;h3&gt;
  
  
  LoRA (Low-Rank Adaptation)
&lt;/h3&gt;

&lt;p&gt;The original &lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;LoRA paper (Hu et al., 2021)&lt;/a&gt; is still required reading. You freeze the base weights and train two small low-rank matrices &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; per attention layer. Typical adapter is 0.1–1% of the model's parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                       &lt;span class="c1"&gt;# rank
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# scaling
&lt;/span&gt;    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# trainable params: 8.4M || all params: 7.2B || trainable%: 0.12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  QLoRA
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2305.14314" rel="noopener noreferrer"&gt;QLoRA (Dettmers et al., 2023)&lt;/a&gt; loads the base model in 4-bit (NF4) and trains LoRA adapters on top. This is what lets you fine-tune a 70B model on a single 80GB GPU. Use &lt;code&gt;bitsandbytes&lt;/code&gt; + &lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;HuggingFace PEFT&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  DoRA, OLoRA, rsLoRA
&lt;/h3&gt;

&lt;p&gt;Newer variants that decouple magnitude/direction (DoRA), use orthogonal init (OLoRA), or rescale rank (rsLoRA). Marginal gains in most cases — start with vanilla LoRA, only switch if you've measured a problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Alignment: SFT is just step one
&lt;/h2&gt;

&lt;p&gt;Supervised Fine-Tuning (SFT) teaches the model &lt;em&gt;what good output looks like&lt;/em&gt;. It does &lt;strong&gt;not&lt;/strong&gt; teach preferences, refusals, or reasoning quality. That's what alignment is for.&lt;/p&gt;

&lt;h3&gt;
  
  
  DPO (Direct Preference Optimization)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2305.18290" rel="noopener noreferrer"&gt;DPO (Rafailov et al., 2023)&lt;/a&gt; replaces the RLHF pipeline (reward model + PPO) with a single classification-style loss on preference pairs. Simpler, more stable, and the de facto default in 2026.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DPOTrainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DPOConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DPOConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                   &lt;span class="c1"&gt;# KL regularization
&lt;/span&gt;    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5e-7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DPOTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sft_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# PEFT auto-handles reference
&lt;/span&gt;    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;preference_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GRPO and RLVR
&lt;/h3&gt;

&lt;p&gt;GRPO (Group Relative Policy Optimization, popularized by DeepSeek-R1) and RLVR (RL with Verifiable Rewards) are the techniques behind the reasoning-model wave. If you're training for math, code, or anything with a programmatic verifier — these matter.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://huggingface.co/docs/trl" rel="noopener noreferrer"&gt;HuggingFace TRL library&lt;/a&gt; now ships first-class support for SFT, DPO, GRPO, and KTO.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The data pipeline is the moat
&lt;/h2&gt;

&lt;p&gt;A bad dataset will defeat a perfect training loop every time. Things that actually move metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Diversity over volume.&lt;/strong&gt; 5K diverse examples beats 50K near-duplicates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard negatives.&lt;/strong&gt; For preference data, pairs where chosen and rejected are &lt;em&gt;almost equally good&lt;/em&gt; teach more than obvious wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decontamination.&lt;/strong&gt; Strip eval-set leakage from training data. &lt;em&gt;Always.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format consistency.&lt;/strong&gt; Tokenize early to catch chat-template mismatches before you waste 10 GPU-hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PII and licensing.&lt;/strong&gt; This is where the EU AI Act lives. Document provenance.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. The 2026 tooling stack
&lt;/h2&gt;

&lt;p&gt;Here's what a production-grade fine-tuning project looks like today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Training framework&lt;/td&gt;
&lt;td&gt;&lt;a href="https://huggingface.co/docs/trl" rel="noopener noreferrer"&gt;HuggingFace TRL&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adapters&lt;/td&gt;
&lt;td&gt;&lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;HuggingFace PEFT&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bitsandbytes&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distributed&lt;/td&gt;
&lt;td&gt;Accelerate / DeepSpeed ZeRO-3 / FSDP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiment tracking&lt;/td&gt;
&lt;td&gt;Weights &amp;amp; Biases or MLflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serving&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/vllm-project/vllm" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval harness&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lm-evaluation-harness&lt;/code&gt; + custom domain evals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Closed-source baseline&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/fine-tuning" rel="noopener noreferrer"&gt;OpenAI fine-tuning&lt;/a&gt; for comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Wiring all of this into a real CI/CD lifecycle is what separates a notebook experiment from a deployable system. That's the focus of &lt;a href="https://cursuri-ai.ro/courses/mlops-prototip-productie" rel="noopener noreferrer"&gt;MLOps: Prototype to Production&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Evaluation: where most projects quietly fail
&lt;/h2&gt;

&lt;p&gt;Loss curves go down. The model "feels better." You ship. Production complaints spike. Sound familiar?&lt;/p&gt;

&lt;p&gt;Build a &lt;strong&gt;holistic eval suite&lt;/strong&gt; before you start training:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capability evals&lt;/strong&gt; — domain-specific tasks scored by rubric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regression evals&lt;/strong&gt; — verify the model didn't lose abilities (catastrophic forgetting is real).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety evals&lt;/strong&gt; — refusals, jailbreak resistance, policy adherence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-as-judge&lt;/strong&gt; — useful, but bias-corrected with human spot-checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost &amp;amp; latency&lt;/strong&gt; — TTFT, throughput, p95 — these &lt;em&gt;are&lt;/em&gt; product metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your eval suite isn't version-controlled and reproducible, you don't have an eval suite. You have vibes.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Serving: the part nobody talks about until it breaks
&lt;/h2&gt;

&lt;p&gt;LoRA adapters can be &lt;strong&gt;hot-swapped&lt;/strong&gt; at inference time. vLLM, SGLang, and TensorRT-LLM all support multi-LoRA serving — meaning you can host one base model and dozens of fine-tuned adapters with near-zero overhead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# vLLM with LoRA adapters&lt;/span&gt;
vllm serve meta-llama/Llama-3.1-8B-Instruct &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-lora&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lora-modules&lt;/span&gt; legal-adapter&lt;span class="o"&gt;=&lt;/span&gt;./adapters/legal sales-adapter&lt;span class="o"&gt;=&lt;/span&gt;./adapters/sales &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-loras&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the architectural unlock that makes fine-tuning economically viable for SaaS multi-tenancy.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. EU AI Act: not optional in 2026
&lt;/h2&gt;

&lt;p&gt;If you're shipping in the EU, fine-tuning a foundation model can put you in the &lt;em&gt;deployer&lt;/em&gt; or &lt;em&gt;provider&lt;/em&gt; category under the &lt;a href="https://artificialintelligenceact.eu/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt;. Practical consequences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model card&lt;/strong&gt; documenting training data, intended use, limitations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk assessment&lt;/strong&gt; if the use case touches Annex III (HR, education, critical infrastructure, law enforcement, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging&lt;/strong&gt; of significant model updates and eval results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency obligations&lt;/strong&gt; to end users for AI-generated content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't lawyer paranoia — auditors are already asking. Bake it into your pipeline from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. The mistakes I see most often
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning before exhausting prompting and RAG.&lt;/strong&gt; Cheaper, faster, easier to roll back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using &lt;code&gt;r=64&lt;/code&gt; because "bigger is better".&lt;/strong&gt; Most tasks saturate at &lt;code&gt;r=8&lt;/code&gt; to &lt;code&gt;r=16&lt;/code&gt;. Measure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mismatched chat template&lt;/strong&gt; between training and inference. Silent quality killer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training on the eval set.&lt;/strong&gt; Decontaminate. Then decontaminate again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping the SFT-only baseline.&lt;/strong&gt; You can't claim DPO helped if you didn't measure SFT-only first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring catastrophic forgetting.&lt;/strong&gt; Always run a regression eval against the base model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting the FinOps math.&lt;/strong&gt; A $400 fine-tune that adds $0.002/request to inference is not a win at 1M requests/day.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;p&gt;If you want a structured path that goes from prompt engineering to deploying fine-tuned models in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Foundation:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/introducere-ai-engineering" rel="noopener noreferrer"&gt;Introduction to AI Engineering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before fine-tuning:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; → &lt;a href="https://cursuri-ai.ro/courses/rag-retrieval-augmented-generation" rel="noopener noreferrer"&gt;RAG: Retrieval-Augmented Generation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The full deep dive:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/fine-tuning-modele-ai" rel="noopener noreferrer"&gt;Fine-Tuning and Model Adaptation — Enterprise Edition&lt;/a&gt; (LoRA/QLoRA/DoRA, DPO/GRPO/RLVR, vLLM serving, EU AI Act, end-to-end project)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productionization:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/mlops-prototip-productie" rel="noopener noreferrer"&gt;MLOps: Prototype to Production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration layer:&lt;/strong&gt; &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP — Model Context Protocol&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browse the full IT engineering track at &lt;a href="https://cursuri-ai.ro/cursuri/it" rel="noopener noreferrer"&gt;cursuri-ai.ro/cursuri/it&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Fine-tuning in 2026 is no longer about &lt;em&gt;can the model learn the task&lt;/em&gt;. It's about &lt;strong&gt;whether your dataset, eval suite, serving stack, and governance process are good enough to deserve a custom model&lt;/strong&gt;. Get those right, and a single adapter can be the difference between a feature that costs you money and a feature that defines your product.&lt;/p&gt;

&lt;p&gt;If this resonated, I'd love to hear what fine-tuning problem you're actually stuck on — drop it in the comments. 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — the AI engineering education platform for Romanian and EU professionals.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Claude Opus 4.7 vs GPT-5.5: A Developer's Pragmatic Comparison Guide (2026)</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 28 Apr 2026 10:03:06 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/claude-opus-47-vs-gpt-55-a-developers-pragmatic-comparison-guide-2026-11jb</link>
      <guid>https://dev.to/cursuri-ai/claude-opus-47-vs-gpt-55-a-developers-pragmatic-comparison-guide-2026-11jb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — In 2026, choosing an LLM is no longer about picking "the best model." It's about understanding which model solves &lt;em&gt;your specific problem&lt;/em&gt; at the lowest total cost and risk. Claude Opus 4.7 brings a 1M token context window and exceptional reasoning. GPT-5.5 brings ecosystem maturity and multimodal strength. The right answer for production is almost always &lt;strong&gt;multi-model orchestration&lt;/strong&gt;, not allegiance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're a backend engineer, ML engineer, or solutions architect choosing a foundation model in 2026, this guide is for you. No marketing fluff. Just patterns I've validated on real projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Quick Note on Honesty
&lt;/h2&gt;

&lt;p&gt;Before we go further: &lt;strong&gt;I'm not going to fabricate specs.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; is verified to ship with a &lt;strong&gt;1M token context window&lt;/strong&gt; (Anthropic's official spec).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; remains in active production as the cost-efficient predecessor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt; is OpenAI's current flagship at the time of writing. For exact context window, pricing, and benchmark numbers, &lt;strong&gt;always check OpenAI's official documentation&lt;/strong&gt; — those numbers shift between point releases, and any blog quoting them risks being stale within a month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article focuses on &lt;strong&gt;architectural and methodological differences&lt;/strong&gt; that age well, not spec-sheet trivia that doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Comparison Matters Differently in 2026
&lt;/h2&gt;

&lt;p&gt;Three years ago, picking a model meant running it through a weekend benchmark and shipping. Today, the calculus has changed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context windows have stopped being a bottleneck.&lt;/strong&gt; With Opus 4.7's 1M token window, the question is no longer "can I fit my codebase?" — it's "should I, given attention dynamics and cost?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Cost of Ownership has become non-trivial.&lt;/strong&gt; API price-per-token is maybe 30% of what you actually pay in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory pressure is real.&lt;/strong&gt; The EU AI Act and GDPR are no longer theoretical — they shape architecture decisions for any team with European users.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Engineers who still treat model selection as a 2-hour decision are leaving serious money and reliability on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architectural Differences That Actually Matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context Window
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Practical Implication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;1,000,000 tokens&lt;/td&gt;
&lt;td&gt;Full enterprise codebases, long-form legal docs, multi-document RAG without chunking compromises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;(See Anthropic docs)&lt;/td&gt;
&lt;td&gt;Cost-optimized workhorse for everyday agentic workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;(See OpenAI docs)&lt;/td&gt;
&lt;td&gt;Tight integration with Azure OpenAI, mature tooling ecosystem&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The 1M context window is not just bigger — it changes architectural patterns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you have a million tokens, you stop building chunked RAG pipelines for many use cases. You stop fighting context truncation. You can pass a full repo, a full deposition, a full quarterly filing — and ask the model to reason over it directly.&lt;/p&gt;

&lt;p&gt;But this comes with a real trade-off: &lt;strong&gt;attention quality degrades unevenly across very long contexts.&lt;/strong&gt; Just because you &lt;em&gt;can&lt;/em&gt; stuff 800K tokens in doesn't mean the model will reliably find the needle. Always run targeted &lt;strong&gt;needle-in-haystack&lt;/strong&gt; evals on &lt;em&gt;your&lt;/em&gt; data structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Style
&lt;/h3&gt;

&lt;p&gt;This is hard to quantify but easy to feel after enough projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; tends to reason more conservatively. It pushes back on ambiguity, asks clarifying questions, and produces structured outputs that hold up well under JSON schema validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt; tends to be more proactive and creative. It will often produce a complete answer where Claude would ask "did you mean X or Y?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither is universally better. Conservative reasoning saves you from hallucinated database queries in production. Proactive reasoning ships features faster in a hackathon.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Use &amp;amp; Agentic Workflows
&lt;/h3&gt;

&lt;p&gt;Both models support function calling and agentic loops. In my experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude's tool use feels more deterministic. JSON schemas hold. Parallel tool calls behave predictably.&lt;/li&gt;
&lt;li&gt;GPT's tool use has a more mature ecosystem (Assistants API, more SDK examples, broader community).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building a &lt;strong&gt;pure agent system&lt;/strong&gt;, both work. If you're integrating into an existing &lt;strong&gt;Azure / Microsoft stack&lt;/strong&gt;, GPT-5.5 has lower friction. If you're building a &lt;strong&gt;regulated workflow with strict guarantees&lt;/strong&gt;, Claude's structured output behavior wins on reliability.&lt;/p&gt;




&lt;h2&gt;
  
  
  When To Choose Each — A Decision Framework
&lt;/h2&gt;

&lt;p&gt;Stop asking "which is best?" Start asking these four questions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. What problem am I actually solving?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-form document reasoning, code analysis at scale, regulated decision support&lt;/strong&gt; → Claude Opus 4.7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal user-facing features, real-time voice, ecosystem-heavy integrations&lt;/strong&gt; → GPT-5.5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume cost-sensitive agentic workloads&lt;/strong&gt; → Claude Opus 4.6 (or smaller models)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. What's my failure cost?
&lt;/h3&gt;

&lt;p&gt;A chatbot that recommends the wrong product costs a sale. An assistant that misreads a contract clause costs a lawsuit. Match the model's reliability profile to your downside risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Who maintains this in 18 months?
&lt;/h3&gt;

&lt;p&gt;Models get deprecated. Pricing changes. APIs evolve. Pick the model whose &lt;strong&gt;migration path&lt;/strong&gt; you can stomach. If your answer is "we can't migrate" — you've built tech debt, not capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. What's my regulatory surface?
&lt;/h3&gt;

&lt;p&gt;For EU-resident users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act&lt;/strong&gt; classifies systems by risk tier — high-risk systems carry significant compliance overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR&lt;/strong&gt; still applies to any prompt containing personal data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor concentration risk&lt;/strong&gt; is now a documented audit concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Single-vendor architectures are increasingly hard to defend in compliance reviews.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build Your Own Evaluation Harness (Don't Trust Public Benchmarks)
&lt;/h2&gt;

&lt;p&gt;Public benchmarks measure general capability. Your production system needs &lt;em&gt;domain-specific&lt;/em&gt; capability. Here's a minimal evaluation pattern I use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;anthropic_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run a single task against a model and return structured output.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# openai
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;evaluate_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_eval_suite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compare both models on the same tasks.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;evaluate_on_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few principles for building your eval suite:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use real production data&lt;/strong&gt; (anonymized). Synthetic tasks lie.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include adversarial cases&lt;/strong&gt; — ambiguous inputs, near-duplicates, edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure cost-per-correct-answer&lt;/strong&gt;, not just accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run it weekly&lt;/strong&gt; — model behavior drifts between point releases.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Hidden Costs Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;API price-per-token is the smallest part of your real cost. Here's the full picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost Layer&lt;/th&gt;
&lt;th&gt;Typical Range&lt;/th&gt;
&lt;th&gt;What Drives It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct API tokens&lt;/td&gt;
&lt;td&gt;20-30% of total&lt;/td&gt;
&lt;td&gt;Pricing tier, prompt size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Re-prompting on errors&lt;/td&gt;
&lt;td&gt;10-20%&lt;/td&gt;
&lt;td&gt;Model reliability, validation strictness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-loop validation&lt;/td&gt;
&lt;td&gt;15-30%&lt;/td&gt;
&lt;td&gt;Use case sensitivity, regulatory requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching infrastructure&lt;/td&gt;
&lt;td&gt;5-10%&lt;/td&gt;
&lt;td&gt;Architecture, library choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vendor migration overhead&lt;/td&gt;
&lt;td&gt;10-25% (when triggered)&lt;/td&gt;
&lt;td&gt;Lock-in level, abstraction quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance audits&lt;/td&gt;
&lt;td&gt;5-15%&lt;/td&gt;
&lt;td&gt;Regulatory environment, data sensitivity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;A model that's "20% cheaper at the API" can be 2x more expensive in TCO&lt;/strong&gt; if it triggers more re-prompts or requires heavier human validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Model Orchestration: The Pattern That Wins
&lt;/h2&gt;

&lt;p&gt;In 2026, the production-grade answer is rarely "one model for everything." Common patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│  Router (lightweight model)                                 │
│  ├── Classifies request complexity &amp;amp; sensitivity            │
│  └── Routes to appropriate model                            │
└─────────────────────────────────────────────────────────────┘
            │
   ┌────────┼────────┐
   ▼        ▼        ▼
[Haiku]  [Opus 4.6]  [Opus 4.7]
 cheap    balanced    deep reasoning
 fast     production  complex docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern routinely cuts costs by &lt;strong&gt;40-60%&lt;/strong&gt; versus single-model architectures, with no quality loss when the router is well-calibrated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Going Deeper: Resources
&lt;/h2&gt;

&lt;p&gt;If you want to go beyond this article and build genuine expertise in model selection, evaluation, and multi-model architecture, I've put together a structured course covering exactly these topics:&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/comparatie-modele-ai" rel="noopener noreferrer"&gt;AI Model Comparison 2026 — Enterprise Edition&lt;/a&gt;&lt;/strong&gt; &lt;em&gt;(course is in Romanian)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full enterprise evaluation methodology — from benchmark to production&lt;/li&gt;
&lt;li&gt;How to interpret 2026 benchmarks correctly (signal vs. marketing noise)&lt;/li&gt;
&lt;li&gt;Structured selection frameworks based on cost / risk / use case&lt;/li&gt;
&lt;li&gt;Complete landscape: Anthropic, OpenAI, Google, Meta, Mistral&lt;/li&gt;
&lt;li&gt;Multi-model architectures and cost optimization strategies&lt;/li&gt;
&lt;li&gt;Applied case studies with European regulatory context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔗 Full platform: &lt;strong&gt;&lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — single subscription, full catalog of AI courses for IT and non-IT professionals.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;The real edge in 2026 isn't access to AI — it's &lt;strong&gt;methodological maturity in choosing, evaluating, and governing AI&lt;/strong&gt;. Model access has become a commodity. The competence to architect around models is the scarce resource.&lt;/p&gt;

&lt;p&gt;If you take one thing from this article, let it be this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Stop asking "which model is best?" Start asking "which model best fits this specific decision, and what's my exit if I'm wrong?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single shift in framing will save your team thousands of hours and tens of thousands of euros over the next twelve months.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a comment with your current model stack — I'm always curious how teams are actually orchestrating these in production.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>anthropic</category>
      <category>openai</category>
    </item>
    <item>
      <title>🤖 OpenAI Codex in 2026: The Agentic Coding Era Has Arrived ⚡</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Fri, 24 Apr 2026 21:27:26 +0000</pubDate>
      <link>https://dev.to/galian/openai-codex-in-2026-the-agentic-coding-era-has-arrived-439m</link>
      <guid>https://dev.to/galian/openai-codex-in-2026-the-agentic-coding-era-has-arrived-439m</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Codex is no longer "autocomplete on steroids". It's a teammate that reads your repo, runs your tests, opens pull requests — and occasionally outperforms your juniors. Here's what changed, and how to actually get good at using it.&lt;/em&gt; 🚀&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧭 Quick recap: what Codex is today
&lt;/h2&gt;

&lt;p&gt;When OpenAI brought back the &lt;strong&gt;Codex&lt;/strong&gt; brand in 2025, most developers assumed it was just a marketing refresh. It wasn't. The new Codex is an &lt;strong&gt;agentic coding system&lt;/strong&gt; powered by a dedicated reasoning model (the &lt;code&gt;codex-*&lt;/code&gt; family, purpose-built for software tasks) that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 Read and understand an entire repository&lt;/li&gt;
&lt;li&gt;🖥️ Execute commands in an isolated sandbox&lt;/li&gt;
&lt;li&gt;🧪 Run your tests, read the failures, and iterate&lt;/li&gt;
&lt;li&gt;🔀 Produce real, reviewable git diffs and pull requests&lt;/li&gt;
&lt;li&gt;🔌 Talk to external tools through &lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; servers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By April 2026, Codex ships in three flavors that matter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🧰 &lt;strong&gt;Codex CLI&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Open-source terminal agent&lt;/td&gt;
&lt;td&gt;Power users, scripts, CI pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;☁️ &lt;strong&gt;Codex Cloud&lt;/strong&gt; (inside ChatGPT)&lt;/td&gt;
&lt;td&gt;Parallel cloud agents working on your GitHub repos&lt;/td&gt;
&lt;td&gt;Long tasks, multi-step refactors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🧩 &lt;strong&gt;Codex IDE extensions&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;VS Code / JetBrains integration&lt;/td&gt;
&lt;td&gt;Day-to-day development&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three share the &lt;strong&gt;same underlying agent loop&lt;/strong&gt;: plan → act → observe → verify. That loop is the thing you really need to understand — because if you don't, Codex will still look "magical", but you won't know why it fails, why it loops, or why it deletes your &lt;code&gt;.env&lt;/code&gt;. 😅&lt;/p&gt;




&lt;h2&gt;
  
  
  🆕 What's new in the latest version
&lt;/h2&gt;

&lt;p&gt;The current generation of Codex (powered by the newest &lt;code&gt;codex&lt;/code&gt; reasoning model released earlier this year) introduced a handful of upgrades that genuinely change the workflow:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 🧠 Deeper, longer reasoning
&lt;/h3&gt;

&lt;p&gt;Codex can now think for minutes — sometimes tens of minutes — on a single task, backtracking when a test fails and revising its plan. That's great when you're refactoring auth middleware. It's a disaster when you're asking it to "fix a typo" and it decides to rewrite the file. &lt;strong&gt;Learning to scope tasks is now a core skill.&lt;/strong&gt; ✍️&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 🔐 Sandboxed execution by default
&lt;/h3&gt;

&lt;p&gt;Codex runs in &lt;strong&gt;ephemeral containers with network restrictions&lt;/strong&gt;. You can still punch holes in the sandbox (for example, to let it install dependencies), but you have to be explicit. This is a huge safety win compared to the "YOLO mode" of 2024 tooling. 🛡️&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 🔌 First-class MCP support
&lt;/h3&gt;

&lt;p&gt;This is the biggest shift. Codex doesn't just call tools baked in by OpenAI — it speaks &lt;strong&gt;MCP&lt;/strong&gt;, the open standard for wiring LLMs to external systems (databases, APIs, internal services, observability tools). If your company has a Jira MCP server, a Postgres MCP server, and an AWS MCP server, Codex can orchestrate all three in a single task. 🪄&lt;/p&gt;

&lt;h3&gt;
  
  
  4. 🤝 Multi-agent coordination
&lt;/h3&gt;

&lt;p&gt;Codex Cloud can now spawn &lt;strong&gt;multiple parallel agents&lt;/strong&gt; on independent subtasks of the same issue, then merge their work. Think: one agent writes the migration, another writes the API endpoint, a third updates the OpenAPI spec — and they reconcile before opening a single PR. 🧵&lt;/p&gt;

&lt;h3&gt;
  
  
  5. 📝 Repository-aware memory
&lt;/h3&gt;

&lt;p&gt;Through the &lt;code&gt;AGENTS.md&lt;/code&gt; convention (now supported by Codex, Cursor, Claude Code, and most major coding agents), Codex reads per-repo and per-folder instructions that shape its behavior. If you've ever written a &lt;code&gt;CLAUDE.md&lt;/code&gt;, you already know 80% of this. 📖&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 Why every developer should care (yes, even you)
&lt;/h2&gt;

&lt;p&gt;There's a narrative that goes: &lt;em&gt;"Agentic coding tools will replace developers."&lt;/em&gt; That's not what's happening. What's actually happening is more interesting — and more demanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🎯 The floor of productivity has risen. A mediocre developer with a well-configured Codex setup ships faster than a senior without one.&lt;/li&gt;
&lt;li&gt;📉 The ceiling has &lt;strong&gt;also&lt;/strong&gt; risen. Seniors who master agents are now operating at output levels that looked impossible two years ago.&lt;/li&gt;
&lt;li&gt;🧪 The bottleneck has shifted from &lt;em&gt;writing code&lt;/em&gt; to &lt;em&gt;specifying intent, reviewing diffs, and designing systems&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: &lt;strong&gt;the skill stack changed&lt;/strong&gt;. If your skillset is still "I know React and I can Google" — you're in trouble. If your skillset is "I know how to orchestrate agents, design MCP integrations, and review AI-generated diffs critically" — you're the most valuable person in the room. 🏆&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Real-world workflows that actually work in 2026
&lt;/h2&gt;

&lt;p&gt;Here are four patterns that Romanian and European engineering teams are using in production right now:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧪 1. The "CI-first" loop
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Write failing tests → hand to Codex → let it iterate until green → human reviews the diff.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Perfect for well-scoped bug fixes and feature work where the acceptance criteria can be expressed as tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔁 2. The "refactor fleet"
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Open one Codex Cloud task per file or per module → merge in batches.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Works beautifully for things like "migrate from Moment.js to date-fns across 200 files" or "convert all class components to hooks". Parallelism turns a week of work into an afternoon. ⚡&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 3. The "MCP orchestra"
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Codex queries Sentry, reads the offending stack trace, pulls the relevant file, writes a fix, runs the tests, opens a PR with the Sentry link embedded.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the workflow that makes oncall bearable. 🎯&lt;/p&gt;

&lt;h3&gt;
  
  
  🗺️ 4. The "architect + executor"
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;A human writes a detailed plan (or has a reasoning model write one). Codex executes it step by step.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The split of &lt;strong&gt;planning vs. execution&lt;/strong&gt; is the single biggest productivity multiplier we've observed. 🧠&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ What still goes wrong (and why "just use Codex" isn't a strategy)
&lt;/h2&gt;

&lt;p&gt;Let's be honest: Codex fails a lot. The failures cluster into predictable categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔍 &lt;strong&gt;Context under-specification&lt;/strong&gt; — it can't read your mind; vague prompts = vague PRs&lt;/li&gt;
&lt;li&gt;🧱 &lt;strong&gt;Missing repo conventions&lt;/strong&gt; — no &lt;code&gt;AGENTS.md&lt;/code&gt; means it invents its own style&lt;/li&gt;
&lt;li&gt;🎭 &lt;strong&gt;Hallucinated APIs&lt;/strong&gt; — especially for internal libraries without good docstrings&lt;/li&gt;
&lt;li&gt;🌀 &lt;strong&gt;Over-eager refactoring&lt;/strong&gt; — it "cleans up" code that was deliberately non-obvious&lt;/li&gt;
&lt;li&gt;🔐 &lt;strong&gt;Security blind spots&lt;/strong&gt; — it will happily add &lt;code&gt;dangerouslySetInnerHTML&lt;/code&gt; if you ask nicely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every single one of these failures is a &lt;strong&gt;skill issue&lt;/strong&gt;, not a tool issue. The good news: skills are learnable. The bad news: you have to actually learn them. 📚&lt;/p&gt;




&lt;h2&gt;
  
  
  🎓 How to get seriously good at this — the structured path
&lt;/h2&gt;

&lt;p&gt;You don't become great at agentic coding by watching YouTube clips. You become great by building a layered skill stack: prompt engineering → LLM integration → agents → MCP → full AI-native workflows.&lt;/p&gt;

&lt;p&gt;Here's the path we recommend on &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — Romania's premium AI learning platform — mapped directly to Codex mastery:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧱 Foundation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🎯 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/introducere-ai-engineering" rel="noopener noreferrer"&gt;Introducere în AI Engineering&lt;/a&gt;&lt;/strong&gt; — the mental model of how modern AI systems actually work. Without this, everything else is cargo-culting.&lt;/li&gt;
&lt;li&gt;✍️ &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt;&lt;/strong&gt; — the single highest-ROI skill in 2026. Every Codex task starts with a prompt. Bad prompt in, bad PR out.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🚀 Intermediate
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🧠 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/advanced-llm-integration" rel="noopener noreferrer"&gt;Integrare Avansată LLM în Aplicații de Producție&lt;/a&gt;&lt;/strong&gt; — if you want to build &lt;em&gt;your own&lt;/em&gt; Codex-style tools, or even just understand why Codex behaves the way it does, this is the course.&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/ai-agents-automatizare" rel="noopener noreferrer"&gt;AI Agents: Arhitectura și Automatizarea Sistemelor Autonome&lt;/a&gt;&lt;/strong&gt; — Codex is an agent. Learn the patterns (ReAct, reflection, planning, memory) that make agents actually work in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🏆 Advanced
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;💻 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/cursor-pro" rel="noopener noreferrer"&gt;Cursor ca Pro: IDE AI-Native, Composer și Multi-Agent 2026&lt;/a&gt;&lt;/strong&gt; — everything you learn here transfers directly to Codex. Agent-native IDE workflows are the 2026 equivalent of learning vim — once you're in, you can't go back.&lt;/li&gt;
&lt;li&gt;🔌 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;MCP (Model Context Protocol) — Construirea de Servere și Integrări&lt;/a&gt;&lt;/strong&gt; — this is where senior engineers quietly separate themselves from the crowd. MCP is the &lt;em&gt;plumbing&lt;/em&gt; of the agentic era. Codex is the headline product; MCP is the reason it works with your stack.&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses/workflow-automation-zapier-n8n" rel="noopener noreferrer"&gt;Automatizare Workflow Enterprise 2026: Zapier, n8n, Make, Pipedream și Agentic Automation cu MCP&lt;/a&gt;&lt;/strong&gt; — how to wire Codex (and other agents) into real business workflows beyond the terminal.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧭 A realistic 30-day plan with Codex
&lt;/h2&gt;

&lt;p&gt;If you're starting from "I've used ChatGPT a few times" and want to be dangerous in a month, here's a plan:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Week&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Hours/week&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1️⃣&lt;/td&gt;
&lt;td&gt;Prompt engineering fundamentals + first Codex tasks on a side project&lt;/td&gt;
&lt;td&gt;6–8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2️⃣&lt;/td&gt;
&lt;td&gt;AI Engineering foundations + understanding the agent loop&lt;/td&gt;
&lt;td&gt;8–10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3️⃣&lt;/td&gt;
&lt;td&gt;LLM integration + writing your first MCP server&lt;/td&gt;
&lt;td&gt;8–10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4️⃣&lt;/td&gt;
&lt;td&gt;Agentic workflows end-to-end on a real repo + multi-agent experiments&lt;/td&gt;
&lt;td&gt;10–12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By day 30, you'll be more productive than 80% of developers who've been "using AI" for two years without a structured path. 📈&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 The honest takeaway
&lt;/h2&gt;

&lt;p&gt;Codex in 2026 is the best coding agent most teams have ever had access to. But "best tool" doesn't equal "best results" — the gap between developers who &lt;em&gt;use&lt;/em&gt; Codex and developers who &lt;em&gt;master&lt;/em&gt; agentic coding is widening every month.&lt;/p&gt;

&lt;p&gt;If you want to be on the right side of that gap:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;🧠 Build the mental model (AI Engineering)&lt;/li&gt;
&lt;li&gt;✍️ Nail the inputs (Prompt Engineering)&lt;/li&gt;
&lt;li&gt;🤖 Understand the loop (AI Agents)&lt;/li&gt;
&lt;li&gt;🔌 Master the plumbing (MCP)&lt;/li&gt;
&lt;li&gt;💻 Live in agent-native tools (Cursor, Codex CLI, Codex Cloud)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All five layers are taught, in Romanian, with practical exercises and interactive evaluation, on &lt;strong&gt;&lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;&lt;/strong&gt; — with an AI tutor integrated into every lesson so you get feedback as you learn. 🎓&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The developers who treat Codex as "just a fancier autocomplete" will plateau this year. The ones who treat it as a collaborator — and invest in the skills to direct it — are about to have the most productive year of their careers.&lt;/em&gt; 🚀✨&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Ready to level up?&lt;/strong&gt; Start with the foundation courses on &lt;a href="https://cursuri-ai.ro/courses" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; and build the stack that actually makes agentic tools pay off. Your 2026 self will thank you. 💜&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>The Anatomy of a Modern AI Marketing Curriculum in 2026 — What It Covers and Why It Matters</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:13:27 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/the-anatomy-of-a-modern-ai-marketing-curriculum-in-2026-what-it-covers-and-why-it-matters-mh6</link>
      <guid>https://dev.to/cursuri-ai/the-anatomy-of-a-modern-ai-marketing-curriculum-in-2026-what-it-covers-and-why-it-matters-mh6</guid>
      <description>&lt;h1&gt;
  
  
  The Anatomy of a Modern AI Marketing Curriculum in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;"Digital marketing is no longer a copywriting discipline with an analytics layer on top. In 2026, it's a distributed system of generative models, data pipelines, and cross-channel automations — strategically orchestrated by a human who understands both AI and the market."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The global AI-in-marketing market hit &lt;strong&gt;$45.8 billion&lt;/strong&gt; in 2026, up from $21.5 billion in 2024.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;78% of B2B and B2C companies&lt;/strong&gt; now use at least one AI tool in their marketing stack.&lt;/li&gt;
&lt;li&gt;A modern AI Marketing curriculum covers &lt;strong&gt;9 core areas&lt;/strong&gt;: fundamentals, content and SEO, social media, email and automation, paid ads, analytics, video/audio/visual, ethics and legislation, and applied projects.&lt;/li&gt;
&lt;li&gt;The dominant tech stack: &lt;strong&gt;GPT-5.4, Claude Opus 4.6, Performance Max, Meta Advantage+, Jasper, Canva AI&lt;/strong&gt;, integrated with modern CRMs and data warehouses.&lt;/li&gt;
&lt;li&gt;This article maps, section by section, what such a curriculum should look like if you want to move from "I've heard of AI" to "I run an AI-first department."&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this article lives on dev.to
&lt;/h2&gt;

&lt;p&gt;Plenty of developers build MarTech tools, work at startups where they wear multiple hats, or run side projects that require them to understand funnels, SEO, and conversions. Over the last 18 months, AI has fundamentally rewritten how marketing gets done — and the line between "developer" and "growth engineer" has visibly thinned.&lt;/p&gt;

&lt;p&gt;This article is an X-ray of the skills a modern AI Marketing specialist needs in 2026. It's useful if you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a product and want to understand how it gets promoted in the AI era&lt;/li&gt;
&lt;li&gt;Freelance or consult and integrate AI into client deliverables&lt;/li&gt;
&lt;li&gt;Work at the MarTech intersection — data engineering, analytics, experimentation&lt;/li&gt;
&lt;li&gt;Want a solid baseline for evaluating or hiring specialists in this field&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — a Romanian platform focused exclusively on professional AI education — and this article reflects the curriculum we've designed for the marketing track.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 2026 numbers you need to know
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;2024&lt;/th&gt;
&lt;th&gt;2026&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Global AI Marketing market&lt;/td&gt;
&lt;td&gt;$21.5B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$45.8B&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Companies using AI in marketing&lt;/td&gt;
&lt;td&gt;37%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROI — AI-augmented vs. traditional campaigns&lt;/td&gt;
&lt;td&gt;+10-15%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+35-50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per lead reduction&lt;/td&gt;
&lt;td&gt;-8%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-28%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content production time reduction&lt;/td&gt;
&lt;td&gt;-25%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-65%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Romania:&lt;/strong&gt; 52% of digital agencies and 34% of companies with marketing budgets above €10,000/month actively use AI in their workflows (iSense Solutions for IAB Romania, 2026).&lt;/p&gt;

&lt;p&gt;The takeaway is unambiguous: a marketer who doesn't operate with AI in 2026 is no longer competitive. And a developer building products can no longer afford to treat marketing as a black box.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 9 areas of a modern curriculum
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AI fundamentals for digital marketing
&lt;/h3&gt;

&lt;p&gt;Without a proper grasp of generative models, everything else stays shallow. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operational differences between &lt;strong&gt;GPT-5.4&lt;/strong&gt; (1M token context, excellent for content at scale) and &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; (complex analytical reasoning, strategy)&lt;/li&gt;
&lt;li&gt;The architecture of a modern &lt;strong&gt;MarTech stack&lt;/strong&gt;: CRM → CDP → AI orchestrator → channels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation levels&lt;/strong&gt; (L1-L5) — from manual prompting to fully autonomous systems with human-in-the-loop&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Content and SEO with AI
&lt;/h3&gt;

&lt;p&gt;Content generation was the first battlefield AI won. In 2026, it's no longer "I wrote a blog post with ChatGPT" — it's full pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scalable content generation aligned with brand voice&lt;/li&gt;
&lt;li&gt;Optimization for &lt;strong&gt;Google AI Overviews&lt;/strong&gt; — the new ranking model partially replacing classic SERPs&lt;/li&gt;
&lt;li&gt;Differentiated copywriting for &lt;strong&gt;ads, email, and landing pages&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Editorial calendars orchestrated by AI based on trending signals and seasonality&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Social media and community
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cross-channel automation (LinkedIn, Instagram, TikTok, X) while respecting each platform's tone&lt;/li&gt;
&lt;li&gt;Visual and video content generation straight from prompts (&lt;strong&gt;Sora, Runway, Midjourney&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;Intelligent &lt;strong&gt;social listening&lt;/strong&gt; — automatic sentiment detection and reputation-crisis alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Email marketing and automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Campaigns with &lt;strong&gt;1:1 personalization&lt;/strong&gt; driven by hundreds of behavioral signals&lt;/li&gt;
&lt;li&gt;Adaptive funnels that self-optimize based on segment reactions&lt;/li&gt;
&lt;li&gt;Predictive segmentation — you no longer slice the list demographically; you slice it by &lt;strong&gt;intent score&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Paid ads and performance marketing
&lt;/h3&gt;

&lt;p&gt;This is where the gap between "marketing with AI" and "AI-first marketing" is most visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Performance Max&lt;/strong&gt; — campaigns that simultaneously optimize bid, creative, and audience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta Advantage+&lt;/strong&gt; — the Meta equivalent, with product catalog and automated targeting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROAS&lt;/strong&gt; optimization and budgeting with predictive models (not static rules)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Analytics and data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Predictive customer analytics&lt;/strong&gt; — churn prediction, LTV forecasting, next-best-action&lt;/li&gt;
&lt;li&gt;Personalization at scale using &lt;strong&gt;vector embeddings&lt;/strong&gt; and behavioral similarity&lt;/li&gt;
&lt;li&gt;Decision dashboards that propose actions, not just display metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Video, audio, and visual marketing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Image generation and visual design (Midjourney, DALL-E, Adobe Firefly)&lt;/li&gt;
&lt;li&gt;End-to-end video marketing: &lt;strong&gt;script → voiceover → editing → subtitles → distribution&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Podcast and voice marketing&lt;/strong&gt; — a fast-growing niche in 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Ethics, legislation, and AI-first strategy
&lt;/h3&gt;

&lt;p&gt;The most underrated area — and the riskiest if ignored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand safety&lt;/strong&gt; in the age of generated content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU AI Act&lt;/strong&gt; — practical requirements for marketing applications (risk classification, transparency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR&lt;/strong&gt; applied specifically to personalization and algorithmic profiling&lt;/li&gt;
&lt;li&gt;AI-First transformation roadmap for an organization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Case studies and applied projects
&lt;/h3&gt;

&lt;p&gt;Any serious curriculum closes with real application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;End-to-end AI digital transformation of a &lt;strong&gt;Romanian e-commerce business&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;AI strategy for a local &lt;strong&gt;marketing agency&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final capstone project&lt;/strong&gt; — building your own AI-first marketing strategy, ready to implement&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The dominant 2026 tech stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
txt
── Foundation models ──
• GPT-5.4 (OpenAI)                 — 1M token context, content at scale
• Claude Opus 4.6 (Anthropic)      — analytical reasoning, strategy, long docs
• Claude Sonnet 4.6                — operational workloads, cost-efficient

── Advertising platforms ──
• Google Performance Max + Gemini  — fully orchestrated campaigns
• Meta Advantage+                  — equivalent on Meta Ads

── Specialized tools ──
• Jasper, Copy.ai                  — ad-focused copywriting
• Canva AI, Adobe Firefly          — visual design
• Midjourney, DALL-E 3+            — premium imagery
• Runway, Sora                     — video generation
• ElevenLabs                       — voice generation

── Analytics &amp;amp; data ──
• Segment / RudderStack            — CDP
• Snowflake / BigQuery             — data warehouse
• Hex, Mode                        — AI-assisted analytics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>marketing</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>MCP (Model Context Protocol): The Complete Guide to Building AI-Powered Integrations in 2026</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Sun, 19 Apr 2026 20:18:08 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/mcp-model-context-protocol-the-complete-guide-to-building-ai-powered-integrations-in-2026-5bnd</link>
      <guid>https://dev.to/cursuri-ai/mcp-model-context-protocol-the-complete-guide-to-building-ai-powered-integrations-in-2026-5bnd</guid>
      <description>&lt;p&gt;Every developer building AI apps hits the same problem: connecting an LLM to real tools means writing custom glue code for every single integration. Different schemas, different auth, different error handling — repeated for every model and every data source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; fixes this. It's an open standard — think USB-C for AI connectivity — that lets any AI client talk to any tool server through one universal interface. And it's not theoretical: OpenAI, Google, Microsoft, Salesforce, and thousands of developers already use it in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Actually Does
&lt;/h2&gt;

&lt;p&gt;Before MCP, connecting Claude or GPT to your database meant writing a custom function, defining a JSON schema, handling auth, and repeating all of that for every tool. Scale that to 30 integrations across multiple environments — it breaks fast.&lt;/p&gt;

&lt;p&gt;MCP replaces all of that with a single protocol based on JSON-RPC 2.0. A server declares what it can do; a client discovers it automatically. No hardcoding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Your App (Host)  →  MCP Client  →  MCP Server (tools, data, prompts)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A server can expose three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — functions the AI can call (&lt;code&gt;query_database&lt;/code&gt;, &lt;code&gt;send_email&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt; — structured data it can read (schemas, file contents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt; — reusable templates (code review checklist, SQL generator)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Working Example in Python
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Database Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query users filtered by status.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;get_db_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, email FROM users WHERE status = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema://users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_users_schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns the users table schema.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR, email VARCHAR, status VARCHAR);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stdio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;15 lines. Your AI agent can now query your database and understand its schema through any MCP-compatible client.&lt;/p&gt;

&lt;h2&gt;
  
  
  TypeScript Works Too
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;McpServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/mcp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StdioServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/stdio.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GitHub Assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;list_issues&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;List open issues for a repository&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`https://api.github.com/repos/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/issues?state=open&amp;amp;per_page=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioServerTransport&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Two Transports, Different Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;stdio&lt;/strong&gt; — local tools. Server runs as a child process, zero network overhead. Great for file access, local DBs, CLI tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streamable HTTP&lt;/strong&gt; — remote/shared servers. Runs as a web service, supports OAuth 2.0. Ideal for SaaS integrations and team-shared tools.&lt;/p&gt;

&lt;p&gt;Most production setups use both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Won
&lt;/h2&gt;

&lt;p&gt;The adoption timeline tells the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nov 2024&lt;/strong&gt; — Anthropic launches MCP as open-source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 2025&lt;/strong&gt; — OpenAI adopts MCP officially&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;May 2025&lt;/strong&gt; — Microsoft joins the MCP steering committee&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jun 2025&lt;/strong&gt; — Salesforce builds Agentforce 3 on MCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dec 2025&lt;/strong&gt; — MCP moves to the Linux Foundation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today: 10,000+ servers in production, 70%+ of major SaaS brands ship MCP servers, every major AI platform supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Done Right
&lt;/h2&gt;

&lt;p&gt;MCP's security model is one of its strongest features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Granular permissions&lt;/strong&gt; — each server declares capabilities, the host controls access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User consent&lt;/strong&gt; — critical actions need explicit approval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process isolation&lt;/strong&gt; — servers run in separate processes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full audit trail&lt;/strong&gt; — every invocation is logged&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From Demo to Production
&lt;/h2&gt;

&lt;p&gt;A tutorial MCP server and a production one are very different. Production needs OAuth 2.0, rate limiting, Docker/Kubernetes deployment, CI/CD pipelines, GDPR compliance, and threat modeling.&lt;/p&gt;

&lt;p&gt;If you want the full path — from fundamentals to deploying enterprise-grade MCP servers with Python and TypeScript — check out this &lt;a href="https://cursuri-ai.ro/courses/mcp-model-context-protocol" rel="noopener noreferrer"&gt;complete MCP course&lt;/a&gt;. 24 hours of hands-on content with real projects: PostgreSQL, external APIs, multi-server gateways, and production security patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Here
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Install Claude Desktop or Cursor as your MCP host&lt;/li&gt;
&lt;li&gt;Try a pre-built server (filesystem, PostgreSQL)&lt;/li&gt;
&lt;li&gt;Build a custom server with FastMCP or the TypeScript SDK&lt;/li&gt;
&lt;li&gt;Add HTTP transport and OAuth for remote access&lt;/li&gt;
&lt;li&gt;Deploy with Docker&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MCP is infrastructure, not a trend. The developers who learn it now will build the next generation of AI applications.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want more production-focused AI engineering content? Visit &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — courses built for developers who ship.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>🤖 How a Virtual AI Professor Is Changing the Way Romania Learns</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Mon, 13 Apr 2026 22:02:49 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/how-a-virtual-ai-professor-is-changing-the-way-romania-learns-2957</link>
      <guid>https://dev.to/cursuri-ai/how-a-virtual-ai-professor-is-changing-the-way-romania-learns-2957</guid>
      <description>&lt;h2&gt;
  
  
  🏫 The Classroom Has No Walls Anymore
&lt;/h2&gt;

&lt;p&gt;Romania isn't usually the first country that comes to mind when you think about AI-driven education. But something interesting is happening here — a small team built &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt;, a platform where an AI virtual professor teaches structured, university-grade courses entirely in Romanian. 🇷🇴&lt;/p&gt;

&lt;h2&gt;
  
  
  🎓 What Makes an AI Professor Different?
&lt;/h2&gt;

&lt;p&gt;Traditional e-learning platforms rely on human instructors recording content once, then distributing it forever. The content ages. The examples become irrelevant. The quizzes stay the same. 😴&lt;/p&gt;

&lt;p&gt;An AI-powered professor flips this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔄 &lt;strong&gt;Content stays current.&lt;/strong&gt; Courses reference 2025–2026 frameworks, tools, and regulations — including Romania-specific fiscal and legal context.&lt;/li&gt;
&lt;li&gt;📏 &lt;strong&gt;Every learner gets the same depth.&lt;/strong&gt; There's no "phoning it in" on module 7 because the instructor got tired. Each of the 29 courses on the platform has the same structured depth: modules, lessons, practical exercises, and quizzes.&lt;/li&gt;
&lt;li&gt;🤝 &lt;strong&gt;Non-technical people aren't left behind.&lt;/strong&gt; Half the catalog is designed for business professionals — marketing, HR, finance, real estate, entrepreneurship — not just developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, the &lt;a href="https://cursuri-ai.ro/courses/prompt-engineering-masterclass" rel="noopener noreferrer"&gt;Prompt Engineering Masterclass&lt;/a&gt; doesn't just teach you what a prompt is. It walks through advanced techniques like chain-of-thought reasoning, few-shot patterns, and evaluation frameworks — structured the way a university course would be, but accessible to anyone. 💡&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚙️ The Technical Architecture (for the Devs Reading This)
&lt;/h2&gt;

&lt;p&gt;Behind the scenes wih:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;📋 &lt;strong&gt;Plans&lt;/strong&gt; the full course structure (modules, lessons, learning objectives)&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Generates&lt;/strong&gt; each lesson in parallel using LLMs&lt;/li&gt;
&lt;li&gt;🧩 &lt;strong&gt;Assembles&lt;/strong&gt; the course with quizzes, practical exercises, and narrated audio&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Validates&lt;/strong&gt; output quality — structure, factual accuracy, quiz correctness&lt;/li&gt;
&lt;li&gt;🚢 &lt;strong&gt;Deploys&lt;/strong&gt; to production on AWS ECS Fargate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The generation pipeline catches its own mistakes — mismatched quiz keys, malformed options, missing content — and fixes them before anything goes live. It's a real production system, not a ChatGPT wrapper with a UI on top. 😏&lt;/p&gt;

&lt;h2&gt;
  
  
  🇷🇴 Why Romania, Why Now?
&lt;/h2&gt;

&lt;p&gt;Romania has a massive tech talent pool but a persistent gap in AI-specific education — especially in Romanian. Most high-quality AI content is in English, paywalled, or assumes you already have a CS degree. 😤&lt;/p&gt;

&lt;p&gt;Cursuri-AI.ro fills that gap with courses like &lt;a href="https://cursuri-ai.ro/courses/ai-lideri-business" rel="noopener noreferrer"&gt;AI for Business Leaders&lt;/a&gt;, which teaches executives how to evaluate AI projects, manage AI teams, and understand ROI — without writing a single line of code. That kind of course simply didn't exist in Romanian before. 🏆&lt;/p&gt;

&lt;p&gt;The bet is simple: &lt;strong&gt;if you lower the barrier to AI literacy in a country's native language, adoption accelerates across every industry&lt;/strong&gt; — not just tech. 📈&lt;/p&gt;

&lt;h2&gt;
  
  
  🔮 What This Means for EdTech
&lt;/h2&gt;

&lt;p&gt;The virtual AI professor model isn't just a novelty. It points to a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📚 Course catalogs can &lt;strong&gt;scale to hundreds of topics&lt;/strong&gt; without hiring hundreds of instructors&lt;/li&gt;
&lt;li&gt;♻️ Content can be &lt;strong&gt;regenerated&lt;/strong&gt; when the field evolves, instead of becoming stale&lt;/li&gt;
&lt;li&gt;🌍 &lt;strong&gt;Localization&lt;/strong&gt; becomes trivial — the same system can teach in any language with the same depth&lt;/li&gt;
&lt;li&gt;💎 &lt;strong&gt;Quality is consistent&lt;/strong&gt; — every module, every quiz, every explanation meets the same standard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This doesn't replace human mentorship. But it democratizes the structured knowledge layer that most people need before mentorship even becomes useful. 🙌&lt;/p&gt;

&lt;h2&gt;
  
  
  👀 Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you're curious, browse the course catalog at &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;cursuri-ai.ro&lt;/a&gt;. The platform has 29 courses across IT and non-IT tracks, all in Romanian, all taught by the AI professor. 🎓&lt;/p&gt;

&lt;p&gt;Whether you're a developer who wants to go deep on RAG and AI agents, or a marketing lead trying to figure out how AI fits into your workflow — there's probably a course for you. ✨&lt;/p&gt;

</description>
      <category>ai</category>
      <category>web</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>How AI Is Reshaping Romania's Financial System — And What Developers Should Know</title>
      <dc:creator>galian</dc:creator>
      <pubDate>Tue, 07 Apr 2026 23:13:38 +0000</pubDate>
      <link>https://dev.to/cursuri-ai/how-ai-is-reshaping-romanias-financial-system-and-what-developers-should-know-2a1h</link>
      <guid>https://dev.to/cursuri-ai/how-ai-is-reshaping-romanias-financial-system-and-what-developers-should-know-2a1h</guid>
      <description>&lt;h2&gt;
  
  
  🏦 Romania's Financial Sector Is Quietly Becoming an AI Playground
&lt;/h2&gt;

&lt;p&gt;While Western Europe dominates the AI headlines, Romania's financial ecosystem is undergoing a silent transformation. From automated tax compliance to real-time fraud detection, AI is no longer a PowerPoint slide in board meetings — it's in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 The Current Landscape
&lt;/h2&gt;

&lt;p&gt;Romania's financial system is ripe for AI adoption: a complex tax code (VAT 21%, micro-enterprise thresholds at 100k EUR, multiple regimes in parallel), rapid digitization mandated by law (e-Factura, e-Transport, SAF-T, RO e-TVA), a strong developer talent pool, and full EU regulatory alignment (GDPR, EU AI Act, PSD2, DORA). High regulatory complexity + strong tech talent + EU digital mandates = massive opportunity.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Where AI Is Already Deployed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fraud Detection &amp;amp; AML&lt;/strong&gt; — Banks like Banca Transilvania, BRD, and ING Romania use ML-based transaction monitoring with gradient-boosted trees, graph neural networks, and real-time streaming, reducing false positives by up to 60%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated Tax Compliance&lt;/strong&gt; — e-Factura generates millions of XMLs monthly. AI handles auto-classification by tax category, VAT anomaly detection, and predictive compliance before ANAF flags you. ANAF itself uses AI to cross-reference e-Factura with e-Transport and SAF-T.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credit Scoring &amp;amp; Lending&lt;/strong&gt; — Beyond Biroul de Credit, fintechs like Mokka, iWanto, and Salarium integrate PSD2 transaction history, behavioral patterns, and NLP on financial documents for instant creditworthiness assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversational AI&lt;/strong&gt; — Romanian-language NLU models fine-tuned on banking domain, intent classification for transaction queries, voice AI for phone banking. The challenge: Romanian is a low-resource language for NLP.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚖️ Regulatory Framework
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;EU AI Act&lt;/strong&gt; — Credit scoring and financial risk AI = high-risk. Mandatory risk assessments, human oversight, transparency, bias testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GDPR Art. 22&lt;/strong&gt; — Citizens have the right not to be subject to purely automated decisions with legal effects. You need human-in-the-loop, explainability, and contestation mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DORA (Jan 2025)&lt;/strong&gt; — Stress-test AI models, maintain audit trails for all decisions, report AI incidents to BNR.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔧 Common Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choices&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kafka, AWS Kinesis, RabbitMQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL, ClickHouse, S3 + Parquet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PyTorch, scikit-learn, XGBoost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serving&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI + Docker, SageMaker, MLflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLMs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude API, OpenAI API, fine-tuned Llama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Evidently AI, Grafana, OpenTelemetry&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Opportunities
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Banking + AI&lt;/strong&gt; — PSD2 opened the doors but few build intelligent products on it. Personal finance, automated savings, SME cash flow prediction — all underserved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RegTech Automation&lt;/strong&gt; — e-Factura validation, SAF-T generation, tax optimization. Massive market from freelancers to enterprises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Romanian Financial NLP&lt;/strong&gt; — Huge gap in domain-specific Romanian models for finance/legal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Accounting&lt;/strong&gt; — ~70,000 Romanian accounting firms still semi-manual. Auto-categorization, reconciliation, and declaration generation would be transformative.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Want to dive deeper? &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; covers AI applications across finance, business, and tech — 28 professional courses in Romanian, each with an integrated AI tutor 24/7.&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fintech sector: &lt;strong&gt;34% YoY&lt;/strong&gt; growth in transaction volume&lt;/li&gt;
&lt;li&gt;e-Factura: &lt;strong&gt;200M+ invoices/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Banking IT spending: &lt;strong&gt;+28%&lt;/strong&gt; in two years&lt;/li&gt;
&lt;li&gt;EU AI Act compliance: creating a new wave of demand for regulation-aware AI engineers&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Romania's financial system is at an inflection point. Mandatory digitization + EU regulation + strong dev community = AI isn't optional, it's required. Whether you're building fraud models, automating tax compliance, or creating Romanian-language financial assistants — the demand is real and growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your experience with AI in financial systems? Drop a comment 👇&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Learn AI hands-on, in Romanian: &lt;a href="https://cursuri-ai.ro" rel="noopener noreferrer"&gt;Cursuri-AI.ro&lt;/a&gt; — 28 professional courses from AI Engineering to Finance AI, each with a 24/7 AI tutor built into every lesson.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
