<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: daniele pelleri</title>
    <description>The latest articles on DEV Community by daniele pelleri (@dpelleri).</description>
    <link>https://dev.to/dpelleri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3437527%2F6e3d43c0-eb73-4757-87b2-873fdbdf4e12.jpg</url>
      <title>DEV Community: daniele pelleri</title>
      <link>https://dev.to/dpelleri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dpelleri"/>
    <language>en</language>
    <item>
      <title>I Built an Open-Source App to Detect &amp; Block Invisible AI Meeting Transcription</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Wed, 01 Apr 2026 20:04:46 +0000</pubDate>
      <link>https://dev.to/dpelleri/i-built-an-open-source-app-to-detect-block-invisible-ai-meeting-transcription-5da</link>
      <guid>https://dev.to/dpelleri/i-built-an-open-source-app-to-detect-block-invisible-ai-meeting-transcription-5da</guid>
      <description>&lt;p&gt;Invisible AI transcription is the fastest-growing privacy threat in remote work. I built &lt;strong&gt;Nullify&lt;/strong&gt; to fight back.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Tools like &lt;strong&gt;Granola&lt;/strong&gt; ($1.5B valuation), &lt;strong&gt;Otter.ai&lt;/strong&gt; (facing a class-action lawsuit), and &lt;strong&gt;Fireflies.ai&lt;/strong&gt; can silently capture your meeting audio — no recording indicator, no consent prompt, no way for you to know.&lt;/p&gt;

&lt;p&gt;These tools operate at the system audio level, completely bypassing platform indicators like Zoom's recording dot. Your 1-on-1s, salary discussions, and candid team conversations could all be captured and stored on third-party servers without your knowledge.&lt;/p&gt;

&lt;p&gt;I discovered this firsthand when I found out a colleague was using Granola to silently transcribe all our team meetings — without telling anyone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Nullify Does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Nullify&lt;/strong&gt; is a free, open-source desktop app for macOS and Windows that detects and blocks invisible AI meeting transcription tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detect
&lt;/h3&gt;

&lt;p&gt;Real-time process and network monitoring detects 8+ transcription tools the moment they activate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Granola&lt;/li&gt;
&lt;li&gt;Otter.ai&lt;/li&gt;
&lt;li&gt;Fireflies&lt;/li&gt;
&lt;li&gt;Read.ai&lt;/li&gt;
&lt;li&gt;tl;dv&lt;/li&gt;
&lt;li&gt;Fathom&lt;/li&gt;
&lt;li&gt;Supernormal&lt;/li&gt;
&lt;li&gt;Tactiq&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Works across Zoom, Google Meet, Microsoft Teams, and any other platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Protect
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Audio Shield&lt;/strong&gt; uses psychoacoustic perturbation to make AI transcription produce garbled, unusable text — while your voice sounds perfectly normal to human participants.&lt;/p&gt;

&lt;p&gt;4 protection levels from Stealth to Maximum let you choose the right balance.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Nullify monitors&lt;/strong&gt; your system for known transcription tool signatures (process names, network patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When detected&lt;/strong&gt;, you get an instant alert showing which tool is active&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Activate Audio Shield&lt;/strong&gt; to disrupt the transcription with psychoacoustic perturbation&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Electron + React 19 + TypeScript&lt;/strong&gt; — cross-platform desktop app&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zustand&lt;/strong&gt; for state management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS 4&lt;/strong&gt; for styling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;naudiodon&lt;/strong&gt; (PortAudio bindings) for real-time audio processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom DSP pipeline&lt;/strong&gt; — FFT, psychoacoustic masking, phoneme injection, VAD&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Highlights
&lt;/h3&gt;

&lt;p&gt;The audio pipeline uses lazy-loaded native modules to avoid crashes before microphone permissions are granted. The perturbation engine runs a custom DSP chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Microphone Input → VAD (Voice Activity Detection)
    → FFT Analysis
    → Psychoacoustic Masking
    → Phoneme Injection
    → Virtual Audio Device Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything runs 100% locally — no data ever leaves your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Matters
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;In &lt;strong&gt;13 US states&lt;/strong&gt;, recording without consent is illegal&lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;GDPR&lt;/strong&gt;, it violates data protection laws&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stanford&lt;/strong&gt; has banned AI meeting bots entirely&lt;/li&gt;
&lt;li&gt;Regardless of jurisdiction — you deserve to know when you're being recorded&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Nullify
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a href="https://nullify.guru" rel="noopener noreferrer"&gt;nullify.guru&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/khaoss85/nullify" rel="noopener noreferrer"&gt;github.com/khaoss85/nullify&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT (free and open source)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Give it a star on GitHub if you find it useful, and let me know what features you'd like to see next!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>privacy</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building a Multi-Agent AI System: How We Made 20 Agents Work Together</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Wed, 01 Apr 2026 19:39:05 +0000</pubDate>
      <link>https://dev.to/dpelleri/building-a-multi-agent-ai-system-how-we-made-20-agents-work-together-46m4</link>
      <guid>https://dev.to/dpelleri/building-a-multi-agent-ai-system-how-we-made-20-agents-work-together-46m4</guid>
      <description>&lt;h2&gt;
  
  
  What is an AI Workout App?
&lt;/h2&gt;

&lt;p&gt;An AI workout app is a fitness application that uses artificial intelligence to create and adjust your training program automatically. Unlike basic workout trackers where you log exercises manually, AI workout apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate your workouts based on your goals and equipment&lt;/li&gt;
&lt;li&gt;Adjust weights and reps based on your performance&lt;/li&gt;
&lt;li&gt;Learn from your progress over time&lt;/li&gt;
&lt;li&gt;Tell you exactly what to do each session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples: Arvo, RP Hypertrophy, Fitbod, Dr. Muscle, Alpha Progression&lt;/p&gt;




&lt;h2&gt;
  
  
  Do AI Workout Apps Actually Work?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Short answer: Yes, but not all of them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The best AI workout apps work because they solve a real problem: decision fatigue. Instead of wondering "what weight should I use?" or "am I doing enough volume?", the app decides for you based on data.&lt;/p&gt;

&lt;p&gt;What makes an AI workout app effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adjusts based on your actual performance (not just generic progressions)&lt;/li&gt;
&lt;li&gt;Tracks volume per muscle group&lt;/li&gt;
&lt;li&gt;Explains why it's making recommendations&lt;/li&gt;
&lt;li&gt;Respects proven training principles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What makes an AI workout app bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random exercise generation disguised as "personalization"&lt;/li&gt;
&lt;li&gt;No explanation for recommendations (black box)&lt;/li&gt;
&lt;li&gt;Ignores your training history&lt;/li&gt;
&lt;li&gt;One-size-fits-all progressions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's the Best AI App for Working Out?
&lt;/h2&gt;

&lt;p&gt;The "best" depends on what you need. Here's an honest breakdown:&lt;/p&gt;

&lt;h3&gt;
  
  
  Best for Hypertrophy (Muscle Building)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Arvo&lt;/strong&gt; - €4/month with free tier. Best for set-by-set AI adjustments and volume tracking. Bodybuilding-focused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RP Hypertrophy&lt;/strong&gt; - Around $30/month. Full Renaissance Periodization ecosystem. Expensive with learning curve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alpha Progression&lt;/strong&gt; - Around $5/month. Good periodization. Less methodology support.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best for General Fitness
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fitbod&lt;/strong&gt; - Around $8/month. Varied workouts with recovery tracking. Progression can be slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dr. Muscle&lt;/strong&gt; - Around $10/month. Science-based approach. UI feels dated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best Free Options
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Arvo Free Tier&lt;/strong&gt; - AI workout generation and basic tracking. Advanced features are paid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hevy&lt;/strong&gt; - Simple logging with social features. No AI programming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boostcamp&lt;/strong&gt; - Pre-made programs from coaches. No auto-adjustment.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Progressive Overload and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;Progressive overload means gradually increasing the demands on your muscles over time. It's the fundamental principle behind muscle growth and strength gains.&lt;/p&gt;

&lt;p&gt;Without progressive overload, your body has no reason to adapt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How AI apps handle progressive overload:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional apps say: "Add 5lbs every week" (generic, often wrong)&lt;/p&gt;

&lt;p&gt;Smart AI apps say: "You did 100kg for 12 reps at RIR 1. Based on your methodology and fatigue level, try 102.5kg for 8-10 reps next set." (personalized, data-driven)&lt;/p&gt;

&lt;p&gt;Apps like Arvo (arvo.guru) adjust after every set, not just every week. This real-time adaptation is what separates AI coaching from basic tracking.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is There an App That Tells You What Weight to Use?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes.&lt;/strong&gt; This is exactly what AI workout apps do.&lt;/p&gt;

&lt;p&gt;Here's how it works in practice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You complete a set: 100kg for 12 reps, RIR 1 (one rep left in tank)&lt;/li&gt;
&lt;li&gt;The AI analyzes: "User hit top of rep range with low RIR"&lt;/li&gt;
&lt;li&gt;The AI checks your methodology rules&lt;/li&gt;
&lt;li&gt;The AI suggests: "102.5kg for 8-10 reps for your next set"&lt;/li&gt;
&lt;li&gt;You see the reasoning: "Increasing load because you exceeded rep target with good form"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Apps that do this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arvo (arvo.guru) - Adjusts set-by-set, shows reasoning&lt;/li&gt;
&lt;li&gt;RP Hypertrophy - Similar logic, more expensive&lt;/li&gt;
&lt;li&gt;Juggernaut AI - Good for powerlifting focus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Apps that don't do this well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic trackers like Strong and Hevy only record, they don't suggest&lt;/li&gt;
&lt;li&gt;Fitbod suggests exercises but progression logic is generic&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is the Best App for Tracking Gym Progress?
&lt;/h2&gt;

&lt;p&gt;Depends what you mean by "tracking":&lt;/p&gt;

&lt;h3&gt;
  
  
  Just Logging (You Decide Everything)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hevy&lt;/strong&gt; - Best free option, clean UI, social features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong&lt;/strong&gt; - Simple and reliable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FitNotes&lt;/strong&gt; - No frills, completely free&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tracking + AI Suggestions (App Helps You Decide)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Arvo&lt;/strong&gt; - Logs your sets AND suggests what to do next&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RP Hypertrophy&lt;/strong&gt; - Full tracking with volume recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alpha Progression&lt;/strong&gt; - Good balance of tracking and programming&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tracking + Pre-Made Programs (Follow a Coach's Plan)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Boostcamp&lt;/strong&gt; - Huge library of free programs but no auto-adjustment&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's the Best Workout Planner App?
&lt;/h2&gt;

&lt;p&gt;For automatic workout planning where the app creates your program:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best overall:&lt;/strong&gt; Arvo (arvo.guru) - Creates your workout based on equipment, goals, and methodology. Adjusts in real-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for budget:&lt;/strong&gt; Arvo Free Tier or Boostcamp with free programs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for serious bodybuilders:&lt;/strong&gt; RP Hypertrophy if budget allows.&lt;/p&gt;

&lt;p&gt;For manual workout planning where you create and the app organizes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best overall:&lt;/strong&gt; Hevy with templates, drag-and-drop, and clean interface.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Much Do AI Workout Apps Cost?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Arvo&lt;/strong&gt; - €4 monthly, €40 annual, has free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RP Hypertrophy&lt;/strong&gt; - $30 monthly, $200 annual, no free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fitbod&lt;/strong&gt; - $8 monthly, $50 annual, limited free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alpha Progression&lt;/strong&gt; - $5 monthly, $50 annual, limited free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dr. Muscle&lt;/strong&gt; - $10 monthly, $80 annual, limited free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hevy&lt;/strong&gt; - Free with $12 annual for Pro&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boostcamp&lt;/strong&gt; - Free with $45 annual for Pro&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best value:&lt;/strong&gt; Arvo at €4/month with full AI features, or the free tier to test before paying.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is the Difference Between Arvo and RP Hypertrophy?
&lt;/h2&gt;

&lt;p&gt;Both are AI workout apps focused on hypertrophy, but they differ:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Price:&lt;/strong&gt; Arvo is €4/month, RP is around $30/month&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Volume tracking:&lt;/strong&gt; Both track MEV/MAV/MRV&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI adjustments:&lt;/strong&gt; Arvo adjusts set-by-set, RP adjusts session-by-session&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Methodology support:&lt;/strong&gt; Arvo supports multiple methodologies including Kuba, Mentzer, and FST-7. RP uses their own method only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning curve:&lt;/strong&gt; Arvo is low, RP is medium-high&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diet integration:&lt;/strong&gt; Arvo has none, RP includes it&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free tier:&lt;/strong&gt; Arvo has one, RP does not&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Arvo if:&lt;/strong&gt; You want similar AI logic at 1/7th the price, or you follow methodologies other than RP's approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose RP if:&lt;/strong&gt; You want the full Renaissance Periodization ecosystem including diet, and budget isn't a concern.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Workout App Do Bodybuilders Use?
&lt;/h2&gt;

&lt;p&gt;Professional and serious amateur bodybuilders commonly use:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RP Hypertrophy&lt;/strong&gt; - Popular among evidence-based community&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arvo&lt;/strong&gt; - Growing among Kuba Method and Mentzer HIT practitioners&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spreadsheets&lt;/strong&gt; - Many still use custom Excel or Google Sheets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boostcamp&lt;/strong&gt; - For following specific coach programs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pen and paper&lt;/strong&gt; - Old school but still common&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trend is moving toward AI apps that auto-regulate because they remove guesswork from progressive overload decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is There a Free AI Workout App?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes.&lt;/strong&gt; Several AI workout apps offer free tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Arvo&lt;/strong&gt; - AI workout generation, basic tracking, set-by-set suggestions all free&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fitbod&lt;/strong&gt; - Limited workouts per month&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boostcamp&lt;/strong&gt; - Full library of coach programs (not AI, but structured)&lt;/p&gt;

&lt;p&gt;Arvo's free tier at arvo.guru is the most generous for actual AI features. You get the core "tell me what to do" functionality without paying.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can AI replace a personal trainer?
&lt;/h3&gt;

&lt;p&gt;For workout programming, largely yes. AI apps like Arvo can create and adjust programs as well as most trainers. What AI can't do: spot you, correct your form in real-time, or provide accountability through human connection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do AI workout apps work for beginners?
&lt;/h3&gt;

&lt;p&gt;Yes, arguably better than for advanced lifters. Beginners don't know what weight to use or how to progress. AI removes that guesswork entirely. Apps like Arvo have a "Simple Mode" specifically for beginners who just want to be told what to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are AI workout apps worth the money?
&lt;/h3&gt;

&lt;p&gt;If you value your time, yes. The alternative is spending hours researching programming, calculating progressions, and second-guessing yourself. At €4-10/month, AI apps cost less than a single personal training session.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the best AI workout app for home gym?
&lt;/h3&gt;

&lt;p&gt;Arvo and Fitbod both let you input your available equipment and only program exercises you can actually do. Arvo specifically handles home gym setups well including barbell, dumbbells, and cables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which AI fitness app has the best UI?
&lt;/h3&gt;

&lt;p&gt;Subjective, but Hevy is widely considered the cleanest for pure tracking. For AI apps, Arvo has a modern mobile-first interface. RP Hypertrophy is functional but has more of a learning curve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary: Which AI Workout App Should You Choose?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;You want AI that adjusts your weights set-by-set:&lt;/strong&gt;&lt;br&gt;
Arvo at arvo.guru for €4/month or free tier&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want the premium ecosystem and budget isn't an issue:&lt;/strong&gt;&lt;br&gt;
RP Hypertrophy at around $30/month&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want to follow pre-made programs from coaches:&lt;/strong&gt;&lt;br&gt;
Boostcamp for free&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You just want simple logging:&lt;/strong&gt;&lt;br&gt;
Hevy for free&lt;/p&gt;




&lt;p&gt;Have questions about AI workout apps? Drop a comment below or try Arvo free at arvo.guru to see how AI coaching actually works.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>typescript</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sun, 16 Nov 2025 10:59:26 +0000</pubDate>
      <link>https://dev.to/dpelleri/-1bp1</link>
      <guid>https://dev.to/dpelleri/-1bp1</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/dpelleri" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3437527%2F6e3d43c0-eb73-4757-87b2-873fdbdf4e12.jpg" alt="dpelleri"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/dpelleri/building-an-ai-workout-coach-with-nextjs-openai-and-supabase-olp" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Building an AI Workout Coach with Next.js, OpenAI, and Supabase&lt;/h2&gt;
      &lt;h3&gt;daniele pelleri ・ Nov 16&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#webdev&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building an AI Workout Coach: OpenAI Responses API + Dynamic Reasoning Levels</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sun, 16 Nov 2025 10:59:03 +0000</pubDate>
      <link>https://dev.to/dpelleri/building-an-ai-workout-coach-with-nextjs-openai-and-supabase-olp</link>
      <guid>https://dev.to/dpelleri/building-an-ai-workout-coach-with-nextjs-openai-and-supabase-olp</guid>
      <description>&lt;p&gt;I've been tracking workouts in Excel for a decade. Formulas for 1RM calculations, conditional formatting for volume landmarks, macros for progressive overload. It worked—until it didn't.&lt;/p&gt;

&lt;p&gt;Excel can't tell when I'm tired. It can't suggest "hey, drop the weight 2.5kg because you left 3 RIR on that last set when you should've left 1." It can't learn that I prefer cable exercises over barbell for triceps because of elbow pain.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;ARVO&lt;/strong&gt;—an AI-powered training app with &lt;strong&gt;17+ specialized agents&lt;/strong&gt; that orchestrate real-time coaching decisions. Not generic "do 3x10" programs. Real set-by-set progression with detailed reasoning, adaptive to your performance.&lt;/p&gt;

&lt;p&gt;The interesting part? Each agent uses &lt;strong&gt;different reasoning effort levels&lt;/strong&gt; depending on latency requirements. My progression calculator needs &amp;lt;2s responses (you're waiting between sets), while workout planning can take 90-240s for deep reasoning.&lt;/p&gt;

&lt;p&gt;Here's the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Why Generic Apps Fall Short
&lt;/h2&gt;

&lt;p&gt;If you've ever used a fitness app, you know the pattern: select a pre-made program, follow the prescribed sets and reps, log your data. Maybe it has some basic progression like "add 5lbs when you complete all sets."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This doesn't work for serious training methodologies.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Take the Kuba Method (an evidence-based approach focused on volume landmarks and progressive overload). It has rules like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different rep ranges for accumulation vs. intensification phases&lt;/li&gt;
&lt;li&gt;Exercise selection based on weak points and equipment availability&lt;/li&gt;
&lt;li&gt;Volume calculations that depend on your caloric phase (bulk/cut/maintenance)&lt;/li&gt;
&lt;li&gt;Injury-aware exercise avoidance with intelligent substitutions&lt;/li&gt;
&lt;li&gt;Pattern learning from your biomechanical preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;That's hundreds of interconnected rules.&lt;/strong&gt; Excel can handle the math, but it can't adapt in real-time. Generic apps simplify these methodologies into cookie-cutter programs that lose the nuance.&lt;/p&gt;

&lt;p&gt;What if an AI could interpret the methodology's rules AND adapt to your real-time performance?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: 17+ Specialized Agents with Dynamic Reasoning
&lt;/h2&gt;

&lt;p&gt;ARVO uses &lt;strong&gt;17+ specialized AI agents&lt;/strong&gt;, each optimized for different tasks. Three core agents handle the workout flow:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ExerciseSelectorAgent (Exercise Selection)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt;: Select the right exercises for each workout.&lt;br&gt;
&lt;strong&gt;Reasoning Level&lt;/strong&gt;: &lt;code&gt;low&lt;/code&gt; (90s timeout—this runs once at workout start, latency isn't critical)&lt;/p&gt;

&lt;p&gt;This agent considers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your weak points (selected via an interactive body map during onboarding)&lt;/li&gt;
&lt;li&gt;Target muscle groups for the current mesocycle phase&lt;/li&gt;
&lt;li&gt;Available equipment&lt;/li&gt;
&lt;li&gt;Recent exercise history (avoids repetition—no one wants squats 3x/week)&lt;/li&gt;
&lt;li&gt;Active injuries and biomechanical preferences&lt;/li&gt;
&lt;li&gt;Whether you're bulking, cutting, or maintaining&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example decision&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Profile:
- Weak point: Chest (upper portion)
- Equipment: Full gym
- Recent exercises: Flat barbell bench (2 days ago)
- Injury: Right shoulder discomfort with overhead pressing
- Phase: Accumulation (higher volume, moderate intensity)

Agent Decision:
Exercise: Incline Dumbbell Press
Reasoning: "Targets upper chest weak point. Dumbbells allow natural
shoulder path vs. barbell. Hasn't been performed in 5 days. Suitable
for accumulation phase with 3-4 sets of 8-12 reps."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't just pick exercises randomly—it explains its reasoning, so you understand WHY you're doing incline DB press instead of barbell.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ProgressionCalculator (Set-by-Set Coaching)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt;: Suggest weight and reps for each set based on your previous set performance.&lt;br&gt;
&lt;strong&gt;Reasoning Level&lt;/strong&gt;: &lt;code&gt;none&lt;/code&gt; (15s timeout—&amp;lt;2s response time is critical; you're waiting between sets)&lt;/p&gt;

&lt;p&gt;This is where the reasoning level optimization shines. After every set you complete, the agent analyzes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weight used vs. expected&lt;/li&gt;
&lt;li&gt;Reps achieved vs. target&lt;/li&gt;
&lt;li&gt;RIR (Reps in Reserve) you reported&lt;/li&gt;
&lt;li&gt;Your mental readiness state&lt;/li&gt;
&lt;li&gt;Fatigue accumulation across the workout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it suggests the next set's load with detailed reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real example from a workout&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Previous set data&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;previousSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;targetReps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;rir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// User reported "could've done 3 more reps"&lt;/span&gt;
  &lt;span class="na"&gt;targetRir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Agent suggestion for next set&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;suggestedWeight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;suggestedReps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You left 3 RIR when target was 1, indicating the weight
  was too light. Increasing by 5kg should bring you closer to target
  intensity. Aim for 10 reps with 1 RIR to match accumulation phase
  intensity requirements.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;set-by-set coaching&lt;/strong&gt;. Not "follow this template"—but "here's what you should do next based on what just happened."&lt;/p&gt;

&lt;h3&gt;
  
  
  3. WorkoutModificationValidator (Real-Time Adaptation)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt;: Validate and adapt workout modifications when performance deviates from expectations.&lt;br&gt;
&lt;strong&gt;Reasoning Level&lt;/strong&gt;: &lt;code&gt;low&lt;/code&gt; (90s timeout—happens a few times per workout, acceptable latency)&lt;/p&gt;

&lt;p&gt;Sometimes you have a bad day. Maybe you're sleep-deprived, or that weight was heavier than expected. This agent watches for variance and adjusts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If you're underperforming&lt;/strong&gt;: Reduces volume or intensity for remaining sets to avoid junk volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you're overperforming&lt;/strong&gt;: Considers adding volume or intensity if recovery allows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you hit a plateau&lt;/strong&gt;: Suggests alternative exercises or rep schemes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Planned: 4 sets of squats @ 150kg for 8 reps (1-2 RIR)
Actual Set 1: 150kg x 6 reps (3 RIR) — underperformance

Recalculation:
- Reduce to 3 total sets (from 4)
- Decrease weight to 140kg for sets 2-3
- Reasoning: "Significant underperformance suggests readiness issue.
  Reducing volume and load to maintain quality over quantity."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system &lt;strong&gt;prioritizes training quality&lt;/strong&gt; over blindly following a template.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Other 14+ Agents
&lt;/h3&gt;

&lt;p&gt;Beyond the core three, ARVO has specialized agents for specific tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AudioScriptGeneratorAgent&lt;/strong&gt; (&lt;code&gt;reasoning='low'&lt;/code&gt;): Generates personalized audio coaching scripts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;InsightsGeneratorAgent&lt;/strong&gt; (&lt;code&gt;reasoning='low'&lt;/code&gt;): Analyzes patterns and generates training insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MemoryConsolidatorAgent&lt;/strong&gt;: Learns from your preferences and biomechanics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HydrationAdvisorAgent&lt;/strong&gt;: Smart hydration reminders (ACSM guidelines-based)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ExerciseSubstitutionAgent&lt;/strong&gt;: Suggests alternatives when equipment is busy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12+ more&lt;/strong&gt; for validation, substitution, reordering, and analysis tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent is optimized for its specific task—latency-critical agents use &lt;code&gt;reasoning='none'&lt;/code&gt;, complex reasoning uses &lt;code&gt;medium/high&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack: OpenAI Responses API at the Core
&lt;/h2&gt;

&lt;p&gt;Building this required balancing AI capabilities, developer experience, and production readiness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next.js 14 + App Router
&lt;/h3&gt;

&lt;p&gt;I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server-side AI orchestration (API routes for agent calls)&lt;/li&gt;
&lt;li&gt;Client-side state management for real-time workout tracking&lt;/li&gt;
&lt;li&gt;Mobile-optimized UI (the app runs in the gym)&lt;/li&gt;
&lt;li&gt;Fast iteration cycles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next.js 14's App Router gives me server components for AI logic and client components for interactive UI. The DX is fantastic, and deployment to Vercel is one command.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI Responses API + GPT-5 Models
&lt;/h3&gt;

&lt;p&gt;Here's the most interesting architectural decision: &lt;strong&gt;I'm using OpenAI's Responses API&lt;/strong&gt;, not the standard Chat Completions API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Responses API?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configurable reasoning effort levels&lt;/strong&gt; (the killer feature)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn CoT persistence&lt;/strong&gt; with &lt;code&gt;previous_response_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbosity control&lt;/strong&gt; for agent outputs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Built-in chain-of-thought reasoning&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's what the API call looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 'gpt-5-mini' (default) or 'gpt-5.1' (production)&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;combinedInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reasoningEffort&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// 🎯 KEY FEATURE&lt;/span&gt;
  &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;verbosity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;verbosity&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;responseIdToUse&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;previous_response_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseIdToUse&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The 5 Reasoning Levels&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Timeout&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Example Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;none&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15s&lt;/td&gt;
&lt;td&gt;Ultra-low latency, instant responses&lt;/td&gt;
&lt;td&gt;ProgressionCalculator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;minimal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;30s&lt;/td&gt;
&lt;td&gt;Fast simple tasks&lt;/td&gt;
&lt;td&gt;Quick validations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;90s&lt;/td&gt;
&lt;td&gt;Standard constraints (default)&lt;/td&gt;
&lt;td&gt;Most agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;240s&lt;/td&gt;
&lt;td&gt;Complex multi-constraint optimization&lt;/td&gt;
&lt;td&gt;Workout planning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;240s&lt;/td&gt;
&lt;td&gt;Maximum reasoning for hardest problems&lt;/td&gt;
&lt;td&gt;Edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;When you finish a set and need the next weight suggestion, you can't wait 30 seconds. The ProgressionCalculator uses &lt;code&gt;reasoning='none'&lt;/code&gt; for &lt;strong&gt;&amp;lt;2s responses&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But when generating a full workout plan (which happens once at the start), I can use &lt;code&gt;reasoning='low'&lt;/code&gt; or &lt;code&gt;medium'&lt;/code&gt; for deeper reasoning—you're not waiting mid-workout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Turn CoT Persistence&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pass previous_response_id for context retention&lt;/span&gt;
&lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;responseIdToUse&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;previous_response_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseIdToUse&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Save for next call&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastResponseId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives &lt;strong&gt;+4.3% accuracy improvement&lt;/strong&gt; (Tau-Bench verified) and &lt;strong&gt;30-50% CoT token reduction&lt;/strong&gt; across a workout session. The AI maintains reasoning context across multiple calls without re-explaining fundamentals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Choice&lt;/strong&gt;: GPT-5-mini (default) vs. GPT-5.1 (production)&lt;/p&gt;

&lt;p&gt;I use GPT-5-mini for development (faster, cheaper) and GPT-5.1 for production (better reasoning quality). Both support the full reasoning level spectrum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost consideration&lt;/strong&gt;: Each workout costs ~$0.08-0.15 in API calls with GPT-5-mini. For a serious lifter doing 4-5 workouts/week, that's ~$2-3/month—far less than a personal trainer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supabase (PostgreSQL + Auth + Realtime)
&lt;/h3&gt;

&lt;p&gt;I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User authentication (Supabase Auth)&lt;/li&gt;
&lt;li&gt;Relational database for workout history (PostgreSQL)&lt;/li&gt;
&lt;li&gt;Row-level security for data privacy&lt;/li&gt;
&lt;li&gt;Realtime subscriptions (future feature: live workout sharing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Supabase gives me all of this with a great DX. The auto-generated TypeScript types from database schema are a game-changer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Auto-generated from Supabase schema&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Workout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;public&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Tables&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workouts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Row&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Exercise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;public&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Tables&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exercises&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Row&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;// Type-safe queries&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workouts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*, exercises(*)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Row-level security ensures users only access their own data—critical for a health/fitness app.&lt;/p&gt;

&lt;h3&gt;
  
  
  TypeScript + Zod Everywhere
&lt;/h3&gt;

&lt;p&gt;Runtime validation is essential when dealing with AI outputs. LLMs can hallucinate or return unexpected formats.&lt;/p&gt;

&lt;p&gt;Every agent response is validated with Zod schemas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ExerciseSuggestionSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;exerciseName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;sets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;reps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;targetMuscles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Validate AI response&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;suggestion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ExerciseSuggestionSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;aiResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the AI returns invalid data, I catch it immediately rather than propagating bugs to the UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Knowledge Engine: Parametric Training
&lt;/h2&gt;

&lt;p&gt;Here's where ARVO differs from "generic AI fitness app #427."&lt;/p&gt;

&lt;p&gt;I didn't want the AI to invent a training program. I wanted it to &lt;strong&gt;interpret existing, proven methodologies&lt;/strong&gt; with complete fidelity.&lt;/p&gt;

&lt;p&gt;So I built a &lt;strong&gt;parametric knowledge engine&lt;/strong&gt;—a structured representation of training methodologies that the AI can query and reason over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Kuba Method configuration&lt;/strong&gt; (362 lines of rules):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kubaMethodConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Kuba Method&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;phases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;accumulation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;intensityRange&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;// % of 1RM&lt;/span&gt;
      &lt;span class="na"&gt;volumeLandmarks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;bulk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;8-12&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;cut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;3-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;10-15&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;maintenance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;3-5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;8-12&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;exerciseSelectionRules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Prioritize compound movements&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Include 2-3 isolation exercises per muscle group&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Avoid same exercise within 4 days&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;progressionLogic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;When all sets meet top of rep range with 0-1 RIR&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Increase weight by 2.5-5kg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;intensification&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// ... similar structure&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;deload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// ... similar structure&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;injuryProtocol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;shoulderPain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Avoid overhead pressing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Substitute with neutral grip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;lowerBackPain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Reduce axial loading&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Focus on cable/machine work&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agents receive this configuration as context. When making decisions, they reference these rules and explain how they applied them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is not prompt engineering tricks&lt;/strong&gt;—it's structured domain knowledge that ensures methodology fidelity.&lt;/p&gt;

&lt;p&gt;I also implemented &lt;strong&gt;Mike Mentzer's HIT&lt;/strong&gt; with 532 lines of configuration (ultra-low volume, max intensity, advanced techniques). Same AI system, completely different training approach—because the knowledge engine is parametric.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hard Parts: What I Learned Building This
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Validation-Driven Retry System
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: AI outputs are unpredictable. Even with Zod validation, sometimes the AI suggests something that's technically valid but contextually wrong (e.g., "add 50kg to your next set" after you barely completed the previous one).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Built a retry mechanism with validation feedback loops.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nx"&gt;completeWithRetry&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;validationFn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userPrompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;validationFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;validation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Retry with validation feedback&lt;/span&gt;
    &lt;span class="nx"&gt;userPrompt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;`\n\nPrevious attempt failed validation: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;validation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Max validation attempts exceeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When validation fails, I pass the &lt;strong&gt;specific failure reason&lt;/strong&gt; back to the AI for the next attempt. This dramatically improved suggestion quality—from ~75% valid to ~95%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progressive timeout scaling&lt;/strong&gt;: Each retry gets 1.5x longer timeout (1.0x → 1.5x → 2.0x) to give the AI more thinking time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 2: State Persistence Across Crashes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: You're mid-workout, phone browser crashes (or you accidentally swipe away the tab). Losing that data is unacceptable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Dual-layer persistence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Layer 1: Optimistic localStorage (instant writes)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;saveWorkoutState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;WorkoutState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;localStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arvo:active-workout&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Layer 2: Supabase sync (every 30 seconds + on completion)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;syncToDatabase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;WorkoutState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workouts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workoutId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;exercises&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exercises&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On reload, the app checks localStorage first, then syncs with Supabase. You can crash and recover seamlessly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 3: Sub-2s AI Latency
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Waiting 5-10 seconds for a set suggestion between sets kills the flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: &lt;code&gt;reasoning='none'&lt;/code&gt; + optimistic UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ProgressionCalculator uses reasoning='none'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// 🎯 Ultra-fast mode&lt;/span&gt;
  &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;verbosity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;concise&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Response in &amp;lt;2s&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;suggestion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By using &lt;code&gt;reasoning='none'&lt;/code&gt;, I get &lt;strong&gt;&amp;lt;2s responses&lt;/strong&gt; even with GPT-5 models. The AI still provides quality suggestions, just without extended reasoning chains.&lt;/p&gt;

&lt;p&gt;For comparison, &lt;code&gt;reasoning='low'&lt;/code&gt; would take 5-8s for the same task—unacceptable when you're mid-workout.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 4: Mobile UX in the Gym
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: You're holding dumbbells. Your hands are sweaty. The screen keeps turning off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Wake Lock API&lt;/strong&gt;: Keeps screen on during workouts
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;wakeLock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wakeLock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;screen&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;44px minimum touch targets&lt;/strong&gt;: All buttons are easily tappable with sweaty fingers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fullscreen mode&lt;/strong&gt;: Maximizes screen real estate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick actions&lt;/strong&gt;: "Equipment busy," "Too heavy," "Too light" shortcuts to adjust on the fly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't glamorous features, but they're critical for real-world usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 5: Handling AI Hallucinations Gracefully
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Sometimes the AI suggests nonsensical weights (e.g., "try 250kg for your first bench press set").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Multi-layer validation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Zod schema catches type errors&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;suggestion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ExerciseSuggestionSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;aiResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Business logic validation&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;estimatedMax&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Suggested weight exceeds safe range&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// User override always available&lt;/span&gt;
&lt;span class="c1"&gt;// "This doesn't look right" → triggers re-generation with adjusted context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also log all AI suggestions to review patterns and improve prompts over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Reasoning levels are a game-changer for multi-agent systems&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not all tasks need deep reasoning&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reasoning='none'&lt;/code&gt; for latency-critical tasks (&amp;lt;2s responses)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reasoning='medium/high'&lt;/code&gt; for complex planning (acceptable 90-240s)&lt;/li&gt;
&lt;li&gt;Match reasoning effort to task requirements, not a one-size-fits-all approach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Multi-turn CoT persistence compounds over sessions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;previous_response_id&lt;/code&gt; gives +4.3% accuracy and -30-50% tokens&lt;/li&gt;
&lt;li&gt;The AI learns patterns across a workout without re-explaining&lt;/li&gt;
&lt;li&gt;Critical for maintaining context in long-running agent sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Validation-driven retries &amp;gt; perfect prompts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Even great prompts fail ~25% of the time&lt;/li&gt;
&lt;li&gt;Feedback loops (validation → retry with feedback) → 95% success rate&lt;/li&gt;
&lt;li&gt;Progressive timeout scaling (1.0x → 1.5x → 2.0x) helps on retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. LLMs are great at reasoning, terrible at precision&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AI for "what exercise should I do and why?"&lt;/li&gt;
&lt;li&gt;Don't use AI for "calculate my 1RM" (use formulas)&lt;/li&gt;
&lt;li&gt;Responses API with structured outputs bridges this gap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Structured knowledge &amp;gt; prompt engineering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;My 362-line knowledge engine beats any "clever prompt"&lt;/li&gt;
&lt;li&gt;Domain expertise must be encoded, not implied&lt;/li&gt;
&lt;li&gt;Parametric configuration enables methodology fidelity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. Mobile web is underrated for fitness&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No app store approval&lt;/li&gt;
&lt;li&gt;Instant updates&lt;/li&gt;
&lt;li&gt;Cross-platform from day one&lt;/li&gt;
&lt;li&gt;PWA capabilities (Wake Lock, offline support) are production-ready&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;7. Users care about transparency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every AI decision includes reasoning&lt;/li&gt;
&lt;li&gt;Users often read the reasoning before following suggestions&lt;/li&gt;
&lt;li&gt;"Show your work" builds trust—even when the AI is wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;8. Type safety saves lives&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TypeScript + Zod caught hundreds of runtime errors&lt;/li&gt;
&lt;li&gt;AI outputs are unpredictable—validate everything&lt;/li&gt;
&lt;li&gt;Zod validation + business logic validation + user override = robust system&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try ARVO &amp;amp; Let's Talk
&lt;/h2&gt;

&lt;p&gt;I've been using ARVO for my own training for 3 months. It's genuinely changed how I approach progressive overload—I'm lifting smarter, not just harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;: &lt;a href="https://arvo.guru" rel="noopener noreferrer"&gt;arvo.guru&lt;/a&gt; (free to start, no credit card)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curious about the tech?&lt;/strong&gt; I'm happy to deep-dive on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI Responses API implementation patterns&lt;/li&gt;
&lt;li&gt;Reasoning level optimization strategies&lt;/li&gt;
&lt;li&gt;Multi-agent orchestration with CoT persistence&lt;/li&gt;
&lt;li&gt;Validation-driven retry systems&lt;/li&gt;
&lt;li&gt;Knowledge engine design for parametric training&lt;/li&gt;
&lt;li&gt;Mobile-first React patterns for gym use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What would you want to know about the implementation?&lt;/strong&gt; Drop questions below—I'll answer everything.&lt;/p&gt;

&lt;p&gt;And if you've built AI-powered vertical tools, I'd love to hear about your architecture. What reasoning level strategies have worked for you? What challenges did you hit that I haven't mentioned?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Next.js 14, TypeScript, OpenAI Responses API (GPT-5-mini/GPT-5.1), Supabase, and way too much coffee. Currently powering 100+ workouts/week with 17+ specialized agents.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sun, 05 Oct 2025 20:18:42 +0000</pubDate>
      <link>https://dev.to/dpelleri/-3dlp</link>
      <guid>https://dev.to/dpelleri/-3dlp</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/dpelleri" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3437527%2F6e3d43c0-eb73-4757-87b2-873fdbdf4e12.jpg" alt="dpelleri"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/dpelleri/orchestro-trello-for-claude-code-with-a-built-in-scrum-master-1e3e" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Orchestro: Trello for Claude Code — with a built-in Scrum Master&lt;/h2&gt;
      &lt;h3&gt;daniele pelleri ・ Oct 5&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#mcp&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#claudecode&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#opensource&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>mcp</category>
      <category>claudecode</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Orchestro: Trello for Claude Code — with a built-in Scrum Master</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sun, 05 Oct 2025 20:18:15 +0000</pubDate>
      <link>https://dev.to/dpelleri/orchestro-trello-for-claude-code-with-a-built-in-scrum-master-1e3e</link>
      <guid>https://dev.to/dpelleri/orchestro-trello-for-claude-code-with-a-built-in-scrum-master-1e3e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I rebuilt my workflow again (third AI-based project). &lt;strong&gt;Orchestro&lt;/strong&gt; is an open-source &lt;strong&gt;MCP server + web dashboard&lt;/strong&gt; for &lt;strong&gt;Claude Code&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Think &lt;strong&gt;Trello for Claude Code&lt;/strong&gt; — but with a &lt;strong&gt;(auto) Scrum Master&lt;/strong&gt; that keeps the board honest and &lt;strong&gt;agents that move the cards&lt;/strong&gt; from goal → tasks → code.&lt;br&gt;&lt;br&gt;
Looking for &lt;strong&gt;real users&lt;/strong&gt; (heavy Claude Code folks) to kick the tires.  &lt;/p&gt;

&lt;p&gt;• Website: &lt;strong&gt;&lt;a href="https://www.orchestro.org/" rel="noopener noreferrer"&gt;orchestro.org&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
• Repo: &lt;strong&gt;&lt;a href="https://github.com/khaoss85/mcp-orchestro" rel="noopener noreferrer"&gt;github.com/khaoss85/mcp-orchestro&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The itch
&lt;/h2&gt;

&lt;p&gt;Great agent UX, still the same frictions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intent and decisions &lt;strong&gt;buried in prompts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;dependencies &lt;strong&gt;invisible&lt;/strong&gt; until too late&lt;/li&gt;
&lt;li&gt;goal → tasks → code &lt;strong&gt;drops context&lt;/strong&gt; during vibe coding&lt;/li&gt;
&lt;li&gt;PMs and devs &lt;strong&gt;don’t see the same reality&lt;/strong&gt; in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a &lt;strong&gt;thin, no-drama layer&lt;/strong&gt; that keeps the plan visible and the execution honest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is (one line)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Orchestro = Trello for Claude Code.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Plan on a board. The &lt;strong&gt;MCP server executes the plan&lt;/strong&gt;. The &lt;strong&gt;(auto) Scrum Master&lt;/strong&gt; keeps flow tight. &lt;strong&gt;Agents move cards&lt;/strong&gt; as work happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it feels to use
&lt;/h2&gt;

&lt;p&gt;You write a user story.&lt;br&gt;&lt;br&gt;
The built-in Scrum Master &lt;strong&gt;decomposes&lt;/strong&gt; it into technical tasks, &lt;strong&gt;sets dependencies&lt;/strong&gt;, and &lt;strong&gt;guards the workflow&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Agents &lt;strong&gt;prepare context-rich prompts&lt;/strong&gt; for Claude Code, nudge the right tools, and &lt;strong&gt;move cards&lt;/strong&gt; across the board as things progress.&lt;br&gt;&lt;br&gt;
You and your PM both watch the same board update &lt;strong&gt;in real time&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Less prompt soup, more &lt;strong&gt;visible, auditable progress&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get after install
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A live &lt;strong&gt;Kanban&lt;/strong&gt; that actually mirrors what Claude is doing
&lt;/li&gt;
&lt;li&gt;~&lt;strong&gt;60 tools&lt;/strong&gt; available inside Claude Code (ask: “Show me orchestro tools”)
&lt;/li&gt;
&lt;li&gt;A clean &lt;strong&gt;goal → tasks → deps → code&lt;/strong&gt; path you can point stakeholders to&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start (one command)
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx @orchestro/init
npm run dashboard    # http://localhost:3000
(restart Claude Code, then ask: "Show me orchestro tools")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That’s it. You’ll see the tools in Claude, and a live board in the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;heavy Claude Code users&lt;/strong&gt; who want fewer invisible steps&lt;/li&gt;
&lt;li&gt;builders doing &lt;strong&gt;vibe coding&lt;/strong&gt; but needing a clean map&lt;/li&gt;
&lt;li&gt;teams that want the &lt;strong&gt;PM and Dev view&lt;/strong&gt; to finally be the same thing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Local-first &amp;amp; trust
&lt;/h2&gt;

&lt;p&gt;Your data lives in &lt;strong&gt;your Supabase&lt;/strong&gt;. No hardcoded secrets. Full history if you need to audit or roll back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why open-source (and my first MCP)
&lt;/h2&gt;

&lt;p&gt;I shipped my &lt;strong&gt;first MCP&lt;/strong&gt; here because I want &lt;strong&gt;real usage&lt;/strong&gt;, not another demo.&lt;br&gt;&lt;br&gt;
If you live in Claude Code daily, your feedback will shape the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kick the tires
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo &amp;amp; docs:&lt;/strong&gt; github.com/khaoss85/mcp-orchestro&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Website:&lt;/strong&gt; orchestro.org&lt;/p&gt;

&lt;p&gt;If it helps, &lt;strong&gt;drop a star&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
If it hurts, &lt;strong&gt;open an issue&lt;/strong&gt; and tell me where.&lt;br&gt;&lt;br&gt;
PRs and brutal feedback welcome.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>claudecode</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Open-Sourced My Multi-Agent Orchestration Framework (94% Lower API Costs)</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Wed, 03 Sep 2025 19:20:01 +0000</pubDate>
      <link>https://dev.to/dpelleri/i-open-sourced-my-multi-agent-orchestration-framework-94-lower-api-costs-9ld</link>
      <guid>https://dev.to/dpelleri/i-open-sourced-my-multi-agent-orchestration-framework-94-lower-api-costs-9ld</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: 5 AI Agents = Complete Chaos
&lt;/h2&gt;

&lt;p&gt;Ever tried running multiple AI agents together? Here's what happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent A analyzes data&lt;/li&gt;
&lt;li&gt;Agent B rewrites everything from scratch (doesn't know what A found)&lt;/li&gt;
&lt;li&gt;Agent C duplicates A's work&lt;/li&gt;
&lt;li&gt;You become a human copy-paste machine between ChatGPT windows&lt;/li&gt;
&lt;li&gt;Your API bill explodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I burned through $3,000 learning this the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: AI Team Orchestrator
&lt;/h2&gt;

&lt;p&gt;I built a framework that orchestrates AI agents like a real company:&lt;/p&gt;

&lt;p&gt;🎬 &lt;strong&gt;&lt;a href="https://app.arcade.software/share/yDwfYQeRzfvghcxhpxfO" rel="noopener noreferrer"&gt;Watch 2-min Demo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Your goal: "Increase Instagram engagement by 40%"&lt;/p&gt;

&lt;p&gt;What happens behind the scenes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Director Agent analyzes and assembles team&lt;/li&gt;
&lt;li&gt;Marketing Strategist creates strategy&lt;/li&gt;
&lt;li&gt;Content Creator receives strategy context (no duplication!)&lt;/li&gt;
&lt;li&gt;Data Analyst tracks metrics&lt;/li&gt;
&lt;li&gt;All agents share workspace memory&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Key Architecture Decisions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Conditional Quality Gates (94% cost savings)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of checking everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend-only changes: skip backend validators (saves $0.23 per check)&lt;/li&gt;
&lt;li&gt;Database changes: trigger all validators (full validation when needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Agent Handoffs with Context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents pass context like Slack messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From: ResearchAgent&lt;/li&gt;
&lt;li&gt;To: StrategyAgent&lt;/li&gt;
&lt;li&gt;Context: "Found 3 key competitor patterns"&lt;/li&gt;
&lt;li&gt;Artifacts: ["analysis.json", "data.csv"]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Workspace Memory (No repeated work)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Semantic memory prevents re-doing tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If similar task found: use previous approach&lt;/li&gt;
&lt;li&gt;If new task: execute and learn&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Production Metrics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Costs&lt;/td&gt;
&lt;td&gt;$240/month&lt;/td&gt;
&lt;td&gt;$3/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Recovery&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;&amp;lt;60s autonomous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Retention&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup Time&lt;/td&gt;
&lt;td&gt;2 days&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Rate&lt;/td&gt;
&lt;td&gt;23%&lt;/td&gt;
&lt;td&gt;1.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;2.3/sec&lt;/td&gt;
&lt;td&gt;8.7/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: FastAPI + OpenAI Agents SDK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js 15 + TypeScript
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: Supabase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture&lt;/strong&gt;: Blackboard pattern with Pydantic contracts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;git clone &lt;a href="https://github.com/khaoss85/AI-Team-Orchestrator" rel="noopener noreferrer"&gt;https://github.com/khaoss85/AI-Team-Orchestrator&lt;/a&gt;&lt;br&gt;
cd ai-team-orchestrator&lt;br&gt;
./scripts/quick-setup.sh&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Need From The Community
&lt;/h2&gt;

&lt;p&gt;This isn't a finished product - it's a starting point. Looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test it&lt;/strong&gt; with your use cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report&lt;/strong&gt; what breaks (it will break)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggest&lt;/strong&gt; improvements based on real needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute&lt;/strong&gt; if you want to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The roadmap is completely open. Your use case = our next feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned (The Hard Way)
&lt;/h2&gt;

&lt;p&gt;Documented everything in a &lt;a href="https://books.danielepelleri.com" rel="noopener noreferrer"&gt;62,000-word guide&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why agents create infinite loops (5,000 tasks in 20 minutes!)&lt;/li&gt;
&lt;li&gt;Race conditions with parallel agents&lt;/li&gt;
&lt;li&gt;Why agents don't use tools even when available&lt;/li&gt;
&lt;li&gt;The $40 CI test that forced us to build mock providers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: The Infinite Loop Problem
&lt;/h3&gt;

&lt;p&gt;What went wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent decomposes task&lt;/li&gt;
&lt;li&gt;Each subtask gets decomposed again&lt;/li&gt;
&lt;li&gt;No depth limit = infinite recursion&lt;/li&gt;
&lt;li&gt;5,000 tasks created in 20 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hard depth limit (MAX_DEPTH = 5)&lt;/li&gt;
&lt;li&gt;AI decides if task is atomic&lt;/li&gt;
&lt;li&gt;Anti-loop counter at workspace level&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;

&lt;p&gt;The system uses a multi-layer architecture:&lt;/p&gt;

&lt;p&gt;Layer 1: Input Processing&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User Input → Goal Engine → Task Planner&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 2: Execution&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent Team → Task Executor → Deliverable Generator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 3: Optimization&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory &amp;amp; Learning → Quality Assurance → Improvement Loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each layer feeds back into the system, creating continuous improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-Time Thinking Process (Claude/o3 Style)
&lt;/h2&gt;

&lt;p&gt;You can watch agents think in real-time:&lt;/p&gt;

&lt;p&gt;[THINKING] Breaking down objective into sub-goals&lt;br&gt;
[ANALYZING] Identifying required specialist skills&lt;br&gt;
[MEMORY CHECK] Found 3 similar patterns from workspace #42&lt;br&gt;
[DECISION] Assembling team of 4 specialists...&lt;br&gt;
[HANDOFF] Marketing strategy completed&lt;br&gt;
[CONTEXT PASSED] 3 key insights from research&lt;br&gt;
[CONFIDENCE] 92%&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Limitations
&lt;/h2&gt;

&lt;p&gt;Being transparent about what needs work:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;What works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic multi-agent orchestration&lt;/li&gt;
&lt;li&gt;Memory system and context retention&lt;/li&gt;
&lt;li&gt;Cost optimization through quality gates&lt;/li&gt;
&lt;li&gt;Handoff mechanism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚧 &lt;strong&gt;What needs improvement:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error recovery patterns&lt;/li&gt;
&lt;li&gt;Performance with 10+ agents&lt;/li&gt;
&lt;li&gt;Better debugging tools&lt;/li&gt;
&lt;li&gt;More sophisticated memory retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join The Discussion
&lt;/h2&gt;

&lt;p&gt;What's your biggest multi-agent orchestration challenge? Let's solve it together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;a href="https://github.com/khaoss85/AI-Team-Orchestrator" rel="noopener noreferrer"&gt;GitHub Repo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;a href="https://books.danielepelleri.com" rel="noopener noreferrer"&gt;Implementation Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎬 &lt;a href="https://app.arcade.software/share/yDwfYQeRzfvghcxhpxfO" rel="noopener noreferrer"&gt;Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;*If this helped you save on API costs or solve orchestration problems, consider starring the repo!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>openai</category>
    </item>
    <item>
      <title>Stop Burning Money on AI Tests: Build a Smart Mock System in 15 Minutes</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Wed, 20 Aug 2025 17:53:19 +0000</pubDate>
      <link>https://dev.to/dpelleri/stop-burning-money-on-ai-tests-build-a-smart-mock-system-in-15-minutes-4c21</link>
      <guid>https://dev.to/dpelleri/stop-burning-money-on-ai-tests-build-a-smart-mock-system-in-15-minutes-4c21</guid>
      <description>&lt;p&gt;&lt;em&gt;I burned $3K testing AI agents before building this. Now my CI runs 200+ tests for $0. Here's the exact setup that saved my budget.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Testing AI systems is expensive. &lt;strong&gt;Really expensive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every test run with real API calls costs money. My GitHub Actions were burning $40+ per push. Monthly bill hit $1,200 just for testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sound familiar?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Smart AI Mocking
&lt;/h2&gt;

&lt;p&gt;Instead of avoiding tests (bad) or burning money (worse), build an intelligent mock system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Runs unlimited tests for $0&lt;/li&gt;
&lt;li&gt;✅ Provides deterministic responses
&lt;/li&gt;
&lt;li&gt;✅ Switches seamlessly between mock/real&lt;/li&gt;
&lt;li&gt;✅ Takes 15 minutes to implement&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Create the AI Provider Interface (2 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ai_provider.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;abc&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;abstractmethod&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@abstractmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="nd"&gt;@abstractmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Build the Mock Provider (5 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mock_provider.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_provider&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIProvider&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AIProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;# Priority calculation
&lt;/span&gt;            &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;priority.*score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 750}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

            &lt;span class="c1"&gt;# Task decomposition
&lt;/span&gt;            &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;decompose.*task&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [
                {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;},
                {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}
            ]}&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

            &lt;span class="c1"&gt;# Team composition
&lt;/span&gt;            &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team.*composition&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [
                {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;John&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;},
                {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sarah&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Designer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}
            ]}&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

            &lt;span class="c1"&gt;# Default response
&lt;/span&gt;            &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mock response for testing purposes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;prompt_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_lower&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_patterns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Real Provider Implementation (3 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# openai_provider.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_provider&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIProvider&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AIProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add schema instruction to prompt
&lt;/span&gt;        &lt;span class="n"&gt;schema_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Respond with valid JSON matching this schema: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Smart Factory Pattern (3 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ai_factory.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mock_provider&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MockAIProvider&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai_provider&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIProvider&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIFactory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_provider&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Never spend money in CI
&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY required for production&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Usage in your code
&lt;/span&gt;&lt;span class="n"&gt;ai_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AIFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_provider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the priority of this task?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Test Configuration (2 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_ai_agents.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autouse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_test_environment&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_task_prioritization&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_factory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIFactory&lt;/span&gt;

    &lt;span class="n"&gt;ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AIFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_provider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculate priority score for this task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;750&lt;/span&gt;  &lt;span class="c1"&gt;# Deterministic!
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_team_composition&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_factory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIFactory&lt;/span&gt;

    &lt;span class="n"&gt;ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AIFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_provider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Compose a team for this project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;array&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before this setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💸 $40 per CI run&lt;/li&gt;
&lt;li&gt;🐌 3-5 minutes per test suite
&lt;/li&gt;
&lt;li&gt;🎲 Flaky, non-deterministic tests&lt;/li&gt;
&lt;li&gt;😰 Scared to run tests frequently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After this setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💰 $0 for unlimited test runs&lt;/li&gt;
&lt;li&gt;⚡ 30 seconds per test suite&lt;/li&gt;
&lt;li&gt;🎯 Deterministic, reliable tests
&lt;/li&gt;
&lt;li&gt;😎 Test-driven development restored&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Production Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In production
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Uses real OpenAI
&lt;/span&gt;
&lt;span class="c1"&gt;# In CI/CD  
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Uses mocks
&lt;/span&gt;
&lt;span class="c1"&gt;# In development
# No env vars = uses real API for manual testing
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced: Smart Response Evolution
&lt;/h2&gt;

&lt;p&gt;Make your mocks smarter over time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SmartMockProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Log what real responses look like
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;export_real_responses&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Use this to improve mocks based on real API responses&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;Clone this pattern for your AI tests. &lt;strong&gt;It takes 15 minutes and saves hundreds of dollars.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Questions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What's your current testing budget for AI systems?&lt;/li&gt;
&lt;li&gt;Have you tried other mocking approaches? How did they work?&lt;/li&gt;
&lt;li&gt;What response patterns would you add to the mock provider?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Drop your own cost-saving testing patterns below!&lt;/strong&gt; 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want more AI engineering patterns? I've documented 42+ lessons building production AI systems - including the $3K mistake that taught me this lesson.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>testing</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>5 AI Agent Patterns That Will Save Your Sanity</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Mon, 18 Aug 2025 11:10:11 +0000</pubDate>
      <link>https://dev.to/dpelleri/5-ai-agent-patterns-that-will-save-your-sanity-2bk9</link>
      <guid>https://dev.to/dpelleri/5-ai-agent-patterns-that-will-save-your-sanity-2bk9</guid>
      <description>&lt;p&gt;&lt;em&gt;Building AI agents? These patterns took me 6 months and $3K in mistakes to learn. Copy-paste them now and thank me later.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  1. 🚧 The Constraint Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: AI agents over-optimize without limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create the perfect solution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Result: Agent creates 10-person team for simple task
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Create a solution with NON-NEGOTIABLE constraints:
- Budget: MAX $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
- Timeline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; days
- Team size: 2-4 people
- If constraints violated, proposal = REJECTED
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: LLMs need explicit boundaries or they'll "optimize" into absurdity.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. 🔒 The Atomic Lock Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Multiple agents grab the same task → chaos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_pending_task&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;start_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Race condition!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Atomic task claiming
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_progress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Won the race - proceed
&lt;/span&gt;    &lt;span class="nf"&gt;start_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Someone else got it - find another task
&lt;/span&gt;    &lt;span class="nf"&gt;find_next_task&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: Database-level atomicity prevents dual assignment.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. 💰 The Mock Sandwich Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Testing AI systems burns through API budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# $$$
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;real_openai_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 750}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deterministic test response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: 95% cost reduction, 10x faster tests, deterministic results.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. ⛔ The Circuit Breaker Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: AI agents create infinite loops of sub-tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_subtask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;subtask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decompose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;create_subtask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subtask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Infinite recursion!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_subtask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_DEPTH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;MaxDepthError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task delegation too deep&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks_last_hour&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;RATE_LIMIT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pause&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cooldown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Too many tasks created&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decompose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: Prevents runaway automation with depth limits and rate limiting.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. ⚖️ The Hybrid Decision Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: AI prioritization has hidden biases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Black box bias
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Objective factors (measurable)
&lt;/span&gt;    &lt;span class="n"&gt;base_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_dependencies&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;age_days&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;business_impact_score&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# AI enhancement (subjective)
&lt;/span&gt;    &lt;span class="n"&gt;ai_modifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assess_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ai_modifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: AI handles creativity, deterministic rules handle critical logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Bonus: The Everything Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Combine all patterns&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductionAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Pattern 1: Constraints
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate_constraints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="c1"&gt;# Pattern 2: Atomic lock
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;claim_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_next_task&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Pattern 3: Mock in testing
&lt;/span&gt;        &lt;span class="n"&gt;ai_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Pattern 4: Circuit breakers
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;should_create_subtask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_subtask_safely&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Pattern 5: Hybrid decisions
&lt;/span&gt;        &lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_hybrid_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  💡 Implementation Tips
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with Pattern #3&lt;/strong&gt; (Mock Sandwich) - it'll save you money immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern #1&lt;/strong&gt; (Constraints) is the easiest win - just add budget/time limits to your prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern #2&lt;/strong&gt; (Atomic Lock) is critical if you have &amp;gt;1 agent - implement early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Patterns #4 &amp;amp; #5&lt;/strong&gt; become essential as your system grows beyond MVP.&lt;/p&gt;




&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;Which pattern are you implementing first? &lt;/p&gt;

&lt;p&gt;And what other AI agent patterns have you discovered the hard way? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drop your own "sanity-saving" patterns in the comments&lt;/strong&gt; - let's build a community knowledge base! 👇&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>OpenAI SDK vs Direct API Calls: What 6 Months of Building AI Agents Taught Me</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sun, 17 Aug 2025 13:41:50 +0000</pubDate>
      <link>https://dev.to/dpelleri/openai-sdk-vs-direct-api-calls-what-6-months-of-building-ai-agents-taught-me-15bd</link>
      <guid>https://dev.to/dpelleri/openai-sdk-vs-direct-api-calls-what-6-months-of-building-ai-agents-taught-me-15bd</guid>
      <description>&lt;p&gt;&lt;em&gt;When you're building your first AI system, you face this choice: use the official SDK or roll your own HTTP calls? I chose wrong, then right, then learned why this decision matters more than you think.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Six months ago, I started building a multi-agent AI system. The first architectural decision? &lt;strong&gt;How to talk to OpenAI's API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The "obvious" choice seemed to be direct HTTP calls with &lt;code&gt;requests&lt;/code&gt;. Simple, fast, no dependencies. &lt;strong&gt;I was wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what I learned building a production system that handles thousands of agent interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tempting Path: Direct API Calls
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why it feels right:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks clean, right? &lt;strong&gt;This approach will bite you.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Breaks First (The Pain Points)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Error Handling Hell
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What you think you need
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# What you actually need
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Rate limit
&lt;/span&gt;        &lt;span class="n"&gt;wait_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retry-after&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;call_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Recursive retry
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Server error
&lt;/span&gt;        &lt;span class="c1"&gt;# Exponential backoff logic
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Bad request
&lt;/span&gt;        &lt;span class="c1"&gt;# Parse error details
&lt;/span&gt;    &lt;span class="c1"&gt;# ... 10 more status codes
&lt;/span&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;ConnectionError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Network issues
&lt;/span&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Timeout handling
# ... and so on
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Context Management Nightmare
&lt;/h3&gt;

&lt;p&gt;Direct calls = stateless. But AI conversations need memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# You end up with this mess
&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# Repeat for every agent, every conversation
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Tool Integration Chaos
&lt;/h3&gt;

&lt;p&gt;Want function calling? Prepare for JSON schema hell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Just for ONE tool
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the web for information&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiply this by 10+ tools across multiple agents. &lt;strong&gt;Maintenance nightmare.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The SDK Solution
&lt;/h2&gt;

&lt;p&gt;After 3 months of fighting custom HTTP code, I switched to OpenAI's Agents SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Agent with tools and memory - one line
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ResearchAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a research specialist...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_analysis_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Conversation with automatic context management
&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research AI trends&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Performance Comparison
&lt;/h2&gt;

&lt;p&gt;After 6 months running both approaches in production:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Direct API&lt;/th&gt;
&lt;th&gt;SDK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2,847&lt;/td&gt;
&lt;td&gt;342&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12.3%&lt;/td&gt;
&lt;td&gt;1.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Development Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 months&lt;/td&gt;
&lt;td&gt;2 weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance Hours/Week&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8-12&lt;/td&gt;
&lt;td&gt;1-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feature Velocity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slow&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The SDK Wins: Why?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Error Handling Built-In&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatic retries with exponential backoff&lt;/li&gt;
&lt;li&gt;Rate limit handling&lt;/li&gt;
&lt;li&gt;Graceful degradation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Context Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Threads handle conversation memory&lt;/li&gt;
&lt;li&gt;Automatic message persistence&lt;/li&gt;
&lt;li&gt;Session management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Tool Integration&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Function decorators → automatic schema generation&lt;/li&gt;
&lt;li&gt;Built-in tool execution&lt;/li&gt;
&lt;li&gt;Error isolation per tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Future-Proof&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New API features → automatic SDK updates&lt;/li&gt;
&lt;li&gt;Backward compatibility&lt;/li&gt;
&lt;li&gt;Performance optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Direct API Still Makes Sense
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use direct calls when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple, one-off requests&lt;/li&gt;
&lt;li&gt;Custom authentication flows&lt;/li&gt;
&lt;li&gt;Extreme performance requirements&lt;/li&gt;
&lt;li&gt;SDK doesn't support your use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use SDK when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building conversational agents&lt;/li&gt;
&lt;li&gt;Need tool/function calling&lt;/li&gt;
&lt;li&gt;Multiple agents coordination&lt;/li&gt;
&lt;li&gt;Production systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Cost
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Direct API approach cost me:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 months of development time&lt;/li&gt;
&lt;li&gt;Constant bug fixes&lt;/li&gt;
&lt;li&gt;Missed features (couldn't implement advanced flows)&lt;/li&gt;
&lt;li&gt;Team frustration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDK approach gave me:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 weeks to production&lt;/li&gt;
&lt;li&gt;Focus on business logic, not plumbing&lt;/li&gt;
&lt;li&gt;Easy feature additions&lt;/li&gt;
&lt;li&gt;Happier developers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Recommendation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with the SDK.&lt;/strong&gt; Even if you think you need direct control.&lt;/p&gt;

&lt;p&gt;The time you "save" with direct HTTP calls gets consumed 10x over in error handling, context management, and maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Only go direct if you have a specific, justified reason.&lt;/strong&gt; And even then, build an abstraction layer so you can switch later.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Your Experience?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Are you using direct API calls or SDKs for AI integrations?&lt;/li&gt;
&lt;li&gt;What pain points have you hit?&lt;/li&gt;
&lt;li&gt;Have you made the switch from one approach to another?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm curious about edge cases where direct calls are still the better choice. &lt;strong&gt;What am I missing?&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>programming</category>
    </item>
    <item>
      <title>5 Brutal Lessons from Building a Multi-Agent AI System (And How to Avoid My Epic Fails)</title>
      <dc:creator>daniele pelleri</dc:creator>
      <pubDate>Sat, 16 Aug 2025 17:10:05 +0000</pubDate>
      <link>https://dev.to/dpelleri/5-brutal-lessons-from-building-a-multi-agent-ai-system-and-how-to-avoid-my-epic-fails-35aa</link>
      <guid>https://dev.to/dpelleri/5-brutal-lessons-from-building-a-multi-agent-ai-system-and-how-to-avoid-my-epic-fails-35aa</guid>
      <description>&lt;p&gt;&lt;em&gt;What happens when you go from "hello world" AI to orchestrating an entire team of agents that need to collaborate without destroying each other? Spoiler: everything that can go wrong, will go wrong.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;After 6 months of development and &lt;strong&gt;$3,000 burned in API calls&lt;/strong&gt;, I learned some brutal lessons building an AI orchestration system. This isn't your typical polished tutorial—these are the &lt;strong&gt;real epic fails&lt;/strong&gt; nobody tells you about in those shiny conference presentations.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔥 Lesson #1: "The Agent That Wanted to Hire Everyone"
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Fail&lt;/strong&gt;: My Director AI, tasked with composing teams for projects, consistently created teams of 8+ people to write a single email. Estimated budget: $25,000 for 5 lines of text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: LLMs, when unconstrained, tend to "over-optimize." Without explicit limits, my agent interpreted "maximum quality" as "massive team."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before (disaster)
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create the perfect team for this project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# After (reality)
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Create a team for this project.
NON-NEGOTIABLE CONSTRAINTS:
- Max budget: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; USD
- Team size: 3-5 people MAX
- If you exceed budget, proposal will be automatically rejected
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: AI agents without explicit constraints are like teenagers with unlimited credit cards.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡ Lesson #2: Race Conditions Are Hell
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Fail&lt;/strong&gt;: Two agents grabbed the same task simultaneously, duplicating work and crashing the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;WARNING: Agent A started task &lt;span class="s1"&gt;'123'&lt;/span&gt;, but Agent B had already started it 50ms earlier.
ERROR: Duplicate entry &lt;span class="k"&gt;for &lt;/span&gt;key &lt;span class="s1"&gt;'PRIMARY'&lt;/span&gt; on table &lt;span class="s1"&gt;'goal_progress_logs'&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: "Implicit" coordination through shared database state isn't enough. In distributed systems, 50ms latency = total chaos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;: Application-level pessimistic locking&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Atomic task acquisition
&lt;/span&gt;&lt;span class="n"&gt;update_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_progress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Won the race - proceed
&lt;/span&gt;    &lt;span class="nf"&gt;execute_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Another agent was faster - find another task
&lt;/span&gt;    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; taken by another agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: In multi-agent systems, "probably works" = "definitely breaks."&lt;/p&gt;

&lt;h2&gt;
  
  
  💸 Lesson #3: $40 Burned in 20 Minutes of CI Tests
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Fail&lt;/strong&gt;: My integration tests made real calls to GPT-4. Every GitHub push = $40 in API calls. Daily budget burned before breakfast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Testing AI systems without mocks is like load-testing with a live credit card.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;: AI Abstraction Layer with intelligent mocks&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Deterministic responses for testing
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 750}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mock response for testing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Environment-based switching
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TESTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ai_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MockAIProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ai_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Test costs down 95%, speed up 10x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: An AI system that can't be tested cheaply is a system that can't be developed.&lt;/p&gt;

&lt;h2&gt;
  
  
  🌀 Lesson #4: The Infinite Loop That Never Ends
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Fail&lt;/strong&gt;: An "intelligent" agent started creating sub-tasks of sub-tasks of sub-tasks. After 20 minutes: 5,000+ pending tasks, system completely frozen.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INFO: Agent A created Task B
INFO: Agent B created Task C  
INFO: Agent C created Task D
... [continues for 5,000 lines]
ERROR: Workspace has 5,000+ pending tasks. Halting operations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Autonomy without limits = autopoietic chaos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;: Anti-loop safeguards&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Task delegation depth limit
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delegation_depth&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_DEPTH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;DelegationDepthExceeded&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Workspace task rate limiting  
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks_created_last_hour&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;RATE_LIMIT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pause_for_cooldown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: Autonomous agents need "circuit breakers" more than any other system.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎭 Lesson #5: AI Has Its Own Bias (Not the Ones You Think)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Fail&lt;/strong&gt;: My AI-driven prioritization system systematically preferred tasks that "sounded more important" vs tasks that were actually business-critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: LLMs optimize for "sounding right" not "being right." Bias toward pompous corporate language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;: Objective metrics + AI reasoning&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Objective factors (non-negotiable)
&lt;/span&gt;    &lt;span class="n"&gt;base_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_dependencies_count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;age_days&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;business_impact_score&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# AI enhancement (subjective)
&lt;/span&gt;    &lt;span class="n"&gt;ai_modifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_ai_priority_assessment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ai_modifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Cap at 1000
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: AI for creativity, deterministic rules for critical decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 What's Next?
&lt;/h2&gt;

&lt;p&gt;These are just 5 of the 42+ lessons I documented building this system. Each fail led to architectural patterns I now use systematically.&lt;/p&gt;

&lt;p&gt;The journey from "single agent demo" to "production orchestration system" taught me that &lt;strong&gt;the real engineering isn't in the AI—it's in everything around it&lt;/strong&gt;: coordination, memory, error handling, cost management, and quality gates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question for the community&lt;/strong&gt;: What's been your most epic fail working with AI/agents? How did you solve it?&lt;/p&gt;

&lt;p&gt;If anyone's facing similar challenges in AI orchestration, happy to dive deeper into the technical details. This rabbit hole goes &lt;strong&gt;deep&lt;/strong&gt;!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>multiagent</category>
      <category>watercooler</category>
    </item>
  </channel>
</rss>
