<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: angu10</title>
    <description>The latest articles on DEV Community by angu10 (@angu10).</description>
    <link>https://dev.to/angu10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1044663%2F6f0294a9-2404-41a6-b770-c2062d20a809.png</url>
      <title>DEV Community: angu10</title>
      <link>https://dev.to/angu10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/angu10"/>
    <language>en</language>
    <item>
      <title>Building CloudGuide: A Real-Time AI Assistant for Navigating Google Cloud</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 13 Mar 2026 05:34:40 +0000</pubDate>
      <link>https://dev.to/angu10/building-cloudguide-a-real-time-ai-assistant-for-navigating-google-cloud-m8b</link>
      <guid>https://dev.to/angu10/building-cloudguide-a-real-time-ai-assistant-for-navigating-google-cloud-m8b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Cloud platforms are incredibly powerful — but navigating them can be confusing even for experienced developers.&lt;/p&gt;

&lt;p&gt;Recently, while setting up a static website on Google Cloud Storage, I realized how easy it is to make small mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A permission checkbox hidden deep in the UI&lt;/li&gt;
&lt;li&gt;A configuration buried under another menu&lt;/li&gt;
&lt;li&gt;A setting that appears correct but fails silently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many cases, developers spend more time &lt;strong&gt;searching the console UI&lt;/strong&gt; than actually building their application.&lt;/p&gt;

&lt;p&gt;That led to a question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if an AI assistant could watch your screen, listen to your question, and guide you step‑by‑step through cloud configuration?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That idea became &lt;strong&gt;CloudGuide&lt;/strong&gt;, a multimodal AI agent built with &lt;strong&gt;Google AI models and Google Cloud&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project was developed specifically for the &lt;strong&gt;Google AI Hackathon&lt;/strong&gt;, and this post explains how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;CloudGuide is a &lt;strong&gt;voice-enabled AI assistant that helps users configure cloud resources in real time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of reading documentation or watching tutorials, users can simply say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Help me deploy a website on Google Cloud.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI agent then:&lt;/p&gt;

&lt;p&gt;• Watches the user’s screen&lt;br&gt;&lt;br&gt;
• Listens to voice input&lt;br&gt;&lt;br&gt;
• Speaks instructions&lt;br&gt;&lt;br&gt;
• Highlights UI elements to click&lt;br&gt;&lt;br&gt;
• Verifies steps using real Google Cloud APIs  &lt;/p&gt;

&lt;p&gt;The goal is to turn complex cloud configuration into a &lt;strong&gt;guided interactive experience&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Key Capabilities
&lt;/h2&gt;

&lt;p&gt;CloudGuide combines several multimodal capabilities:&lt;/p&gt;
&lt;h3&gt;
  
  
  Screen Understanding
&lt;/h3&gt;

&lt;p&gt;The system captures periodic screenshots of the user's screen and sends them to the AI model for interpretation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Voice Interaction
&lt;/h3&gt;

&lt;p&gt;Users communicate naturally through a microphone, asking questions or requesting help.&lt;/p&gt;
&lt;h3&gt;
  
  
  Real-Time Voice Responses
&lt;/h3&gt;

&lt;p&gt;The AI responds with &lt;strong&gt;native audio output&lt;/strong&gt; using Gemini’s audio model.&lt;/p&gt;
&lt;h3&gt;
  
  
  API Grounding
&lt;/h3&gt;

&lt;p&gt;Instead of trusting screenshots alone, the system verifies actions using the &lt;strong&gt;Google Cloud Storage API&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Visual Click Guidance
&lt;/h3&gt;

&lt;p&gt;The system highlights the exact UI element users need to click.&lt;/p&gt;

&lt;p&gt;This dramatically reduces confusion when navigating complex cloud interfaces.&lt;/p&gt;


&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;The system consists of three primary layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Client&lt;/li&gt;
&lt;li&gt;Backend&lt;/li&gt;
&lt;li&gt;AI Model&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Client (User Machine)
&lt;/h3&gt;

&lt;p&gt;The client runs locally and handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Screen capture&lt;/li&gt;
&lt;li&gt;Microphone input&lt;/li&gt;
&lt;li&gt;Speaker output&lt;/li&gt;
&lt;li&gt;Browser highlighting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technologies used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;mss&lt;/strong&gt; – screen capture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pyaudio&lt;/strong&gt; – audio streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright + Chrome DevTools Protocol&lt;/strong&gt; – UI highlighting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The client streams screenshots and audio to the backend via WebSocket.&lt;/p&gt;


&lt;h3&gt;
  
  
  Backend (Google Cloud Run)
&lt;/h3&gt;

&lt;p&gt;The backend is deployed on &lt;strong&gt;Google Cloud Run&lt;/strong&gt; and built with &lt;strong&gt;FastAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Responsibilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing WebSocket connections&lt;/li&gt;
&lt;li&gt;Streaming data to the AI model&lt;/li&gt;
&lt;li&gt;Executing tool calls&lt;/li&gt;
&lt;li&gt;Verifying cloud configuration through APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All multimodal input flows through a request queue before being sent to the AI model.&lt;/p&gt;


&lt;h3&gt;
  
  
  AI Model (Gemini Live)
&lt;/h3&gt;

&lt;p&gt;CloudGuide uses the &lt;strong&gt;Gemini Live API&lt;/strong&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemini-2.5-flash-native-audio-latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time audio streaming&lt;/li&gt;
&lt;li&gt;Image understanding&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;Natural voice output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model processes &lt;strong&gt;audio and screenshots together&lt;/strong&gt; within a single streaming session.&lt;/p&gt;

&lt;p&gt;This creates a natural conversational experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tool Calling and API Grounding
&lt;/h2&gt;

&lt;p&gt;One challenge with vision-based AI systems is that screenshots can be misleading.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A UI might show a resource as created even though the underlying API operation failed.&lt;/p&gt;

&lt;p&gt;To solve this, CloudGuide uses &lt;strong&gt;tool calling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example tools include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;check_bucket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;list_bucket_files&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;check_bucket_permissions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;diagnose_bucket_issues&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These tools query the &lt;strong&gt;Google Cloud Storage API&lt;/strong&gt; directly.&lt;/p&gt;

&lt;p&gt;This allows the AI agent to verify that each step actually worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Visual UI Guidance
&lt;/h2&gt;

&lt;p&gt;Another major feature is &lt;strong&gt;element highlighting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Before asking the user to click something, the system highlights the UI element.&lt;/p&gt;

&lt;p&gt;A pulsing rectangle appears around the button or input field.&lt;/p&gt;

&lt;p&gt;This is implemented using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Playwright&lt;/li&gt;
&lt;li&gt;Chrome DevTools Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend sends highlight instructions, and the client injects an overlay into the browser.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Audio Feedback Loops
&lt;/h3&gt;

&lt;p&gt;When the AI speaks through speakers, the microphone can pick up the sound and send it back to the model.&lt;/p&gt;

&lt;p&gt;This creates a feedback loop where the model responds to itself.&lt;/p&gt;

&lt;p&gt;Using headphones mitigates this issue, but future improvements could include built‑in echo cancellation.&lt;/p&gt;




&lt;h3&gt;
  
  
  Voice Activity Detection
&lt;/h3&gt;

&lt;p&gt;The Live API sometimes interprets background noise as speech.&lt;/p&gt;

&lt;p&gt;Adding voice activity detection would improve reliability.&lt;/p&gt;




&lt;h3&gt;
  
  
  UI Changes
&lt;/h3&gt;

&lt;p&gt;Cloud interfaces evolve frequently.&lt;/p&gt;

&lt;p&gt;During development, certain UI paths moved or appeared differently than documented, which required adjustments to the workflow detection logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;The backend runs on &lt;strong&gt;Google Cloud Run&lt;/strong&gt; and is deployed through a CI/CD pipeline using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Build&lt;/li&gt;
&lt;li&gt;Container Registry&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deployment is automated through a simple script that builds and deploys the backend service.&lt;/p&gt;

&lt;p&gt;This makes the system scalable and accessible from anywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building this project revealed several key insights:&lt;/p&gt;

&lt;h3&gt;
  
  
  Multimodal AI is powerful
&lt;/h3&gt;

&lt;p&gt;Combining screen understanding with voice interaction creates a much more natural interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  API grounding improves reliability
&lt;/h3&gt;

&lt;p&gt;Vision alone is not enough. Verifying system state using APIs is essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice interfaces still need infrastructure improvements
&lt;/h3&gt;

&lt;p&gt;Capabilities like echo cancellation and voice activity detection will make voice agents significantly more robust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Cloud platforms are incredibly capable, but their complexity often slows developers down.&lt;/p&gt;

&lt;p&gt;Projects like CloudGuide explore a new paradigm:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI agents that guide users through complex systems in real time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google AI models&lt;/li&gt;
&lt;li&gt;Gemini Live streaming&lt;/li&gt;
&lt;li&gt;Google Cloud Run&lt;/li&gt;
&lt;li&gt;Real API grounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we can create assistants that truly understand what users are doing.&lt;/p&gt;

&lt;p&gt;This project was built using &lt;strong&gt;Google AI models and Google Cloud services&lt;/strong&gt; and the accompanying content was created specifically for entering the &lt;strong&gt;Google AI Hackathon&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>geminiliveagentchallenge</category>
      <category>python</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Stop Print Debugging Your AI Agents: A Deep Dive into Agent Observability</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 04 Jan 2026 05:15:46 +0000</pubDate>
      <link>https://dev.to/angu10/stop-print-debugging-your-ai-agents-a-deep-dive-into-agent-observability-29eo</link>
      <guid>https://dev.to/angu10/stop-print-debugging-your-ai-agents-a-deep-dive-into-agent-observability-29eo</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Invisible Agent Problem&lt;/li&gt;
&lt;li&gt;Why Current Solutions Fall Short&lt;/li&gt;
&lt;li&gt;Introducing Agent Recorder&lt;/li&gt;
&lt;li&gt;How It Works: Technical Deep Dive&lt;/li&gt;
&lt;li&gt;Real-World Use Cases&lt;/li&gt;
&lt;li&gt;Comparing Approaches&lt;/li&gt;
&lt;li&gt;Building Production-Ready Agents&lt;/li&gt;
&lt;li&gt;The Road Ahead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Invisible Agent Problem
&lt;/h2&gt;

&lt;p&gt;It's 2 AM. Your AI agent just went into an infinite loop consuming API credits. Again.&lt;/p&gt;

&lt;p&gt;You've built what should be a simple customer service agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse user question&lt;/li&gt;
&lt;li&gt;Search knowledge base&lt;/li&gt;
&lt;li&gt;Query database if needed&lt;/li&gt;
&lt;li&gt;Format response&lt;/li&gt;
&lt;li&gt;Maybe escalate to human support&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simple, right? Except somewhere in those 5 steps, your agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Called the same database query 15 times&lt;/li&gt;
&lt;li&gt;Got stuck in a loop asking the LLM to "try again"&lt;/li&gt;
&lt;li&gt;Hallucinated data that doesn't exist&lt;/li&gt;
&lt;li&gt;Crashed with a cryptic error in step 4&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And you have &lt;strong&gt;no idea which one&lt;/strong&gt; until you start debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Print Statement Spiral
&lt;/h3&gt;

&lt;p&gt;So you do what every developer does. You add logging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Calling LLM with: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] LLM took &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Got response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Searching DB: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_customer_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Getting customer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An hour later, your terminal looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DEBUG] Calling LLM with: Find all orders for customer John Smith...
[DEBUG] LLM took 1.23s
[DEBUG] Got response: I'll search for that customer...
[DEBUG] Searching DB: customer_name=John Smith
[DEBUG] Found 2 results
[DEBUG] Getting customer 123
[DEBUG] Customer: John Smith
[DEBUG] Calling LLM with: Here are the customer details: {'id': 123...
[DEBUG] LLM took 0.87s
[DEBUG] Got response: Let me get their orders...
[DEBUG] Searching DB: orders WHERE customer_id=123
[DEBUG] Found 3 results
[DEBUG] Calling LLM with: Here are the orders: [{'id': 1001, 'to...
[DEBUG] LLM took 1.45s
[DEBUG] Got response: The customer has 3 orders...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're staring at hundreds of lines of logs trying to answer basic questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many times did we call the LLM?&lt;/li&gt;
&lt;li&gt;What was the total execution time?&lt;/li&gt;
&lt;li&gt;Which step failed?&lt;/li&gt;
&lt;li&gt;What were the actual arguments passed to each function?&lt;/li&gt;
&lt;li&gt;When did it start looping?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is not sustainable.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Cost of Poor Observability
&lt;/h3&gt;

&lt;p&gt;Let me share some real numbers from my experience building AI agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time Spent Debugging:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Print debugging: 2-4 hours per bug&lt;/li&gt;
&lt;li&gt;Adding proper logging: 30 minutes per function&lt;/li&gt;
&lt;li&gt;Actually finding the bug: 15 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; 3-5 hours for issues that should take 15 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Developer Frustration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Losing context between debugging sessions&lt;/li&gt;
&lt;li&gt;Unable to reproduce issues&lt;/li&gt;
&lt;li&gt;No way to compare "working" vs "broken" runs&lt;/li&gt;
&lt;li&gt;Every new team member asks: "How do I debug this?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API Inefficiency:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agents making 3x more API calls than necessary&lt;/li&gt;
&lt;li&gt;Inefficient prompts using excessive tokens&lt;/li&gt;
&lt;li&gt;Unable to identify performance bottlenecks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've spent decades building amazing developer tools for web apps, mobile apps, backend services. But for AI agents? We're back to &lt;code&gt;print()&lt;/code&gt; statements like it's 1995.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Current Solutions Fall Short
&lt;/h2&gt;

&lt;p&gt;Before building Agent Recorder, I tried everything:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Standard Logging Libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling LLM with prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Still just text logs in a file&lt;/li&gt;
&lt;li&gt;No structure, no visualization&lt;/li&gt;
&lt;li&gt;Manual instrumentation everywhere&lt;/li&gt;
&lt;li&gt;Hard to correlate across async calls&lt;/li&gt;
&lt;li&gt;No timing information without extra code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Cloud Observability Tools (DataDog, New Relic, etc.)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expensive for small teams and individuals&lt;/li&gt;
&lt;li&gt;Send your prompts/responses to third-party servers (security issue)&lt;/li&gt;
&lt;li&gt;Heavy SDKs that bloat your dependencies&lt;/li&gt;
&lt;li&gt;Designed for traditional apps, not agent workflows&lt;/li&gt;
&lt;li&gt;Over-engineered for "just see what my agent did"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. LLM Provider Dashboards (OpenAI, Anthropic)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only see LLM calls, not your tool calls&lt;/li&gt;
&lt;li&gt;No local context (what led to this call?)&lt;/li&gt;
&lt;li&gt;Delayed (not real-time)&lt;/li&gt;
&lt;li&gt;Can't see your custom logic&lt;/li&gt;
&lt;li&gt;Vendor lock-in&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Framework-Specific Tools (LangSmith for LangChain)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only works with that framework&lt;/li&gt;
&lt;li&gt;Requires rewriting code to use their patterns&lt;/li&gt;
&lt;li&gt;Still cloud-based with subscription fees&lt;/li&gt;
&lt;li&gt;What if you use raw APIs or multiple frameworks?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I needed was simple:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See every LLM call and tool call&lt;/li&gt;
&lt;li&gt;Local storage (my data, my machine)&lt;/li&gt;
&lt;li&gt;Framework-agnostic (works with anything)&lt;/li&gt;
&lt;li&gt;Minimal code changes&lt;/li&gt;
&lt;li&gt;Beautiful visualization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Free and open source&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That tool didn't exist. So I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Agent Recorder
&lt;/h2&gt;

&lt;p&gt;Agent Recorder is &lt;strong&gt;Redux DevTools for AI agents&lt;/strong&gt;. If you've ever used Redux DevTools for React development, you know the power of seeing every action, every state change, with the ability to inspect, time-travel, and understand your application flow.&lt;/p&gt;

&lt;p&gt;Now imagine that, but for your AI agent's execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two-Decorator Solution
&lt;/h3&gt;

&lt;p&gt;Here's all you need to add to your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_recorder&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;

&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_customer_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE customer_id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No context managers, no complex setup, no configuration files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;run_name&lt;/code&gt; parameter groups related calls together.&lt;/strong&gt; All functions decorated with &lt;code&gt;run_name="customer-service-agent"&lt;/code&gt; will be recorded in the same timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gets Captured Automatically
&lt;/h3&gt;

&lt;p&gt;Every decorated function automatically logs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Function name&lt;/strong&gt; - What was called&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arguments&lt;/strong&gt; - All input parameters with their values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Return value&lt;/strong&gt; - Complete output from the function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration&lt;/strong&gt; - Execution time in milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timestamp&lt;/strong&gt; - Exact time of invocation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors&lt;/strong&gt; - Full exception details if it failed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parent tracking&lt;/strong&gt; - For nested function calls&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;No manual annotation needed.&lt;/strong&gt; Just add the decorator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running Your Agent
&lt;/h3&gt;

&lt;p&gt;Use your functions exactly as before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is your agent logic - unchanged!
&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find all orders for customer John Smith&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Ask LLM to understand the query
&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asks: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Search for the customer
&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_name=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;John Smith&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Get their orders
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_customer_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Summarize results
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize these orders: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is being recorded in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  Viewing the Timeline
&lt;/h3&gt;

&lt;p&gt;When your agent finishes (or crashes), run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your browser opens to a beautiful web-based timeline showing the complete execution flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: Technical Deep Dive
&lt;/h2&gt;

&lt;p&gt;Let me walk you through the architecture and implementation details.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Decorator-Based Instrumentation
&lt;/h3&gt;

&lt;p&gt;When you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what happens under the hood:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Registry Lookup:&lt;/strong&gt; Agent Recorder checks if a &lt;code&gt;Recorder&lt;/code&gt; instance exists for &lt;code&gt;"my-agent"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-Creation:&lt;/strong&gt; If not, it creates one with a unique run ID (timestamp + UUID)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function Wrapping:&lt;/strong&gt; Your function gets wrapped with timing and logging logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution:&lt;/strong&gt; When called, it captures args, executes the function, captures the result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event Writing:&lt;/strong&gt; Writes a structured event to a JSONL file immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The actual implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Get or create a Recorder instance for this run_name
&lt;/span&gt;    &lt;span class="n"&gt;recorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_or_create_recorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Return the actual decorator that wraps your function
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recorder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Event Storage Format
&lt;/h3&gt;

&lt;p&gt;All events are stored as &lt;strong&gt;JSONL&lt;/strong&gt; (JSON Lines) - one JSON object per line. This format is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streamable:&lt;/strong&gt; Can write events as they happen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parseable:&lt;/strong&gt; Easy to read line-by-line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash-resistant:&lt;/strong&gt; If your program crashes, all events up to that point are saved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tooling-friendly:&lt;/strong&gt; Standard format used by many data tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20260103_192705_c2207bde"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4f85a880-2ab7-45bf-a0ba-9c776581a5de"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-03T19:27:06.097562"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User asks: Find all orders for customer John Smith"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;760&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'll help you find customer information. Let me search the database."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Storage location:&lt;/strong&gt; &lt;code&gt;~/.agent-recorder/runs/&amp;lt;run_id&amp;gt;.jsonl&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Event Types
&lt;/h3&gt;

&lt;p&gt;Agent Recorder tracks 5 event types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;run_start&lt;/strong&gt; - Marks the beginning of a run
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer-service-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20260103_192705_c2207bde"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-03T19:27:05.337192"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;llm_call&lt;/strong&gt; - LLM function execution
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;tool_call&lt;/strong&gt; - Tool function execution
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;340&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;error&lt;/strong&gt; - Exception that occurred
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"error_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ValueError"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Customer not found"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"traceback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;run_end&lt;/strong&gt; - Marks completion (optional in v0.1.1)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Async Support
&lt;/h3&gt;

&lt;p&gt;The same decorators work seamlessly with async functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;async-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;openai_async&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;async-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.weather.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Use with asyncio
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_llm_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in SF?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;weather&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;San Francisco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent Recorder detects if your function is a coroutine and handles it appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Web Viewer Architecture
&lt;/h3&gt;

&lt;p&gt;The viewer is a &lt;strong&gt;self-contained HTML file&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No external dependencies (no CDN calls)&lt;/li&gt;
&lt;li&gt;Vanilla JavaScript for parsing JSONL&lt;/li&gt;
&lt;li&gt;CSS for the timeline UI&lt;/li&gt;
&lt;li&gt;Syntax highlighting for JSON data&lt;/li&gt;
&lt;li&gt;Collapsible event cards&lt;/li&gt;
&lt;li&gt;Search and filter capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you run &lt;code&gt;agent-recorder view latest&lt;/code&gt;, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finds the latest run in &lt;code&gt;~/.agent-recorder/runs/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Starts a local HTTP server (default port 8765)&lt;/li&gt;
&lt;li&gt;Serves the HTML viewer + JSONL data&lt;/li&gt;
&lt;li&gt;Opens your browser to &lt;code&gt;http://localhost:8765/runs/&amp;lt;run_id&amp;gt;.html&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Everything stays local.&lt;/strong&gt; No data leaves your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Let me show you how Agent Recorder solves actual problems I've encountered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 1: Debugging Infinite Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Agent keeps calling the same tool over and over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Agent Recorder:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DEBUG] Calling search_database with query: customer_name='John'
[DEBUG] Got 0 results
[DEBUG] Calling LLM...
[DEBUG] LLM says: Let me search again
[DEBUG] Calling search_database with query: customer_name='John'
[DEBUG] Got 0 results
[DEBUG] Calling LLM...
[DEBUG] LLM says: Let me search again
... (500 more lines)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to manually count log lines and realize it's looping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the timeline and immediately see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. llm_call - "Find customer John"
2. tool_call - search_database(query="customer_name='John'") → []
3. llm_call - "I got no results, let me try again"
4. tool_call - search_database(query="customer_name='John'") → []
5. llm_call - "I got no results, let me try again"
6. tool_call - search_database(query="customer_name='John'") → []
... (pattern visible immediately)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; The database query is wrong (should be &lt;code&gt;customer_name='John Smith'&lt;/code&gt;). Also, the LLM needs explicit instruction to stop after 1 failed attempt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; 2 hours → 5 minutes&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 2: Performance Optimization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Agent is slow but you don't know which part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at the timeline durations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. llm_call - 1.2s ⚡ (acceptable)
2. tool_call - search_database - 3.8s 🐌 (SLOW!)
3. tool_call - get_orders - 0.4s ⚡
4. llm_call - 0.9s ⚡
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Add a database index on &lt;code&gt;customer_name&lt;/code&gt;. Duration drops to 0.2s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Total execution time: 6.3s → 2.7s (57% faster)&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 3: Token Usage Optimization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; High API usage, unclear why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Export the run to JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agent-recorder &lt;span class="nb"&gt;export&lt;/span&gt; &amp;lt;run_id&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; run.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write a quick script to analyze:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;total_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;run.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;total_calls&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total LLM calls: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_calls&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average prompt length: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_calls&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Discovery:&lt;/strong&gt; One LLM call had a 5000-character prompt that included the entire knowledge base unnecessarily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Pass only relevant excerpts to the LLM. Token usage drops significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 4: Comparing Runs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; "It worked yesterday, now it's broken."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all runs&lt;/span&gt;
agent-recorder list

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# 20260102_143022_abc123  customer-agent  2026-01-02 14:30:22 (working)&lt;/span&gt;
&lt;span class="c"&gt;# 20260103_192705_c2207b  customer-agent  2026-01-03 19:27:05 (broken)&lt;/span&gt;

&lt;span class="c"&gt;# Export both&lt;/span&gt;
agent-recorder &lt;span class="nb"&gt;export &lt;/span&gt;20260102_143022_abc123 &lt;span class="nt"&gt;-o&lt;/span&gt; working.json
agent-recorder &lt;span class="nb"&gt;export &lt;/span&gt;20260103_192705_c2207b &lt;span class="nt"&gt;-o&lt;/span&gt; broken.json

&lt;span class="c"&gt;# Compare with diff tool or custom script&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Discovery:&lt;/strong&gt; In the broken version, a new validation step was added that always returns empty results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; 4 hours → 15 minutes&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 5: Onboarding New Team Members
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; "How does this agent work?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run a sample execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python examples/customer_service_agent.py
agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Show them the timeline. They instantly understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent asks LLM to parse the query&lt;/li&gt;
&lt;li&gt;LLM decides which tools to call&lt;/li&gt;
&lt;li&gt;Agent executes tools (database, API calls)&lt;/li&gt;
&lt;li&gt;LLM synthesizes the response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;No documentation needed.&lt;/strong&gt; The timeline is living documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing Approaches
&lt;/h2&gt;

&lt;p&gt;Let me compare different debugging approaches with a real scenario:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; Debug why customer order lookup fails for "John Smith"&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Print Statements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Looking for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: No customers found, returning empty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Getting orders for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 30-60 minutes&lt;br&gt;
&lt;strong&gt;Lines of debug code:&lt;/strong&gt; 15-20&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Remove all print statements&lt;br&gt;
&lt;strong&gt;If it breaks again:&lt;/strong&gt; Add them all back&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 2: Logging Framework
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Looking for customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No customers found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Getting orders for customer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 20-30 minutes&lt;br&gt;
&lt;strong&gt;Lines of debug code:&lt;/strong&gt; 20-25 (permanent overhead)&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Logs stay (clutter over time)&lt;br&gt;
&lt;strong&gt;Visualization:&lt;/strong&gt; Still just text in a file&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 3: Cloud Observability (e.g., DataDog)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ddtrace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;

&lt;span class="nd"&gt;@tracer.wrap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 10-15 minutes&lt;br&gt;
&lt;strong&gt;Setup time:&lt;/strong&gt; 2-3 hours (SDK, config, account)&lt;br&gt;
&lt;strong&gt;Ongoing:&lt;/strong&gt; Monthly subscription&lt;br&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Data sent to third-party&lt;br&gt;
&lt;strong&gt;Lines of instrumentation:&lt;/strong&gt; 15-20&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 4: Agent Recorder
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_recorder&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM customers WHERE name = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE customer_id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 5-10 minutes&lt;br&gt;
&lt;strong&gt;Setup time:&lt;/strong&gt; 30 seconds (&lt;code&gt;pip install&lt;/code&gt;)&lt;br&gt;
&lt;strong&gt;Ongoing:&lt;/strong&gt; Free&lt;br&gt;
&lt;strong&gt;Security:&lt;/strong&gt; All data local&lt;br&gt;
&lt;strong&gt;Lines of instrumentation:&lt;/strong&gt; 3 decorators&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Decorators stay (useful for future debugging)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner:&lt;/strong&gt; Agent Recorder provides the best balance of simplicity, effectiveness, and privacy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building Production-Ready Agents
&lt;/h2&gt;

&lt;p&gt;Agent Recorder isn't just for debugging - it's essential for production agents.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Handling Sensitive Data
&lt;/h3&gt;

&lt;p&gt;Don't log API keys or personal information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm_with_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# api_key won't be logged
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_user_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Result won't be logged (but function call and args will)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user_sensitive_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Custom Storage Location
&lt;/h3&gt;

&lt;p&gt;For production deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;storage_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/log/agent-recorder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Cleanup Old Runs
&lt;/h3&gt;

&lt;p&gt;Keep disk usage under control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete runs older than 7 days&lt;/span&gt;
agent-recorder cleanup &lt;span class="nt"&gt;--older-than&lt;/span&gt; 7d

&lt;span class="c"&gt;# Dry run to see what would be deleted&lt;/span&gt;
agent-recorder cleanup &lt;span class="nt"&gt;--older-than&lt;/span&gt; 7d &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Automated Analysis
&lt;/h3&gt;

&lt;p&gt;Export and analyze runs programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Export to JSON
&lt;/span&gt;    &lt;span class="n"&gt;export_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-recorder export &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -o &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;export_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load and analyze
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;export_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_llm_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_tool_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_llm_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_tool_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Road Ahead
&lt;/h2&gt;

&lt;p&gt;Agent Recorder v0.1.1 is just the beginning. Here's what's coming:&lt;/p&gt;

&lt;h3&gt;
  
  
  v0.2.0 - Enhanced Visualization (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tree/Graph View:&lt;/strong&gt; See nested calls as a visual tree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Counting:&lt;/strong&gt; Automatic token counting for OpenAI/Anthropic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Estimation:&lt;/strong&gt; Calculate API costs for each run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Metrics:&lt;/strong&gt; Identify bottlenecks automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export Formats:&lt;/strong&gt; PDF, HTML, CSV for reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.3.0 - Framework Integrations (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain Adapter:&lt;/strong&gt; Auto-instrument LangChain agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LlamaIndex Adapter:&lt;/strong&gt; Seamless integration with LlamaIndex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AutoGen Support:&lt;/strong&gt; Track multi-agent conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI Integration:&lt;/strong&gt; Monitor crew workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.4.0 - Advanced Features (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Streaming:&lt;/strong&gt; Watch agent execution live&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent Support:&lt;/strong&gt; Track multiple agents interacting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff View:&lt;/strong&gt; Compare two runs side-by-side&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Events:&lt;/strong&gt; Log your own event types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin System:&lt;/strong&gt; Extend with custom visualizations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.5.0 - Language Ports (Community Welcome!)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript/Node.js SDK:&lt;/strong&gt; For JavaScript agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go SDK:&lt;/strong&gt; For Go-based agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rust SDK:&lt;/strong&gt; For high-performance agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Want to contribute?&lt;/strong&gt; Check out the GitHub repo for good first issues!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Observability Is Not Optional
&lt;/h2&gt;

&lt;p&gt;As AI agents move from prototypes to production, observability isn't a nice-to-have - &lt;strong&gt;it's essential.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't optimize what you can't measure.&lt;br&gt;
You can't debug what you can't see.&lt;br&gt;
You can't trust what you can't verify.&lt;/p&gt;

&lt;p&gt;Agent Recorder gives you that visibility with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Two simple decorators&lt;/li&gt;
&lt;li&gt;✅ Zero configuration&lt;/li&gt;
&lt;li&gt;✅ Local-first architecture&lt;/li&gt;
&lt;li&gt;✅ Framework-agnostic design&lt;/li&gt;
&lt;li&gt;✅ Beautiful visualization&lt;/li&gt;
&lt;li&gt;✅ Free and open source&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started Today
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone and install&lt;/span&gt;
git clone https://github.com/yourusername/agent-recorder.git
&lt;span class="nb"&gt;cd &lt;/span&gt;agent-recorder
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Try the example&lt;/span&gt;
python examples/simple_agent.py

&lt;span class="c"&gt;# View the recording&lt;/span&gt;
agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/yourusername/agent-recorder" rel="noopener noreferrer"&gt;https://github.com/yourusername/agent-recorder&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;br&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; See README.md for full documentation&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the Community
&lt;/h2&gt;

&lt;p&gt;Agent Recorder is open source and built for the community. Whether you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found a bug → Open an issue&lt;/li&gt;
&lt;li&gt;Have a feature idea → Start a discussion&lt;/li&gt;
&lt;li&gt;Want to contribute → Submit a PR&lt;/li&gt;
&lt;li&gt;Built something cool → Share your story&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building the future of agent observability together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Star the repo&lt;/strong&gt; if you find it useful - it helps others discover the project!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Building Self-Correcting Database Agents with Meta's Llama-4-Scout: From Natural Language to SQL Insights</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Thu, 09 Oct 2025 15:09:03 +0000</pubDate>
      <link>https://dev.to/angu10/building-self-correcting-database-agents-with-metas-llama-4-scout-from-natural-language-to-sql-2k0a</link>
      <guid>https://dev.to/angu10/building-self-correcting-database-agents-with-metas-llama-4-scout-from-natural-language-to-sql-2k0a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The ability to query databases using natural language represents one of the most practical applications of large language models in enterprise settings. However, most text-to-SQL implementations suffer from a critical flaw: they generate queries once and hope for the best. When a query fails—due to incorrect table names, misunderstood schema relationships, or logical errors—the system simply returns an error message, leaving users frustrated.&lt;/p&gt;

&lt;p&gt;In this technical guide, we'll build a sophisticated database agent using Meta's Llama-4-Scout that doesn't just generate SQL queries—it thinks through the problem, validates its approach, and most importantly, learns from its mistakes to self-correct. This agent implements a five-phase cognitive framework: &lt;strong&gt;Understand → Plan → Generate → Validate → Execute&lt;/strong&gt;, with intelligent retry logic when queries fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Database Agents Matter
&lt;/h2&gt;

&lt;p&gt;Business analysts, product managers, and non-technical stakeholders frequently need data insights but lack SQL expertise. Traditional BI tools require pre-built dashboards, creating bottlenecks when ad-hoc questions arise. A well-designed database agent democratizes data access while maintaining query safety and accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Llama-4-Scout for Structured Data Tasks
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout excels at structured reasoning tasks for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema comprehension&lt;/strong&gt;: Understands relational database concepts including foreign keys, joins, and aggregations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error diagnosis&lt;/strong&gt;: Can parse SQL error messages and identify root causes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative refinement&lt;/strong&gt;: Maintains context across retry attempts to avoid repeating mistakes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety awareness&lt;/strong&gt;: Can be prompted to avoid destructive operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Architecture
&lt;/h2&gt;

&lt;p&gt;Our implementation consists of four layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Database Layer&lt;/strong&gt;: SQLite database with realistic employee/department data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Cognitive Layer&lt;/strong&gt;: Five-phase reasoning pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety &amp;amp; Validation Layer&lt;/strong&gt;: SQL injection prevention and query validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User Interface Layer&lt;/strong&gt;: Streamlit application for interaction&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's build this step by step.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Environment Setup and Dependencies
&lt;/h2&gt;

&lt;p&gt;`python&lt;/p&gt;

&lt;h1&gt;
  
  
  Required packages
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pip install streamlit together sqlite3 pandas
&lt;/h1&gt;

&lt;p&gt;import streamlit as st&lt;br&gt;
import sqlite3&lt;br&gt;
import pandas as pd&lt;br&gt;
from together import Together&lt;br&gt;
import re&lt;br&gt;
from datetime import datetime&lt;br&gt;
import json&lt;br&gt;
&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Database Schema and Sample Data Creation
&lt;/h2&gt;

&lt;p&gt;First, we'll create a realistic database with three interconnected tables representing a company's organizational structure:&lt;/p&gt;

&lt;p&gt;`&lt;code&gt;&lt;/code&gt;python&lt;br&gt;
def create_sample_database():&lt;br&gt;
    """Create a sample SQLite database with employee, department, and project tables"""&lt;br&gt;
    conn = sqlite3.connect('company.db')&lt;br&gt;
    cursor = conn.cursor()&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Drop existing tables if they exist
cursor.execute('DROP TABLE IF EXISTS projects')
cursor.execute('DROP TABLE IF EXISTS employees')
cursor.execute('DROP TABLE IF EXISTS departments')

# Create departments table
cursor.execute('''
CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT NOT NULL,
    budget DECIMAL(12, 2),
    location TEXT
)
''')

# Create employees table with self-referential manager relationship
cursor.execute('''
CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT UNIQUE,
    hire_date DATE,
    salary DECIMAL(10, 2),
    department_id INTEGER,
    manager_id INTEGER,
    FOREIGN KEY (department_id) REFERENCES departments(department_id),
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
)
''')

# Create projects table
cursor.execute('''
CREATE TABLE projects (
    project_id INTEGER PRIMARY KEY,
    project_name TEXT NOT NULL,
    department_id INTEGER,
    start_date DATE,
    end_date DATE,
    status TEXT,
    budget DECIMAL(10, 2),
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
)
''')

# Insert sample departments
departments = [
    (1, 'Engineering', 2500000.00, 'San Francisco'),
    (2, 'Sales', 1500000.00, 'New York'),
    (3, 'Marketing', 800000.00, 'Austin'),
    (4, 'Human Resources', 600000.00, 'San Francisco'),
    (5, 'Finance', 700000.00, 'New York')
]
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?, ?)', departments)

# Insert sample employees (managers first, then their reports)
employees = [
    # Managers (no manager_id)
    (1, 'Sarah', 'Johnson', 'sarah.j@company.com', '2018-01-15', 150000, 1, None),
    (2, 'Michael', 'Chen', 'michael.c@company.com', '2017-03-20', 145000, 2, None),
    (3, 'Emily', 'Rodriguez', 'emily.r@company.com', '2019-06-10', 125000, 3, None),
    (4, 'David', 'Kim', 'david.k@company.com', '2018-09-05', 130000, 4, None),
    (5, 'Lisa', 'Patel', 'lisa.p@company.com', '2017-11-12', 140000, 5, None),

    # Engineering team
    (6, 'James', 'Wilson', 'james.w@company.com', '2020-02-14', 115000, 1, 1),
    (7, 'Anna', 'Martinez', 'anna.m@company.com', '2021-05-18', 95000, 1, 1),
    (8, 'Robert', 'Taylor', 'robert.t@company.com', '2019-08-22', 120000, 1, 1),
    (9, 'Jennifer', 'Anderson', 'jennifer.a@company.com', '2022-01-10', 85000, 1, 6),
    (10, 'William', 'Thomas', 'william.t@company.com', '2021-11-30', 90000, 1, 6),

    # Sales team
    (11, 'Jessica', 'Moore', 'jessica.m@company.com', '2020-04-05', 95000, 2, 2),
    (12, 'Christopher', 'Jackson', 'chris.j@company.com', '2021-07-19', 88000, 2, 2),
    (13, 'Amanda', 'White', 'amanda.w@company.com', '2019-12-01', 105000, 2, 2),
    (14, 'Daniel', 'Harris', 'daniel.h@company.com', '2022-03-15', 75000, 2, 11),

    # Marketing team
    (15, 'Michelle', 'Martin', 'michelle.m@company.com', '2020-09-08', 82000, 3, 3),
    (16, 'Kevin', 'Thompson', 'kevin.t@company.com', '2021-10-20', 78000, 3, 3),
    (17, 'Rachel', 'Garcia', 'rachel.g@company.com', '2022-02-28', 72000, 3, 15),

    # HR team
    (18, 'Brian', 'Martinez', 'brian.m@company.com', '2019-05-14', 85000, 4, 4),
    (19, 'Nicole', 'Robinson', 'nicole.r@company.com', '2021-08-03', 75000, 4, 4),

    # Finance team
    (20, 'Andrew', 'Clark', 'andrew.c@company.com', '2018-07-22', 95000, 5, 5),
    (21, 'Stephanie', 'Lewis', 'stephanie.l@company.com', '2020-11-11', 88000, 5, 5),
    (22, 'Joshua', 'Lee', 'joshua.l@company.com', '2022-04-06', 80000, 5, 20)
]
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?, ?, ?, ?)', employees)

# Insert sample projects
projects = [
    (1, 'Cloud Migration', 1, '2024-01-01', '2024-12-31', 'In Progress', 500000),
    (2, 'Mobile App Redesign', 1, '2024-03-15', '2024-09-30', 'In Progress', 300000),
    (3, 'Q4 Sales Campaign', 2, '2024-10-01', '2024-12-31', 'Planning', 200000),
    (4, 'Enterprise Sales Initiative', 2, '2024-02-01', '2024-11-30', 'In Progress', 150000),
    (5, 'Brand Refresh', 3, '2024-05-01', '2024-08-31', 'Completed', 180000),
    (6, 'Social Media Strategy', 3, '2024-07-01', '2025-06-30', 'In Progress', 120000),
    (7, 'HR System Upgrade', 4, '2024-04-01', '2024-10-31', 'In Progress', 100000),
    (8, 'Financial Audit 2024', 5, '2024-01-01', '2024-03-31', 'Completed', 80000)
]
cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?, ?, ?)', projects)

conn.commit()
conn.close()

return "Database created successfully!"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def get_database_schema():&lt;br&gt;
    """Extract and format database schema for the agent"""&lt;br&gt;
    conn = sqlite3.connect('company.db')&lt;br&gt;
    cursor = conn.cursor()&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_info = []

# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()

for table in tables:
    table_name = table[0]
    schema_info.append(f"\n--- Table: {table_name} ---")

    # Get column information
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()

    for col in columns:
        col_name = col[1]
        col_type = col[2]
        is_pk = " (PRIMARY KEY)" if col[5] else ""
        schema_info.append(f"  - {col_name}: {col_type}{is_pk}")

    # Get foreign key information
    cursor.execute(f"PRAGMA foreign_key_list({table_name})")
    fks = cursor.fetchall()

    if fks:
        schema_info.append("  Foreign Keys:")
        for fk in fks:
            schema_info.append(f"    - {fk[3]} -&amp;gt; {fk[2]}({fk[4]})")

conn.close()
return "\n".join(schema_info)`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: The Five-Phase Agent Cognitive Framework
&lt;/h2&gt;

&lt;p&gt;Now we implement the core intelligence of our agent. Each phase has a specific responsibility in the query generation pipeline:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;/p&gt;
&lt;h1&gt;
  
  
  Initialize Together AI client
&lt;/h1&gt;

&lt;p&gt;together_client = Together(api_key="your_api_key_here")&lt;/p&gt;

&lt;p&gt;def phase_1_understand(question, schema):&lt;br&gt;
    """&lt;br&gt;
    PHASE 1: UNDERSTAND&lt;br&gt;
    Analyze the natural language question and understand what data is needed&lt;br&gt;
    """&lt;br&gt;
    instruction = f"""&lt;br&gt;
You are a database expert analyzing a natural language question.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;TASK: Analyze this question to understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What specific information is the user requesting?&lt;/li&gt;
&lt;li&gt;Which tables contain this information?&lt;/li&gt;
&lt;li&gt;What columns are relevant?&lt;/li&gt;
&lt;li&gt;Are any calculations, aggregations, or groupings needed?&lt;/li&gt;
&lt;li&gt;Are there any ambiguities in the question?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provide a clear analysis of what data needs to be retrieved.&lt;br&gt;
"""&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in understanding phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_2_plan(question, understanding, schema, previous_error=None):&lt;br&gt;
    """&lt;br&gt;
    PHASE 2: PLAN&lt;br&gt;
    Create a detailed plan for constructing the SQL query&lt;br&gt;
    """&lt;br&gt;
    error_context = ""&lt;br&gt;
    if previous_error:&lt;br&gt;
        error_context = f"\n\nPREVIOUS ATTEMPT FAILED WITH ERROR:\n{previous_error}\n\nAdjust your plan to avoid this error."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;instruction = f"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You are a database expert creating a query plan.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;YOUR UNDERSTANDING: {understanding}&lt;br&gt;
{error_context}&lt;/p&gt;

&lt;p&gt;TASK: Create a detailed SQL query plan:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Which tables need to be queried?&lt;/li&gt;
&lt;li&gt;What JOIN operations are required (if any)?&lt;/li&gt;
&lt;li&gt;What WHERE conditions are needed?&lt;/li&gt;
&lt;li&gt;Are GROUP BY or aggregate functions needed?&lt;/li&gt;
&lt;li&gt;Should results be sorted or limited?&lt;/li&gt;
&lt;li&gt;What specific columns should be selected?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provide a step-by-step plan in numbered format.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in planning phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_3_generate(question, understanding, plan, schema, previous_error=None, previous_sql=None):&lt;br&gt;
    """&lt;br&gt;
    PHASE 3: GENERATE&lt;br&gt;
    Generate the actual SQL query based on the plan&lt;br&gt;
    """&lt;br&gt;
    error_context = ""&lt;br&gt;
    if previous_error and previous_sql:&lt;br&gt;
        error_context = f"""&lt;br&gt;
\n\nPREVIOUS SQL ATTEMPT:&lt;br&gt;
{previous_sql}&lt;/p&gt;

&lt;p&gt;ERROR RECEIVED:&lt;br&gt;
{previous_error}&lt;/p&gt;

&lt;p&gt;IMPORTANT: Fix the specific error above. Common issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect table or column names (check schema carefully)&lt;/li&gt;
&lt;li&gt;Missing JOIN conditions&lt;/li&gt;
&lt;li&gt;Syntax errors&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Incorrect aggregate functions&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;instruction = f"""&lt;br&gt;
You are a SQL expert generating a query.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;YOUR UNDERSTANDING: {understanding}&lt;/p&gt;

&lt;p&gt;YOUR PLAN: {plan}&lt;br&gt;
{error_context}&lt;/p&gt;

&lt;p&gt;TASK: Generate a valid SQLite SQL query that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Follows the plan exactly&lt;/li&gt;
&lt;li&gt;Uses correct table and column names from the schema&lt;/li&gt;
&lt;li&gt;Includes proper JOIN syntax if needed&lt;/li&gt;
&lt;li&gt;Has correct WHERE, GROUP BY, and ORDER BY clauses&lt;/li&gt;
&lt;li&gt;Is properly formatted and includes comments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CRITICAL RULES:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only use tables and columns that exist in the schema above&lt;/li&gt;
&lt;li&gt;For SQLite, use proper syntax (e.g., no TOP keyword, use LIMIT instead)&lt;/li&gt;
&lt;li&gt;Always use table aliases for clarity in JOINs&lt;/li&gt;
&lt;li&gt;Double-check column names match the schema exactly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Respond with ONLY the SQL query, nothing else. Do not include markdown code blocks or explanations.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=600,
        temperature=0.1
    )
    sql_query = response.choices[0].message.content.strip()

    # Clean up the SQL (remove markdown code blocks if present)
    sql_query = re.sub(r'^```sql\s*', '', sql_query)
    sql_query = re.sub(r'^```\s*', '', sql_query)
    sql_query = re.sub(r'\s*```$', '', sql_query)

    return sql_query.strip()
except Exception as e:
    return f"Error in generation phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_4_validate(sql_query, schema):&lt;br&gt;
    """&lt;br&gt;
    PHASE 4: VALIDATE&lt;br&gt;
    Validate the SQL query for safety and correctness before execution&lt;br&gt;
    """&lt;br&gt;
    validation_results = {&lt;br&gt;
        "is_safe": True,&lt;br&gt;
        "is_valid": True,&lt;br&gt;
        "warnings": [],&lt;br&gt;
        "errors": []&lt;br&gt;
    }&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Check for dangerous operations
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE', 'GRANT', 'REVOKE']
sql_upper = sql_query.upper()

for keyword in dangerous_keywords:
    if keyword in sql_upper:
        validation_results["is_safe"] = False
        validation_results["errors"].append(f"Dangerous operation detected: {keyword}")

# Check for SELECT statement
if not sql_upper.strip().startswith('SELECT'):
    validation_results["is_safe"] = False
    validation_results["errors"].append("Only SELECT queries are allowed")

# Check for semicolon (potential SQL injection)
if sql_query.count(';') &amp;gt; 1:
    validation_results["warnings"].append("Multiple statements detected - potential security risk")

# Basic syntax check
if 'FROM' not in sql_upper:
    validation_results["is_valid"] = False
    validation_results["errors"].append("Invalid SQL: Missing FROM clause")

return validation_results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_5_execute(sql_query):&lt;br&gt;
    """&lt;br&gt;
    PHASE 5: EXECUTE&lt;br&gt;
    Execute the SQL query and return results&lt;br&gt;
    """&lt;br&gt;
    try:&lt;br&gt;
        conn = sqlite3.connect('company.db')&lt;br&gt;
        df = pd.read_sql_query(sql_query, conn)&lt;br&gt;
        conn.close()&lt;br&gt;
        return df, None&lt;br&gt;
    except Exception as e:&lt;br&gt;
        return None, str(e)&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Self-Correction and Retry Logic
&lt;/h2&gt;

&lt;p&gt;This is where our agent becomes truly intelligent. When a query fails, the agent analyzes the error and attempts to fix it:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
def diagnose_error(sql_query, error_message, schema):&lt;br&gt;
    """&lt;br&gt;
    Use the LLM to diagnose what went wrong and provide guidance&lt;br&gt;
    """&lt;br&gt;
    instruction = f"""&lt;br&gt;
You are a SQL debugging expert.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;FAILED SQL QUERY:&lt;br&gt;
{sql_query}&lt;/p&gt;

&lt;p&gt;ERROR MESSAGE:&lt;br&gt;
{error_message}&lt;/p&gt;

&lt;p&gt;TASK: Diagnose the specific problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What caused this error?&lt;/li&gt;
&lt;li&gt;Which part of the query is incorrect?&lt;/li&gt;
&lt;li&gt;What should be changed to fix it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Be specific and concise. Focus on the exact fix needed.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in diagnosis: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def execute_query_with_retry(question, schema, max_attempts=3):&lt;br&gt;
    """&lt;br&gt;
    Complete pipeline with retry logic&lt;br&gt;
    """&lt;br&gt;
    attempt_history = []&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for attempt in range(max_attempts):
    attempt_info = {
        "attempt_number": attempt + 1,
        "understanding": "",
        "plan": "",
        "sql": "",
        "validation": {},
        "result": None,
        "error": None,
        "diagnosis": ""
    }

    # Get previous error context if this is a retry
    previous_error = attempt_history[-1]["error"] if attempt_history else None
    previous_sql = attempt_history[-1]["sql"] if attempt_history else None

    # Phase 1: Understand
    understanding = phase_1_understand(question, schema)
    attempt_info["understanding"] = understanding

    # Phase 2: Plan (with error context if retrying)
    plan = phase_2_plan(question, understanding, schema, previous_error)
    attempt_info["plan"] = plan

    # Phase 3: Generate SQL (with error context if retrying)
    sql_query = phase_3_generate(question, understanding, plan, schema, previous_error, previous_sql)
    attempt_info["sql"] = sql_query

    # Phase 4: Validate
    validation = phase_4_validate(sql_query, schema)
    attempt_info["validation"] = validation

    if not validation["is_safe"] or not validation["is_valid"]:
        attempt_info["error"] = "Query failed validation: " + ", ".join(validation["errors"])
        attempt_history.append(attempt_info)
        continue

    # Phase 5: Execute
    result, error = phase_5_execute(sql_query)
    attempt_info["result"] = result
    attempt_info["error"] = error

    if error:
        # Diagnose the error
        diagnosis = diagnose_error(sql_query, error, schema)
        attempt_info["diagnosis"] = diagnosis
        attempt_history.append(attempt_info)
        continue

    # Success!
    attempt_history.append(attempt_info)
    return result, sql_query, attempt_history

# All attempts failed
return None, None, attempt_history
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def format_natural_language_response(question, df, sql_query):&lt;br&gt;
    """&lt;br&gt;
    Generate a natural language explanation of the results&lt;br&gt;
    """&lt;br&gt;
    if df is None or df.empty:&lt;br&gt;
        return "No results found for your query."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result_preview = df.head(10).to_string()
row_count = len(df)

instruction = f"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You are a data analyst explaining query results to a business user.&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;SQL QUERY EXECUTED:&lt;br&gt;
{sql_query}&lt;/p&gt;

&lt;p&gt;RESULTS ({row_count} rows total):&lt;br&gt;
{result_preview}&lt;/p&gt;

&lt;p&gt;TASK: Provide a clear, business-friendly explanation of these results:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Directly answer the user's question&lt;/li&gt;
&lt;li&gt;Highlight key insights from the data&lt;/li&gt;
&lt;li&gt;Mention the total number of results if relevant&lt;/li&gt;
&lt;li&gt;Keep it concise but informative&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Do not show the SQL query or technical details. Focus on the business meaning.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.3
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Results retrieved successfully. {row_count} rows returned."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;`&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Building the Streamlit User Interface
&lt;/h2&gt;

&lt;p&gt;Now let's create an interactive interface that showcases the agent's cognitive process:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
def main():&lt;br&gt;
    st.set_page_config(page_title="SQL Agent with Llama-4-Scout", layout="wide")&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;st.title("🤖 Self-Correcting Database Agent")
st.markdown("*Powered by Meta's Llama-4-Scout*")

# Initialize session state
if 'db_initialized' not in st.session_state:
    st.session_state.db_initialized = False
if 'query_history' not in st.session_state:
    st.session_state.query_history = []

# Sidebar for database setup
with st.sidebar:
    st.header("⚙️ Database Setup")

    if not st.session_state.db_initialized:
        if st.button("Initialize Sample Database", type="primary"):
            with st.spinner("Creating database..."):
                result = create_sample_database()
                st.session_state.db_initialized = True
                st.success(result)
                st.rerun()
    else:
        st.success("✅ Database ready")

        if st.button("Reset Database"):
            create_sample_database()
            st.session_state.query_history = []
            st.success("Database reset!")
            st.rerun()

    st.markdown("---")
    st.header("📊 Database Schema")

    if st.session_state.db_initialized:
        with st.expander("View Schema", expanded=False):
            schema = get_database_schema()
            st.code(schema, language="text")

    st.markdown("---")
    st.header("💡 Example Questions")
    st.markdown("""
    **Simple:**
    - How many employees do we have?
    - Show all departments

    **Medium:**
    - What's the average salary by department?
    - Who are the managers?

    **Complex:**
    - Which employees earn more than their manager?
    - Show departments with highest average salaries

    **Advanced:**
    - Compare project budgets to department budgets
    - Find managers with most direct reports
    """)

# Main content area
if not st.session_state.db_initialized:
    st.info("👈 Please initialize the database using the sidebar to get started.")
    return

st.header("💬 Ask Questions About Your Data")

# Query input
col1, col2 = st.columns([3, 1])

with col1:
    user_question = st.text_input(
        "Your Question:",
        placeholder="e.g., Who are the top 5 highest paid employees?",
        key="question_input"
    )

with col2:
    st.write("")  # Spacing
    st.write("")  # Spacing
    show_thinking = st.checkbox("Show reasoning", value=True)

if st.button("🔍 Get Answer", type="primary", use_container_width=True):
    if not user_question:
        st.warning("Please enter a question.")
        return

    # Get schema
    schema = get_database_schema()

    # Execute query with retry logic
    with st.spinner("🧠 Agent is thinking..."):
        result_df, final_sql, attempt_history = execute_query_with_retry(
            user_question, 
            schema, 
            max_attempts=3
        )

    # Display results
    if result_df is not None:
        st.success("✅ Query successful!")

        # Show the agent's thinking process
        if show_thinking:
            with st.expander("🧠 Agent's Cognitive Process", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    if len(attempt_history) &amp;gt; 1:
                        if i &amp;lt; len(attempt_history) - 1:
                            st.error(f"❌ Attempt {attempt['attempt_number']} (Failed)")
                        else:
                            st.success(f"✅ Attempt {attempt['attempt_number']} (Success)")

                    col1, col2 = st.columns(2)

                    with col1:
                        st.markdown("**1️⃣ UNDERSTAND**")
                        st.info(attempt['understanding'])

                    with col2:
                        st.markdown("**2️⃣ PLAN**")
                        st.info(attempt['plan'])

                    st.markdown("**3️⃣ GENERATE SQL**")
                    st.code(attempt['sql'], language="sql")

                    if attempt['validation']['warnings']:
                        st.warning("Validation Warnings: " + ", ".join(attempt['validation']['warnings']))

                    if attempt['error']:
                        st.error(f"**Error:** {attempt['error']}")

                        if attempt['diagnosis']:
                            st.markdown("**🔍 Diagnosis:**")
                            st.warning(attempt['diagnosis'])

                    if i &amp;lt; len(attempt_history) - 1:
                        st.markdown("---")
                        st.markdown("↻ **Retrying with corrections...**")
                        st.markdown("---")

        # Display natural language response
        st.markdown("### 📝 Answer")
        with st.spinner("Generating explanation..."):
            explanation = format_natural_language_response(user_question, result_df, final_sql)
        st.markdown(explanation)

        # Display results table
        st.markdown("### 📊 Data")
        st.dataframe(result_df, use_container_width=True)

        # Show final SQL
        with st.expander("View SQL Query"):
            st.code(final_sql, language="sql")

        # Add to history
        st.session_state.query_history.append({
            "question": user_question,
            "sql": final_sql,
            "rows": len(result_df),
            "attempts": len(attempt_history),
            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        })

    else:
        st.error("❌ Query failed after all retry attempts")

        if show_thinking:
            with st.expander("🧠 All Attempts", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    st.error(f"**Attempt {attempt['attempt_number']}**")
                    st.code(attempt['sql'], language="sql")
                    st.error(f"Error: {attempt['error']}")
                    if attempt['diagnosis']:
                        st.warning(f"Diagnosis: {attempt['diagnosis']}")
                    st.markdown("---")

# Query history
if st.session_state.query_history:
    st.markdown("---")
    st.header("📜 Query History")

    history_df = pd.DataFrame(st.session_state.query_history)
    st.dataframe(history_df, use_container_width=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Implementation Insights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Power of Multi-Phase Reasoning
&lt;/h3&gt;

&lt;p&gt;The five-phase approach significantly improves query accuracy compared to single-shot generation. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase Separation Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understand&lt;/strong&gt;: Forces the model to explicitly identify required data before coding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan&lt;/strong&gt;: Creates a logical roadmap, reducing the chance of structural errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt;: Focuses solely on syntax, with context from previous phases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt;: Catches dangerous operations before execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt;: Provides real-world feedback for learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation of concerns mirrors how human SQL experts approach complex queries—they don't immediately write code; they think through the problem first.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Self-Correction Through Error Analysis
&lt;/h3&gt;

&lt;p&gt;The retry mechanism is sophisticated because it provides the agent with:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
error_context = f"""&lt;br&gt;
PREVIOUS SQL ATTEMPT:&lt;br&gt;
{previous_sql}&lt;/p&gt;

&lt;p&gt;ERROR RECEIVED:&lt;br&gt;
{previous_error}&lt;/p&gt;

&lt;p&gt;IMPORTANT: Fix the specific error above.&lt;br&gt;
"""&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;This context allows the model to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;See exactly what it tried before&lt;/li&gt;
&lt;li&gt;Understand why it failed&lt;/li&gt;
&lt;li&gt;Adjust its approach specifically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most text-to-SQL systems fail silently. Our agent learns and adapts.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Safety Through Validation
&lt;/h3&gt;

&lt;p&gt;The validation phase prevents dangerous operations:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
python&lt;br&gt;
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE']&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
This is critical for production systems where a malicious or misunderstood prompt could corrupt data. The agent is constrained to read-only operations, making it safe for business users.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Schema-Aware Query Generation
&lt;/h3&gt;

&lt;p&gt;By providing the complete schema to the agent at each phase, we eliminate a common failure mode: queries that reference non-existent tables or columns. The schema serves as ground truth that the agent must respect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building a self-correcting database agent with Llama-4-Scout demonstrates the power of structured agentic reasoning. Unlike simple prompt-and-hope approaches, our five-phase cognitive framework—&lt;strong&gt;Understand → Plan → Generate → Validate → Execute&lt;/strong&gt;—creates a system that thinks through problems methodically and learns from mistakes.&lt;/p&gt;

&lt;p&gt;The key innovations in this implementation are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-phase reasoning&lt;/strong&gt;: Each phase has a specific cognitive responsibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent retry logic&lt;/strong&gt;: The agent learns from failures and adapts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety-first design&lt;/strong&gt;: Validation prevents destructive operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural language interface&lt;/strong&gt;: Business users can access data without SQL knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent reasoning&lt;/strong&gt;: Users can inspect the agent's thought process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach can be extended beyond SQL to other structured tasks: API query generation, configuration file creation, or any domain where precision matters and mistakes have consequences. The think-plan-execute-validate-retry pattern is broadly applicable to agentic systems.&lt;/p&gt;

&lt;p&gt;As LLMs continue to improve, we can expect even more sophisticated error recovery, better schema understanding, and more nuanced query optimization. The combination of powerful base models like Llama-4-Scout with well-designed cognitive frameworks opens up new possibilities for making complex technical systems accessible to everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to build your own database agent?&lt;/strong&gt; The complete code is provided in this article. Start with the sample database, test with the example questions, and then adapt it to your own data infrastructure. The future of data access is conversational, safe, and intelligent.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>agenticai</category>
      <category>python</category>
    </item>
    <item>
      <title>Student-Teacher Distillation: A Complete Guide for Model Compression</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Tue, 26 Aug 2025 18:48:58 +0000</pubDate>
      <link>https://dev.to/angu10/student-teacher-distillation-a-complete-guide-for-model-compression-37ed</link>
      <guid>https://dev.to/angu10/student-teacher-distillation-a-complete-guide-for-model-compression-37ed</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 1 of our Deep Learning Model Optimization Series&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the rapidly evolving world of machine learning, deploying large, powerful models in production environments often presents significant challenges. Enter student-teacher distillation—a powerful technique that allows us to compress the knowledge of complex models into smaller, more efficient ones without sacrificing too much accuracy. This comprehensive guide will walk you through everything you need to know about this fascinating approach to model optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Student-Teacher Distillation?
&lt;/h2&gt;

&lt;p&gt;Student-teacher distillation, also known as knowledge distillation, is a model compression technique where a smaller "student" model learns to mimic the behavior of a larger, more complex "teacher" model. Think of it as an experienced professor (teacher) passing on their knowledge and wisdom to a bright student who can then apply that knowledge more efficiently.&lt;/p&gt;

&lt;p&gt;The core idea is beautifully simple: instead of training the student model from scratch on the original data, we train it to match the teacher's outputs. The teacher model provides "soft targets" or probability distributions that contain richer information than the hard labels in the original dataset. These soft targets capture the teacher's uncertainty and the relationships between different classes, which helps the student learn more nuanced decision boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Magic Behind Soft Targets
&lt;/h3&gt;

&lt;p&gt;When a teacher model predicts a class, it doesn't just output a binary decision—it provides a probability distribution across all possible classes. For example, when classifying an image of a dog, the teacher might output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dog: 0.8&lt;/li&gt;
&lt;li&gt;Wolf: 0.15&lt;/li&gt;
&lt;li&gt;Cat: 0.03&lt;/li&gt;
&lt;li&gt;Other: 0.02&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distribution tells us that while the model is confident it's a dog, it also sees some wolf-like features. This nuanced understanding, captured in the soft targets, helps the student model learn more effectively than just knowing "this is a dog."&lt;/p&gt;

&lt;h2&gt;
  
  
  Student-Teacher Distillation vs. Fine-Tuning: Understanding the Key Differences
&lt;/h2&gt;

&lt;p&gt;While both techniques involve training smaller models, they serve fundamentally different purposes and follow distinct approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fine-Tuning: Adapting Pre-trained Knowledge
&lt;/h3&gt;

&lt;p&gt;Fine-tuning takes a pre-trained model (often trained on a large, general dataset) and adapts it to a specific task or domain. You start with a model that already understands general patterns and then specialize it for your particular use case. It's like taking a general physician and having them specialize in cardiology.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics of fine-tuning:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starts with a pre-trained model of similar or identical size&lt;/li&gt;
&lt;li&gt;Adapts existing knowledge to new domains or tasks&lt;/li&gt;
&lt;li&gt;Typically involves training on task-specific data&lt;/li&gt;
&lt;li&gt;The final model size remains roughly the same&lt;/li&gt;
&lt;li&gt;Primary goal is task adaptation, not compression&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Student-Teacher Distillation: Knowledge Compression
&lt;/h3&gt;

&lt;p&gt;Distillation, on the other hand, is primarily about compression and efficiency. You're taking a large, complex model and teaching a smaller model to replicate its behavior. The student model learns to approximate the teacher's decision-making process within a more constrained architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics of distillation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a smaller model from a larger one&lt;/li&gt;
&lt;li&gt;Focuses on knowledge transfer and compression&lt;/li&gt;
&lt;li&gt;The student learns from the teacher's outputs, not just original data&lt;/li&gt;
&lt;li&gt;Significant reduction in model size and computational requirements&lt;/li&gt;
&lt;li&gt;Primary goal is efficiency while maintaining accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use Which Approach
&lt;/h3&gt;

&lt;p&gt;Choose &lt;strong&gt;fine-tuning&lt;/strong&gt; when you have a model that's already the right size for your deployment constraints, but you need to adapt it to a specific domain or task. Choose &lt;strong&gt;distillation&lt;/strong&gt; when you have a high-performing model that's too large or slow for your production requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Choose the Right Teacher Model
&lt;/h2&gt;

&lt;p&gt;Selecting an appropriate teacher model is crucial for successful distillation. The teacher sets the accuracy ceiling for your student, so this decision significantly impacts your final results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accuracy is King
&lt;/h3&gt;

&lt;p&gt;Your teacher model should excel at the target task. There's no point in distilling from a mediocre teacher—the student can only learn what the teacher knows. Look for models with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High accuracy on your target dataset&lt;/li&gt;
&lt;li&gt;Strong generalization capabilities&lt;/li&gt;
&lt;li&gt;Robust accuracy across different data distributions&lt;/li&gt;
&lt;li&gt;Well-calibrated confidence scores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Considerations
&lt;/h3&gt;

&lt;p&gt;While the teacher doesn't need to share the same architecture as the student, some considerations matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complexity advantage&lt;/strong&gt;: The teacher should be significantly more complex than the student to justify the distillation process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task alignment&lt;/strong&gt;: Models designed for similar tasks often make better teachers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output compatibility&lt;/strong&gt;: Ensure the teacher's output format aligns with your distillation setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical Factors
&lt;/h3&gt;

&lt;p&gt;Don't overlook practical constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Computational resources&lt;/strong&gt;: You need to be able to run inference on the teacher model during training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Licensing and availability&lt;/strong&gt;: Ensure you have access to the teacher model and can use it for your purposes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data compatibility&lt;/strong&gt;: The teacher should work well with your training data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Teacher Approaches
&lt;/h3&gt;

&lt;p&gt;Consider using multiple teacher models when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different teachers excel at different aspects of the task&lt;/li&gt;
&lt;li&gt;You want to ensemble knowledge from various sources&lt;/li&gt;
&lt;li&gt;You're working with complex, multi-modal tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Selecting the Perfect Student Model
&lt;/h2&gt;

&lt;p&gt;Choosing the student model involves balancing accuracy goals with deployment constraints. This is where the art of distillation really shines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Size and Efficiency Targets
&lt;/h3&gt;

&lt;p&gt;Start by defining your deployment requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency constraints&lt;/strong&gt;: How fast must inference be?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory limitations&lt;/strong&gt;: What's your RAM/storage budget?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power consumption&lt;/strong&gt;: Are you deploying on mobile or edge devices?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput requirements&lt;/strong&gt;: How many predictions per second do you need?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architectural Choices
&lt;/h3&gt;

&lt;p&gt;The student architecture should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Appropriately sized&lt;/strong&gt;: Small enough to meet deployment constraints, large enough to capture essential patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Well-suited to the task&lt;/strong&gt;: Some architectures naturally excel at certain types of problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiently designed&lt;/strong&gt;: Modern efficient architectures like MobileNets, EfficientNets, or DistilBERT are often good starting points&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Goldilocks Principle
&lt;/h3&gt;

&lt;p&gt;Your student model size should be "just right":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too small&lt;/strong&gt;: The model lacks the capacity to learn the teacher's knowledge effectively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too large&lt;/strong&gt;: You lose the efficiency benefits and might as well use a larger model directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Just right&lt;/strong&gt;: Provides the best trade-off between accuracy and efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advanced Strategy: Progressive Distillation
&lt;/h3&gt;

&lt;p&gt;For very large compression ratios, consider progressive distillation as your student model selection strategy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with a large teacher&lt;/li&gt;
&lt;li&gt;Distill to a medium-sized intermediate model&lt;/li&gt;
&lt;li&gt;Use the intermediate model as a teacher for an even smaller student&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This stepped approach often yields better results than trying to compress directly from very large to very small models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Student-Teacher Distillation
&lt;/h2&gt;

&lt;p&gt;Like any technique, distillation comes with its own set of advantages and limitations. Understanding these will help you make informed decisions about when and how to apply this approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Compelling Advantages
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Significant Model Compression&lt;/strong&gt;&lt;br&gt;
The most obvious benefit is the dramatic reduction in model size. You can often achieve 5-10x compression while retaining 90-95% of the original accuracy. This makes deployment feasible in resource-constrained environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Faster Inference&lt;/strong&gt;&lt;br&gt;
Smaller models mean faster predictions. This translates to better user experience, lower latency, and the ability to serve more requests with the same hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lower Computational Costs&lt;/strong&gt;&lt;br&gt;
Reduced model size means lower memory usage, less power consumption, and cheaper inference costs—especially important when serving millions of requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preserved Knowledge Quality&lt;/strong&gt;&lt;br&gt;
Unlike simple pruning or quantization, distillation preserves the nuanced decision-making patterns of the teacher model. The student learns not just what to predict, but how to think about the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Generalization&lt;/strong&gt;&lt;br&gt;
Soft targets from the teacher model often help students generalize better than training on hard labels alone. The teacher's uncertainty provides valuable regularization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility in Architecture&lt;/strong&gt;&lt;br&gt;
You can distill knowledge across different architectures, allowing you to optimize for specific deployment requirements while retaining accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Notable Limitations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Accuracy Ceiling&lt;/strong&gt;&lt;br&gt;
The student can rarely exceed the teacher's accuracy. You're fundamentally limited by the teacher's knowledge and capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Complexity&lt;/strong&gt;&lt;br&gt;
Distillation requires careful hyperparameter tuning, temperature selection, and loss function balancing. It's more complex than standard training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Computational Overhead During Training&lt;/strong&gt;&lt;br&gt;
You need to run both teacher and student models during training, which can be computationally expensive and time-consuming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teacher Dependency&lt;/strong&gt;&lt;br&gt;
The quality of your distillation is fundamentally limited by your teacher model. A biased or poorly accurate teacher will pass these issues to the student.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diminishing Returns&lt;/strong&gt;&lt;br&gt;
Very aggressive compression (e.g., 100x smaller) often leads to significant accuracy degradation. There are practical limits to how much you can compress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task-Specific Effectiveness&lt;/strong&gt;&lt;br&gt;
Distillation works better for some tasks than others. Classification tasks often see better results than generation tasks, for instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Distillation Shines
&lt;/h3&gt;

&lt;p&gt;Student-teacher distillation is particularly effective when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a high-accuracy large model that's too slow for production&lt;/li&gt;
&lt;li&gt;Deployment constraints (mobile, edge devices) require smaller models&lt;/li&gt;
&lt;li&gt;You need to serve high-volume requests efficiently&lt;/li&gt;
&lt;li&gt;The task has clear input-output relationships&lt;/li&gt;
&lt;li&gt;You have sufficient computational resources for training&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Consider Alternatives
&lt;/h3&gt;

&lt;p&gt;Consider other approaches when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your teacher model isn't significantly better than smaller alternatives&lt;/li&gt;
&lt;li&gt;Training time and computational costs outweigh deployment benefits&lt;/li&gt;
&lt;li&gt;You need the absolute best accuracy regardless of size&lt;/li&gt;
&lt;li&gt;Your deployment environment can accommodate larger models&lt;/li&gt;
&lt;li&gt;The task requires capabilities that are hard to distill (like complex reasoning)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;Student-teacher distillation represents a powerful tool in the modern ML practitioner's toolkit. As models continue to grow larger and more capable, the ability to efficiently compress and deploy them becomes increasingly valuable.&lt;/p&gt;

&lt;p&gt;In our next article, we'll dive into the practical implementation details, including code examples, loss function design, and training strategies that will help you implement your own distillation pipeline. We'll explore different distillation variants, advanced techniques like attention transfer, and share best practices learned from real-world deployments.&lt;/p&gt;

&lt;p&gt;The journey from understanding the theory to implementing effective distillation systems is both challenging and rewarding. With the foundation we've built here, you're well-equipped to start exploring this fascinating area of machine learning optimization.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Stay tuned for Part 2, where we'll get our hands dirty with implementation details and practical code examples that will bring these concepts to life.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Beyond the Competition: How Claude Sonnet 4, GPT-4o, and Gemini 2.5 Can Work Together in Agent Harmony</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 22 Jun 2025 17:51:30 +0000</pubDate>
      <link>https://dev.to/angu10/beyond-the-competition-how-claude-sonnet-4-gpt-4o-and-gemini-25-can-work-together-in-agent-3o6b</link>
      <guid>https://dev.to/angu10/beyond-the-competition-how-claude-sonnet-4-gpt-4o-and-gemini-25-can-work-together-in-agent-3o6b</guid>
      <description>&lt;p&gt;The AI landscape is often portrayed as a zero-sum game where models compete for dominance. But what if we shifted our perspective? Instead of choosing one model to rule them all, what if we leveraged the unique strengths of each model to create a more powerful, complementary system?&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how Claude Sonnet-4, OpenAI's GPT-4o, and Google's Gemini 2.5 can work together in an agentic architecture, creating a symphony of AI capabilities that's greater than the sum of its parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Each Model's Unique Strengths
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Sonnet 4: The Thoughtful Analyst
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceptional reasoning and analysis capabilities&lt;/li&gt;
&lt;li&gt;Strong ethical reasoning and safety considerations&lt;/li&gt;
&lt;li&gt;Excellent at breaking down complex problems methodically&lt;/li&gt;
&lt;li&gt;Superior performance in structured thinking and logical reasoning&lt;/li&gt;
&lt;li&gt;Excellent at handling nuanced conversations and context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code review and analysis&lt;/li&gt;
&lt;li&gt;Complex problem decomposition&lt;/li&gt;
&lt;li&gt;Ethical decision-making processes&lt;/li&gt;
&lt;li&gt;Research and analysis tasks&lt;/li&gt;
&lt;li&gt;Long-form content creation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GPT-4o: The Versatile Performer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excellent multimodal capabilities (text, vision, audio)&lt;/li&gt;
&lt;li&gt;Strong creative writing and content generation&lt;/li&gt;
&lt;li&gt;Robust API ecosystem and integration options&lt;/li&gt;
&lt;li&gt;Consistent performance across diverse tasks&lt;/li&gt;
&lt;li&gt;Great at following specific formatting instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content generation and creative writing&lt;/li&gt;
&lt;li&gt;Multimodal processing tasks&lt;/li&gt;
&lt;li&gt;API integrations and automation&lt;/li&gt;
&lt;li&gt;Quick prototyping and ideation&lt;/li&gt;
&lt;li&gt;Image analysis and description&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gemini 2.5: The Technical Powerhouse
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceptional mathematical and scientific reasoning&lt;/li&gt;
&lt;li&gt;Strong coding capabilities and technical documentation&lt;/li&gt;
&lt;li&gt;Excellent at handling large contexts and complex data&lt;/li&gt;
&lt;li&gt;Superior performance in research and technical analysis&lt;/li&gt;
&lt;li&gt;Great integration with Google's ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scientific research and analysis&lt;/li&gt;
&lt;li&gt;Complex mathematical computations&lt;/li&gt;
&lt;li&gt;Technical documentation&lt;/li&gt;
&lt;li&gt;Data analysis and processing&lt;/li&gt;
&lt;li&gt;Integration with Google services&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Complementary Architecture: Building a Multi-Agent System
&lt;/h2&gt;

&lt;p&gt;Instead of choosing one model, let's design a system where each model handles what they do best. Here's how we can create a complementary agentic architecture:&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Python-Based Multi-Agent System
&lt;/h2&gt;

&lt;p&gt;Let's build a practical example that demonstrates how these models can work together. We'll create a research assistant that leverages all three models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Enum&lt;/span&gt;

&lt;span class="c1"&gt;# Mock API clients - Here we're going to have only Mock API 
# but reader can replace with actual API implementations
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;CLAUDE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;GPT4O&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;GEMINI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_claude_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_gpt4o_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_gemini_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_claude_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize Claude client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Claude Sonnet 4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_gpt4o_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize GPT-4o client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GPT-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_gemini_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize Gemini client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemini 2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_expert&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Orchestrates a comprehensive research process using all three models
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔍 Starting research on: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 1: Claude analyzes and breaks down the topic
&lt;/span&gt;        &lt;span class="n"&gt;analysis_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_analyze_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 2: Gemini conducts technical research
&lt;/span&gt;        &lt;span class="n"&gt;technical_research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gemini_technical_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 3: GPT-4o creates comprehensive content
&lt;/span&gt;        &lt;span class="n"&gt;final_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gpt4o_synthesize_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 4: Claude reviews and provides final insights
&lt;/span&gt;        &lt;span class="n"&gt;final_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_review_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_summary&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_review&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_analyze_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Thoughtful analysis and problem decomposition&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate Claude's analytical approach
&lt;/span&gt;        &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Analysis of &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:

        1. Core Components:
           - Primary research areas to explore
           - Key stakeholders and perspectives
           - Potential challenges and considerations

        2. Research Strategy:
           - Technical aspects requiring deep expertise
           - Creative elements for engaging presentation
           - Ethical considerations and implications

        3. Success Metrics:
           - Accuracy and depth of information
           - Clarity of presentation
           - Practical applicability
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning_steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;considerations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gemini_technical_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gemini&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Deep technical research and data analysis&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate Gemini's technical research capabilities
&lt;/span&gt;        &lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Technical Research for &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:

        📊 Data Analysis:
        - Latest statistical trends and patterns
        - Mathematical models and algorithms
        - Scientific papers and research findings

        🔬 Technical Implementation:
        - Code examples and technical specifications
        - Performance benchmarks and comparisons
        - Integration possibilities and frameworks

        📈 Quantitative Insights:
        - Market data and growth projections
        - Technical performance metrics
        - Scalability considerations
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gpt4o_synthesize_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                       &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;GPT-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Creative synthesis and content generation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate GPT-4o's content creation capabilities
&lt;/span&gt;        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        # Comprehensive Guide to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        ## Executive Summary
        Based on our multi-faceted analysis, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; represents a significant 
        opportunity with both technical and strategic implications.

        ## Key Findings
        - Strategic insights from analytical review
        - Technical breakthroughs from research data
        - Implementation roadmap for practical application

        ## Creative Applications
        - Innovative use cases and scenarios
        - Engaging examples and case studies
        - Visual concepts and presentation ideas

        ## Actionable Recommendations
        1. Immediate next steps
        2. Long-term strategic planning
        3. Risk mitigation strategies
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sections&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_review_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Final review and quality assurance&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Quality Review:

        ✅ Strengths:
        - Comprehensive coverage of key topics
        - Well-structured and logical flow
        - Balanced technical and strategic perspectives

        🔧 Recommendations:
        - Consider adding more specific examples
        - Strengthen the conclusion with actionable insights
        - Ensure accessibility for diverse audiences

        📋 Final Assessment:
        Content meets high standards for accuracy, clarity, and usefulness.
        Ready for publication with minor enhancements.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review_criteria&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_create_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a summary of the collaborative process&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        🤝 Collaborative Research Summary:

        Models Involved: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Total Tasks: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Average Confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Process Flow:
        1. Claude provided analytical framework and strategic thinking
        2. Gemini delivered technical depth and data-driven insights
        3. GPT-4o synthesized information into engaging, actionable content
        4. Claude conducted final quality review and validation

        This complementary approach leveraged each model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s unique strengths
        to produce a more comprehensive and valuable outcome.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Advanced Use Case: Code Review Pipeline
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodeReviewPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assistant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Multi-model code review process
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Claude: Logical analysis and architecture review
&lt;/span&gt;        &lt;span class="n"&gt;claude_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_code_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gemini: Technical optimization and performance analysis
&lt;/span&gt;        &lt;span class="n"&gt;gemini_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gemini_performance_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# GPT-4o: Documentation and improvement suggestions
&lt;/span&gt;        &lt;span class="n"&gt;gpt4o_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gpt4o_documentation_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logical_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;claude_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gemini_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gpt4o_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;combined_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_calculate_combined_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="n"&gt;claude_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gemini_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gpt4o_review&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_code_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude analyzes code logic and architecture&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_logic_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Logical structure is sound with clear separation of concerns...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gemini_performance_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gemini analyzes performance and optimization opportunities&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Performance bottlenecks identified in data processing loops...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimizations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complexity_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;7.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gpt4o_documentation_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;GPT-4o reviews documentation and suggests improvements&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Documentation coverage is 73% with opportunities for improvement...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.73&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;improvement_areas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_calculate_combined_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate a weighted combined score&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_logic_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;total_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.33&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total_score&lt;/span&gt;

&lt;span class="c1"&gt;# Usage Example
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Initialize the multi-agent system
&lt;/span&gt;    &lt;span class="n"&gt;research_assistant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;code_reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeReviewPipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Example 1: Research a complex topic
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Research Assistant Example ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;research_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;research_assistant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implementing Microservices Architecture with Event-Driven Design&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research completed with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Example 2: Code review process
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Code Review Example ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sample_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    def process_data(data_list):
        result = []
        for item in data_list:
            if item &amp;gt; 0:
                result.append(item * 2)
        return result
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;review_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;code_reviewer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code review completed with combined score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;review_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;combined_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Applications and Benefits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Content Creation Pipeline
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Analyzes the audience and creates a content strategy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Researches technical accuracy and data validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Generates engaging, well-formatted content&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Software Development
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Architectural decisions and code logic review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Performance optimization and technical implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Documentation, testing strategies, and user interface design&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Research and Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Problem decomposition and critical thinking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Data analysis and scientific methodology&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Report writing and presentation creation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Task Orchestration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskOrchestrator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_capabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mathematical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formatting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assign_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Intelligently assign tasks based on model strengths&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capabilities&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_capabilities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;  &lt;span class="c1"&gt;# Default fallback
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Quality Assurance
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QualityAssurance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate results across multiple models&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;avg_confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;avg_confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consensus_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if models agree on key points&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Implementation would compare semantic similarity
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Simplified for example
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Cost Optimization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostOptimizer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_costs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# per 1k tokens
&lt;/span&gt;            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.012&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.018&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;optimize_task_assignment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Assign tasks to minimize cost while maximizing quality&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;assignments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Logic to assign based on cost-effectiveness
&lt;/span&gt;            &lt;span class="n"&gt;assignments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_best_model_for_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;assignments&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Future of Complementary AI
&lt;/h2&gt;

&lt;p&gt;As AI models continue to evolve, the concept of complementary architectures becomes even more powerful. We're moving toward a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Specialized Models&lt;/strong&gt;: Each model excels in specific domains&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent Orchestration&lt;/strong&gt;: Systems automatically choose the best model for each task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Learning&lt;/strong&gt;: Models learn from each other's outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration&lt;/strong&gt;: Users don't need to know which model is handling their request&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The future of AI isn't about one model dominating all others — it's about creating intelligent systems that leverage the unique strengths of each model. By building complementary architectures with Claude Sonnet 4, GPT-4o, and Gemini 2.5, we can create more robust, accurate, and efficient AI solutions.&lt;/p&gt;

&lt;p&gt;The examples and code provided in this article demonstrate practical approaches to implementing these complementary systems. As you build your own multi-agent architectures, remember that the goal isn't to replace human intelligence but to augment it with the best that each AI model has to offer.&lt;/p&gt;

&lt;p&gt;Start small, experiment with different task assignments, and gradually build more sophisticated orchestration systems. The complementary approach not only provides better results but also creates more resilient and adaptable AI solutions for the future.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>Code Less, Prompt Better: Unlocking Python's Built-in LLM Enhancers</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 16 May 2025 22:07:23 +0000</pubDate>
      <link>https://dev.to/angu10/code-less-prompt-better-unlocking-pythons-built-in-llm-enhancers-3hdd</link>
      <guid>https://dev.to/angu10/code-less-prompt-better-unlocking-pythons-built-in-llm-enhancers-3hdd</guid>
      <description>&lt;p&gt;In the rapidly evolving landscape of Large Language Models (LLMs), effective prompt engineering has become a crucial skill. While much attention is given to the art of crafting effective prompts, less focus has been placed on how to efficiently manage these prompts programmatically. Python, with its rich set of built-in features, offers powerful tools to dynamically construct, optimize, and manage LLM prompts.&lt;br&gt;
This article explores how Python's built-in features can transform your approach to LLM prompt engineering, making your code more efficient, maintainable, and powerful.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;1. Using locals() for Dynamic Context Injection&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When working with LLMs, we often need to inject contextual information into our prompts. The traditional approach involves manual string formatting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_response(user_name, user_query, previous_context):
    prompt = f"""
    User name: {user_name}
    User query: {user_query}
    Previous context: {previous_context}

    Please respond to the user's query considering the context above.
    """

    return call_llm_api(prompt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for simple cases, but becomes unwieldy as the number of variables increases. It's also error-prone – you might forget to include a variable or update a variable name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution with locals()&lt;/strong&gt;&lt;br&gt;
Python's locals() function returns a dictionary containing all local variables in the current scope. We can leverage this to automatically include all relevant context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_response(user_name, user_query, previous_context, user_preferences=None, user_history=None):
    # All local variables are now accessible
    context_dict = locals()

    # Build a dynamic prompt section with all available context
    context_sections = []
    for key, value in context_dict.items():
        if value is not None:  # Only include non-None values
            context_sections.append(f"{key}: {value}")

    context_text = "\n".join(context_sections)

    prompt = f"""
    Context information:
    {context_text}

    Please respond to the user's query considering the context above.
    """

    return call_llm_api(prompt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Automatic variable inclusion: If you add a new parameter to your function, it's automatically included in the context.&lt;br&gt;
Reduced errors: No need to manually update string formatting when variables change.&lt;br&gt;
Cleaner code: Separates the mechanism of context injection from the specific variables.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Using inspect for Function Documentation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When creating LLM prompts that involve function execution or code generation, providing accurate function documentation is crucial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_function_prompt(func_name, params):
    prompt = f"""
    Create a Python function named '{func_name}' with the following parameters:
    {params}
    """
    return prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach requires manually specifying function details, which can be tedious and error-prone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution with inspect&lt;/strong&gt;&lt;br&gt;
Python's inspect module allows us to extract rich metadata from functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import inspect

def create_function_prompt(func_reference):
    # Get the function signature
    signature = inspect.signature(func_reference)

    # Get the function docstring
    doc = inspect.getdoc(func_reference) or "No documentation available"

    # Get source code if available
    try:
        source = inspect.getsource(func_reference)
    except:
        source = "Source code not available"

    prompt = f"""
    Function name: {func_reference.__name__}

    Signature: {signature}

    Documentation:
    {doc}

    Original source code:
    {source}

    Please create an improved version of this function.
    """

    return prompt

# Example usage
def example_func(a, b=10):
    """This function adds two numbers together."""
    return a + b

improved_function_prompt = create_function_prompt(example_func)
# Send to LLM for improvement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dynamically extracts all relevant information about the function, making the prompt much more informative.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Context Management with Class Attributes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
Managing conversation history and context with LLMs often leads to repetitive code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversation_history = []

def chat_with_llm(user_input):
    # Manually build the prompt with history
    prompt = "Previous conversation:\n"
    for entry in conversation_history:
        prompt += f"{entry['role']}: {entry['content']}\n"

    prompt += f"User: {user_input}\n"
    prompt += "Assistant: "

    response = call_llm_api(prompt)

    # Update history
    conversation_history.append({"role": "User", "content": user_input})
    conversation_history.append({"role": "Assistant", "content": response})

    return response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Solution with Class Attributes and &lt;strong&gt;dict&lt;/strong&gt;&lt;/strong&gt;&lt;br&gt;
We can create a conversation manager class that uses Python's object attributes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class ConversationManager:
    def __init__(self, system_prompt=None, max_history=10):
        self.history = []
        self.system_prompt = system_prompt
        self.max_history = max_history
        self.user_info = {}
        self.conversation_attributes = {
            "tone": "helpful",
            "style": "concise",
            "knowledge_level": "expert"
        }

    def add_user_info(self, **kwargs):
        """Add user-specific information to the conversation context."""
        self.user_info.update(kwargs)

    def set_attribute(self, key, value):
        """Set a conversation attribute."""
        self.conversation_attributes[key] = value

    def build_prompt(self, user_input):
        """Build a complete prompt using object attributes."""
        prompt_parts = []

        # Add system prompt if available
        if self.system_prompt:
            prompt_parts.append(f"System: {self.system_prompt}")

        # Add conversation attributes
        prompt_parts.append("Conversation attributes:")
        for key, value in self.conversation_attributes.items():
            prompt_parts.append(f"- {key}: {value}")

        # Add user info if available
        if self.user_info:
            prompt_parts.append("\nUser information:")
            for key, value in self.user_info.items():
                prompt_parts.append(f"- {key}: {value}")

        # Add conversation history
        if self.history:
            prompt_parts.append("\nConversation history:")
            for entry in self.history[-self.max_history:]:
                prompt_parts.append(f"{entry['role']}: {entry['content']}")

        # Add current user input
        prompt_parts.append(f"\nUser: {user_input}")
        prompt_parts.append("Assistant:")

        return "\n".join(prompt_parts)

    def chat(self, user_input):
        """Process a user message and get response from LLM."""
        prompt = self.build_prompt(user_input)

        response = call_llm_api(prompt)

        # Update history
        self.history.append({"role": "User", "content": user_input})
        self.history.append({"role": "Assistant", "content": response})

        return response

    def get_state_as_dict(self):
        """Return a dictionary of the conversation state using __dict__."""
        return self.__dict__

    def save_state(self, filename):
        """Save the conversation state to a file."""
        import json
        with open(filename, 'w') as f:
            json.dump(self.get_state_as_dict(), f)

    def load_state(self, filename):
        """Load the conversation state from a file."""
        import json
        with open(filename, 'r') as f:
            state = json.load(f)
            self.__dict__.update(state)```



Using this approach:

# Create a conversation manager
convo = ConversationManager(system_prompt="You are a helpful assistant.")

# Add user information
convo.add_user_info(name="John", expertise="beginner", interests=["Python", "AI"])

# Set conversation attributes
convo.set_attribute("tone", "friendly")

# Chat with the LLM
response = convo.chat("Can you help me understand how Python dictionaries work?")
print(response)

# Later, save the conversation state
convo.save_state("conversation_backup.json")

# And load it back
new_convo = ConversationManager()
new_convo.load_state("conversation_backup.json")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;4. Using dir() for Object Exploration&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When working with complex objects or APIs, it can be challenging to know what data is available to include in prompts:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def generate_data_analysis_prompt(dataset):
    # Manually specifying what we think is available
    prompt = f"""
    Dataset name: {dataset.name}
    Number of rows: {len(dataset)}

    Please analyze this dataset.
    """
    return prompt

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Solution with dir()&lt;/strong&gt;&lt;br&gt;
Python's dir() function lets us dynamically discover object attributes and methods:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
def generate_data_analysis_prompt(dataset):
    # Discover available attributes
    attributes = dir(dataset)

    # Filter out private attributes (those starting with _)
    public_attrs = [attr for attr in attributes if not attr.startswith('_')]

    # Build metadata section
    metadata = []
    for attr in public_attrs:
        try:
            value = getattr(dataset, attr)
            # Only include non-method attributes with simple values
            if not callable(value) and not hasattr(value, '__dict__'):
                metadata.append(f"{attr}: {value}")
        except:
            pass  # Skip attributes that can't be accessed

    metadata_text = "\n".join(metadata)

    prompt = f"""
    Dataset metadata:
    {metadata_text}

    Please analyze this dataset based on the metadata above.
    """

    return prompt


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This approach automatically discovers and includes relevant metadata without requiring us to know the exact structure of the dataset object in advance.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. String Manipulation for Prompt Cleaning&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
User inputs and other text data often contain formatting issues that can affect LLM performance:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def process_document(document_text):
    prompt = f"""
    Document:
    {document_text}

    Please summarize the key points from this document.
    """
    return call_llm_api(prompt)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Solution with String Methods&lt;/strong&gt;&lt;br&gt;
Python's rich set of string manipulation methods can clean and normalize text:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def process_document(document_text):
    # Remove excessive whitespace
    cleaned_text = ' '.join(document_text.split())

    # Normalize line breaks
    cleaned_text = cleaned_text.replace('\r\n', '\n').replace('\r', '\n')

    # Limit length (many LLMs have token limits)
    max_chars = 5000
    if len(cleaned_text) &amp;gt; max_chars:
        cleaned_text = cleaned_text[:max_chars] + "... [truncated]"

    # Replace problematic characters
    for char, replacement in [('\u2018', "'"), ('\u2019', "'"), ('\u201c', '"'), ('\u201d', '"')]:
        cleaned_text = cleaned_text.replace(char, replacement)

    prompt = f"""
    Document:
    {cleaned_text}

    Please summarize the key points from this document.
    """

    return call_llm_api(prompt)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Python's built-in features offer powerful capabilities for enhancing LLM prompts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Context:&lt;/strong&gt; Using locals() and &lt;strong&gt;dict&lt;/strong&gt; to automatically include relevant variables&lt;br&gt;
&lt;strong&gt;Introspection:&lt;/strong&gt; Using inspect and dir() to extract rich metadata from objects and functions&lt;br&gt;
&lt;strong&gt;String Manipulation:&lt;/strong&gt; Using Python's string methods to clean and normalize text&lt;/p&gt;

&lt;p&gt;By leveraging these built-in features, you can create more robust, maintainable, and dynamic LLM interactions. The techniques in this article can help you move beyond static prompt templates to create truly adaptive and context-aware LLM applications.&lt;br&gt;
Most importantly, these approaches scale well as your LLM applications become more complex, allowing you to maintain clean, readable code while supporting sophisticated prompt engineering techniques.&lt;br&gt;
Whether you're building a simple chatbot or a complex AI assistant, Python's built-in features can help you create more effective LLM interactions with less code and fewer errors.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI in the Clinical Arena: Llama 4 Scout vs Claude 3.7 Statistical Showdown</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 11 Apr 2025 06:04:22 +0000</pubDate>
      <link>https://dev.to/angu10/ai-in-the-clinical-arena-llama-4-scout-vs-claude-37-statistical-showdown-2m43</link>
      <guid>https://dev.to/angu10/ai-in-the-clinical-arena-llama-4-scout-vs-claude-37-statistical-showdown-2m43</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As artificial intelligence advances, there is growing interest in evaluating how different AI models perform in specialized domains like clinical trial statistics. This article compares two state-of-the-art large language models — Llama 4 Scout Reasoning and Claude 3.7 — on their ability to solve common statistical problems in clinical trials. It’s important to emphasize that this study examines only a limited set of three clinical trial problems and should not be interpreted as a comprehensive assessment of these models’ overall capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Instruct Model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce1akyftlyw2brixi709.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce1akyftlyw2brixi709.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73vwb2fqhx2ryuehf2wz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73vwb2fqhx2ryuehf2wz.png" alt="Image description" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5le9dn9vslq0l3hov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5le9dn9vslq0l3hov.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgr8ystkgqldyvd68f0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgr8ystkgqldyvd68f0k.png" alt="Image description" width="800" height="1065"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08rsnioau33t3dmg3a1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08rsnioau33t3dmg3a1b.png" alt="Image description" width="786" height="828"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiac30d0svr17bjqmpr0i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiac30d0svr17bjqmpr0i.png" alt="Image description" width="796" height="980"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Selection
&lt;/h2&gt;

&lt;p&gt;Three foundational clinical trial statistical problems were selected to evaluate the models:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treatment Effect Analysis&lt;/strong&gt;: Calculating response rates, absolute risk reduction (ARR), and number needed to treat (NNT) in a cancer treatment study comparing experimental and control arms&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-inferiority Trial Design&lt;/strong&gt;: Determining the minimum cure rate required for a new antibiotic to be considered non-inferior to the standard of care&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interim Analysis Decision-Making&lt;/strong&gt;: Applying O’Brien-Fleming boundaries to decide whether to stop a trial early based on interim results&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Criteria
&lt;/h2&gt;

&lt;p&gt;The outputs from both models were compared across several dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mathematical accuracy&lt;/li&gt;
&lt;li&gt;Statistical reasoning approach&lt;/li&gt;
&lt;li&gt;Clarity of explanation&lt;/li&gt;
&lt;li&gt;Contextual understanding&lt;/li&gt;
&lt;li&gt;Presentation format&lt;/li&gt;
&lt;li&gt;Result interpretation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Detailed Findings
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Mathematical Precision
&lt;/h2&gt;

&lt;p&gt;Both models demonstrated excellent mathematical precision, arriving at identical numerical answers for all three problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Problem 1, both correctly calculated the response rates (55.6% vs 44.4%), ARR (11.2%), and NNT (9)&lt;/li&gt;
&lt;li&gt;In Problem 2, both determined the minimum acceptable cure rate to be 70%&lt;/li&gt;
&lt;li&gt;In Problem 3, both correctly concluded that the trial should not be stopped based on the interim analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Approach to Statistical Reasoning
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;p&gt;Llama 4 Scout approached problems with a highly structured methodology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Began by clearly organizing given information in bulleted lists&lt;/li&gt;
&lt;li&gt;Used explicit section headings to demarcate reasoning steps&lt;/li&gt;
&lt;li&gt;Provided direct formula applications with clear variable substitutions&lt;/li&gt;
&lt;li&gt;Included practical interpretations of the final statistical outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;p&gt;Claude 3.7 demonstrated a more narrative reasoning style:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used numbered steps with detailed explanations before formula application&lt;/li&gt;
&lt;li&gt;Provided more extensive context about the statistical principles being applied&lt;/li&gt;
&lt;li&gt;Explained the reasoning behind formula selection&lt;/li&gt;
&lt;li&gt;Included additional interpretation of why certain approaches were appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Explanation Quality
&lt;/h2&gt;

&lt;p&gt;The models differed somewhat in their explanatory approaches:&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Provided concise but complete explanations&lt;/li&gt;
&lt;li&gt;Excellent at clarifying practical implications (e.g., “This means we would need to treat 9 patients with the experimental treatment instead of the control treatment to achieve one additional response”)&lt;/li&gt;
&lt;li&gt;Included additional context about threshold interpretations&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Explicit about Type I error control in the interim analysis problem&lt;/p&gt;
&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Offered more detailed contextual explanations of statistical concepts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provided more extensive rationale for calculation approaches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explained why certain statistical formulas were appropriate in each context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Included more discussion of underlying statistical principles (e.g., family-wise error rate)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Visual Presentation
&lt;/h2&gt;

&lt;p&gt;The models showed distinct presentation styles:&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More visually structured with effective use of bulleted lists&lt;/li&gt;
&lt;li&gt;Clear section labeling and organization&lt;/li&gt;
&lt;li&gt;Highlighted key information more prominently&lt;/li&gt;
&lt;li&gt;More consistent formatting throughout responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More narrative in style with detailed steps&lt;/li&gt;
&lt;li&gt;Used numbered steps with explanatory headings&lt;/li&gt;
&lt;li&gt;Integrated formulas within explanatory text&lt;/li&gt;
&lt;li&gt;Maintained professional presentation but with less visual structuring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Complementary Strengths&lt;br&gt;
This comparative analysis revealed that both models possess strong capabilities in clinical trial statistical reasoning but with slightly different emphases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 4 Scout Reasoning&lt;/strong&gt; excelled in structural organization, practical interpretation, and concise presentation of statistical concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude 3.7&lt;/strong&gt; demonstrated particular strength in contextual explanation, statistical rationale, and discussion of underlying principles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Implications
&lt;/h2&gt;

&lt;p&gt;For professionals working in clinical trial statistics, the choice between these models might depend on specific needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For quick statistical calculations with a clear structure: Llama 4 Scout Reasoning’s concise and well-organized approach might be preferred.&lt;/li&gt;
&lt;li&gt;Claude 3.7’s more detailed explanations might offer more excellent educational value for a deeper understanding of statistical principles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, the mathematical precision demonstrated by both models suggests that either would be reliable for calculating vital clinical trial statistics.&lt;/p&gt;

&lt;p&gt;Comparing Llama 4 Scout Reasoning vs Claude 3.7 in Clinical Trial Problem Solving&lt;/p&gt;

&lt;h1&gt;
  
  
  Comparing Llama 4 Scout Reasoning vs Claude 3.5 in Clinical Trial Problem Solving
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Llama 4 Scout Reasoning&lt;/th&gt;
&lt;th&gt;Claude 3.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Information Organization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Used bulleted lists to highlight given information&lt;/td&gt;
&lt;td&gt;Presented information in numbered steps with headings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear section headings with step numbers&lt;/td&gt;
&lt;td&gt;Numbered steps with detailed headings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 1: Response Rates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experimental arm rate&lt;/td&gt;
&lt;td&gt;55.6%&lt;/td&gt;
&lt;td&gt;55.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control arm rate&lt;/td&gt;
&lt;td&gt;44.4%&lt;/td&gt;
&lt;td&gt;44.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARR calculation&lt;/td&gt;
&lt;td&gt;0.556 - 0.444 = 0.112 = 11.2%&lt;/td&gt;
&lt;td&gt;55.6% - 44.4% = 11.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NNT calculation&lt;/td&gt;
&lt;td&gt;1/0.112 = 8.93 → 9&lt;/td&gt;
&lt;td&gt;1/0.112 = 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional context&lt;/td&gt;
&lt;td&gt;Explained meaning of NNT (9 patients for one additional response)&lt;/td&gt;
&lt;td&gt;Provided explanation of why ARR used experimental minus control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 2: Non-inferiority Trial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concept explanation&lt;/td&gt;
&lt;td&gt;Concise explanation of non-inferiority concept&lt;/td&gt;
&lt;td&gt;More detailed explanation of non-inferiority margin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calculation approach&lt;/td&gt;
&lt;td&gt;Direct formula application&lt;/td&gt;
&lt;td&gt;Step-by-step explanation with formula justification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final answer&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional explanation&lt;/td&gt;
&lt;td&gt;Added what happens if cure rate is below/above threshold&lt;/td&gt;
&lt;td&gt;Included context about the meaning of non-inferiority margin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 3: O'Brien-Fleming Boundaries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision framework&lt;/td&gt;
&lt;td&gt;Clear comparison of p-value to boundary&lt;/td&gt;
&lt;td&gt;Detailed explanation of boundary concept&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision logic&lt;/td&gt;
&lt;td&gt;p-value (0.01) &amp;gt; boundary (0.0001) → don't stop&lt;/td&gt;
&lt;td&gt;Same conclusion with more contextual explanation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional explanation&lt;/td&gt;
&lt;td&gt;Included explanation of Type I error control&lt;/td&gt;
&lt;td&gt;Discussed family-wise error rate control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall Characteristics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formatting style&lt;/td&gt;
&lt;td&gt;More visually structured with bulleted lists&lt;/td&gt;
&lt;td&gt;More narrative with detailed steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mathematical accuracy&lt;/td&gt;
&lt;td&gt;Identical answers across all problems&lt;/td&gt;
&lt;td&gt;Identical answers across all problems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Result interpretation&lt;/td&gt;
&lt;td&gt;More explicit interpretation of final results&lt;/td&gt;
&lt;td&gt;More context on the statistical principles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explanation depth&lt;/td&gt;
&lt;td&gt;Concise but complete&lt;/td&gt;
&lt;td&gt;More detailed statistical context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This limited comparison suggests that Llama 4 Scout Reasoning and Claude 3.7 demonstrate strong capabilities in solving clinical trial statistical problems. However, Llama 4 Scout is open-source, and you can fine-tune it with your data, which will be more powerful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s worth emphasizing that this analysis is based on only three specific problems and should not be extrapolated to represent overall model capabilities across the broad and complex domain of clinical trial statistics. A more comprehensive evaluation would require testing across a broader range of problem types, complexity levels, and specialized statistical methods used in clinical trials.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>llama</category>
      <category>ai</category>
      <category>clinical</category>
      <category>llm</category>
    </item>
    <item>
      <title>Document Whisperer: Llama-4-Scout and the Future of Intelligent Content Extraction</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 06 Apr 2025 05:28:55 +0000</pubDate>
      <link>https://dev.to/angu10/document-whisperer-llama-4-scout-and-the-future-of-intelligent-content-extraction-1g0k</link>
      <guid>https://dev.to/angu10/document-whisperer-llama-4-scout-and-the-future-of-intelligent-content-extraction-1g0k</guid>
      <description>&lt;p&gt;In today's data-driven world, the ability to quickly extract insights from documents is becoming increasingly valuable. Whether you're a researcher sifting through academic papers, a legal professional analyzing contracts, or a business analyst reviewing reports, the challenge of efficiently navigating through lengthy documents remains universal.&lt;/p&gt;

&lt;p&gt;Enter Llama-4-Scout, Meta's latest language model that's showing remarkable capabilities in document question-answering tasks. Released on April 5, 2025, Scout is a groundbreaking addition to the new Llama 4 family, specifically designed to be more reliable, precise, and effective at finding and extracting information from documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Llama-4-Scout Special?
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout utilizes a Mixture-of-Experts (MoE) architecture with 17 billion active parameters (out of 109 billion total parameters across 16 experts) that strikes an optimal balance between performance and efficiency. What truly sets Scout apart is its unprecedented 10-million token context window, making it best in class for long-context tasks like summarizing huge codebases and analyzing extensive documents.&lt;/p&gt;

&lt;p&gt;Scout was pretrained on approximately 40 trillion tokens of multimodal data and excels at:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document comprehension&lt;/strong&gt;: Understanding context across extremely long documents&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Information retrieval&lt;/strong&gt;: Precisely locating specific details in complex content&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source attribution&lt;/strong&gt;: Providing clear references to where information was found&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal reasoning&lt;/strong&gt;: Visual recognition, image reasoning, captioning, and answering questions about document visuals&lt;/p&gt;

&lt;p&gt;This focused design makes Scout particularly well-suited for professional document analysis tasks that require both accuracy and reasonable computational resources. Remarkably, the model fits on a single H100 GPU when quantized to Int4, making it accessible for organizations without massive computing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: Document Q&amp;amp;A on the AGENTS Framework
&lt;/h2&gt;

&lt;p&gt;I recently had the opportunity to test Llama-4-Scout's capabilities by building a simple PDF Q&amp;amp;A application with a research paper on the AGENTS framework (a system for autonomous language agents). The results demonstrated Scout's remarkable ability to extract, understand, and contextualize information from technical documents. Here's what I observed:&lt;/p&gt;

&lt;h2&gt;
  
  
  Source Paper: &lt;a href="https://arxiv.org/pdf/2309.07870" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2309.07870&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Example 1: Identifying Core Framework Components
&lt;/h2&gt;

&lt;p&gt;When asked about the main features of the framework described in the document, Scout successfully extracted and summarized the key components:&lt;/p&gt;

&lt;p&gt;Query: "What are the main features of the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The main features of the AGENTS framework include planning, memory (specifically long-short term memory implementation), tool usage capabilities, multi-agent communication protocols, and fine-grained symbolic control through SOPs (Standard Operating Procedures). Source: Page 4, Framework Architecture section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2: Understanding Technical Implementation Details
&lt;/h2&gt;

&lt;p&gt;Scout excelled at extracting specific technical implementation details related to memory systems:&lt;/p&gt;

&lt;p&gt;Query: "How does long-short term memory work in AGENTS?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The AGENTS framework implements a dual-memory system where long-term memories are constructed as action histories embedded by sentence-transformers and stored in VectorDB for semantic search retrieval. In contrast, short-term memories (working memories) are maintained in natural language form and dynamically updated by an LLM through a carefully tuned prompt system. Source: Pages 7-8, Memory Architecture section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 3: Explaining Human-Agent Interaction Mechanisms
&lt;/h2&gt;

&lt;p&gt;The model demonstrated its ability to identify specific configuration details that enable key functionalities:&lt;/p&gt;

&lt;p&gt;Query: "How does the AGENTS framework support human-agent interaction?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The AGENTS framework supports human-agent interaction through a simple but effective configuration mechanism. Users can change an agent's 'is_human' field to 'True' in the config file, which enables humans to take the role of an agent and interact with other language agents in both single-agent and multi-agent scenarios. Source: Page 12, Human-Agent Interaction section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 4: Clarifying Domain-Specific Terminology
&lt;/h2&gt;

&lt;p&gt;When asked about specialized terminology in the document, Scout provided clear and accurate explanations:&lt;/p&gt;

&lt;p&gt;Query: "What is SOP in the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "In the AGENTS framework, SOP (Standard Operating Procedures) is a symbolic plan that provides fine-grained control of agents' behaviors. Technically, it's implemented as a graph of multiple states that define different situations an agent may encounter, with explicit transition rules between these states. This architecture makes agents' behavior more stable and predictable across various scenarios. Source: Page 9, Agent Control Mechanisms section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 5: Identifying Ecosystem Components
&lt;/h2&gt;

&lt;p&gt;Scout successfully extracted information about the broader ecosystem supporting the framework:&lt;/p&gt;

&lt;p&gt;Query: "What is the Agent Hub in the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The Agent Hub is a platform component of the AGENTS framework that enables users to share their fine-tuned language agents and search/download useful language agents created by others. This significantly reduces the effort required for designing, testing, and tuning language agents from scratch, creating a collaborative ecosystem for agent development. Source: Page 15, Ecosystem and Tools section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own Document Intelligence Tool
&lt;/h2&gt;

&lt;p&gt;The ability to build custom applications around Llama-4-Scout is straightforward through providers like Together.ai. I created a simple Streamlit application that allows users to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload any PDF document&lt;/li&gt;
&lt;li&gt;Process both the text and visual content&lt;/li&gt;
&lt;li&gt;Ask questions in natural language&lt;/li&gt;
&lt;li&gt;Receive answers with source references&lt;/li&gt;
&lt;li&gt;Evaluate answer quality with multiple criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This type of application can transform how teams work with documents, significantly reducing the time spent searching for information and improving the accuracy of insights extracted. Scout's ability to process images in documents means it can work with charts, tables, diagrams, and other visual elements that are often critical for full document understanding and is "optimized for visual recognition, image reasoning, captioning, and answering general questions about an image."&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Capabilities and Performance
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout demonstrates impressive performance relative to competing models. In comparative evaluations, Scout has shown "superior performance relative to contemporary models such as Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across recognized benchmark datasets."&lt;/p&gt;

&lt;p&gt;What makes Scout particularly practical is its efficiency. Scout "fits on a single H100 GPU when quantized to Int4" while still delivering high-quality results. This efficiency means organizations can implement advanced document intelligence without requiring massive computational resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: The Future of Document Intelligence
&lt;/h2&gt;

&lt;p&gt;As models like Llama-4-Scout continue to evolve, we can expect even more sophisticated document intelligence capabilities. Future developments will likely include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deeper reasoning across multiple documents&lt;/li&gt;
&lt;li&gt;More nuanced understanding of domain-specific content&lt;/li&gt;
&lt;li&gt;Better handling of ambiguity and uncertain information&lt;/li&gt;
&lt;li&gt;Enhanced multimodal capabilities for complex visual content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout represents a significant step forward in making advanced document intelligence accessible. Its balanced approach to performance and efficiency makes it particularly valuable for professional applications where accuracy and attribution matter.&lt;/p&gt;

&lt;p&gt;For organizations dealing with large volumes of documents, investing in tools built around models like Scout could yield substantial returns through improved information accessibility and insight generation. The model's ability to "process and work with extremely lengthy documents" makes it ideal for enterprises with extensive documentation needs.&lt;/p&gt;

&lt;p&gt;Have you experimented with Llama-4-Scout or similar models for document analysis? I'd love to hear about your experiences and applications in the comments below.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The examples provided are based on actual testing of Llama-4-Scout through Together.ai's API integration. Results may vary depending on document complexity and specific implementation details.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>OpenAI - Gibili Portrait Assistance: AI-Powered Image Generation Made Simple</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Mon, 31 Mar 2025 17:50:22 +0000</pubDate>
      <link>https://dev.to/angu10/openai-gibili-portrait-assistance-ai-powered-image-generation-made-simple-285e</link>
      <guid>https://dev.to/angu10/openai-gibili-portrait-assistance-ai-powered-image-generation-made-simple-285e</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Ever wished you could create stunning portraits with just a few clicks? Meet Gibili Portrait Assistance, an AI-powered tool that makes generating high-quality portraits effortless. Whether you’re an artist, designer, or simply someone who loves experimenting with AI, Gibili can help bring your ideas to life.&lt;/p&gt;

&lt;p&gt;In this post, we’ll walk you through how to use Gibili Portrait Assistance and explore the OpenAI architecture behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use Gibili Portrait Assistance
&lt;/h2&gt;

&lt;p&gt;Using Gibili is straightforward and requires no prior technical knowledge. Here’s a simple step-by-step guide:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Enter Your Description or Upload an Image&lt;/strong&gt;&lt;br&gt;
You can either type a text description of the portrait you want or upload an existing image to be enhanced or transformed by AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text Prompt Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“A realistic portrait of a woman with curly brown hair, wearing a red scarf, in a cinematic lighting style.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Image Upload:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you have an image you want to modify or enhance, simply upload it, and Gibili will apply AI-powered enhancements or transformations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Customize Your Preferences&lt;/strong&gt;&lt;br&gt;
You can fine-tune details such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Art Style&lt;/strong&gt;: Realistic, digital painting, anime, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background&lt;/strong&gt;: Solid color, blurred, natural scenery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Facial Expressions&lt;/strong&gt;: Smiling, neutral, surprised.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additional Features&lt;/strong&gt;: Glasses, hats, jewelry, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Generate the Image&lt;/strong&gt;&lt;br&gt;
Press Enter, and within seconds, Gibili will produce a high-resolution portrait based on your input or uploaded image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Refine and Download&lt;/strong&gt;&lt;br&gt;
If you want adjustments, you can tweak your input and regenerate until you’re satisfied. Once ready, download your portrait in high-quality format.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenAI Architecture Behind Gibili
&lt;/h2&gt;

&lt;p&gt;Gibili Portrait Assistance is powered by OpenAI’s advanced image generation models, leveraging diffusion models to create highly detailed and realistic portraits. Here’s a simplified breakdown:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Text-to-Image &amp;amp; Image-to-Image Generation&lt;/strong&gt;&lt;br&gt;
When you provide a text prompt, the AI model translates it into a visual representation using deep learning techniques. If you upload an image, the model can enhance, transform, or stylize it while maintaining its core structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fine-Tuned on Portrait Data&lt;/strong&gt;&lt;br&gt;
The model has been trained on a vast dataset of portraits across different styles, ensuring high accuracy and creativity in generated images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Iterative Refinement&lt;/strong&gt;&lt;br&gt;
Instead of creating the final image instantly, the AI gradually refines it through multiple steps, ensuring greater precision and quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. User-Guided Adjustments&lt;/strong&gt;&lt;br&gt;
Users can modify parameters like style and background, and the model will intelligently adjust the portrait while maintaining coherence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Gibili Portrait Assistance?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ Easy to Use
&lt;/h3&gt;

&lt;p&gt;No need for advanced design skills — just describe what you want or upload an image, and AI does the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎨 Customizable Output
&lt;/h3&gt;

&lt;p&gt;From photorealistic portraits to artistic illustrations, you can tailor the results to your liking.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 Fast &amp;amp; High-Quality
&lt;/h3&gt;

&lt;p&gt;Generate high-resolution images within seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  🖌️ Creative Freedom
&lt;/h3&gt;

&lt;p&gt;Perfect for artists, marketers, and content creators looking for unique visuals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started with Gibili Today!
&lt;/h2&gt;

&lt;p&gt;Ready to create amazing AI-generated portraits? Try Gibili Portrait Assistance now and explore the limitless possibilities of AI-powered creativity!&lt;/p&gt;

</description>
      <category>openai</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Intelligent Loop: A Guide to Modern LLM Agents</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Mon, 24 Feb 2025 06:07:25 +0000</pubDate>
      <link>https://dev.to/angu10/the-intelligent-loop-a-guide-to-modern-llm-agents-1g85</link>
      <guid>https://dev.to/angu10/the-intelligent-loop-a-guide-to-modern-llm-agents-1g85</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Large Language Model (LLM) based AI agents represent a new paradigm in artificial intelligence. Unlike traditional software agents, these systems leverage the powerful capabilities of LLMs to understand, reason, and interact with their environment in more sophisticated ways. This guide will introduce you to the basics of LLM agents and their think-act-observe cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an LLM Agent?
&lt;/h2&gt;

&lt;p&gt;An LLM agent is a system that uses a large language model as its core reasoning engine to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Process natural language instructions&lt;/li&gt;
&lt;li&gt;Make decisions based on context and goals&lt;/li&gt;
&lt;li&gt;Generate human-like responses and actions&lt;/li&gt;
&lt;li&gt;Interact with external tools and APIs&lt;/li&gt;
&lt;li&gt;Learn from interactions and feedback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Think of an LLM agent as an AI assistant who can understand, respond, and take actions in the digital world, like searching the web, writing code, or analyzing data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04c8aj06tfamfymkf6qt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04c8aj06tfamfymkf6qt.png" alt=" " width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Think-Act-Observe Cycle in LLM Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Observe (Input Processing)
&lt;/h3&gt;

&lt;p&gt;LLM agents observe their environment through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Direct user instructions and queries&lt;/li&gt;
&lt;li&gt;Context from previous conversations&lt;/li&gt;
&lt;li&gt;Data from connected tools and APIs&lt;/li&gt;
&lt;li&gt;System prompts and constraints&lt;/li&gt;
&lt;li&gt;Environmental feedback&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Think (LLM Processing)
&lt;/h3&gt;

&lt;p&gt;The thinking phase for LLM agents involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parsing and understanding input context&lt;/li&gt;
&lt;li&gt;Reasoning about the task and requirements&lt;/li&gt;
&lt;li&gt;Planning necessary steps to achieve goals&lt;/li&gt;
&lt;li&gt;Selecting appropriate tools or actions&lt;/li&gt;
&lt;li&gt;Generating natural language responses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The LLM is the "brain," using its trained knowledge to process information and make decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Act (Execution)
&lt;/h3&gt;

&lt;p&gt;LLM agents can take various actions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate text responses&lt;/li&gt;
&lt;li&gt;Call external APIs&lt;/li&gt;
&lt;li&gt;Execute code&lt;/li&gt;
&lt;li&gt;Use specialized tools&lt;/li&gt;
&lt;li&gt;Store and retrieve information&lt;/li&gt;
&lt;li&gt;Request clarification from users&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Components of LLM Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core LLM
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Serves as the primary reasoning engine&lt;/li&gt;
&lt;li&gt;Processes natural language input&lt;/li&gt;
&lt;li&gt;Generates responses and decisions&lt;/li&gt;
&lt;li&gt;Maintains conversation context&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Working Memory
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Stores conversation history&lt;/li&gt;
&lt;li&gt;Maintains current context&lt;/li&gt;
&lt;li&gt;Tracks task progress&lt;/li&gt;
&lt;li&gt;Manages temporary information&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Tool Use
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;API integrations&lt;/li&gt;
&lt;li&gt;Code execution capabilities&lt;/li&gt;
&lt;li&gt;Data processing tools&lt;/li&gt;
&lt;li&gt;External knowledge bases&lt;/li&gt;
&lt;li&gt;File manipulation utilities&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Planning System
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Task decomposition&lt;/li&gt;
&lt;li&gt;Step-by-step reasoning&lt;/li&gt;
&lt;li&gt;Goal tracking&lt;/li&gt;
&lt;li&gt;Error handling and recovery&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Types of LLM Agent Architectures
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Simple Agents
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Single LLM with basic tool access&lt;/li&gt;
&lt;li&gt;Direct input-output processing&lt;/li&gt;
&lt;li&gt;Limited memory and context&lt;/li&gt;
&lt;li&gt;Example: Basic chatbots with API access&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ReAct Agents
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Reasoning and Acting framework&lt;/li&gt;
&lt;li&gt;Step-by-step thought process&lt;/li&gt;
&lt;li&gt;Explicit action planning&lt;/li&gt;
&lt;li&gt;Self-reflection capabilities&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Chain-of-Thought Agents
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Detailed reasoning steps&lt;/li&gt;
&lt;li&gt;Complex problem decomposition&lt;/li&gt;
&lt;li&gt;Transparent decision-making&lt;/li&gt;
&lt;li&gt;Better error handling&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Multi-Agent Systems
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Multiple LLM agents working together&lt;/li&gt;
&lt;li&gt;Specialized roles and capabilities&lt;/li&gt;
&lt;li&gt;Inter-agent communication&lt;/li&gt;
&lt;li&gt;Collaborative problem-solving&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Common Applications
&lt;/h2&gt;

&lt;p&gt;LLM agents are increasingly used for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Personal assistance and task automation&lt;/li&gt;
&lt;li&gt;Code generation and debugging&lt;/li&gt;
&lt;li&gt;Data analysis and research&lt;/li&gt;
&lt;li&gt;Content creation and editing&lt;/li&gt;
&lt;li&gt;Customer service and support&lt;/li&gt;
&lt;li&gt;Process automation and workflow management&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Best Practices for LLM Agent Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Clear Instructions
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Provide explicit system prompts&lt;/li&gt;
&lt;li&gt;Define constraints and limitations&lt;/li&gt;
&lt;li&gt;Specify available tools and capabilities&lt;/li&gt;
&lt;li&gt;Set clear success criteria&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Effective Memory Management
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Implement efficient context tracking&lt;/li&gt;
&lt;li&gt;Prioritize relevant information&lt;/li&gt;
&lt;li&gt;Clean up unnecessary data&lt;/li&gt;
&lt;li&gt;Maintain conversation coherence&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Robust Tool Integration
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Define clear tool interfaces&lt;/li&gt;
&lt;li&gt;Handle API errors gracefully&lt;/li&gt;
&lt;li&gt;Validate tool outputs&lt;/li&gt;
&lt;li&gt;Monitor resource usage&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Safety and Control
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Implement ethical guidelines&lt;/li&gt;
&lt;li&gt;Add safety checks and filters&lt;/li&gt;
&lt;li&gt;Monitor agent behavior&lt;/li&gt;
&lt;li&gt;Maintain user control&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>Ever Wonder How AI "Sees" Like You Do? A Beginner's Guide to Attention</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Wed, 19 Feb 2025 02:05:21 +0000</pubDate>
      <link>https://dev.to/angu10/ever-wonder-how-ai-sees-like-you-do-a-beginners-guide-to-attention-30m3</link>
      <guid>https://dev.to/angu10/ever-wonder-how-ai-sees-like-you-do-a-beginners-guide-to-attention-30m3</guid>
      <description>&lt;h2&gt;
  
  
  Understanding Attention in Large Language Models: A Beginner's Guide
&lt;/h2&gt;

&lt;p&gt;Have you ever wondered how ChatGPT or other AI models can understand and respond to your messages so well? The secret lies in a mechanism called ATTENTION - a crucial component that helps these models understand relationships between words and generate meaningful responses. Let's break it down in simple terms!&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Attention?
&lt;/h3&gt;

&lt;p&gt;Imagine you're reading a long sentence: "The cat sat on the mat because it was comfortable." When you read "it," your brain naturally connects back to either "the cat" or "the mat" to understand what "it" refers to. This is exactly what attention does in AI models - it helps the model figure out which words are related to each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Does Attention Work?
&lt;/h3&gt;

&lt;p&gt;The attention mechanism works like a spotlight that can focus on different words when processing each word in a sentence. Here's a simple breakdown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; For each word, the model calculates how important every other word is in relation to it.&lt;/li&gt;
&lt;li&gt; It then uses these importance scores to create a weighted combination of all words.&lt;/li&gt;
&lt;li&gt; This helps the model understand context and relationships between words.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's visualize this with an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnmsyiealc4pjtqe8ufm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnmsyiealc4pjtqe8ufm.png" alt="Image description" width="800" height="1309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this diagram, the word "it" is paying attention to all other words in the sentence. The thickness of the arrows could represent the attention weights. The model would likely assign higher attention weights to "cat" and "mat" to determine which one "it" refers to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Head Attention: Looking at Things from Different Angles
&lt;/h3&gt;

&lt;p&gt;In modern language models, we don't just use one attention mechanism - we use several in parallel! This is called Multi-Head Attention. Each "head" can focus on different types of relationships between words.&lt;/p&gt;

&lt;p&gt;Let's consider the sentence: The chef who won the competition prepared a delicious meal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Head 1 could focus on subject-verb relationships (chef - prepared)&lt;/li&gt;
&lt;li&gt;  Head 2 might attend to adjective-noun pairs (delicious - meal)&lt;/li&gt;
&lt;li&gt;  Head 3 could look at broader context (competition - meal)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg4zwcmb6dz3jfu31p40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg4zwcmb6dz3jfu31p40.png" alt="Image description" width="800" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This multi-headed approach helps the model understand text from different perspectives, just like how we humans might read a sentence multiple times to understand different aspects of its meaning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Attention Matters
&lt;/h3&gt;

&lt;p&gt;Attention mechanisms have revolutionized natural language processing because they:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Handle long-range dependencies better than previous methods.&lt;/li&gt;
&lt;li&gt; Can process input sequences in parallel.&lt;/li&gt;
&lt;li&gt; Create interpretable connections between words.&lt;/li&gt;
&lt;li&gt; Allow models to focus on relevant information while ignoring irrelevant parts.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Recent Developments and Research
&lt;/h3&gt;

&lt;p&gt;The field of LLMs is rapidly evolving, with new techniques and insights emerging regularly. Here are a few areas of active research:&lt;/p&gt;

&lt;h4&gt;
  
  
  Contextual Hallucinations
&lt;/h4&gt;

&lt;p&gt;Large language models (LLMs) can sometimes hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context.&lt;/p&gt;

&lt;p&gt;The Lookback Lens technique analyzes attention patterns to detect when a model might be generating information not present in the input context.&lt;/p&gt;

&lt;h4&gt;
  
  
  Extending Context Window
&lt;/h4&gt;

&lt;p&gt;Researchers are working on extending the context window sizes of LLMs, allowing them to process longer text sequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;While the math behind attention mechanisms can be complex, the core idea is simple: help the model focus on the most relevant parts of the input when processing each word. This allows language models to understand the context and relationships between words better, leading to more accurate and coherent responses.&lt;/p&gt;

&lt;p&gt;Remember, this is just a high-level overview - there's much more to learn about attention mechanisms! Hopefully, this will give you a good foundation for understanding how modern AI models process and understand text.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>chatgpt</category>
      <category>learning</category>
    </item>
    <item>
      <title>A Step-by-Step Guide to LLM Function Calling in Python</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Wed, 12 Feb 2025 23:06:38 +0000</pubDate>
      <link>https://dev.to/angu10/a-step-by-step-guide-to-llm-function-calling-in-python-4pg7</link>
      <guid>https://dev.to/angu10/a-step-by-step-guide-to-llm-function-calling-in-python-4pg7</guid>
      <description>&lt;p&gt;Function calling allows Claude to interact with external functions and tools in a structured way. This guide will walk you through implementing function calling with Claude using Python, complete with examples and best practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To get started, you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.7+&lt;/li&gt;
&lt;li&gt;anthropic Python package&lt;/li&gt;
&lt;li&gt;A valid API key from Anthropic&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Basic Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from anthropic import Anthropic
import json
# Initialize the client
anthropic = Anthropic(api_key='your-api-key')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Defining Functions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function_schema = {
    "name": "get_weather",
    "description": "Get the current weather for a specific location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name or coordinates"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making Function Calls
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A Step-by-Step Guide to LLM Function Calling in Python
Function calling allows Claude to interact with external functions and tools in a structured way. This guide will walk you through implementing function calling with Claude using Python, complete with examples and best practices.
Prerequisites
To get started, you'll need:
Python 3.7+
anthropic Python package
A valid API key from Anthropic

Basic Setup
from anthropic import Anthropic
import json
# Initialize the client
anthropic = Anthropic(api_key='your-api-key')
Defining Functions
function_schema = {
    "name": "get_weather",
    "description": "Get the current weather for a specific location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name or coordinates"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    }
}
Making Function Calls
def get_weather(location, unit="celsius"):
    # This is a mock implementation but you can all call your API
    return {
        "location": location,
        "temperature": 22 if unit == "celsius" else 72,
        "conditions": "sunny"
    }
def process_function_call(message):
    try:
        # Parse the function call parameters
        params = json.loads(message.content)
        # Call the appropriate function
        if message.name == "get_weather":
            result = get_weather(**params)
            return json.dumps(result)
        else:
            raise ValueError(f"Unknown function: {message.name}")
    except Exception as e:
        return json.dumps({"error": str(e)})
# Example conversation with function calling
messages = [
    {
        "role": "user",
        "content": "What's the weather like in Paris?"
    }
]
while True:
    response = anthropic.messages.create(
        model="claude-3-5-haiku-latest",
        messages=messages,
        tools=[function_schema]
    )
    # Check if Claude wants to call a function
    if response.tool_calls:
        for tool_call in response.tool_calls:
            # Execute the function
            result = process_function_call(tool_call)
            # Add the function result to the conversation
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.name,
                "content": result
            })
    else:
        # Normal response - print and break
        print(response.content)
        break
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Clear Function Descriptions&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Write detailed descriptions for your functions&lt;/li&gt;
&lt;li&gt;Specify parameter types and constraints clearly&lt;/li&gt;
&lt;li&gt;Include examples in the descriptions when helpful&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Input Validation&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Validate all function inputs before processing&lt;/li&gt;
&lt;li&gt;Return meaningful error messages&lt;/li&gt;
&lt;li&gt;Handle edge cases gracefully&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Response Formatting&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Return consistent JSON structures&lt;/li&gt;
&lt;li&gt;Include status indicators in responses&lt;/li&gt;
&lt;li&gt;Format error messages uniformly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4 . Security Considerations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate and sanitize all inputs&lt;/li&gt;
&lt;li&gt;Implement rate limiting if needed&lt;/li&gt;
&lt;li&gt;Use appropriate authentication&lt;/li&gt;
&lt;li&gt;Don't expose sensitive information in function descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Function calling with Claude enables powerful integrations between the language model and external tools. By following these best practices and implementing proper error handling, you can create robust and reliable function-calling implementations.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>python</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
