<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yash Desai</title>
    <description>The latest articles on DEV Community by Yash Desai (@yashddesai).</description>
    <link>https://dev.to/yashddesai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1866199%2F29204bca-4d64-4873-85c7-74ca0b1bcb48.jpg</url>
      <title>DEV Community: Yash Desai</title>
      <link>https://dev.to/yashddesai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yashddesai"/>
    <language>en</language>
    <item>
      <title>The Ultimate Showdown: Grok Code Fast 1 vs Claude Sonnet 4 - Which AI Coding Assistant Will Win Your Heart (and Wallet)?</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Mon, 08 Sep 2025 11:01:08 +0000</pubDate>
      <link>https://dev.to/yashddesai/the-ultimate-showdown-grok-code-fast-1-vs-claude-sonnet-4-which-ai-coding-assistant-will-win-180n</link>
      <guid>https://dev.to/yashddesai/the-ultimate-showdown-grok-code-fast-1-vs-claude-sonnet-4-which-ai-coding-assistant-will-win-180n</guid>
      <description>&lt;p&gt;&lt;em&gt;The AI coding wars just got a major plot twist, and developers are choosing sides faster than you can say "Hello World"&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The race for the best AI coding assistant has reached fever pitch in 2025, and two titans have emerged from the battlefield: &lt;strong&gt;xAI's Grok Code Fast 1&lt;/strong&gt; and &lt;strong&gt;Anthropic's Claude Sonnet 4&lt;/strong&gt;. If you're a developer wondering which one deserves your precious time (and hard-earned money), you've landed in the right place.&lt;/p&gt;

&lt;p&gt;After diving deep into benchmarks, real-world testing, and developer feedback from across the internet, I'm here to break down everything you need to know about these two coding powerhouses. Spoiler alert: the "winner" might surprise you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Speed Demon vs The Perfectionist: Setting the Stage
&lt;/h2&gt;

&lt;p&gt;Picture this: You're deep in a coding session at 2 AM, trying to debug that stubborn function that's been haunting your dreams. Do you want lightning-fast suggestions that keep you in the flow, or do you prefer thoughtful, near-perfect code that might take a few extra seconds?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grok Code Fast 1&lt;/strong&gt; is the adrenaline junkie of the AI coding world – built for speed, priced for accessibility, and designed to keep you in the zone. &lt;strong&gt;Claude Sonnet 4&lt;/strong&gt;, on the other hand, is the meticulous craftsperson who thinks before speaking and rarely makes mistakes.&lt;/p&gt;

&lt;p&gt;But which approach actually wins in the trenches of real-world development? Let's find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 1: Performance Benchmarks - The Numbers Game
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SWE-Bench Verified: The Gold Standard
&lt;/h3&gt;

&lt;p&gt;When it comes to solving real-world software engineering tasks, the numbers tell an interesting story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt;: &lt;strong&gt;72.7%&lt;/strong&gt; accuracy on SWE-Bench Verified&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok Code Fast 1&lt;/strong&gt;: &lt;strong&gt;70.8%&lt;/strong&gt; accuracy on SWE-Bench Verified&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a relatively small gap for such different philosophies. Claude edges ahead, but Grok is breathing down its neck while being significantly faster and cheaper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed: Where Grok Shines
&lt;/h3&gt;

&lt;p&gt;Here's where things get exciting. Grok Code Fast 1 processes at &lt;strong&gt;92 tokens per second&lt;/strong&gt; with a &lt;strong&gt;256,000 token context window&lt;/strong&gt;. Developers using tools like Cursor and Cline report something fascinating: responses come back so fast that they had to change their entire workflow.&lt;/p&gt;

&lt;p&gt;One developer on Reddit put it perfectly: &lt;em&gt;"It's not long enough for you to context switch to something else, but fast enough to keep you in flow state."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Claude Sonnet 4, while not slow, operates at a more measured pace – especially when using its extended thinking mode that can process up to &lt;strong&gt;64,000 tokens of internal reasoning&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 2: Pricing - David vs Goliath
&lt;/h2&gt;

&lt;p&gt;This is where Grok Code Fast 1 delivers a knockout punch:&lt;/p&gt;

&lt;h3&gt;
  
  
  Grok Code Fast 1 Pricing:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input tokens&lt;/strong&gt;: $0.20 per million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens&lt;/strong&gt;: $1.50 per million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached tokens&lt;/strong&gt;: $0.02 per million tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Claude Sonnet 4 Pricing:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input tokens&lt;/strong&gt;: $3.00 per million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens&lt;/strong&gt;: $15.00 per million tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The math is brutal&lt;/strong&gt;: Grok is &lt;strong&gt;84% cheaper&lt;/strong&gt; than Claude for most use cases. For a typical development workflow, you could run Grok for months at the cost of a few days with Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 3: Real-World Coding Capabilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Complex Code Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Grok Code Fast 1&lt;/strong&gt; excels at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid prototyping&lt;/strong&gt; with its massive context window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete REST API generation&lt;/strong&gt; with proper error handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time debugging&lt;/strong&gt; with visible reasoning traces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy code refactoring&lt;/strong&gt; into clean, modular functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One fascinating feature is Grok's &lt;strong&gt;visible reasoning traces&lt;/strong&gt; – you can actually see how it's thinking through problems, making it easier to guide and correct when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt; dominates in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex architectural planning&lt;/strong&gt; with extended thinking mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-feature application development&lt;/strong&gt; (reducing navigation errors from 20% to near zero)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-grade code quality&lt;/strong&gt; with exceptional instruction following&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sophisticated system design&lt;/strong&gt; requiring deep reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Extended Thinking Advantage
&lt;/h3&gt;

&lt;p&gt;Claude Sonnet 4's extended thinking mode is like having a senior developer who thinks out loud before coding. It can use up to 64,000 tokens of internal reasoning, working through problems step-by-step before delivering solutions. This makes it particularly powerful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex architectural decisions&lt;/li&gt;
&lt;li&gt;Large-scale refactoring projects&lt;/li&gt;
&lt;li&gt;Mission-critical code that requires high reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Round 4: Developer Experience - The Human Factor
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Team Grok: Speed Addicts
&lt;/h3&gt;

&lt;p&gt;Developers using Grok Code Fast 1 report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Addictive interactive development&lt;/strong&gt; due to near-instant responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Excellent for "fast-draft" coding&lt;/strong&gt; where speed matters more than perfection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Great for pair programming sessions&lt;/strong&gt; and rapid iteration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Perfect for budget-constrained environments&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 314-billion parameter Mixture-of-Experts architecture means you get specialized routing for different coding tasks while maintaining that blazing speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Team Claude: Quality Perfectionists
&lt;/h3&gt;

&lt;p&gt;Claude Sonnet 4 users consistently mention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Superior code quality&lt;/strong&gt; with fewer bugs on first attempt&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Excellent for "explain-and-refine" workflows&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better for enterprise environments&lt;/strong&gt; where reliability is paramount&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outstanding at following complex, multi-step instructions&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies like GitHub, Sourcegraph, and Cursor have specifically praised Claude Sonnet 4's performance in production environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 5: The Surprise Weaknesses
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Grok's Achilles Heel
&lt;/h3&gt;

&lt;p&gt;Despite its strengths, Grok Code Fast 1 showed surprising weakness in certain areas. In independent testing, it scored just &lt;strong&gt;1 out of 10&lt;/strong&gt; on Tailwind CSS v3 tasks – a typically easy challenge for top-tier models. This suggests potential gaps in training on specific frameworks or smaller model size limitations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude's Trade-offs
&lt;/h3&gt;

&lt;p&gt;Claude Sonnet 4's main weakness? &lt;strong&gt;Cost&lt;/strong&gt;. At 15x more expensive than Grok for output tokens, it's simply not accessible for many developers, especially for high-volume use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: It's Not What You'd Expect
&lt;/h2&gt;

&lt;p&gt;After analyzing hundreds of data points, developer reviews, and real-world use cases, here's the surprising truth: &lt;strong&gt;there's no universal winner&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Grok Code Fast 1 if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You value &lt;strong&gt;speed and interactivity&lt;/strong&gt; above all else&lt;/li&gt;
&lt;li&gt;You're working on &lt;strong&gt;rapid prototyping&lt;/strong&gt; or iterative development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget constraints&lt;/strong&gt; are a major factor&lt;/li&gt;
&lt;li&gt;You prefer &lt;strong&gt;transparent reasoning&lt;/strong&gt; you can guide and adjust&lt;/li&gt;
&lt;li&gt;You're doing &lt;strong&gt;high-volume coding&lt;/strong&gt; where costs add up quickly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Claude Sonnet 4 if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need &lt;strong&gt;maximum accuracy&lt;/strong&gt; for complex, mission-critical projects&lt;/li&gt;
&lt;li&gt;You're working on &lt;strong&gt;large-scale architecture&lt;/strong&gt; or enterprise applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code quality and reliability&lt;/strong&gt; are more important than speed&lt;/li&gt;
&lt;li&gt;You can justify the &lt;strong&gt;premium pricing&lt;/strong&gt; for superior performance&lt;/li&gt;
&lt;li&gt;You need &lt;strong&gt;extended reasoning&lt;/strong&gt; for sophisticated problem-solving&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Future is Multi-Model
&lt;/h2&gt;

&lt;p&gt;Here's a pro tip from the trenches: the smartest developers aren't picking sides – they're using both. Grok for rapid iteration and prototyping, Claude for architectural decisions and critical code reviews.&lt;/p&gt;

&lt;p&gt;Tools like Cursor are already supporting multiple models, and the trend toward &lt;strong&gt;model-agnostic development environments&lt;/strong&gt; is accelerating. Why limit yourself to one when you can have the best of both worlds?&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Development Workflow
&lt;/h2&gt;

&lt;p&gt;The emergence of these two distinct approaches signals a maturation in the AI coding space. We're moving beyond the "one-size-fits-all" mentality toward specialized tools for specific use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For individual developers&lt;/strong&gt;: Start with Grok Code Fast 1 for daily coding tasks and use Claude Sonnet 4 for complex problem-solving when accuracy matters most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For teams&lt;/strong&gt;: Consider hybrid approaches where different models serve different roles in your development pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For enterprises&lt;/strong&gt;: The cost-effectiveness of Grok makes it viable for organization-wide deployment, while Claude's reliability makes it perfect for critical systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The AI coding assistant revolution isn't slowing down – it's just getting started. Both Grok Code Fast 1 and Claude Sonnet 4 represent significant leaps forward, each optimized for different aspects of the development experience.&lt;/p&gt;

&lt;p&gt;The real winner? Developers. We now have powerful, accessible AI coding assistants that can dramatically boost productivity, whether you prioritize speed, accuracy, or cost-effectiveness.&lt;/p&gt;

&lt;p&gt;The future of coding is collaborative, intelligent, and more accessible than ever. The question isn't which model is better – it's how you'll use these tools to build the next generation of software.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want to stay updated on the latest AI developments and implementation strategies? Connect with me on &lt;a href="https://www.linkedin.com/in/yash-d-desai" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or check out my other technical deep-dives at &lt;a href="https://yashddesai.com" rel="noopener noreferrer"&gt;yashddesai.com&lt;/a&gt;. You can also follow my ongoing AI experiments and tutorials at &lt;a href="https://dev.to/yashddesai"&gt;dev.to/yashddesai&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: #ai #coding #grok #claude #artificial-intelligence #developer-tools #programming #software-development #machine-learning #productivity&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>llm</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>The Ultimate AI Coding Grok Code Fast 1 vs GPT-5 High vs Claude Sonnet 4 – Which One Is Actually Faster?</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Sat, 30 Aug 2025 10:03:05 +0000</pubDate>
      <link>https://dev.to/yashddesai/the-ultimate-ai-coding-grok-code-fast-1-vs-gpt-5-high-vs-claude-sonnet-4-which-one-is-actually-13fg</link>
      <guid>https://dev.to/yashddesai/the-ultimate-ai-coding-grok-code-fast-1-vs-gpt-5-high-vs-claude-sonnet-4-which-one-is-actually-13fg</guid>
      <description>&lt;p&gt;The AI coding assistant war has reached a fever pitch in 2025, and developers everywhere are asking the same question: &lt;strong&gt;which model should I bet my productivity on?&lt;/strong&gt; After diving deep into the latest releases from xAI, OpenAI, and Anthropic, I've got some surprising findings that might change how you think about AI-powered development.&lt;/p&gt;

&lt;p&gt;Let's be honest – we're not just looking for another chatbot that can write Hello World. We need AI that can keep up with our chaotic development workflows, understand our messy codebases, and actually help us ship features faster. The three contenders couldn't be more different in their approaches, and the results will surprise you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Speed Demon: Grok Code Fast 1 Changes Everything
&lt;/h2&gt;

&lt;p&gt;When xAI dropped Grok Code Fast 1 in August 2025, they weren't just releasing another coding model – they were making a statement about speed. This thing processes at &lt;strong&gt;92 tokens per second&lt;/strong&gt; and costs a jaw-dropping &lt;strong&gt;$0.20 per million input tokens&lt;/strong&gt;. To put that in perspective, that's 84% cheaper than GPT-5 High and 93% cheaper than Claude Sonnet 4.&lt;/p&gt;

&lt;p&gt;But here's what blew my mind: developers using Grok Code Fast 1 in tools like Cursor and Cline are reporting they had to &lt;strong&gt;change their entire workflow&lt;/strong&gt; because the model responds so fast. One developer on Hacker News put it perfectly: "It's not long enough for you to context switch to something else, but fast enough to keep you in flow state."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes Grok Code Fast 1 Special?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;314B parameter MoE architecture&lt;/strong&gt; built specifically for agentic coding workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;256K token context window&lt;/strong&gt; that can handle massive codebases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visible reasoning traces&lt;/strong&gt; – you can actually see how it's thinking through problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;70.8% on SWE-Bench Verified&lt;/strong&gt; – solid performance on real-world coding tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache hit rates above 90%&lt;/strong&gt; in typical development workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model was quietly released under the codename "Sonic" (how fitting!) and has been getting rave reviews from developers who value rapid iteration over perfect first attempts. It's not the smartest model in the lineup, but it's the one that might actually change how you work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reasoning Powerhouse: GPT-5 High Takes No Prisoners
&lt;/h2&gt;

&lt;p&gt;OpenAI's GPT-5 High is the crown jewel of coding models, achieving &lt;strong&gt;74.9% on SWE-Bench Verified&lt;/strong&gt; – the highest score in our comparison. With a massive &lt;strong&gt;400K token context window&lt;/strong&gt; and hybrid reasoning architecture, this model is built for the most complex coding challenges.&lt;/p&gt;

&lt;p&gt;But here's the catch that's been driving developers crazy: GPT-5's "thinking mode" can sometimes run for 15-30 minutes on complex problems, only to produce unusable output. One frustrated developer tweeted: "GPT-5 ran for 20 minutes and the output was completely bugged. I switched to Sonnet 4 and it fixed it in two prompts."&lt;/p&gt;

&lt;h3&gt;
  
  
  When GPT-5 High Shines:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex architectural decisions&lt;/strong&gt; requiring deep reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-step problem solving&lt;/strong&gt; across large codebases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance optimization&lt;/strong&gt; and security analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal projects&lt;/strong&gt; involving code and visual elements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-level code quality&lt;/strong&gt; requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model excels when you need PhD-level reasoning, but it's overkill for everyday coding tasks. Think of it as the senior architect on your team – brilliant for complex challenges, but you wouldn't ask them to fix a simple CSS bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reliable Workhorse: Claude Sonnet 4 Strikes the Balance
&lt;/h2&gt;

&lt;p&gt;Anthropic's Claude Sonnet 4 has earned a reputation as the "Goldilocks" of coding models – not too fast, not too slow, but just right for most development workflows. Scoring &lt;strong&gt;72.7% on SWE-Bench Verified&lt;/strong&gt;, it consistently delivers reliable, production-ready code with fewer errors than its competitors.&lt;/p&gt;

&lt;p&gt;What sets Claude apart is its &lt;strong&gt;instruction-following precision&lt;/strong&gt;. Developers consistently report that Claude "gets it right on the first try" more often than other models, especially for complex requirements that span multiple files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Sonnet 4's Sweet Spots:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;200K context window&lt;/strong&gt; with extended thinking capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superior error handling&lt;/strong&gt; and defensive coding practices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent performance&lt;/strong&gt; across long development sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise reliability&lt;/strong&gt; for production systems&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better at understanding complex file relationships&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One Visual Studio user shared their experience: "Claude Sonnet 4 consistently delivers faster responses and acts like a true coding agent, actually implementing fixes rather than just explaining what needs to be done."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real-World Performance Battle
&lt;/h2&gt;

&lt;p&gt;Here's where things get interesting. The benchmark scores tell one story, but developer experiences reveal another:&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed vs Quality Trade-offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Grok Code Fast 1&lt;/strong&gt; is revolutionizing rapid prototyping. Developers report they can iterate on UI components and debug issues at unprecedented speed. The model's transparency through visible reasoning traces makes it excellent for learning and understanding code patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5 High&lt;/strong&gt; excels when you need that first attempt to be nearly perfect. For complex refactoring, architecture decisions, or tackling technical debt, its superior reasoning often saves time despite slower responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt; hits the productivity sweet spot. It's fast enough to maintain flow state but thorough enough to produce maintainable, bug-free code. It's the model you'd choose if you could only pick one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Reality Check
&lt;/h3&gt;

&lt;p&gt;The pricing differences create distinct value propositions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Grok Code Fast 1&lt;/strong&gt;: $0.20/$1.50 per million tokens (input/output)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5 High&lt;/strong&gt;: $1.25/$10.00 per million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt;: $3.00/$15.00 per million tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For high-volume development teams, Grok's pricing advantage compounds quickly. But for complex projects requiring minimal iterations, the premium models can actually be more cost-effective overall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Model Fits Your Workflow?
&lt;/h2&gt;

&lt;p&gt;After extensive testing and community feedback, here's my honest recommendation:&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Grok Code Fast 1 if you:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Value speed and cost efficiency above all&lt;/li&gt;
&lt;li&gt;Work on rapid prototyping and experimentation&lt;/li&gt;
&lt;li&gt;Need transparent reasoning for learning&lt;/li&gt;
&lt;li&gt;Handle high-volume, repetitive coding tasks&lt;/li&gt;
&lt;li&gt;Want to maintain flow state during development&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pick GPT-5 High if you:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Need maximum accuracy for complex problems&lt;/li&gt;
&lt;li&gt;Work on enterprise-grade architectural decisions&lt;/li&gt;
&lt;li&gt;Handle multimodal development projects&lt;/li&gt;
&lt;li&gt;Require deep reasoning for performance optimization&lt;/li&gt;
&lt;li&gt;Can afford to wait for premium quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Go with Claude Sonnet 4 if you:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Want balanced performance across all metrics&lt;/li&gt;
&lt;li&gt;Need reliable, production-ready code&lt;/li&gt;
&lt;li&gt;Work on sustained development projects&lt;/li&gt;
&lt;li&gt;Value consistency over cutting-edge features&lt;/li&gt;
&lt;li&gt;Prefer methodical, systematic assistance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line: Context Matters More Than Benchmarks
&lt;/h2&gt;

&lt;p&gt;Here's what the benchmarks don't tell you: the "best" coding AI depends entirely on your specific context. A startup racing to MVP might thrive with Grok's speed and cost efficiency. An enterprise team maintaining critical systems might need Claude's reliability. A research team pushing technical boundaries might require GPT-5's reasoning depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want to stay updated on the latest AI developments and implementation strategies?&lt;/strong&gt; Connect with me on &lt;a href="https://www.linkedin.com/in/yash-d-desai" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or check out my other technical deep-dives at &lt;a href="https://yashddesai.com" rel="noopener noreferrer"&gt;yashddesai.com&lt;/a&gt;. You can also follow my ongoing AI experiments and tutorials at &lt;a href="https://dev.to/yashddesai"&gt;dev.to/yashddesai&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>code</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>The AI Revolution Hits Warp Speed: August 2025's Game-Changing Breakthroughs That Are Reshaping Tech</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Sat, 30 Aug 2025 09:28:45 +0000</pubDate>
      <link>https://dev.to/yashddesai/the-ai-revolution-hits-warp-speed-august-2025s-game-changing-breakthroughs-that-are-reshaping-tech-101h</link>
      <guid>https://dev.to/yashddesai/the-ai-revolution-hits-warp-speed-august-2025s-game-changing-breakthroughs-that-are-reshaping-tech-101h</guid>
      <description>&lt;p&gt;&lt;em&gt;The first week of August 2025 will go down in history as the moment AI truly reached escape velocity. While most of us were planning summer vacations, the tech giants were busy rewriting the rules of artificial intelligence. What happened in those seven days wasn't just incremental progress—it was a seismic shift that's already changing how we think about AI's role in our lives.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As a fullstack developer working at the intersection of innovation and practical application, I've been closely tracking these developments, and frankly, the pace is breathtaking. Let me walk you through the breakthroughs that are making 2025 the year AI went from impressive to indispensable.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5: The Model That Changes Everything
&lt;/h2&gt;

&lt;p&gt;On August 7, 2025, OpenAI dropped GPT-5 like a digital bombshell, and the reverberations are still being felt across Silicon Valley and beyond. This isn't just another iterative update—it's a fundamental leap forward that's setting new benchmarks for what AI can achieve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Makes GPT-5 Revolutionary:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perfect Math Scores&lt;/strong&gt;: GPT-5 achieved a flawless 100% on competition math tests, while its closest competitor, Google's Gemini 2.5 DeepThink, scored 99.2%. That gap might seem small, but in AI terms, it's massive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System of Models Architecture&lt;/strong&gt;: Unlike previous single-model approaches, GPT-5 operates as a unified system with multiple specialized variants (GPT-5, Mini, Nano) that automatically route queries based on complexity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PhD-Level Performance&lt;/strong&gt;: The model demonstrates "thinking mode" capabilities that enable sophisticated multi-step reasoning, bringing AI closer to human-level problem-solving.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dramatic Hallucination Reduction&lt;/strong&gt;: Through a new "safe completions" training method, GPT-5 significantly reduces fabricated responses while maintaining helpfulness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's what really caught my attention as a developer: GPT-5's coding performance is setting new standards. It's not just writing code—it's architecting entire applications from simple prompts, debugging complex systems, and refactoring legacy codebases with an understanding that feels almost intuitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Competition Heats Up: Claude Opus 4.1 Enters the Arena
&lt;/h2&gt;

&lt;p&gt;Not to be outdone, Anthropic released Claude Opus 4.1 on August 5, just days before GPT-5's launch. While positioned as a "drop-in replacement" for Opus 4, the improvements are anything but incremental:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Agentic Capabilities&lt;/strong&gt;: 74.5% performance on SWE-bench Verified, showcasing superior real-world coding abilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Safety Features&lt;/strong&gt;: Advanced safeguards including the ability to end abusive conversations to protect model "welfare"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision in Code Refactoring&lt;/strong&gt;: GitHub notes that Opus 4.1 excels at pinpointing exact corrections in large codebases without introducing bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's fascinating is Anthropic's approach to AI safety. They're implementing "model welfare" protections—not because they believe Claude is sentient, but as a precautionary measure for potential future scenarios. It's forward-thinking safety engineering at its finest.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rise of AI Agents: From Tools to Digital Colleagues
&lt;/h2&gt;

&lt;p&gt;Perhaps the most transformative trend emerging from August 2025 is the maturation of AI agents. These aren't just chatbots—they're autonomous digital entities capable of handling complex, multi-step workflows with minimal human intervention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Numbers Tell the Story:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The AI agents market is projected to grow from $7.38 billion in 2025 to $47.1 billion by 2030&lt;/li&gt;
&lt;li&gt;One Australian business adopts AI every three minutes, according to AWS research&lt;/li&gt;
&lt;li&gt;21 AI-designed drugs have made it through Phase I trials with 80-90% success rates versus 50-70% for traditional drugs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Microsoft's Bold Move: MAI-Voice-1 and MAI-1 Preview
&lt;/h3&gt;

&lt;p&gt;Microsoft made a strategic play on August 29 with the release of two proprietary models—MAI-Voice-1 and MAI-1 Preview. This signals Microsoft's intent to reduce dependence on OpenAI and build its own AI stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAI-Voice-1 Capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates one minute of audio in under a second on a single GPU&lt;/li&gt;
&lt;li&gt;Powers Copilot Daily and Podcasts with human-like speech synthesis&lt;/li&gt;
&lt;li&gt;Enables natural conversational experiences across multiple scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This move is particularly significant for the enterprise AI market, as Microsoft's integration across its ecosystem could accelerate AI agent adoption in business workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Doubles Down on Enterprise AI Agents
&lt;/h3&gt;

&lt;p&gt;Amazon Web Services launched Amazon Bedrock AgentCore in July 2025, providing the infrastructure for enterprise-scale AI agent deployment. The platform addresses the "chasm of production readiness" that has prevented many organizations from scaling AI agents beyond proof-of-concept stages.&lt;/p&gt;

&lt;p&gt;Key components include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore Runtime&lt;/strong&gt;: Low-latency serverless environments for agent execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore Observability&lt;/strong&gt;: Step-by-step visualization of agent workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore Identity&lt;/strong&gt;: Secure access controls for business system integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Healthcare: Where AI Is Saving Lives Right Now
&lt;/h2&gt;

&lt;p&gt;The medical field is experiencing perhaps the most dramatic AI transformation. August 2025 brought breakthrough after breakthrough in healthcare applications:&lt;/p&gt;

&lt;h3&gt;
  
  
  AI-Powered Drug Discovery Acceleration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stanford's Virtual Scientists&lt;/strong&gt;: AI teams that can design and validate nanobody strategies against viral variants with minimal human intervention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AlphaGenome&lt;/strong&gt;: Google DeepMind's genetics prediction tool that forecasts gene expression from DNA sequences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical Image Segmentation&lt;/strong&gt;: UC San Diego's AI reduces required training data by 20-fold while improving accuracy by 10-20%&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Diagnostic Revolution
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Skin Cancer Detection&lt;/strong&gt;: Melbourne researchers developed AI systems that diagnose skin cancer in minutes with high accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cardiac Ultrasound&lt;/strong&gt;: Esaote's AI-enhanced systems provide real-time guidance for complex cardiac imaging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuberculosis Research&lt;/strong&gt;: Tufts University's AI creates "death portraits" showing how TB drugs affect bacteria at the cellular level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The impact is already measurable. Microsoft's MAI-DxO diagnostic platform achieved over 85% accuracy in complex medical cases, far surpassing average physician performance in controlled studies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Breakthrough: AlphaEvolve's Scientific Revolution
&lt;/h2&gt;

&lt;p&gt;Google DeepMind's AlphaEvolve, released in May 2025, represents a paradigm shift in AI-driven scientific discovery. This system combines large language model creativity with algorithmic rigor to solve previously intractable problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Improved tensor processing unit designs for Google's AI infrastructure&lt;/li&gt;
&lt;li&gt;0.7% efficiency gains across Google's global computing resources&lt;/li&gt;
&lt;li&gt;Solutions to open mathematics problems that have puzzled researchers for years&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mario Krenn from the Max Planck Institute called it "quite spectacular" and "the first successful demonstration of new discoveries based on general-purpose LLMs."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Revolution: High-Speed, Low-Power AI
&lt;/h2&gt;

&lt;p&gt;Behind all these advances lies a crucial infrastructure breakthrough. The development of high-density optical interfaces (HDI/O) is enabling AI systems to process hundreds of terabytes per second across multi-bay clusters.&lt;/p&gt;

&lt;p&gt;This isn't just technical jargon—it's the foundation that makes real-time AI agents, instant voice synthesis, and complex reasoning possible at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Developers Like Us
&lt;/h2&gt;

&lt;p&gt;As someone building applications in this rapidly evolving landscape, here are the key implications I see:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;The API War Is Real&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With GPT-5, Claude Opus 4.1, and Microsoft's MAI models all competing, we're entering a golden age of AI capabilities. The competition is driving rapid improvements and, crucially, cost reductions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Agent-First Development&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The future isn't just AI-assisted development—it's AI agents as development partners. We need to start thinking about architecting applications that can work symbiotically with autonomous AI systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Multimodal by Default&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Text-only interfaces are becoming legacy. The integration of voice, vision, and reasoning in models like GPT-5 means our applications need to be designed for rich, multimodal interactions from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Safety and Ethics Are Table Stakes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With Anthropic implementing model welfare protections and AWS investing $100 million in agentic AI safety, responsible AI development isn't optional—it's essential for long-term success.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Investment Reality Check
&lt;/h2&gt;

&lt;p&gt;The numbers behind this AI revolution are staggering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft, Alphabet, Amazon, and Meta are investing $320 billion in AI infrastructure in 2025, up from $230 billion in 2024&lt;/li&gt;
&lt;li&gt;40% of CEOs believe their companies need to reinvent themselves to stay competitive in the AI era&lt;/li&gt;
&lt;li&gt;AI job mentions have surged 400% over the past two years&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's the reality check: while AI capabilities are exploding, practical implementation remains challenging. Most organizations aren't "agent-ready" yet, lacking the APIs, data infrastructure, and operational frameworks needed to deploy AI agents effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: The September Sprint
&lt;/h2&gt;

&lt;p&gt;As I write this in late August 2025, the technology community is buzzing with anticipation for September releases. OpenAI has hinted at GPT-6 development with a focus on personalization, while Google is rumored to be preparing Gemini responses to GPT-5's market impact.&lt;/p&gt;

&lt;p&gt;The velocity of innovation is unprecedented, and for developers, the message is clear: the time for AI experimentation is over. The companies and individuals who learn to build with, deploy, and orchestrate AI agents will have significant advantages in the years ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts: Embracing the AI-First Future
&lt;/h2&gt;

&lt;p&gt;August 2025 proved that we're not just witnessing the evolution of AI—we're living through its revolution. The models released this month aren't just better versions of what came before; they're fundamentally different capabilities that enable entirely new categories of applications and experiences.&lt;/p&gt;

&lt;p&gt;As someone who's spent years building at the intersection of technology and human needs, I'm excited by what these breakthroughs enable. But I'm also mindful of the responsibility we have as builders to ensure these powerful tools serve humanity's best interests.&lt;/p&gt;

&lt;p&gt;The future is being written right now, one algorithm at a time. The question isn't whether AI will transform everything—it's how quickly we can adapt and what we'll build with these incredible new capabilities.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want to stay updated on the latest AI developments and their practical applications? Connect with me on &lt;a href="https://www.linkedin.com/in/yash-d-desai" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for regular insights, check out my latest projects at &lt;a href="https://yashddesai.com" rel="noopener noreferrer"&gt;yashddesai.com&lt;/a&gt;, or follow my technical deep-dives on &lt;a href="https://dev.to/yashddesai"&gt;Dev.to&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #AI #MachineLearning #GPT5 #Claude #TechTrends #ArtificialIntelligence #Innovation #FutureTech #AIAgents #SoftwareDevelopment #OpenAI #Anthropic #Microsoft #DeepLearning #TechNews #AIBreakthroughs #DigitalTransformation #EmergingTech #AIResearch #TechInnovation&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Kimi K2: The Game-Changing Open-Source AI That's Rewriting the Rules of Intelligent Development</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Tue, 26 Aug 2025 03:53:38 +0000</pubDate>
      <link>https://dev.to/yashddesai/kimi-k2-the-game-changing-open-source-ai-thats-rewriting-the-rules-of-intelligent-development-2jka</link>
      <guid>https://dev.to/yashddesai/kimi-k2-the-game-changing-open-source-ai-thats-rewriting-the-rules-of-intelligent-development-2jka</guid>
      <description>&lt;p&gt;The AI landscape just witnessed a seismic shift. On July 11, 2025, China's Moonshot AI dropped what many are calling "another DeepSeek moment" with the release of &lt;strong&gt;Kimi K2&lt;/strong&gt; – a revolutionary open-source AI model that's not just competing with industry giants like GPT-4 and Claude, but actually outperforming them in critical coding benchmarks while costing a fraction of the price.&lt;/p&gt;

&lt;p&gt;As developers, we've all been there – wrestling with complex codebases, debugging mysterious errors, or trying to orchestrate multi-step workflows that seem to require an army of tools. What if I told you there's now an AI that doesn't just understand your code but can actually execute, debug, and even automate entire development pipelines autonomously?&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Kimi K2 a Developer's Dream?
&lt;/h2&gt;

&lt;p&gt;Kimi K2 isn't your typical large language model. Built on a &lt;strong&gt;Mixture-of-Experts (MoE) architecture&lt;/strong&gt; with &lt;strong&gt;1 trillion total parameters&lt;/strong&gt; (but only 32 billion active at any time), it's been specifically engineered for what Moonshot calls "agentic intelligence" – the ability to not just respond but to &lt;strong&gt;act&lt;/strong&gt; independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Specifications That Matter
&lt;/h3&gt;

&lt;p&gt;The architecture itself is fascinating from an engineering perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;61 transformer layers&lt;/strong&gt; with &lt;strong&gt;384 experts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-head Latent Attention (MLA)&lt;/strong&gt; supporting &lt;strong&gt;128K token context window&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SwiGLU activation function&lt;/strong&gt; for enhanced reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;160K vocabulary size&lt;/strong&gt; for comprehensive language understanding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MuonClip optimizer&lt;/strong&gt; ensuring stable training at trillion-parameter scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's where it gets exciting for us developers – this isn't just about raw computational power. The MoE design means you're getting the reasoning capabilities of a trillion-parameter model while only paying for 32 billion parameters worth of computation. It's like having a Ferrari that runs on a motorcycle's fuel budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Performance: The Numbers Don't Lie
&lt;/h2&gt;

&lt;p&gt;Let's talk about the elephant in the room – how does Kimi K2 actually perform when the rubber meets the road? The results are genuinely impressive:&lt;/p&gt;

&lt;h3&gt;
  
  
  Coding Benchmarks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SWE-Bench Verified&lt;/strong&gt;: 65.8% (vs GPT-4.1's 54.6%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiveCodeBench v6&lt;/strong&gt;: 53.7% accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACEBench (En)&lt;/strong&gt;: 76.5%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SWE-Bench Multilingual&lt;/strong&gt;: 47.3%&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reasoning and Mathematics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AIME 2025&lt;/strong&gt;: 49.5%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPQA-Diamond&lt;/strong&gt;: 75.1%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OJBench&lt;/strong&gt;: 27.1%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just numbers on a spreadsheet – they represent real-world scenarios where Kimi K2 is solving complex software engineering problems, mathematical reasoning tasks, and multi-step coding challenges that mirror what we face in production environments daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agentic Advantage: Beyond Chat, Into Action
&lt;/h2&gt;

&lt;p&gt;What sets Kimi K2 apart isn't just its technical specs – it's the &lt;strong&gt;agentic capabilities&lt;/strong&gt; that make it feel less like a chatbot and more like an AI pair programmer with superpowers. Unlike traditional models that excel at generating responses, Kimi K2 has been trained to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Execute tools and APIs&lt;/strong&gt; autonomously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write, run, and debug code&lt;/strong&gt; in real-time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Orchestrate complex multi-step workflows&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interact with external systems&lt;/strong&gt; and databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan and execute long-horizon tasks&lt;/strong&gt; without human intervention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Imagine asking Kimi K2 to "analyze our user engagement data, identify bottlenecks, and propose optimizations." Instead of just giving you suggestions, it can actually fetch the data, run the analysis, generate visualizations, and even draft implementation strategies – all in one seamless workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Performance: The Developer Experience
&lt;/h2&gt;

&lt;p&gt;Recent comparative studies reveal some compelling insights about Kimi K2's practical performance. In head-to-head testing against established models:&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Completion Rates
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pointed file changes&lt;/strong&gt;: 100% success rate (4/4 tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bug detection and fixing&lt;/strong&gt;: 80% success rate (4/5 tasks) &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature implementation&lt;/strong&gt;: 100% success rate (4/4 tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend refactoring&lt;/strong&gt;: 100% success rate (2/2 tasks)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Speed and Efficiency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2.5x faster&lt;/strong&gt; average completion time compared to alternatives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;93% overall success rate&lt;/strong&gt; across diverse coding challenges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;89% clean compilation rate&lt;/strong&gt; for generated code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's particularly noteworthy is that Kimi K2 consistently maintained original test logic while fixing underlying issues, rather than taking shortcuts by modifying assertions or hardcoding values – a common pitfall with other models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics of Intelligence: Cost vs. Performance
&lt;/h2&gt;

&lt;p&gt;Here's where Kimi K2 becomes genuinely disruptive. While maintaining competitive (and often superior) performance, the pricing is revolutionary:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kimi K2&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: $0.15 per million tokens&lt;/li&gt;
&lt;li&gt;Output: $2.50 per million tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Compare this to established alternatives&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Opus: $15/$75 per million tokens&lt;/li&gt;
&lt;li&gt;GPT-4: $3/$15 per million tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers working on large-scale applications or conducting extensive AI-assisted development, this represents potential cost savings of &lt;strong&gt;90% or more&lt;/strong&gt; while maintaining or improving output quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Source: The Developer's Paradise
&lt;/h2&gt;

&lt;p&gt;Perhaps the most exciting aspect of Kimi K2 is its &lt;strong&gt;open-source nature&lt;/strong&gt;. Released under a permissive Apache-style license, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full transparency&lt;/strong&gt;: Inspect and understand every parameter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom fine-tuning&lt;/strong&gt;: Adapt the model for specific domains or use cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosting capabilities&lt;/strong&gt;: Deploy on your own infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community contributions&lt;/strong&gt;: Benefit from collective improvements and optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The licensing terms are remarkably developer-friendly – you only need to display "Kimi K2" attribution if your product exceeds 100 million monthly users or $20 million in revenue. For most developers and startups, this is essentially unrestricted usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Innovation: MuonClip Optimizer
&lt;/h2&gt;

&lt;p&gt;One of the most significant technical achievements behind Kimi K2 is the &lt;strong&gt;MuonClip optimizer&lt;/strong&gt;. Training trillion-parameter models has historically been plagued by instability, loss spikes, and training crashes. Moonshot's innovation lies in combining the &lt;strong&gt;Muon optimizer&lt;/strong&gt; with a novel &lt;strong&gt;QK-clip technique&lt;/strong&gt; that addresses attention logit runaway and maintains stable convergence.&lt;/p&gt;

&lt;p&gt;This isn't just academic – it enabled Kimi K2 to be pre-trained on &lt;strong&gt;15.5 trillion tokens with zero loss spikes&lt;/strong&gt;. For developers, this translates to a more reliable, consistent model behavior that won't suddenly generate nonsensical outputs or fail unexpectedly during complex reasoning tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases: Where Kimi K2 Shines
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Large-Scale Legacy Codebase Analysis
&lt;/h3&gt;

&lt;p&gt;With its 128K token context window, Kimi K2 can ingest and reason about massive codebases in a single pass. It excels at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-module dependency analysis&lt;/li&gt;
&lt;li&gt;End-to-end refactoring suggestions&lt;/li&gt;
&lt;li&gt;Legacy system modernization planning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Autonomous Debugging and Testing
&lt;/h3&gt;

&lt;p&gt;The agentic capabilities really shine here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically generates regression tests&lt;/li&gt;
&lt;li&gt;Identifies edge cases before deployment
&lt;/li&gt;
&lt;li&gt;Executes debug cycles without human intervention&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Full-Stack Development Workflows
&lt;/h3&gt;

&lt;p&gt;From database schema design to API implementation to frontend components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scaffolds complete project structures&lt;/li&gt;
&lt;li&gt;Generates CI/CD configurations&lt;/li&gt;
&lt;li&gt;Creates comprehensive documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Research and Prototyping
&lt;/h3&gt;

&lt;p&gt;The 200K word context window makes it ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processing research papers and technical documentation&lt;/li&gt;
&lt;li&gt;Analyzing multiple files simultaneously (up to 50 at once)&lt;/li&gt;
&lt;li&gt;Real-time web search across 100+ websites for current information&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Global Context: A Strategic AI Move
&lt;/h2&gt;

&lt;p&gt;Kimi K2's release represents more than just a technical achievement – it's a strategic geopolitical statement in the global AI race. Backed by Alibaba with a &lt;strong&gt;$1 billion funding round&lt;/strong&gt; and valued at &lt;strong&gt;$2.5 billion&lt;/strong&gt;, Moonshot AI is positioning itself as a transparent alternative to Western closed-source models.&lt;/p&gt;

&lt;p&gt;This transparency extends beyond just open-sourcing the weights. The company has provided detailed technical documentation, training methodologies, and even the infrastructure optimizations that made this scale of training possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: The Future of Agentic AI
&lt;/h2&gt;

&lt;p&gt;Kimi K2 represents what many experts believe is the future direction of AI development – models that don't just understand and generate, but actually &lt;strong&gt;execute&lt;/strong&gt; and &lt;strong&gt;orchestrate&lt;/strong&gt;. The implications for software development are profound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced development cycles&lt;/strong&gt; through intelligent automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced code quality&lt;/strong&gt; through AI-assisted review and testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Democratized access&lt;/strong&gt; to sophisticated development capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower barriers to entry&lt;/strong&gt; for complex software projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started: Your Next Steps
&lt;/h2&gt;

&lt;p&gt;Ready to explore what Kimi K2 can do for your development workflow? Here's how to get started:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Try the web interface&lt;/strong&gt; at kimi.com for immediate access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore the API&lt;/strong&gt; through various providers like Groq, Fireworks, and others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Download the weights&lt;/strong&gt; from the official repository for local deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment with agentic workflows&lt;/strong&gt; by connecting it to your existing tools and APIs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model is available in both &lt;strong&gt;Kimi-K2-Base&lt;/strong&gt; (for custom fine-tuning) and &lt;strong&gt;Kimi-K2-Instruct&lt;/strong&gt; (ready for production use) variants.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Kimi K2 isn't just another AI model – it's a paradigm shift towards truly intelligent, autonomous development assistants. With its combination of superior performance, revolutionary pricing, open-source accessibility, and genuine agentic capabilities, it's positioning itself as the go-to choice for developers who want cutting-edge AI without vendor lock-in or prohibitive costs.&lt;/p&gt;

&lt;p&gt;Whether you're debugging complex systems, architecting new solutions, or pushing the boundaries of what's possible in software development, Kimi K2 offers a glimpse into a future where AI isn't just a tool but a true development partner.&lt;/p&gt;

&lt;p&gt;The age of agentic intelligence has arrived, and it's open source, affordable, and ready to transform how we build software. The question isn't whether you should explore Kimi K2 – it's how quickly you can integrate it into your development workflow to stay ahead of the curve.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Want to stay updated on the latest AI developments and implementation strategies?&lt;/strong&gt; Connect with me on &lt;a href="https://www.linkedin.com/in/yash-d-desai" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or check out my other technical deep-dives at &lt;a href="https://yashddesai.com" rel="noopener noreferrer"&gt;yashddesai.com&lt;/a&gt;. You can also follow my ongoing AI experiments and tutorials at &lt;a href="https://dev.to/yashddesai"&gt;dev.to/yashddesai&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #ai #opensource #llm #machinelearning #coding #development #mixtureofexperts #agentic #moonshot #kimik2 #deeplearning #softwareengineering&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>PydanticAI: A Comprehensive Guide to Building Production-Ready AI Applications</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Sun, 29 Dec 2024 11:42:47 +0000</pubDate>
      <link>https://dev.to/yashddesai/pydanticai-a-comprehensive-guide-to-building-production-ready-ai-applications-20me</link>
      <guid>https://dev.to/yashddesai/pydanticai-a-comprehensive-guide-to-building-production-ready-ai-applications-20me</guid>
      <description>&lt;p&gt;PydanticAI is a &lt;strong&gt;powerful Python framework&lt;/strong&gt; designed to streamline the development of production-grade applications using Generative AI. It is built by the same team behind Pydantic, a widely used data validation library, and aims to bring the innovative and ergonomic design of FastAPI to the field of AI application development. PydanticAI focuses on &lt;strong&gt;type safety, modularity, and seamless integration&lt;/strong&gt; with other Python tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Concepts
&lt;/h2&gt;

&lt;p&gt;PydanticAI revolves around several key concepts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Agents
&lt;/h3&gt;

&lt;p&gt;Agents are the &lt;strong&gt;primary interface&lt;/strong&gt; for interacting with Large Language Models (LLMs). An agent acts as a container for various components, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;System prompts&lt;/strong&gt;: Instructions for the LLM, defined as static strings or dynamic functions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Function tools&lt;/strong&gt;: Functions that the LLM can call to get additional information or perform actions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Structured result types&lt;/strong&gt;: Data types that the LLM must return at the end of a run.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Dependency types&lt;/strong&gt;: Data or services that system prompt functions, tools and result validators may use.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;LLM models&lt;/strong&gt;: The LLM that the agent will use, which can be set at agent creation or at runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents are designed for reusability and are typically instantiated once and reused throughout an application.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Prompts
&lt;/h3&gt;

&lt;p&gt;System prompts are instructions provided to the LLM by the developer. They can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Static system prompts&lt;/strong&gt;: Defined when the agent is created, using the &lt;code&gt;system_prompt&lt;/code&gt; parameter of the &lt;code&gt;Agent&lt;/code&gt; constructor.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Dynamic system prompts&lt;/strong&gt;: Defined by functions decorated with &lt;code&gt;@agent.system_prompt&lt;/code&gt;. These can access runtime information, such as dependencies, via the &lt;code&gt;RunContext&lt;/code&gt; object.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single agent can use both static and dynamic system prompts, which are appended in the order they are defined at runtime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use the customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name while replying to them.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.system_prompt&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_the_users_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@agent.system_prompt&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_the_date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;The date is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the date?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Hello Frank, the date today is 2032-01-02.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Function Tools
&lt;/h3&gt;

&lt;p&gt;Function tools enable LLMs to access external information or perform actions not available within the system prompt itself. Tools can be registered in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;@agent.tool&lt;/code&gt; decorator: For tools that require access to the agent's context via &lt;code&gt;RunContext&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;@agent.tool_plain&lt;/code&gt; decorator: For tools that do not need access to the agent's context.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;tools&lt;/code&gt; keyword argument in &lt;code&gt;Agent&lt;/code&gt; constructor: Can take plain functions or instances of the &lt;code&gt;Tool&lt;/code&gt; class, giving more control over tool definitions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-flash&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a dice game, you should roll the die and see if the number &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;you get back matches the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s guess. If so, tell them they&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a winner. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use the player&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name in the response.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_die&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a six-sided die and return the result.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_player_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get the player&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;

&lt;span class="n"&gt;dice_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;My guess is 4&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anne&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dice_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Congratulations Anne, you guessed correctly! You're a winner!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tool parameters are extracted from the function signature and are used to build the tool's JSON schema. The docstrings of functions are used to generate the descriptions of the tool and the parameter descriptions within the schema.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependencies
&lt;/h3&gt;

&lt;p&gt;Dependencies provide data and services to the agent’s system prompts, tools, and result validators via a dependency injection system. Dependencies are accessed through the &lt;code&gt;RunContext&lt;/code&gt; object. They can be any Python type, but &lt;code&gt;dataclasses&lt;/code&gt; are a convenient way to manage multiple dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyDeps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;http_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AsyncClient&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MyDeps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.system_prompt&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MyDeps&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;http_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyDeps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;foobar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Tell me a joke.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;#&amp;gt; Did you hear about the toothpaste scandal? They called it Colgate.
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Results are the final values returned from an agent run. They are wrapped in &lt;code&gt;RunResult&lt;/code&gt; (for synchronous and asynchronous runs) or &lt;code&gt;StreamedRunResult&lt;/code&gt; (for streamed runs), providing access to usage data and message history. Results can be plain text or structured data and are validated using Pydantic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CityLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-flash&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CityLocation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Where were the olympics held in 2012?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; city='London' country='United Kingdom'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result validators, added via the &lt;code&gt;@agent.result_validator&lt;/code&gt; decorator, provide a way to add further validation logic, particularly when the validation requires IO and is asynchronous.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;p&gt;PydanticAI boasts several key features that make it a compelling choice for AI application development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Model Agnostic&lt;/strong&gt;: PydanticAI supports a variety of LLMs, including OpenAI, Anthropic, Gemini, Ollama, Groq, and Mistral. It also provides a simple interface for implementing support for other models.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Type Safety&lt;/strong&gt;: Designed to work seamlessly with static type checkers like mypy and pyright. It allows for type checking of dependencies and result types.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Python-Centric Design&lt;/strong&gt;: Leverages familiar Python control flow and agent composition to build AI projects, making it easy to apply standard Python practices.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Structured Responses&lt;/strong&gt;: Uses Pydantic to validate and structure model outputs, ensuring consistent responses.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dependency Injection System&lt;/strong&gt;:  Offers a dependency injection system to provide data and services to an agent’s components, enhancing testability and iterative development.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Streamed Responses&lt;/strong&gt;: Supports streaming LLM outputs with immediate validation, allowing for rapid and accurate results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Working with Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Running Agents
&lt;/h3&gt;

&lt;p&gt;Agents can be run in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;run_sync()&lt;/code&gt;: For synchronous execution.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;run()&lt;/code&gt;: For asynchronous execution.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;run_stream()&lt;/code&gt;: For streaming responses.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Synchronous run
&lt;/span&gt;&lt;span class="n"&gt;result_sync&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the capital of Italy?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Rome
&lt;/span&gt;
&lt;span class="c1"&gt;# Asynchronous run
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#&amp;gt; Paris
&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the capital of the UK?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="c1"&gt;#&amp;gt; London
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conversations
&lt;/h3&gt;

&lt;p&gt;An agent run might represent an entire conversation, but conversations can also be composed of multiple runs, especially when maintaining state between interactions. You can pass messages from previous runs using the &lt;code&gt;message_history&lt;/code&gt; argument to continue a conversation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Be a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Tell me a joke.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Did you hear about the toothpaste scandal? They called it Colgate.
&lt;/span&gt;
&lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Explain?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; This is an excellent joke invent by Samuel Colvin, it needs no explanation.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Usage Limits
&lt;/h3&gt;

&lt;p&gt;PydanticAI provides a &lt;code&gt;settings.UsageLimits&lt;/code&gt; structure to limit the number of tokens and requests. You can apply these settings via the &lt;code&gt;usage_limits&lt;/code&gt; argument to the &lt;code&gt;run&lt;/code&gt; functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.settings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UsageLimits&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.exceptions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UsageLimitExceeded&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-latest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result_sync&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the capital of Italy? Answer with a paragraph.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;usage_limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;UsageLimits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_tokens_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;UsageLimitExceeded&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#&amp;gt; Exceeded the response_tokens_limit of 10 (response_tokens=32)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model Settings
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;settings.ModelSettings&lt;/code&gt; structure allows you to fine-tune model behaviour through parameters such as &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;max_tokens&lt;/code&gt;, and &lt;code&gt;timeout&lt;/code&gt;. You can apply these via the &lt;code&gt;model_settings&lt;/code&gt; argument in the &lt;code&gt;run&lt;/code&gt; functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result_sync&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the capital of Italy?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Rome
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Function Tools in Detail
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tool Registration
&lt;/h3&gt;

&lt;p&gt;Tools can be registered using the &lt;code&gt;@agent.tool&lt;/code&gt; decorator (for tools needing context), the &lt;code&gt;@agent.tool_plain&lt;/code&gt; decorator (for tools without context) or via the &lt;code&gt;tools&lt;/code&gt; argument in the &lt;code&gt;Agent&lt;/code&gt; constructor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;

&lt;span class="n"&gt;agent_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-flash&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Schema
&lt;/h3&gt;

&lt;p&gt;Parameter descriptions are extracted from docstrings and added to the tool’s JSON schema. If a tool has a single parameter that can be represented as an object in JSON schema, the schema is simplified to be just that object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ModelMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.models.function&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FunctionModel&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foobar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get me foobar.
    Args:
        a: apple pie
        b: banana cake
        c: carrot smoothie
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dynamic Tools
&lt;/h3&gt;

&lt;p&gt;Tools can be customised with a &lt;code&gt;prepare&lt;/code&gt; function, which is called at each step to modify the tool definition or omit the tool from that step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Union&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolDefinition&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;only_if_42&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;tool_def&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolDefinition&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ToolDefinition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tool_def&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prepare&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;only_if_42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hitchhiker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;testing...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; success (no tool calls)
&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;testing...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; {"hitchhiker":"42 a"}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Messages and Chat History
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Accessing Messages
&lt;/h3&gt;

&lt;p&gt;Messages exchanged during an agent run can be accessed via the &lt;code&gt;all_messages()&lt;/code&gt; and &lt;code&gt;new_messages()&lt;/code&gt; methods on &lt;code&gt;RunResult&lt;/code&gt; and &lt;code&gt;StreamedRunResult&lt;/code&gt; objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Be a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Tell me a joke.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#&amp;gt; Did you hear about the toothpaste scandal? They called it Colgate.
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Message Reuse
&lt;/h3&gt;

&lt;p&gt;Messages can be passed to the &lt;code&gt;message_history&lt;/code&gt; parameter to continue conversations across multiple agent runs. When a &lt;code&gt;message_history&lt;/code&gt; is set and not empty, a new system prompt is not generated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Message Format
&lt;/h3&gt;

&lt;p&gt;The message format is model-independent allowing messages to be used in different agents or with the same agent using different models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging and Monitoring
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pydantic Logfire
&lt;/h3&gt;

&lt;p&gt;PydanticAI integrates with &lt;strong&gt;Pydantic Logfire&lt;/strong&gt;, an observability platform that allows you to monitor and debug your entire application. Logfire can be used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Real-time debugging&lt;/strong&gt;: To see what's happening in your application in real-time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Monitoring application performance&lt;/strong&gt;: Using SQL queries and dashboards.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To use PydanticAI with Logfire, install with the &lt;code&gt;logfire&lt;/code&gt; optional group: &lt;code&gt;pip install 'pydantic-ai[logfire]'&lt;/code&gt;. You then need to configure a Logfire project and authenticate your environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;PydanticAI can be installed using pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A slim install is also available to use specific models, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'pydantic-ai-slim[openai]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Logfire Integration
&lt;/h3&gt;

&lt;p&gt;To use PydanticAI with Logfire, install it with the &lt;code&gt;logfire&lt;/code&gt; optional group:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'pydantic-ai[logfire]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Examples
&lt;/h3&gt;

&lt;p&gt;Examples are available as a separate package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'pydantic-ai[examples]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing and Evaluation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unit Tests
&lt;/h3&gt;

&lt;p&gt;Unit tests verify that your application code behaves as expected. For PydanticAI, follow these strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Use &lt;code&gt;pytest&lt;/code&gt; as your test harness.&lt;/li&gt;
&lt;li&gt;  Use &lt;code&gt;TestModel&lt;/code&gt; or &lt;code&gt;FunctionModel&lt;/code&gt; in place of your actual model.&lt;/li&gt;
&lt;li&gt;  Use &lt;code&gt;Agent.override&lt;/code&gt; to replace your model inside your application logic.&lt;/li&gt;
&lt;li&gt;  Set &lt;code&gt;ALLOW_MODEL_REQUESTS=False&lt;/code&gt; globally to prevent accidental calls to non-test models.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anyio&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.models.test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TestModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ModelRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserPromptPart&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolDefinition&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;


&lt;span class="nd"&gt;@pytest.mark.anyio&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_weather&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TestModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WeatherResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="n"&gt;weather_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;result_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WeatherResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Return a valid WeatherResult object.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@weather_agent.tool&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nd"&gt;@weather_agent.tool&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_location&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;london&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the weather?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;london&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserPromptPart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;get_current_time&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Evals
&lt;/h3&gt;

&lt;p&gt;Evals are used to measure the performance of the LLM and are more like benchmarks than unit tests. Evals focus on measuring how the LLM performs for a specific application. This can be done through end-to-end tests, synthetic self-contained tests, using LLMs to evaluate LLMs, or by measuring agent performance in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Use Cases
&lt;/h2&gt;

&lt;p&gt;PydanticAI can be used in a wide variety of use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Roulette Wheel&lt;/strong&gt;: Simulating a roulette wheel using an agent with an integer dependency and a boolean result.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Chat Application&lt;/strong&gt;: Creating a chat application with multiple runs, passing previous messages using &lt;code&gt;message_history&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Bank Support Agent&lt;/strong&gt;: Building a support agent for a bank using tools, dependency injection, and structured responses.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Weather Forecast&lt;/strong&gt;: Creating an application that returns a weather forecast based on location and date using function tools and dependencies.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;SQL Generation&lt;/strong&gt;: Generating SQL queries from user prompts, with validation using the result validator.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;PydanticAI offers a &lt;strong&gt;robust and flexible framework&lt;/strong&gt; for developing AI applications with a strong emphasis on type safety and modularity. The use of Pydantic for data validation and structuring, coupled with its dependency injection system, makes it an ideal tool for building &lt;strong&gt;reliable and maintainable AI applications&lt;/strong&gt;. With its broad LLM support and seamless integration with tools like Pydantic Logfire, PydanticAI enables developers to build powerful, production-ready AI-driven projects efficiently.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
    </item>
    <item>
      <title>Breaking the Cycle: How to Beat Procrastination as a Developer</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Sun, 29 Dec 2024 11:28:44 +0000</pubDate>
      <link>https://dev.to/yashddesai/breaking-the-cycle-how-to-beat-procrastination-as-a-developer-5fen</link>
      <guid>https://dev.to/yashddesai/breaking-the-cycle-how-to-beat-procrastination-as-a-developer-5fen</guid>
      <description>&lt;p&gt;We've all been there. You've got a big project, a tricky bug to fix, or a new feature to implement. You know what you need to do, but the motivation just isn't there. Instead, you find yourself endlessly scrolling through Reddit, reorganising your code files (again), or suddenly needing to learn a new Javascript framework. The guilt creeps in, you feel like you’re not living up to your potential, and another day is lost to the procrastination cycle. Sound familiar?&lt;/p&gt;

&lt;p&gt;The good news is that this isn’t a personal failing; it’s a common challenge, especially for those with ambitious goals. The source explains that this cycle is fuelled by &lt;strong&gt;inertia&lt;/strong&gt;, the tendency for objects at rest to stay at rest. In our case, it’s the mental resistance to starting a task, which often leads to distractions instead.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Root of the Problem: Inertia
&lt;/h4&gt;

&lt;p&gt;Think of it like this: in physics, an object at rest requires an external force to set it in motion. The same is true for starting tasks. We often make the initial push seem so monumental that we avoid the task altogether. We think, &lt;em&gt;"I need to build this whole feature today,"&lt;/em&gt; and the inertia seems insurmountable. Instead, we seek quick dopamine hits with easier activities rather than facing the complex, time-consuming work ahead.&lt;/p&gt;

&lt;p&gt;The standard advice – delete social media, remove distractions – only addresses the symptoms, not the core issue. We need a way to overcome this initial inertia by making that first push smaller and easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Simple Strategies to Break Free
&lt;/h3&gt;

&lt;p&gt;The source suggests two techniques to reduce inertia and overcome procrastination:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Reduce the Stakes:&lt;/strong&gt; Instead of aiming for the whole task, take the smallest possible step. If you need to write code for that new feature, don’t say, &lt;em&gt;"I'm going to finish this today."&lt;/em&gt; Instead, tell yourself, &lt;em&gt;"I'm going to write 10 lines of code"&lt;/em&gt;. If you have to read a lengthy API documentation, instead of saying "I'm going to get through this", tell yourself "I'll read the first page". The idea is to lower the initial barrier, making the start far less daunting. This reduces the feeling of inertia, and you’ll likely do more than you had initially planned.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Two-Minute Rule:&lt;/strong&gt; If you are struggling to start, tell yourself you’ll work on the task for just two minutes. If you have a bug to fix, say you'll look at the code for two minutes. If you have an email to respond to, you'll write a few lines and then stop. The beauty of this rule is that, once you start, the momentum often carries you beyond the initial two minutes. It’s much like pushing a ball up a hill – once you get it over the crest, it rolls downhill on its own.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How These Strategies Apply to Development
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Code:&lt;/strong&gt; Instead of tackling a large feature all at once, start with writing the basic structure or a small function. Or, if you are having a hard time working on your current project, you can work on some other part of your codebase for a short while to gain momentum.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Debugging:&lt;/strong&gt; When faced with a tricky bug, focus on tracing the code for two minutes, and you might just find the solution during that time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Documentation:&lt;/strong&gt; Approach reading documentation by breaking it into smaller chunks, maybe just a few pages or even a section at a time using the same principle.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Learning:&lt;/strong&gt; Instead of trying to learn a whole new framework, dedicate two minutes to reading one article or a tutorial.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Refactoring:&lt;/strong&gt; Set a timer for two minutes and improve one piece of code; that might spark a desire to improve another piece of code.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Testing:&lt;/strong&gt; Instead of running all your tests, run a subset of them for just two minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The source highlights that the &lt;strong&gt;initial step is the hardest, so making it small and easy is crucial&lt;/strong&gt;. Once you have overcome that inertia, the momentum will naturally carry you forward. This is like Martin Luther King said, &lt;em&gt;"You don't have to see the whole staircase, just take the first step"&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;As developers, we often deal with complex tasks that can easily lead to procrastination. By understanding the power of inertia and using these simple techniques, we can break free from the cycle of avoidance and guilt. Start small, take that first step, and build momentum. You’ll be amazed at what you can achieve.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>motivation</category>
      <category>programming</category>
      <category>beginners</category>
    </item>
    <item>
      <title>PDF Q&amp;A Automation using LLaMA-3 Model via Groq API</title>
      <dc:creator>Yash Desai</dc:creator>
      <pubDate>Mon, 09 Dec 2024 17:49:12 +0000</pubDate>
      <link>https://dev.to/yashddesai/pdf-qa-automation-using-llama-3-model-via-groq-api-1lpk</link>
      <guid>https://dev.to/yashddesai/pdf-qa-automation-using-llama-3-model-via-groq-api-1lpk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Imagine having a vast library of PDF documents and needing to extract answers to specific questions from these files. Manual processing can be tedious and time-consuming. With the advancements in AI, particularly in natural language processing (NLP), we can automate this process. In this article, we'll explore how to use the LLaMA-3 model via the Groq API to create a Python script that automates Q&amp;amp;A from PDF files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;

&lt;p&gt;Before diving into the script, ensure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Groq API Key&lt;/strong&gt;: Obtain a valid API key from Groq.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python Environment&lt;/strong&gt;: Set up a Python environment with the necessary libraries. You'll need &lt;code&gt;requests&lt;/code&gt; for API calls and &lt;code&gt;PyPDF2&lt;/code&gt; for handling PDF files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install Libraries&lt;/strong&gt;: Run &lt;code&gt;pip install requests PyPDF2&lt;/code&gt; to install the required libraries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Creating the Python Script
&lt;/h2&gt;

&lt;p&gt;The script will involve the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Read PDF Content&lt;/strong&gt;: Extract text from the PDF file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Send Question to API&lt;/strong&gt;: Use the Groq API to send the question and the extracted text to the LLaMA-3 model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get Answer&lt;/strong&gt;: Receive the answer from the API and print it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 1: Read PDF Content
&lt;/h3&gt;

&lt;p&gt;First, we'll write a function to extract text from a PDF file using &lt;code&gt;PyPDF2&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;pdf_file_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pdf_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PdfFileReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_file_obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;num_pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numPages&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;page_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;pdf_file_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Send Question to API
&lt;/h3&gt;

&lt;p&gt;Next, we'll create a function to send the question and the PDF content to the Groq API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_question_to_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pdf_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groq_api_key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;groq_api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer the following question based on the provided text: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Text: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pdf_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Get Answer
&lt;/h3&gt;

&lt;p&gt;Finally, we'll parse the API response to get the answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_answer_from_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to retrieve answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Now, let's combine these functions into a single executable script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;groq_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;YOUR_GROQ_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;pdf_file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path_to_your_pdf_file.pdf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Your question here&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="n"&gt;pdf_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;send_question_to_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pdf_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groq_api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_answer_from_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: We've automated the process of extracting answers from PDF files using the LLaMA-3 model via the Groq API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: This script can be adapted for various PDF files and questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: The accuracy of the answers depends on the quality of the PDF content and the question asked.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we've demonstrated how to leverage the LLaMA-3 model via the Groq API to create a Python script for automating Q&amp;amp;A from PDF files. This approach not only saves time but also opens up possibilities for more complex document analysis tasks. As AI models continue to evolve, we can expect even more sophisticated automation capabilities in the future.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>rag</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
