<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sangmin Lee</title>
    <description>The latest articles on DEV Community by Sangmin Lee (@claudeguide).</description>
    <link>https://dev.to/claudeguide</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946361%2F45852601-611d-4e7b-a381-c122ca373b5a.jpg</url>
      <title>DEV Community: Sangmin Lee</title>
      <link>https://dev.to/claudeguide</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/claudeguide"/>
    <language>en</language>
    <item>
      <title>Testing and Evaluating Claude Agents: A Production Guide</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 01:31:35 +0000</pubDate>
      <link>https://dev.to/claudeguide/testing-and-evaluating-claude-agents-a-production-guide-1n13</link>
      <guid>https://dev.to/claudeguide/testing-and-evaluating-claude-agents-a-production-guide-1n13</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-testing-eval?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-testing-eval" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-testing-eval&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Testing and Evaluating Claude Agents: A Production Guide
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Most Claude agents ship without any automated tests — and most teams regret it after a prompt change silently breaks a production workflow. A complete agent testing strategy has three layers: unit tests for tool call logic, integration tests for multi-turn conversation flows, and an eval harness that measures output quality on a fixed dataset before every deploy in 2026.&lt;/strong&gt; This guide covers the full testing stack for production Claude agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agent Testing Is Different
&lt;/h2&gt;

&lt;p&gt;Standard software testing verifies deterministic behavior: input A always produces output B. Agent testing has a different challenge: LLM outputs are probabilistic. You can't assert exact string equality — you need to assert properties of the output.&lt;/p&gt;

&lt;p&gt;The testing hierarchy for agents:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Unit tests&lt;/strong&gt;: Test your tool implementations independently (deterministic, easy)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration tests&lt;/strong&gt;: Test the full agent loop with mocked or real API calls (semi-deterministic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eval harness&lt;/strong&gt;: Measure output quality on a representative dataset (probabilistic, scored)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regression tests&lt;/strong&gt;: Run the eval before every deploy, alert on quality drops (ongoing)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Layer 1: Unit Testing Tool Implementations
&lt;/h2&gt;

&lt;p&gt;Tool implementations are regular functions — test them like any other code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unittest.mock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MagicMock&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;your_agent.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format_invoice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validate_input&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestSearchDatabaseTool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test the tool implementation independently of Claude.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_returns_results_for_valid_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.tools.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_returns_empty_list_for_no_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.tools.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mock_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nonexistent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_raises_on_invalid_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit must be positive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_sanitizes_sql_injection_attempt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Tool should handle malicious input gracefully.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="s"&gt;; DROP TABLE users; --&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Should not raise, should return empty or sanitized results
&lt;/span&gt;        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestFormatInvoiceTool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_formats_standard_invoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;invoice_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Acme Corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1500.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-04-28&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Consulting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;150.0&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_invoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;invoice_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Acme Corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$1,500.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1500&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_handles_missing_optional_fields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;minimal_invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;100.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-04-28&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;# Should not raise
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_invoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimal_invoice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 2: Integration Tests for the Agent Loop
&lt;/h2&gt;

&lt;p&gt;Integration tests verify that the agent orchestrates tools correctly across a multi-turn conversation. Use recorded responses or a mock client to make tests deterministic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach A: Mock the Anthropic client
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unittest.mock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;your_agent.agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_agent&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build a mock anthropic.Message object.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stop_reason&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tool_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tool_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;tool_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;
        &lt;span class="n"&gt;tool_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_abc123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;tool_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool_block&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;text_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;text_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Done.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;text_block&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestAgentOrchestration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.anthropic.Anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent_calls_search_tool_when_asked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Agent should call search_database tool for search requests.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;mock_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mock_client&lt;/span&gt;

        &lt;span class="c1"&gt;# First call: Claude decides to use search tool
&lt;/span&gt;        &lt;span class="c1"&gt;# Second call: Claude synthesizes the result
&lt;/span&gt;        &lt;span class="n"&gt;mock_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;side_effect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I found 3 active users matching your query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mock_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Carol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find active users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Verify search was called
&lt;/span&gt;        &lt;span class="n"&gt;mock_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_called_once_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.anthropic.Anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent_handles_tool_error_gracefully&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Agent should recover when a tool raises an exception.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;mock_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mock_client&lt;/span&gt;

        &lt;span class="n"&gt;mock_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;side_effect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I wasn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t able to search the database. Please try again.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mock_search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;side_effect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Database connection failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search for test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Agent should respond gracefully, not crash
&lt;/span&gt;        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.anthropic.Anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent_stops_before_turn_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Agent should not loop indefinitely.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;mock_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mock_anthropic_class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mock_client&lt;/span&gt;

        &lt;span class="c1"&gt;# Return tool_use indefinitely
&lt;/span&gt;        &lt;span class="n"&gt;mock_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_mock_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_agent.agent.search_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Keep searching&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Should stop at max_turns, not loop forever
&lt;/span&gt;        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;mock_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Approach B: Record and replay real API responses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import json
from pathlib import Path


class RecordedAnthropicClient:
    """Client that records real API calls and can replay them."""

    def __init__(self, record_path: str, mode: str = "replay"):
        self.record_path = Path(record_path)
        self.mode = mode  # "record" or "replay"
        self._calls = []
        self._index = 0

        if mode == "replay" and self.record_path.exists():
            self._calls = json.loads(self.record_path.read_text())

    def messages_create(self, **kwargs):
        if self.mode == "record":
            import anthropic
            client = anthropic.Anthropic()
            response = client.messages.create(**kwargs)
            self._calls.append(response.model_dump())
            return response
        else:
            if self._index 

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-testing-eval)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>testing</category>
      <category>evaluation</category>
      <category>evals</category>
      <category>production</category>
    </item>
    <item>
      <title>Claude Agent SDK Guide: Build Automation Agents with Tool Use</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 01:30:51 +0000</pubDate>
      <link>https://dev.to/claudeguide/claude-agent-sdk-guide-build-automation-agents-with-tool-use-1odg</link>
      <guid>https://dev.to/claudeguide/claude-agent-sdk-guide-build-automation-agents-with-tool-use-1odg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-sdk-guide?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-sdk-guide" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-sdk-guide&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Claude Agent SDK Guide: Build Automation Agents with Tool Use
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The "Claude Agent SDK" is the Anthropic Python/TypeScript SDK combined with the tool use feature — it's not a separate package. An agent is just a loop: send a message, Claude calls a tool, you execute the tool, send the result back, repeat until done in 2026.&lt;/strong&gt; This guide covers the complete pattern for building reliable production agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Minimal agent with one tool
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Seoul?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# "tool_use" if Claude wants to call a tool
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Agentic Loop
&lt;/h2&gt;

&lt;p&gt;The core pattern: run until &lt;code&gt;stop_reason == "end_turn"&lt;/code&gt;.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import anthropic
import json

client = anthropic.Anthropic()

def run_agent(tools: list, tool_executor: callable, initial_message: str) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-sdk-guide)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>agents</category>
      <category>python</category>
      <category>typescript</category>
      <category>automation</category>
    </item>
    <item>
      <title>Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 01:30:48 +0000</pubDate>
      <link>https://dev.to/claudeguide/deploying-claude-agents-to-production-flyio-vercel-and-lambda-4khp</link>
      <guid>https://dev.to/claudeguide/deploying-claude-agents-to-production-flyio-vercel-and-lambda-4khp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-production-deploy?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-production-deploy" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-production-deploy&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Claude agent deployments fail for three common reasons: timeouts (LLM calls take 5-30 seconds, most serverless platforms time out at 30s), cold starts (agents with heavy initialization are too slow for serverless), and missing environment variables in production.&lt;/strong&gt; Choosing the right deployment target for your agent type prevents all three. This guide covers the three main deployment patterns with complete configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Target Decision Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent type&lt;/th&gt;
&lt;th&gt;Recommended platform&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Long-running (5+ minutes)&lt;/td&gt;
&lt;td&gt;Fly.io&lt;/td&gt;
&lt;td&gt;No timeout limits, persistent processes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API endpoint (&amp;lt; 30s response)&lt;/td&gt;
&lt;td&gt;Vercel&lt;/td&gt;
&lt;td&gt;Zero-config, automatic scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event-driven (webhooks, queues)&lt;/td&gt;
&lt;td&gt;AWS Lambda&lt;/td&gt;
&lt;td&gt;Pay-per-invocation, natural event model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming responses&lt;/td&gt;
&lt;td&gt;Vercel Edge&lt;/td&gt;
&lt;td&gt;Low latency, streaming SSE support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-volume, cost-sensitive&lt;/td&gt;
&lt;td&gt;Fly.io + Redis queue&lt;/td&gt;
&lt;td&gt;Full control, no per-invocation billing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Fly.io: Long-Running Agents
&lt;/h2&gt;

&lt;p&gt;Best for: agents that run for minutes, background processing, agents that need to hold state in memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-agent/
├── Dockerfile
├── fly.toml
├── requirements.txt
└── agent/
    ├── __init__.py
    ├── main.py
    └── tools.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; agent/ ./agent/&lt;/span&gt;

&lt;span class="c"&gt;# Health check endpoint&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  FastAPI agent server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
# agent/main.py
import os
import asyncio
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# In-memory job tracker (use Redis in production for multi-instance)
jobs = {}


class AgentRequest(BaseModel):
    goal: str
    webhook_url: str | None = None


class JobStatus(BaseModel):
    job_id: str
    status: str  # "running" | "done" | "failed"
    result: str | None = None
    error: str | None = None


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/run")
async def run_agent(request: AgentRequest, background_tasks: BackgroundTasks):
    import uuid
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "running", "result": None, "error": None}

    background_tasks.add_task(execute_agent_job, job_id, request.goal, request.webhook_url)
    return {"job_id": job_id}


@app.get("/status/{job_id}")
async def get_status(job_id: str) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-production-deploy)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>deployment</category>
      <category>vercel</category>
      <category>production</category>
      <category>devops</category>
    </item>
    <item>
      <title>Claude Agent SDK + Playwright: Browser Automation Patterns</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 01:30:05 +0000</pubDate>
      <link>https://dev.to/claudeguide/claude-agent-sdk-playwright-browser-automation-patterns-177k</link>
      <guid>https://dev.to/claudeguide/claude-agent-sdk-playwright-browser-automation-patterns-177k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-playwright?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-playwright" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-playwright&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Claude Agent SDK + Playwright: Browser Automation Patterns
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Combining Claude Agent SDK with Playwright gives you browser automation that can reason — not just click a fixed sequence of selectors, but decide what to do next based on what it sees on the page. Claude reads the page, decides which actions to take, calls Playwright tools to execute them, and interprets the results in 2026.&lt;/strong&gt; This guide covers the integration patterns, the tool definitions that make it work reliably, and the error recovery that keeps it production-grade.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Claude acts as the reasoning layer. Playwright acts as the execution layer. You expose Playwright operations as Claude tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User goal → Claude (plan + reason) → Tool call → Playwright (execute) → Claude reads result → next step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key design decision: keep each Playwright tool narrow and reliable. Don't build one &lt;code&gt;do_everything_on_page&lt;/code&gt; tool — build &lt;code&gt;click_element&lt;/code&gt;, &lt;code&gt;fill_input&lt;/code&gt;, &lt;code&gt;get_page_text&lt;/code&gt;, &lt;code&gt;wait_for_element&lt;/code&gt;. Claude decides which combination to call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up the Integration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;anthropic playwright
playwright &lt;span class="nb"&gt;install &lt;/span&gt;chromium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Global browser state
&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_browser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize Playwright browser.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;
    &lt;span class="n"&gt;playwright&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;close_browser&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Defining Playwright Tools for Claude
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PLAYWRIGHT_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;navigate_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Navigate the browser to a URL. Use when you need to load a new page.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The full URL to navigate to, including https://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_page_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the visible text content of the current page. Use to understand what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s on screen before taking action.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_chars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum characters to return (default 5000)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;click_element&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Click an element on the page. Provide a CSS selector or text to find the element.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSS selector OR text content of the element to click. For text: use &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text=Submit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; format.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wait_after_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Milliseconds to wait after clicking (default 500)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fill_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fill a text input or textarea with a value.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSS selector for the input field&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text to type into the field&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clear_first&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boolean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clear existing content before typing (default true)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;take_screenshot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Take a screenshot of the current page state. Use when you need to see the current visual state to decide next action.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File path to save screenshot (e.g., /tmp/screenshot.png)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wait_for_selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wait for an element to appear on the page. Use after navigation or after clicking something that triggers loading.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSS selector to wait for&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum wait time in ms (default 5000)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_element_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the text content of a specific element. More precise than get_page_content when you need a specific value.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSS selector for the element&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extract_table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract data from an HTML table as JSON. Use for scraping tabular data.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSS selector for the table element&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_rows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum rows to extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Tool Execution Functions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
async def execute_playwright_tool(tool_name: str, tool_input: dict) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-playwright)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>automation</category>
    </item>
    <item>
      <title>Claude Agent Observability: Logging, Tracing, Debugging Agents</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 01:30:02 +0000</pubDate>
      <link>https://dev.to/claudeguide/claude-agent-observability-logging-tracing-debugging-agents-5c5f</link>
      <guid>https://dev.to/claudeguide/claude-agent-observability-logging-tracing-debugging-agents-5c5f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-observability?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-observability" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-observability&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Claude Agent Observability: Logging, Tracing, and Debugging Production Agents
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Production Claude agents need three observability layers: structured logging of every LLM call with token counts and latency, trace IDs that connect multi-turn conversations to individual requests, and a cost dashboard that shows per-user API spending before your bill arrives in 2026.&lt;/strong&gt; Without these, debugging agent failures is guesswork and cost surprises are inevitable. This guide covers the full observability stack for production Claude agents, from structured logging to cost alerts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agent Observability Is Different
&lt;/h2&gt;

&lt;p&gt;Standard web application observability tracks HTTP requests: status codes, latency, errors. This covers the surface of agent behavior but misses the most important signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What did the agent actually do?&lt;/strong&gt; (tool calls, reasoning steps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why did it give a bad answer?&lt;/strong&gt; (context, instructions, model version)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How much did each user cost?&lt;/strong&gt; (token usage by conversation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is the agent looping?&lt;/strong&gt; (turn count anomalies)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Did prompt caching work?&lt;/strong&gt; (cache hit rate by conversation type)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need purpose-built agent observability on top of standard infrastructure monitoring.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: Structured Logging
&lt;/h2&gt;

&lt;p&gt;Every API call should emit a structured log event — not a print statement, a JSON record.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python logging setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import logging
import json
import time
import uuid
from dataclasses import dataclass, asdict
from typing import Any
import anthropic

# Configure structured logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("claude_agent")


@dataclass
class LLMCallEvent:
    event_type: str = "llm_call"
    trace_id: str = ""
    session_id: str = ""
    user_id: str = ""
    model: str = ""
    input_tokens: int = 0
    output_tokens: int = 0
    cache_read_tokens: int = 0
    cache_write_tokens: int = 0
    latency_ms: float = 0.0
    stop_reason: str = ""
    tool_calls: list = None
    cost_usd: float = 0.0
    error: str = ""

    def __post_init__(self):
        if self.tool_calls is None:
            self.tool_calls = []


def calculate_cost(model: str, input_tokens: int, output_tokens: int,
                   cache_read_tokens: int = 0) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-observability)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>observability</category>
      <category>logging</category>
      <category>tracing</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Memory and State in Claude Agents: Patterns That Scale</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:31:36 +0000</pubDate>
      <link>https://dev.to/claudeguide/memory-and-state-in-claude-agents-patterns-that-scale-3f1f</link>
      <guid>https://dev.to/claudeguide/memory-and-state-in-claude-agents-patterns-that-scale-3f1f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-memory-patterns?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-memory-patterns" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-memory-patterns&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Memory and State in Claude Agents: Patterns That Scale
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Claude agents don't have persistent memory between API calls — each call starts fresh. Adding memory means deciding what to store, where to store it, and how much to bring back into context on the next call. The four patterns that cover 90% of production needs are: conversation history (in-context), summary compression (compressed context), external memory (vector search), and explicit state (structured data).&lt;/strong&gt; This guide covers when to use each and how to implement them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Problem
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Call 1
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Alex&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="c1"&gt;# Claude: "Hi Alex!"
&lt;/span&gt;
&lt;span class="c1"&gt;# Call 2 — Claude has no memory of call 1
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s my name?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="c1"&gt;# Claude: "I don't know your name."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every conversation must carry its own context. The question is how much and in what form.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Full Conversation History (In-Context)
&lt;/h2&gt;

&lt;p&gt;The simplest approach — append every turn to a running messages list.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import anthropic

client = anthropic.Anthropic()


class ConversationAgent:
    """Agent that maintains full conversation history in context."""

    def __init__(self, system: str, max_turns: int = 50):
        self.system = system
        self.messages = []
        self.max_turns = max_turns

    def chat(self, user_message: str) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-memory-patterns)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>memory</category>
      <category>embeddings</category>
      <category>production</category>
    </item>
    <item>
      <title>How to Handle Errors and Retries in Claude Agent SDK</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:30:52 +0000</pubDate>
      <link>https://dev.to/claudeguide/how-to-handle-errors-and-retries-in-claude-agent-sdk-3phk</link>
      <guid>https://dev.to/claudeguide/how-to-handle-errors-and-retries-in-claude-agent-sdk-3phk</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-error-handling?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-error-handling" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-error-handling&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  How to Handle Errors and Retries in Claude Agent SDK
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Production Claude agents fail in predictable ways — rate limit errors (429), overload errors (529), network timeouts, tool call failures, and infinite loops. Each requires a different recovery strategy, and the difference between a production-grade agent and a fragile prototype is having all five handled correctly.&lt;/strong&gt; This guide covers every error type, the right retry strategy for each, and the circuit breaker pattern that prevents cascading failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Error Taxonomy
&lt;/h2&gt;

&lt;p&gt;Claude Agent SDK errors fall into five categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;HTTP Status&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Retry?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit&lt;/td&gt;
&lt;td&gt;429&lt;/td&gt;
&lt;td&gt;Too many requests&lt;/td&gt;
&lt;td&gt;Yes, with backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overloaded&lt;/td&gt;
&lt;td&gt;529&lt;/td&gt;
&lt;td&gt;API server busy&lt;/td&gt;
&lt;td&gt;Yes, with backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth error&lt;/td&gt;
&lt;td&gt;401&lt;/td&gt;
&lt;td&gt;Bad API key&lt;/td&gt;
&lt;td&gt;No — fix the key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Invalid request&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;td&gt;Bad parameters&lt;/td&gt;
&lt;td&gt;No — fix the code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network failure&lt;/td&gt;
&lt;td&gt;No status&lt;/td&gt;
&lt;td&gt;Connection dropped&lt;/td&gt;
&lt;td&gt;Yes, immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool failure&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Your tool code crashed&lt;/td&gt;
&lt;td&gt;Depends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent loop&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Agent running forever&lt;/td&gt;
&lt;td&gt;Kill after max turns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Base Error Handling Setup
&lt;/h2&gt;

&lt;p&gt;Start with this error handling wrapper before building anything else:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
import anthropic
import time
import random
from typing import Callable, TypeVar

client = anthropic.Anthropic()
T = TypeVar("T")


def with_retry(
    fn: Callable[[], T],
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-error-handling)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>retries</category>
      <category>production</category>
      <category>resilience</category>
    </item>
    <item>
      <title>Claude Agent SDK for Data Pipelines: ETL, Validation, Transform</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:30:50 +0000</pubDate>
      <link>https://dev.to/claudeguide/claude-agent-sdk-for-data-pipelines-etl-validation-transform-3ic</link>
      <guid>https://dev.to/claudeguide/claude-agent-sdk-for-data-pipelines-etl-validation-transform-3ic</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/claude-agent-data-pipeline?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=claude-agent-data-pipeline" rel="noopener noreferrer"&gt;claudeguide.io/claude-agent-data-pipeline&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Claude Agent SDK for Data Pipelines: ETL, Validation, and Transformation Agents
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;The Claude Agent SDK fits data pipelines when the logic is too variable for rigid rules: schema drift, inconsistent source formats, validation that requires judgment, and transformation logic that adapts to data shape in 2026.&lt;/strong&gt; This guide builds three pipeline agents: a schema validation agent that explains failures in plain English, an ETL orchestrator that routes records based on content, and a data quality agent that generates and runs its own checks.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Claude Agents Make Sense in Data Pipelines
&lt;/h2&gt;

&lt;p&gt;Use an agent when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source schema changes unpredictably&lt;/strong&gt; — the agent interprets what changed vs what broke&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation requires context&lt;/strong&gt; — "is this address valid?" is different from "does this field match a regex?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformation logic needs judgment&lt;/strong&gt; — merging records with conflicting fields&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need readable failure reports&lt;/strong&gt; — for non-engineers to act on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't use an agent when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema is stable and transforms are deterministic — use dbt, Airflow, pandas&lt;/li&gt;
&lt;li&gt;You need sub-second throughput — LLM calls add 0.5-2s per invocation&lt;/li&gt;
&lt;li&gt;Cost is a concern — at scale, LLM validation per row gets expensive fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sweet spot: &lt;strong&gt;batch validation and orchestration&lt;/strong&gt;, not row-level transformation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agent 1: Schema Validation Agent
&lt;/h2&gt;

&lt;p&gt;Validates incoming data against an expected schema, returns structured failures with plain-English explanations.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
VALIDATION_TOOLS = [
    {
        "name": "validate_field",
        "description": "Validate a single field value against its schema definition",
        "input_schema": {
            "type": "object",
            "properties": {
                "field_name": {"type": "string"},
                "value": {},
                "expected_type": {"type": "string"},
                "constraints": {
                    "type": "object",
                    "description": "e.g., {min: 0, max: 100} or {enum: ['A', 'B']} or {pattern: '...'}"
                }
            },
            "required": ["field_name", "value", "expected_type"]
        }
    },
    {
        "name": "report_validation_result",
        "description": "Report the final validation result for the record",
        "input_schema": {
            "type": "object",
            "properties": {
                "is_valid": {"type": "boolean"},
                "errors": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "field": {"type": "string"},
                            "issue": {"type": "string"},
                            "action": {"type": "string", "description": "Recommended fix"}
                        }
                    }
                },
                "warnings": {
                    "type": "array",
                    "items": {"type": "string"}
                }
            },
            "required": ["is_valid", "errors"]
        }
    }
]


def execute_validation_tool(tool_name: str, tool_input: dict, record: dict) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude-agent-data-pipeline)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>etl</category>
      <category>validation</category>
      <category>transformation</category>
      <category>python</category>
    </item>
    <item>
      <title>한국에서 AI 부업으로 월 100만원 벌기 (2026 현실)</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:30:06 +0000</pubDate>
      <link>https://dev.to/claudeguide/hangugeseo-ai-bueobeuro-weol-100manweon-beolgi-2026-hyeonsil-3585</link>
      <guid>https://dev.to/claudeguide/hangugeseo-ai-bueobeuro-weol-100manweon-beolgi-2026-hyeonsil-3585</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/ai-side-income-korea-2026?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=ai-side-income-korea-2026" rel="noopener noreferrer"&gt;claudeguide.io/ai-side-income-korea-2026&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  한국에서 AI 부업으로 월 100만원 벌기 (2026 현실)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Claude API와 Claude Code를 활용해 월 100만원 부업 수익을 내는 경로는 세 가지다: (1) AI 디지털 제품(Gumroad/PDF), (2) AI 자동화 외주, (3) 소규모 SaaS.&lt;/strong&gt; 이 중 가장 빠른 경로는 디지털 제품이다 — Claude Code로 콘텐츠를 대량 생산하고, Gumroad로 판매하면 추가 노동 없이 수익이 나온다. 2026년 기준 현실적인 수익 타임라인과 구체적인 실행 방법을 정리한다.&lt;/p&gt;




&lt;h2&gt;
  
  
  왜 지금이 AI 부업의 적기인가
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2026년 AI 부업 환경
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt;: 개발자 생산성 5-10배 향상 → 혼자서도 SaaS 개발 가능&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;저렴해진 API 비용&lt;/strong&gt;: Claude Haiku 1M 토큰 $1.00 → 책 한 권 작성비 $0.50 이하&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gumroad/Lemon Squeezy&lt;/strong&gt;: 디지털 제품 글로벌 판매 플랫폼 (한국에서도 달러 수취 가능)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI 교육 수요 폭증&lt;/strong&gt;: 국내 AI 학습 시장 2026년 급성장 (Claude, ChatGPT 관련 유료 강좌 품귀)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  현실적인 기대치
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;경로&lt;/th&gt;
&lt;th&gt;시작 난이도&lt;/th&gt;
&lt;th&gt;첫 수익까지&lt;/th&gt;
&lt;th&gt;월 100만원까지&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;디지털 제품 (Gumroad)&lt;/td&gt;
&lt;td&gt;낮음&lt;/td&gt;
&lt;td&gt;2-4주&lt;/td&gt;
&lt;td&gt;3-6개월&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI 자동화 외주&lt;/td&gt;
&lt;td&gt;중간&lt;/td&gt;
&lt;td&gt;즉시 가능&lt;/td&gt;
&lt;td&gt;1-2개월&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI SaaS&lt;/td&gt;
&lt;td&gt;높음&lt;/td&gt;
&lt;td&gt;2-3개월&lt;/td&gt;
&lt;td&gt;6-12개월&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  경로 1: AI 디지털 제품 (가장 빠른 시작)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  무엇을 팔 수 있나
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;PDF 가이드/플레이북&lt;/strong&gt;: "Claude Code 프롬프트 300선", "AI 코딩 패턴 모음"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;템플릿/스니펫&lt;/strong&gt;: Notion 템플릿, CLAUDE.md 템플릿 모음&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;코드 레포&lt;/strong&gt;: 재사용 가능한 Claude 에이전트 코드베이스&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;커리큘럼&lt;/strong&gt;: Claude API 입문 강의 자료 (PDF + 코드)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Claude Code로 디지털 제품 만들기
&lt;/h3&gt;

&lt;p&gt;제품 하나 만드는 데 Claude Code를 쓰면:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code에게 지시:
"Claude API 입문자를 위한 완전한 가이드를 작성해줘:
- 총 50-60페이지 분량의 PDF용 Markdown
- 챕터 10개, 각 챕터에 실습 코드 예제 포함
- 비용 절감 팁 10가지
- 실전 프로젝트 3개 (챗봇, 문서 요약기, 이메일 분류기)
한국어로 작성"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;예상 결과&lt;/strong&gt;: 1-2시간 내 초안 완성. Claude API 비용 $2-5 수준.&lt;/p&gt;

&lt;p&gt;이걸 PDF로 변환해 Gumroad에 올리면 하나의 수익 채널이 생긴다.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gumroad 설정 (한국에서)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;계정 생성&lt;/strong&gt;: gumroad.com — 이메일만 있으면 OK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;세금 설정&lt;/strong&gt;: 한국 거주자는 "Individual" 선택, W-8BEN 양식 작성&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;출금&lt;/strong&gt;: Stripe 연동 → USD로 받은 후 원화로 환전&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;가격 전략&lt;/strong&gt;: 영어권 시장 대상 $19-$49 (₩25,000-65,000) — 한국 가격보다 3-5배 높음&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;중요&lt;/strong&gt;: 한국어 제품은 한국 시장(크몽, 탈잉)도 있지만, 영어 제품을 Gumroad에서 팔면 시장이 100배 크다.&lt;/p&gt;

&lt;h3&gt;
  
  
  수익 모델 시뮬레이션
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;제품 가격&lt;/th&gt;
&lt;th&gt;월 판매 수&lt;/th&gt;
&lt;th&gt;월 매출&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;$29&lt;/td&gt;
&lt;td&gt;10개&lt;/td&gt;
&lt;td&gt;$290 (~40만원)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$29&lt;/td&gt;
&lt;td&gt;35개&lt;/td&gt;
&lt;td&gt;$1,015 (~135만원)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$49&lt;/td&gt;
&lt;td&gt;20개&lt;/td&gt;
&lt;td&gt;$980 (~130만원)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;월 35개 판매&lt;/strong&gt; = 하루 1.2개. 이를 달성하려면:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;관련 콘텐츠(블로그/트위터) 지속 발행&lt;/li&gt;
&lt;li&gt;Reddit r/ClaudeAI, r/MachineLearning 참여&lt;/li&gt;
&lt;li&gt;또는 SEO로 유기적 트래픽 확보&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  경로 2: AI 자동화 외주 (가장 빠른 현금)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  한국에서 AI 자동화 수요가 있는 곳
&lt;/h3&gt;

&lt;p&gt;2026년 기준 국내 AI 자동화 외주 수요가 높은 업무:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;콘텐츠 생성 파이프라인&lt;/strong&gt;: 블로그/뉴스 기사 자동 작성 + 검수&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CS 자동응답 봇&lt;/strong&gt;: Claude 기반 고객 문의 응답 자동화&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;데이터 분류/정제&lt;/strong&gt;: Excel 데이터를 Claude로 자동 분류&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;리포트 자동 생성&lt;/strong&gt;: 월간 분석 보고서 자동화&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;코드 마이그레이션&lt;/strong&gt;: 레거시 코드 → 현대적 스택 전환&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  어디서 일감을 구하나
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;국내 플랫폼:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;크몽&lt;/strong&gt; — AI 자동화, 챗봇 구축, 프롬프트 엔지니어링 카테고리 있음&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;탈잉&lt;/strong&gt; — AI 코딩 과외 (시간당 3-5만원)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;원티드/점핏&lt;/strong&gt; — 프리랜서 AI 프로젝트&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;해외 플랫폼:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Upwork&lt;/strong&gt; — AI Automation, Claude/LangChain 전문가 시간당 $30-80&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Toptal&lt;/strong&gt; — 고급 AI 개발자 (검증 어렵지만 단가 높음)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  실전 외주 프로젝트 예시
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;프로젝트&lt;/strong&gt;: 쇼핑몰 CS 자동응답 봇&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude API + 기존 FAQ 데이터&lt;/li&gt;
&lt;li&gt;예상 제작 시간: 2주&lt;/li&gt;
&lt;li&gt;견적: 150-300만원&lt;/li&gt;
&lt;li&gt;유지보수: 월 30-50만원&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code로 작업 속도를 5배 높이면 실제 노동 시간은 훨씬 적다.&lt;/p&gt;

&lt;h3&gt;
  
  
  클라이언트에게 Claude를 설명하는 방법
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"이 솔루션은 Anthropic의 Claude AI를 사용합니다. 
ChatGPT보다 안전하고 (Constitutional AI), 
API 안정성이 높으며, 기업 데이터 보안 정책을 
준수합니다."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  경로 3: AI SaaS (가장 높은 장기 수익)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  혼자 만들 수 있는 AI SaaS 아이디어
&lt;/h3&gt;

&lt;p&gt;Claude Code가 있으면 1인 개발자도 실용적인 SaaS를 만들 수 있다:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI 이력서 리뷰 서비스&lt;/strong&gt;: PDF 업로드 → Claude 분석 → 개선 피드백&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude API 비용 추적 대시보드&lt;/strong&gt;: 팀의 Claude 사용량과 비용 시각화&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;한국어 맞춤법+AI 문체 교정 도구&lt;/strong&gt;: 블로거/작가 대상&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI 코드 리뷰 봇 (GitHub App)&lt;/strong&gt;: PR 자동 리뷰&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;법률 문서 요약기&lt;/strong&gt;: 계약서, 약관 → 한국어 요약&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  기술 스택 (Claude Code로 빠르게 구축)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Next.js 15 + TypeScript
└── Clerk (인증)
└── Neon PostgreSQL + Drizzle (DB)
└── Stripe or Toss Payments (결제)
└── Claude API (AI 기능 핵심)
└── Vercel (배포)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;예상 개발 기간&lt;/strong&gt;: Claude Code 사용 시 4-8주 (혼자 기준).&lt;/p&gt;

&lt;h3&gt;
  
  
  SaaS 가격 설정 (한국 시장)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;티어&lt;/th&gt;
&lt;th&gt;월 가격&lt;/th&gt;
&lt;th&gt;주요 기능&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;무료&lt;/td&gt;
&lt;td&gt;월 10회 사용&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Starter&lt;/td&gt;
&lt;td&gt;₩9,900&lt;/td&gt;
&lt;td&gt;월 100회&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;₩29,000&lt;/td&gt;
&lt;td&gt;무제한 + API 직접 연동&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;월 100명 Pro 구독자 = ₩2,900,000/월.&lt;/p&gt;




&lt;h2&gt;
  
  
  실전 로드맵: 0원 → 월 100만원
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1-2주차: 빠른 첫 수익
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Claude Code로 PDF 가이드 1개 제작 (Claude API 또는 Claude Code 관련)&lt;/li&gt;
&lt;li&gt;Gumroad 계정 생성 + 제품 등록&lt;/li&gt;
&lt;li&gt;Reddit r/ClaudeAI에 관련 도움 글 3-5개 게시 (제품 소개 없이 진짜 가치 제공)&lt;/li&gt;
&lt;li&gt;글 하단에 "더 자세한 내용이 필요하면 DM"&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  3-4주차: 채널 확장
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Twitter/X에 Claude 팁 스레드 3개 (실용적인 내용)&lt;/li&gt;
&lt;li&gt;제품 홍보 → 첫 판매 목표&lt;/li&gt;
&lt;li&gt;판매된 제품 피드백 수집 → 개선&lt;/li&gt;
&lt;li&gt;두 번째 제품 기획&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2-3개월차: 수익 궤도
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;콘텐츠 제작 속도 올리기 (주 2-3개 블로그 글)&lt;/li&gt;
&lt;li&gt;SEO 최적화 (claudeguide.io처럼 전문 허브 구축)&lt;/li&gt;
&lt;li&gt;또는 외주 1-2건으로 100만원 달성&lt;/li&gt;
&lt;li&gt;SaaS 개발 시작 (장기 투자)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  실제로 얼마를 기대할 수 있나
&lt;/h2&gt;

&lt;h3&gt;
  
  
  현실적인 3개월 시나리오
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;낙관적 (운도 좋고 실행도 빠름):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1개월: 10-30만원 (첫 판매들)&lt;/li&gt;
&lt;li&gt;2개월: 50-80만원&lt;/li&gt;
&lt;li&gt;3개월: 100-150만원&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;현실적 (평범한 실행 속도):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1개월: 0-10만원 (셋업 및 첫 판매)&lt;/li&gt;
&lt;li&gt;2개월: 20-50만원&lt;/li&gt;
&lt;li&gt;3개월: 50-100만원&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;중요한 점&lt;/strong&gt;: 디지털 제품은 한번 만들면 계속 팔린다. 첫 달이 가장 힘들고, 이후 복리로 성장한다.&lt;/p&gt;

&lt;h3&gt;
  
  
  실패하는 흔한 이유
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;제품을 너무 오래 만든다&lt;/strong&gt;: 완벽하게 만들려다 출시 못 함. 가이드 10페이지짜리도 충분&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;홍보를 안 한다&lt;/strong&gt;: 만들기만 하고 커뮤니티에 알리지 않음&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;니치가 너무 넓다&lt;/strong&gt;: "AI 가이드"보다 "Claude Code 한국어 가이드"가 낫다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;포기가 빠르다&lt;/strong&gt;: 3개월은 봐야 실제 결과가 나옴&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  한국에서의 세금 및 법적 고려사항
&lt;/h2&gt;

&lt;h3&gt;
  
  
  해외 수익 신고
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Gumroad/Upwork 수익은 종합소득세 신고 대상&lt;/li&gt;
&lt;li&gt;연간 수익이 일정 수준을 넘으면 사업자 등록 고려&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;주의&lt;/strong&gt;: 부업 수익은 회사 취업규칙에 따라 제한될 수 있음 → 미리 확인&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  사업자 등록 (수익이 본격화될 때)
&lt;/h3&gt;

&lt;p&gt;개인사업자 등록 시:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;홈택스에서 사업자등록증 발급 (무료, 온라인)&lt;/li&gt;
&lt;li&gt;업종: 소프트웨어 개발 (코드 743902) 또는 정보서비스업&lt;/li&gt;
&lt;li&gt;Gumroad 등 해외 플랫폼 수익은 외화 수입으로 신고&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  자주 묻는 질문
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: 비개발자도 AI 부업이 가능한가요?&lt;/strong&gt;&lt;br&gt;
가능하다. Claude를 이용한 콘텐츠 제작(글쓰기, 번역, 교육 자료)은 코딩 없이도 할 수 있다. 다만 API를 활용하면 자동화 레버리지가 훨씬 크다.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: 초기 투자 비용이 얼마나 드나요?&lt;/strong&gt;&lt;br&gt;
Claude API: 월 $5-20 수준 (처음엔 매우 적음). Gumroad는 무료. 도메인/호스팅(Vercel): 월 $0-20. 총 초기 비용 $50 미만.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: 한국어 제품과 영어 제품 중 어느 것이 나을까요?&lt;/strong&gt;&lt;br&gt;
시장 크기는 영어 시장이 10-100배 크다. 경쟁도 더 치열하다. 한국어 시장은 경쟁이 적고 진입이 쉽다. 처음엔 한국어로 빠르게 시작해 자신감을 얻고, 영어 시장으로 확장하는 전략을 추천한다.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Claude 외에 다른 AI로도 할 수 있나요?&lt;/strong&gt;&lt;br&gt;
할 수 있다. GPT-4o나 Gemini도 동일한 방식으로 사용 가능하다. 다만 코딩 기반 에이전트 작업에는 Claude가 현재 가장 우수하다.&lt;/p&gt;




&lt;h2&gt;
  
  
  관련 가이드
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-code-korean-guide"&gt;Claude Code 완벽 가이드 (한국어)&lt;/a&gt; — 설치부터 고급 활용까지&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-vs-chatgpt-gemini-korean"&gt;Claude vs ChatGPT vs Gemini 한국어 비교&lt;/a&gt; — 어떤 AI를 써야 할지&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-api-cost-optimization-guide"&gt;Claude API 비용 최적화&lt;/a&gt; — 부업 비용 최소화 전략&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  부업 수익화 가이드
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shoutfirst.gumroad.com/l/sujwg?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-side-income-korea-2026" rel="noopener noreferrer"&gt;Solo AI Builder Stack — $19&lt;/a&gt;&lt;/strong&gt; — 1인 AI 개발자가 실제로 쓰는 도구 스택, 수익화 템플릿, Gumroad 설정부터 첫 판매까지.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shoutfirst.gumroad.com/l/sujwg?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-side-income-korea-2026" rel="noopener noreferrer"&gt;→ Solo AI Builder Stack 구매 — $19&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;30일 환불 보장. 즉시 다운로드.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gumroad</category>
      <category>saas</category>
    </item>
    <item>
      <title>AI Code Review vs Human Review: Strengths &amp; Weaknesses</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:30:03 +0000</pubDate>
      <link>https://dev.to/claudeguide/ai-code-review-vs-human-review-strengths-weaknesses-43nh</link>
      <guid>https://dev.to/claudeguide/ai-code-review-vs-human-review-strengths-weaknesses-43nh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/ai-code-review-vs-human?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=ai-code-review-vs-human" rel="noopener noreferrer"&gt;claudeguide.io/ai-code-review-vs-human&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  AI Code Review vs Human Review: What AI Does Better (and Where It Fails)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;AI code review in 2026 catches security vulnerabilities, consistency violations, and missing error handling faster and more thoroughly than most human reviewers — but it fails at evaluating business logic correctness, system-level architectural decisions, and the social dynamics of team code review.&lt;/strong&gt; The best teams use both: AI for the exhaustive mechanical checks, humans for judgment calls. This guide maps exactly what each does better, with concrete examples.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Asymmetry
&lt;/h2&gt;

&lt;p&gt;AI and human code reviewers are not competing at the same task. They excel at different things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI code review&lt;/strong&gt; is fast, tireless, and pattern-matching at scale. It checks every line against known anti-patterns, security rules, and style conventions without getting bored or distracted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human code review&lt;/strong&gt; is contextual, social, and business-aware. It catches "this feature wasn't what the product spec said" or "this approach will cause a painful migration in 6 months."&lt;/p&gt;

&lt;p&gt;Treating them as substitutes misses the point. The question is: which tool for which job?&lt;/p&gt;




&lt;h2&gt;
  
  
  What AI Does Better
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Security vulnerabilities — systematically
&lt;/h3&gt;

&lt;p&gt;AI reviewers check every input path for injection risks, every auth call for missing validation, and every DB query for multi-tenant safety — on every PR, without exception.&lt;/p&gt;

&lt;p&gt;Humans reviewing under time pressure often skim security-relevant code. AI doesn't skim.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# AI flags this immediately:
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM documents WHERE id = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Missing: WHERE user_id = '{user_id}' (authorization bypass)
&lt;/span&gt;    &lt;span class="c1"&gt;# SQL injection via f-string interpolation
&lt;/span&gt;
&lt;span class="c1"&gt;# Prompt that catches this:
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this function for: SQL injection, missing authorization checks,
 and missing input validation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Missing error handling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AI flags: no error boundary, unhandled promise rejection&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/api/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Throws if response is not JSON&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// No check for response.ok&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Humans often approve code with missing error handling when the happy path looks correct. AI checks the unhappy path systematically.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Convention violations
&lt;/h3&gt;

&lt;p&gt;If your codebase has established patterns — cursor pagination, specific error types, mandatory field names — AI checks every new contribution against them. Humans remember these rules inconsistently, especially on large teams.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Claude Code review prompt:&lt;/span&gt;
"Review this PR against our conventions in CLAUDE.md:
&lt;span class="p"&gt;-&lt;/span&gt; Does every DB query include organizationId filtering?
&lt;span class="p"&gt;-&lt;/span&gt; Are all monetary values stored as integers (cents)?  
&lt;span class="p"&gt;-&lt;/span&gt; Does error handling use our AppError class?
&lt;span class="p"&gt;-&lt;/span&gt; Are there any console.log statements?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Exhaustive test coverage gaps
&lt;/h3&gt;

&lt;p&gt;AI can enumerate the test cases that should exist and flag which are missing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"List every edge case that should be tested for this function,
then check which ones are covered by the existing tests."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A human reviewer might catch 60-70% of missing test cases. AI catches more.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Documentation accuracy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Check if the JSDoc comments for these functions accurately describe
what the functions actually do. Flag any where the documentation
is misleading or incomplete."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Documentation drift is almost never caught in human reviews because reviewers trust the docs and look at the code separately.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Humans Do Better
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Business logic correctness
&lt;/h3&gt;

&lt;p&gt;AI cannot verify "does this implementation match what the product spec or customer actually needs?" without explicit business context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: A function that calculates discounts is technically correct TypeScript but applies the wrong business rule (20% vs the agreed 15% for annual plans). AI won't catch this without seeing the spec. A human who was in the product meeting will.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Architectural foresight
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nx"&gt;AI&lt;/span&gt; &lt;span class="nx"&gt;approves&lt;/span&gt; &lt;span class="k"&gt;this &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="nx"&gt;works&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CreateUserDTO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;updateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;UpdateUserDTO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getUserPermissions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getUserAuditLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getUserBillingStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// 20 more methods...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nx"&gt;Human&lt;/span&gt; &lt;span class="nx"&gt;sees&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;growing&lt;/span&gt; &lt;span class="nx"&gt;into&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;God&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt; &lt;span class="nx"&gt;and&lt;/span&gt; &lt;span class="nx"&gt;will&lt;/span&gt; 
&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nx"&gt;cause&lt;/span&gt; &lt;span class="nx"&gt;maintenance&lt;/span&gt; &lt;span class="nx"&gt;problems&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="nx"&gt;needs&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;be&lt;/span&gt; &lt;span class="nx"&gt;split&lt;/span&gt; &lt;span class="nx"&gt;by&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nx"&gt;before&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="nx"&gt;gets&lt;/span&gt; &lt;span class="nx"&gt;worse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Team knowledge transfer
&lt;/h3&gt;

&lt;p&gt;Human code review is where knowledge spreads. A senior developer's review comment — "here's why we do it this way" — teaches the codebase's implicit knowledge to the reviewer's colleagues. AI review doesn't have this social function.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Organizational context
&lt;/h3&gt;

&lt;p&gt;"This approach works, but it's similar to what we tried in Q3 and had to roll back because of how it interacted with the billing system." AI has no memory of your team's history.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Judgment calls on trade-offs
&lt;/h3&gt;

&lt;p&gt;"Is the added complexity of this optimization worth it for our current scale?" requires judgment about your specific system, team capabilities, and product roadmap. AI can present trade-offs but shouldn't make the call.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Combined Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tier 1: AI pre-review (before human review)
&lt;/h3&gt;

&lt;p&gt;Run AI review before requesting human review. This filters mechanical issues so human reviewers spend their time on judgment calls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In Claude Code:&lt;/span&gt;
git diff main..HEAD | claude &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="s2"&gt;"
Review this diff for:
1. Security issues (injection, auth bypasses, missing validation)
2. Missing error handling (unhandled promises, no catch blocks)
3. Convention violations (check CLAUDE.md for our patterns)
4. Missing test coverage (what edge cases aren't tested?)
5. Documentation gaps

Format: numbered list, file:line for each issue, severity HIGH/MED/LOW.
Do NOT comment on style or formatting — that's handled by our linter.
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix all HIGH and MED issues before requesting human review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 2: Human review focuses on judgment
&lt;/h3&gt;

&lt;p&gt;The human reviewer, having been spared the mechanical issues, focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this match the product spec?&lt;/li&gt;
&lt;li&gt;Are the architectural choices right for our system?&lt;/li&gt;
&lt;li&gt;Knowledge transfer: is this understandable to a new team member?&lt;/li&gt;
&lt;li&gt;Business logic: is this what we actually agreed to build?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tier 3: PR template that incorporates both
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## PR Checklist&lt;/span&gt;

&lt;span class="gu"&gt;### AI Review&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Ran Claude Code review — all HIGH/MED issues resolved
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Security check passed (no injection, auth, validation issues)
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Error handling complete

&lt;span class="gu"&gt;### Human Review Needed For&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Business logic correctness (matches spec?)
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Architectural fit (consistent with system design?)
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Knowledge transfer (clear to team members?)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Effective AI Code Review Prompts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Security-focused review
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this code for security issues only.
Check: SQL/NoSQL injection, authentication bypass, authorization bypass,
input validation gaps, sensitive data exposure, hardcoded credentials.
For each issue: file and line, severity (CRITICAL/HIGH/MED), explanation,
and the correct fix.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Convention compliance review
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this diff against our project conventions:
[paste CLAUDE.md content]

List every violation with file:line and the rule it violates.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test coverage review
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For each function in this diff:
1. List the edge cases that should be tested
2. Check if those tests exist
3. Flag any missing tests as HIGH/MED/LOW priority
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance review
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review for performance issues:
- N+1 query patterns
- Missing database indexes (look for WHERE clause fields)
- Synchronous operations that should be async
- Unnecessary re-renders (React components)
- Memory leaks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can AI code review replace human code review?&lt;/strong&gt;&lt;br&gt;
Not entirely. AI code review excels at mechanical checks — security, conventions, error handling — but cannot evaluate business logic correctness, architectural appropriateness for your specific system, or perform the team knowledge-transfer function of human review. The best workflow uses both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How accurate is AI code review at finding security vulnerabilities?&lt;/strong&gt;&lt;br&gt;
For known vulnerability patterns (SQL injection, missing auth checks, insecure direct object references), AI review is highly accurate and often catches more than human reviewers who are reviewing under time pressure. For novel, context-dependent vulnerabilities, human security review remains necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does AI code review slow down the PR process?&lt;/strong&gt;&lt;br&gt;
No — AI pre-review actually speeds up the human review step. Human reviewers spend less time on mechanical issues and more time on the judgment calls they're uniquely equipped to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the best AI tool for code review in 2026?&lt;/strong&gt;&lt;br&gt;
Claude Code is the most capable for whole-diff review with project context. GitHub Copilot has PR summary features. For automated CI integration, tools like CodeRabbit or Qodo use AI APIs to post review comments automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should AI review comments be blocking or advisory?&lt;/strong&gt;&lt;br&gt;
HIGH severity (security, auth bypass) should be blocking — fix before merge. MED severity (missing error handling, convention violations) should be blocking. LOW severity (documentation gaps, minor optimizations) should be advisory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-code-team-setup"&gt;Claude Code for Teams: Best Practices&lt;/a&gt; — Team review workflows&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/context-engineering-claude"&gt;Context Engineering for Claude&lt;/a&gt; — Loading codebase context for review&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-code-complete-guide"&gt;Claude Code Complete Guide&lt;/a&gt; — Full Claude Code reference&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Go Deeper
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shoutfirst.gumroad.com/l/agfda?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-code-review-vs-human" rel="noopener noreferrer"&gt;Power Prompts 300 — $29&lt;/a&gt;&lt;/strong&gt; — Includes 30+ code review prompt templates: security audit, convention compliance, performance review, and test coverage review — each tuned for Claude Code's project-level context understanding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shoutfirst.gumroad.com/l/agfda?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-code-review-vs-human" rel="noopener noreferrer"&gt;→ Get Power Prompts 300 — $29&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;30-day money-back guarantee. Instant download.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI Pair Programming in 2026: The New Rules</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Fri, 12 Jun 2026 01:31:27 +0000</pubDate>
      <link>https://dev.to/claudeguide/ai-pair-programming-in-2026-the-new-rules-lgl</link>
      <guid>https://dev.to/claudeguide/ai-pair-programming-in-2026-the-new-rules-lgl</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/ai-pair-programming-2026?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=ai-pair-programming-2026" rel="noopener noreferrer"&gt;claudeguide.io/ai-pair-programming-2026&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  AI Pair Programming in 2026: The New Rules
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;AI pair programming in 2026 is not traditional pair programming with a robot — it's a fundamentally different collaboration model where the developer acts as the architect and product owner while the AI acts as the implementer, with the developer reviewing and directing rather than typing.&lt;/strong&gt; Understanding this asymmetry is what separates developers who get 3-5x productivity gains from those who get frustrated and go back to coding alone. This guide covers the mental model, the new habits, and the specific patterns that make the collaboration work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old Mental Model (Wrong)
&lt;/h2&gt;

&lt;p&gt;Traditional pair programming: two developers, one keyboard. The driver types; the navigator reviews, thinks ahead, spots bugs. They switch roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The wrong way to think about AI pair programming&lt;/strong&gt;: AI is the co-pilot who helps me type faster.&lt;/p&gt;

&lt;p&gt;This leads to using AI for autocomplete, getting frustrated when it generates imperfect code, and spending more time fixing AI output than writing code yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Correct Mental Model
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;You are the product owner and architect. Claude Code is the senior engineer who implements your specifications.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You decide WHAT to build (features, behavior, architecture)&lt;/li&gt;
&lt;li&gt;Claude Code decides HOW to implement it (code structure, patterns, syntax)&lt;/li&gt;
&lt;li&gt;You review the output and redirect when needed&lt;/li&gt;
&lt;li&gt;You own the product; Claude owns the implementation details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shift changes how you interact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Wrong (driver-navigator model):
You type → AI suggests next line → you accept or reject
Result: marginal speedup, constant cognitive load

# Correct (architect-engineer model):
You specify feature → AI implements full feature → you review + approve/redirect
Result: 5-10x leverage on implementation work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Five New Rules
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Rule 1: Specify, don't type
&lt;/h3&gt;

&lt;p&gt;Your job is to write precise specifications, not code. The quality of your spec determines the quality of the output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Low-spec (you'll spend 30 min fixing):
"Add search to the app"

# High-spec (gets it right in 1-2 iterations):
"Add full-text search to the /projects page.
- Search field: top of the list, debounced 300ms
- Searches: project name and description fields
- Results: filter the existing project cards in real-time (client-side, we have &amp;lt;500 projects)
- Empty state: 'No projects match your search' with clear button
- URL: update ?search= query param so searches are shareable
- Mobile: search field collapses to icon on &amp;lt;640px width"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second spec takes 2 minutes to write and 1 Claude iteration to implement correctly. The first takes 30+ minutes of back-and-forth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 2: Review for correctness, not style
&lt;/h3&gt;

&lt;p&gt;When reviewing AI-generated code, focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does it do what the spec said?&lt;/li&gt;
&lt;li&gt;Are there security issues (auth, injection, validation)?&lt;/li&gt;
&lt;li&gt;Is error handling complete?&lt;/li&gt;
&lt;li&gt;Does it follow CLAUDE.md conventions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do NOT review for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether you would have written it differently&lt;/li&gt;
&lt;li&gt;Style preferences (unless it violates conventions)&lt;/li&gt;
&lt;li&gt;Variable naming choices that don't affect behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optimizing for personal style in AI-generated code is the biggest time waste in AI pair programming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 3: Redirect with context, not corrections
&lt;/h3&gt;

&lt;p&gt;When the output is wrong, don't manually fix the code — tell Claude what's wrong and why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Inefficient: manually edit the generated code
# Efficient: redirect with context

"This implementation uses offset-based pagination.
Our DB has 2M rows — offset pagination is O(n) and will cause timeouts at scale.
Use cursor-based pagination instead. See lib/db/pagination.ts for our pattern."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Manual editing breaks the collaboration loop and means you're now the implementer again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 4: CLAUDE.md is your co-pilot config
&lt;/h3&gt;

&lt;p&gt;Every 30 minutes you spend maintaining CLAUDE.md saves hours of future redirects. CLAUDE.md is how you train your co-pilot on your codebase.&lt;/p&gt;

&lt;p&gt;When Claude makes a mistake for the second time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Note the pattern&lt;/li&gt;
&lt;li&gt;Add it to CLAUDE.md as an anti-pattern&lt;/li&gt;
&lt;li&gt;Never explain it manually again
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Anti-pattern added after second occurrence:&lt;/span&gt;
&lt;span class="gu"&gt;## Never Do This&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use Array.find() for DB lookups — use DB WHERE clauses
  (Array.find scans the client array, not the indexed DB)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rule 5: Match task size to the model
&lt;/h3&gt;

&lt;p&gt;Not all tasks benefit from full Sonnet capability. Route intelligently:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task type&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Writing a complex feature&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Needs context + reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formatting / simple refactor&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Mechanical, fast, cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writing tests for existing code&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Template-following, not creative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging a complex bug&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Needs analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generating documentation&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Templated output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architectural review&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Judgment call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Daily Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Morning: Load context
&lt;/h3&gt;

&lt;p&gt;Start each session by letting Claude re-orient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read CLAUDE.md and the last 5 git commits.
Tell me: what was being worked on, what's the current state,
and what would be the most logical next task?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This saves 15-20 minutes of re-explaining context.&lt;/p&gt;

&lt;h3&gt;
  
  
  During development: The spec-implement-review loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Write the spec (2-5 minutes)
2. Claude implements (2-5 minutes)
3. You review for correctness (2-3 minutes)
4. Redirect if needed (1-2 minutes) → back to step 2
5. Accept when correct → move to next spec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each cycle is 6-15 minutes for a complete feature unit. A 2-hour session completes 8-20 units.&lt;/p&gt;

&lt;h3&gt;
  
  
  End of session: Update CLAUDE.md
&lt;/h3&gt;

&lt;p&gt;Before closing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Based on our session today, what new patterns or anti-patterns should be added to CLAUDE.md?
List them and I'll decide which to add."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps your co-pilot continuously improving.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Take the Keyboard Back
&lt;/h2&gt;

&lt;p&gt;AI pair programming has a ceiling. These situations call for you to write code directly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Domain logic you can't fully specify&lt;/strong&gt;&lt;br&gt;
If you can't write a clear spec, you don't understand the requirement well enough. Think it through first, then spec it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Performance-critical inner loops&lt;/strong&gt;&lt;br&gt;
When micro-optimizations matter (hot paths, real-time rendering), writing by hand with profiler feedback is faster than iterating with AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Exploratory prototyping&lt;/strong&gt;&lt;br&gt;
When you're not sure what you want yet, typing exploratory code is how you discover requirements. AI implementations of unclear specs create technical debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Debugging subtle concurrency issues&lt;/strong&gt;&lt;br&gt;
Race conditions, deadlocks, and timing-dependent bugs often require single-stepping through execution mentally. AI can help identify candidates but the human brain is better at the actual trace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Habits That Kill AI Pair Productivity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Habit: Accepting without reviewing
&lt;/h3&gt;

&lt;p&gt;Accepting every AI suggestion without review builds bugs into production. AI is confident and wrong more than it lets on. Always review for security and correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Habit: Writing vague specs
&lt;/h3&gt;

&lt;p&gt;"Make this better" produces unpredictable results. Every hour spent writing precise specs saves multiple hours of redirect loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Habit: Fixing instead of redirecting
&lt;/h3&gt;

&lt;p&gt;When you manually fix AI-generated code, you're doing the implementation work yourself. Only do this for trivial one-line fixes; otherwise redirect so Claude learns the correct approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Habit: Ignoring CLAUDE.md
&lt;/h3&gt;

&lt;p&gt;Developers who skip CLAUDE.md setup spend 40-60% more tokens on corrections and redirects. The ROI on CLAUDE.md maintenance is extremely high.&lt;/p&gt;

&lt;h3&gt;
  
  
  Habit: Context switching too frequently
&lt;/h3&gt;

&lt;p&gt;"One more thing while you're here" accumulates context and leads to longer sessions with degraded output. Finish the current task cleanly before starting a new one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is AI pair programming?&lt;/strong&gt;&lt;br&gt;
AI pair programming is a development workflow where a developer acts as architect and product owner, writing specifications and reviewing output, while an AI coding assistant like Claude Code implements the code. Unlike traditional pair programming, the AI handles implementation details while the human maintains strategic control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is AI pair programming faster than coding alone?&lt;/strong&gt;&lt;br&gt;
For implementation-heavy work — writing features, boilerplate, tests, migrations — AI pair programming is typically 3-5x faster than solo development once the workflow is established. For exploratory work, debugging subtle bugs, or highly domain-specific logic, the advantage is smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the biggest mistake developers make with AI pair programming?&lt;/strong&gt;&lt;br&gt;
Using AI as an autocomplete tool rather than a specification-to-implementation engine. Developers who specify features clearly and review output for correctness — rather than trying to steer line-by-line suggestions — get far better results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How long does it take to get productive with AI pair programming?&lt;/strong&gt;&lt;br&gt;
Most developers find the workflow intuitive within 1-2 weeks of daily use. The main adjustment is learning to write precise feature specifications rather than typing code directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does AI pair programming work for all types of development tasks?&lt;/strong&gt;&lt;br&gt;
No. It's most effective for implementation work where requirements are clear: writing API endpoints, UI components, DB queries, tests, and migrations. It's less effective for exploratory prototyping, novel algorithmic work, and debugging deeply context-dependent bugs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/claude-code-complete-guide"&gt;Claude Code Complete Guide&lt;/a&gt; — Full Claude Code feature reference&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/context-engineering-claude"&gt;Context Engineering for Claude&lt;/a&gt; — CLAUDE.md design for better collaboration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/copilot-to-claude-migration-2026"&gt;Why Dev Teams Are Moving From Copilot to Claude Code&lt;/a&gt; — Tool comparison&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Go Deeper
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://shoutfirst.gumroad.com/l/agfda?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-pair-programming-2026" rel="noopener noreferrer"&gt;Power Prompts 300 — $29&lt;/a&gt;&lt;/strong&gt; — 300 specification templates for AI pair programming: feature specs, refactoring instructions, debugging prompts, and review checklists — each written to get consistent, production-ready output in the first or second iteration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shoutfirst.gumroad.com/l/agfda?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-pair-programming-2026" rel="noopener noreferrer"&gt;→ Get Power Prompts 300 — $29&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;30-day money-back guarantee. Instant download.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>workflow</category>
      <category>2026</category>
    </item>
    <item>
      <title>Building an AI-First Startup in 2026: What Actually Works</title>
      <dc:creator>Sangmin Lee</dc:creator>
      <pubDate>Fri, 12 Jun 2026 01:31:24 +0000</pubDate>
      <link>https://dev.to/claudeguide/building-an-ai-first-startup-in-2026-what-actually-works-1oj2</link>
      <guid>https://dev.to/claudeguide/building-an-ai-first-startup-in-2026-what-actually-works-1oj2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://claudeguide.io/ai-first-startup-2026?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=ai-first-startup-2026" rel="noopener noreferrer"&gt;claudeguide.io/ai-first-startup-2026&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Building an AI-First Startup in 2026: What Actually Works
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;In 2026, the most successful AI-first startups are not the ones with the most sophisticated AI — they're the ones that found a narrow use case, made the AI output actually reliable for that use case, and built a product layer that justifies the LLM costs.&lt;/strong&gt; The failure mode is building a general-purpose AI tool in a crowded market. The success pattern is building a specialized tool with AI as one component, targeted at users who have a specific workflow pain point. This guide covers what works, what doesn't, and how to build it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Market in 2026
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's overcrowded
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;General writing assistants (buried under ChatGPT/Claude)&lt;/li&gt;
&lt;li&gt;Generic code review tools (GitHub Copilot covers this)&lt;/li&gt;
&lt;li&gt;"Chat with your documents" — too many competitors&lt;/li&gt;
&lt;li&gt;AI image generation wrappers — commoditized&lt;/li&gt;
&lt;li&gt;Generic customer service chatbots — Intercom, Zendesk already have AI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What's working
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vertical-specific AI workflows&lt;/strong&gt;: AI for legal document review, medical billing codes, construction estimates, patent analysis — narrow and deep beats broad and shallow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow automation&lt;/strong&gt;: replacing specific human repetitive tasks in specific industries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI + proprietary data&lt;/strong&gt;: your training data or integration is the moat, not the AI itself&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Micro-SaaS with AI features&lt;/strong&gt;: traditional SaaS problems solved better with AI (e.g., AI-powered analytics, AI-assisted onboarding)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API-first tools&lt;/strong&gt;: developers building other AI products need infrastructure&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Solo Founder Equation Changed
&lt;/h2&gt;

&lt;p&gt;Before 2023, a solo founder building a SaaS was limited by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engineering capacity (one person can only write so much code)&lt;/li&gt;
&lt;li&gt;Time-to-MVP (3-6 months minimum)&lt;/li&gt;
&lt;li&gt;Cost to maintain (infrastructure, customer support, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In 2026 with Claude Code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Engineering velocity&lt;/strong&gt;: 3-5x faster implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-to-MVP&lt;/strong&gt;: 1-4 weeks for a working product&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support&lt;/strong&gt;: AI-assisted customer support drafts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content&lt;/strong&gt;: AI-generated documentation, onboarding copy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shifts the bottleneck from engineering to product discovery and distribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implication&lt;/strong&gt;: In 2026, a solo founder can reasonably build and maintain what previously required a 3-5 person team. The limiting factor is finding a real problem and getting in front of people who have it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Product Patterns That Monetize
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: AI-Powered Report Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: User inputs data or connects a data source; AI generates a formatted, professional report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: Financial analysis reports, SEO audit reports, competitor research reports, property valuations, code review reports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: The output has clear professional value. Users pay per report or subscribe for ongoing reports. Cost structure is manageable — one report = one API call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unit economics check&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per report: $0.01-0.10 (Sonnet, 2k-20k tokens)&lt;/li&gt;
&lt;li&gt;Value to user: $10-100+ (time saved vs writing manually)&lt;/li&gt;
&lt;li&gt;Charge: $1-5/report or $20-50/month subscription&lt;/li&gt;
&lt;li&gt;Margin: 20-50x on variable costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pattern 2: Specialized Data Extraction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: AI extracts structured data from unstructured documents — contracts, invoices, emails, PDFs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: Invoice parser, contract clause extractor, email CRM auto-fill, medical record coding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: Businesses have lots of documents. Manual extraction is expensive and error-prone. AI extraction is accurate enough for most fields. Clear per-document pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unit economics check&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per document: $0.005-0.05&lt;/li&gt;
&lt;li&gt;Value: replacing 5-30 minutes of human work&lt;/li&gt;
&lt;li&gt;Charge: $0.50-5 per document&lt;/li&gt;
&lt;li&gt;Margin: 10-100x&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pattern 3: AI-Accelerated Workflow Tool
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: An existing workflow tool where AI eliminates the hard, slow steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: Job description writer, RFP response generator, grant application assistant, real estate listing writer, social media calendar generator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: Users already have the workflow; you're making a specific pain point in it dramatically faster. Clear before/after value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4: API / Developer Tool
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: Infrastructure other developers use to build AI features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: Prompt management, AI evaluation frameworks, specialized embedding generation, domain-specific fine-tuned models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works&lt;/strong&gt;: Developer tools have high willingness to pay. One customer = many end users. Technical founders build what they'd want to use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost Structure
&lt;/h2&gt;

&lt;p&gt;LLM costs are real and affect pricing strategy. Model before building:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Variables:
- API calls per user per day
- Average tokens per call (input + output)
- Model tier (Haiku/Sonnet/Opus)
- Cache hit rate (with prompt caching)

Example: Invoice processing SaaS
- 50 invoices/day per customer
- 1,500 input tokens + 300 output tokens per invoice
- Using Sonnet ($3/M input, $15/M output)
- No caching (each invoice different)

Cost per customer per day:
  Input: 50 × 1,500 × $3/M = $0.225
  Output: 50 × 300 × $15/M = $0.225
  Total: $0.45/customer/day = $13.50/customer/month

If you charge $49/month:
  Gross margin: ($49 - $13.50) / $49 = 72%
  (Plus infrastructure, support, etc. — real margin ~50-60%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight&lt;/strong&gt;: Model routing matters. If 60% of your calls can use Haiku instead of Sonnet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;60% on Haiku ($1.00/M input):
  Input cost: 50 × 1,500 × 0.6 × $1.00/M = $0.036
  vs Sonnet for same calls: $0.135
  Saving: $0.099/day = $3/customer/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At 100 customers, that's $300/month saved just from model routing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reliability Problem
&lt;/h2&gt;

&lt;p&gt;The single biggest technical challenge for AI-first products: AI output is probabilistic, not deterministic. Users expect consistent, reliable results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategies for reliability
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Constrained output&lt;/strong&gt;: Use structured outputs / JSON mode to enforce consistent format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Force structured output
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You always respond in valid JSON matching this schema: {...}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output validation layer&lt;/strong&gt;: Validate AI output before showing to users.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
def process_invoice(invoice_text: str) -

[→ Get the Solo AI Builder Stack — $19](https://shoutfirst.gumroad.com/l/sujwg?utm_source=claudeguide&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-first-startup-2026)

*30-day money-back guarantee. Instant download.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
    </item>
  </channel>
</rss>
