<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Karthik Avinash</title>
    <description>The latest articles on DEV Community by Karthik Avinash (@karthikavinash).</description>
    <link>https://dev.to/karthikavinash</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2242024%2F1265ec00-1375-4e14-b958-fcd753314a8b.png</url>
      <title>DEV Community: Karthik Avinash</title>
      <link>https://dev.to/karthikavinash</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/karthikavinash"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Karthik Avinash</dc:creator>
      <pubDate>Thu, 13 Nov 2025 05:33:24 +0000</pubDate>
      <link>https://dev.to/karthikavinash/-47pe</link>
      <guid>https://dev.to/karthikavinash/-47pe</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/karthikavinash" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2242024%2F1265ec00-1375-4e14-b958-fcd753314a8b.png" alt="karthikavinash"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/karthikavinash/free-ebook-how-we-built-a-practical-framework-for-evaluating-ai-agents-in-production-22pf" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;[Free eBook] How We Built a Practical Framework for Evaluating AI Agents in Production&lt;/h2&gt;
      &lt;h3&gt;Karthik Avinash ・ Nov 12&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#llm&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#evaluation&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#agents&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>evaluation</category>
      <category>agents</category>
    </item>
    <item>
      <title>[Free eBook] How We Built a Practical Framework for Evaluating AI Agents in Production</title>
      <dc:creator>Karthik Avinash</dc:creator>
      <pubDate>Wed, 12 Nov 2025 05:31:36 +0000</pubDate>
      <link>https://dev.to/karthikavinash/free-ebook-how-we-built-a-practical-framework-for-evaluating-ai-agents-in-production-22pf</link>
      <guid>https://dev.to/karthikavinash/free-ebook-how-we-built-a-practical-framework-for-evaluating-ai-agents-in-production-22pf</guid>
      <description>&lt;p&gt;Hey everyone! I'm new to the dev.to community and wanted to share something our team built.&lt;/p&gt;

&lt;p&gt;We've all been there: your new AI agent works great in demos, but the moment you think about shipping to prod, you get nervous. How do you &lt;em&gt;actually&lt;/em&gt; know it's reliable?&lt;/p&gt;

&lt;p&gt;Traditional LLM evals are useless for agents. An agent can give a perfect-sounding summary while, internally, its tool calls failed, it pulled stale data, and it completely forgot the user's original goal. Debugging this is a nightmare.&lt;/p&gt;

&lt;p&gt;Our team has been tackling this exact problem while building multimodal agents. We ended up creating our own evaluation playbook and decided to share it as a free e-book.&lt;/p&gt;

&lt;p&gt;It's not just theory, it's a practical guide on &lt;em&gt;how&lt;/em&gt; to build an eval system. It covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Diagnosing the real failure modes in &lt;code&gt;planning&lt;/code&gt;, &lt;code&gt;memory&lt;/code&gt;, and &lt;code&gt;tool_calls&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The 'how-to' of instrumenting your agent to get &lt;code&gt;span&lt;/code&gt; and &lt;code&gt;trace-level&lt;/code&gt; data (this is the most important part for debugging).&lt;/li&gt;
&lt;li&gt;Moving beyond "accuracy" to evals that check for safety and real business metrics.&lt;/li&gt;
&lt;li&gt;How we think about continuous monitoring and drift detection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can grab the free e-book here: &lt;a href="https://shorturl.at/4a1He" rel="noopener noreferrer"&gt;https://shorturl.at/4a1He&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hope it's helpful. Happy to answer any questions in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>evaluation</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
