<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Trilok Kanwar</title>
    <description>The latest articles on DEV Community by Trilok Kanwar (@trilok_kanwar).</description>
    <link>https://dev.to/trilok_kanwar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3754921%2F07514810-2e51-4d2f-be72-79f82e3c4ee1.png</url>
      <title>DEV Community: Trilok Kanwar</title>
      <link>https://dev.to/trilok_kanwar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/trilok_kanwar"/>
    <language>en</language>
    <item>
      <title>How to Detect Agent Instability Before Production</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Tue, 03 Mar 2026 18:03:17 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/how-to-detect-agent-instability-before-production-58a6</link>
      <guid>https://dev.to/trilok_kanwar/how-to-detect-agent-instability-before-production-58a6</guid>
      <description>&lt;p&gt;When building conversational agents, I made a mistake early on.&lt;br&gt;
I validated prompts with single responses.&lt;/p&gt;

&lt;p&gt;Everything looked great until real conversations happened.&lt;/p&gt;

&lt;p&gt;By turn 3 or 4:&lt;br&gt;
-constraints softened&lt;br&gt;
-tone drifted&lt;br&gt;
-instructions faded&lt;/p&gt;

&lt;p&gt;The insight: users experience conversations, not outputs.&lt;/p&gt;

&lt;p&gt;So I changed the workflow. Every prompt edit now gets tested across multiple multi-turn conversations immediately. It exposed instability that single-response testing never revealed.&lt;/p&gt;

&lt;p&gt;That shift made iteration more structured and less reactive.&lt;/p&gt;

&lt;p&gt;If you're building chat or voice agents, consider validating trajectories, not just responses.&lt;/p&gt;

&lt;p&gt;I’ve documented the workflow here: &lt;a href="https://shorturl.at/r7sfP" rel="noopener noreferrer"&gt;https://shorturl.at/r7sfP&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>What we learned from 100+ production RAG deployments (free 118-page handbook)</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Tue, 17 Feb 2026 18:13:00 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/what-we-learned-from-100-production-rag-deployments-free-118-page-handbook-18o5</link>
      <guid>https://dev.to/trilok_kanwar/what-we-learned-from-100-production-rag-deployments-free-118-page-handbook-18o5</guid>
      <description>&lt;p&gt;We’ve been building RAG systems for a while and wanted to share a resource we just published. It’s a 118-page handbook covering the patterns that separate prototype RAG from production RAG.&lt;/p&gt;

&lt;p&gt;If you’re building RAG right now, here are the problems this covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Your vector search returns “close enough” results instead of exact matches. The handbook covers hybrid retrieval that runs semantic and keyword search in parallel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your chunking splits documents in weird places. It covers semantic chunking, code-aware chunking using ASTs, and parent-child structures that keep context intact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You have no idea if your retrieval is actually good. It covers evaluation frameworks that work without manually labeling test data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your costs keep growing and you can’t figure out why. It covers production observability that traces every step of your pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also has dedicated chapters on building RAG for specific domains: code generation, text-to-SQL, legal search, and medical knowledge retrieval. Each one has different failure modes that generic approaches miss.&lt;/p&gt;

&lt;p&gt;Free PDF - &lt;a href="https://shorturl.at/rRXXP" rel="noopener noreferrer"&gt;https://shorturl.at/rRXXP&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to hear what problems others are hitting with production RAG, always helps to know what to cover next.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>Reimagining Synthetic Data Generation at Future AGI</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Fri, 13 Feb 2026 17:45:21 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/reimagining-synthetic-data-generation-at-future-agi-3jk7</link>
      <guid>https://dev.to/trilok_kanwar/reimagining-synthetic-data-generation-at-future-agi-3jk7</guid>
      <description>&lt;p&gt;We’ve been quietly upgrading synthetic data generation at Future AGI.&lt;/p&gt;

&lt;p&gt;Here’s what’s new:&lt;/p&gt;

&lt;p&gt;Grounded generation tied to uploaded knowledge bases (~90% coverage observed)&lt;/p&gt;

&lt;p&gt;-1.78× faster dataset creation&lt;/p&gt;

&lt;p&gt;-Non-linear scaling as dataset size increases&lt;/p&gt;

&lt;p&gt;-Mid-generation editing support&lt;/p&gt;

&lt;p&gt;-Improved diversity beyond 5,000 rows&lt;/p&gt;

&lt;p&gt;-SOP-driven scenario generation with edge cases&lt;/p&gt;

&lt;p&gt;-One-click variable generation for prompt testing&lt;/p&gt;

&lt;p&gt;For teams in regulated industries, voice AI, or LLM evaluation workflows, this reduces manual overhead significantly.&lt;/p&gt;

&lt;p&gt;There’s more in the changelog. We’ll break it down in a separate post.&lt;/p&gt;

&lt;p&gt;Synthetic Data Generation: &lt;a href="https://shorturl.at/Osgwr" rel="noopener noreferrer"&gt;https://shorturl.at/Osgwr&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Prompt Optimization, Not Prompt Guessing</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Thu, 12 Feb 2026 18:49:34 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/prompt-optimization-not-prompt-guessing-33a5</link>
      <guid>https://dev.to/trilok_kanwar/prompt-optimization-not-prompt-guessing-33a5</guid>
      <description>&lt;p&gt;In sales, support, and fintech workflows, teams rely on prompts to classify conversations, extract signals, and route decisions.&lt;/p&gt;

&lt;p&gt;A skilled prompt engineer can make 100 examples look perfect.&lt;/p&gt;

&lt;p&gt;That is exactly the problem.&lt;/p&gt;

&lt;p&gt;Here’s the contradiction nobody talks about:&lt;br&gt;
the more skilled you are at writing prompts, the more dangerous your process becomes.&lt;/p&gt;

&lt;p&gt;Because intuition works on small samples.&lt;br&gt;
It does not generalize to 10,000 inputs, multiple failure modes, and cost constraints you have not measured.&lt;/p&gt;

&lt;p&gt;Expert intuition produces prompts that feel right.&lt;br&gt;
But they cannot be reliably reproduced, versioned, or defended with metrics.&lt;/p&gt;

&lt;p&gt;The fix is not better intuition.&lt;/p&gt;

&lt;p&gt;It is replacing intuition with an objective function.&lt;/p&gt;

&lt;p&gt;Dataset → Evaluator → Optimizer → Ranked prompts.&lt;/p&gt;

&lt;p&gt;This is the same class of problem as hyperparameter tuning.&lt;br&gt;
We just forgot to treat it that way.&lt;/p&gt;

&lt;p&gt;Our team documented the full workflow in a cookbook.&lt;br&gt;
&lt;a href="https://shorturl.at/aI0zg" rel="noopener noreferrer"&gt;https://shorturl.at/aI0zg&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Keeping multimodal experimentation in one place</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Tue, 10 Feb 2026 20:10:48 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/keeping-multimodal-experimentation-in-one-place-37po</link>
      <guid>https://dev.to/trilok_kanwar/keeping-multimodal-experimentation-in-one-place-37po</guid>
      <description>&lt;p&gt;Teams building with image generation models or vision pipelines often hit the same problem. The model produces an image, but you cannot see it where the prompt lives.&lt;/p&gt;

&lt;p&gt;That makes reviewing quality manual, comparing runs messy, and iteration slower than it should be.&lt;/p&gt;

&lt;p&gt;We just shipped native image rendering inside Datasets and Prompt Workbench. Generated images now appear directly next to the prompts that created them.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster output review&lt;/li&gt;
&lt;li&gt;Easy visual comparison across runs&lt;/li&gt;
&lt;li&gt;Iteration without switching tools or losing context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompting, generating, reviewing, and experimenting now happen in one place. Multimodal workflows finally get tooling that matches how they actually work.&lt;/p&gt;

&lt;p&gt;Mutimodal - Image Generation in Datasets &amp;amp; Prompt: &lt;a href="https://shorturl.at/athOG" rel="noopener noreferrer"&gt;https://shorturl.at/athOG&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Your Agent Is Slow Because of Inference</title>
      <dc:creator>Trilok Kanwar</dc:creator>
      <pubDate>Fri, 06 Feb 2026 15:36:26 +0000</pubDate>
      <link>https://dev.to/trilok_kanwar/your-agent-is-slow-because-of-inference-l2a</link>
      <guid>https://dev.to/trilok_kanwar/your-agent-is-slow-because-of-inference-l2a</guid>
      <description>&lt;p&gt;When an agent feels sluggish, the instinct is to blame reasoning quality.&lt;/p&gt;

&lt;p&gt;But in agentic AI systems, reasoning is rarely the real problem.&lt;/p&gt;

&lt;p&gt;Inference today looks like:&lt;/p&gt;

&lt;p&gt;-planning a path forward&lt;br&gt;
-calling tools&lt;br&gt;
-waiting on external systems&lt;br&gt;
-re-planning based on outputs&lt;br&gt;
-generating a final response across long sessions&lt;/p&gt;

&lt;p&gt;That entire loop is inference.&lt;/p&gt;

&lt;p&gt;In a recent chat with Yunmo and Alex from FriendliAI, we explored why inference has quietly become the biggest bottleneck in agent performance and how teams are optimizing for it.&lt;/p&gt;

&lt;p&gt;The key shift:&lt;br&gt;
Latency, throughput, and cost aren’t infra trade-offs anymore. They’re product decisions.&lt;/p&gt;

&lt;p&gt;If you’re building agentic systems, this is worth rethinking.&lt;/p&gt;

&lt;p&gt;▶️ Full webinar link: &lt;a href="https://shorturl.at/moj3x" rel="noopener noreferrer"&gt;https://shorturl.at/moj3x&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiops</category>
      <category>opensource</category>
      <category>inference</category>
    </item>
  </channel>
</rss>
