<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chetan Sehgal</title>
    <description>The latest articles on DEV Community by Chetan Sehgal (@chetan_e2dbf0aed91647397c).</description>
    <link>https://dev.to/chetan_e2dbf0aed91647397c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867437%2F6d6076d2-2176-43b6-96ba-fab7eda0309b.png</url>
      <title>DEV Community: Chetan Sehgal</title>
      <link>https://dev.to/chetan_e2dbf0aed91647397c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chetan_e2dbf0aed91647397c"/>
    <language>en</language>
    <item>
      <title>AI Agents That Learn on the Job: Why On-the-Fly Evolution Changes Everything About Agent Architecture</title>
      <dc:creator>Chetan Sehgal</dc:creator>
      <pubDate>Wed, 08 Apr 2026 17:09:35 +0000</pubDate>
      <link>https://dev.to/chetan_e2dbf0aed91647397c/ai-agents-that-learn-on-the-job-why-on-the-fly-evolution-changes-everything-about-agent-3koi</link>
      <guid>https://dev.to/chetan_e2dbf0aed91647397c/ai-agents-that-learn-on-the-job-why-on-the-fly-evolution-changes-everything-about-agent-3koi</guid>
      <description>&lt;p&gt;Most AI agents shipped today are frozen the moment they hit production. They execute. They respond. But they don't get better from doing the work.&lt;/p&gt;

&lt;p&gt;This is the dirty secret of the current agent boom: for all the hype about autonomous AI, the vast majority of deployed agents are static inference machines wrapped in clever prompt chains. When they fail at a task pattern, someone on your team manually re-prompts, retrains, or rewires the pipeline. The feedback loop between failure and improvement is measured in days or weeks — not the minutes it should take.&lt;/p&gt;

&lt;p&gt;That's starting to change, and the implications are significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  ALTK-Evolve: On-the-Job Learning for Agents
&lt;/h2&gt;

&lt;p&gt;Hugging Face and IBM Research recently introduced &lt;a href="https://huggingface.co/blog/ibm-research/altk-evolve" rel="noopener noreferrer"&gt;ALTK-Evolve&lt;/a&gt;, a framework that enables &lt;strong&gt;on-the-job learning for AI agents&lt;/strong&gt;. Instead of relying exclusively on offline fine-tuning or static prompt engineering, ALTK-Evolve lets agents evolve their behavior through real-world task execution.&lt;/p&gt;

&lt;p&gt;The core idea: an agent's own &lt;strong&gt;execution traces&lt;/strong&gt; — the sequence of actions it took, the tools it called, the results it observed — become training signal. The agent doesn't just complete a task and move on. It reflects on what worked, what didn't, and adjusts its strategy for the next iteration.&lt;/p&gt;

&lt;p&gt;This isn't reinforcement learning in the traditional sense, where you need a carefully designed reward function and a simulation environment. This is learning from production behavior, in production, on real tasks. The feedback loop tightens from weeks to hours, potentially to minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More Than Another Benchmark
&lt;/h2&gt;

&lt;p&gt;The AI community is perpetually distracted by benchmark wars. Model X beats Model Y on HumanEval. A new architecture claims state-of-the-art on MMLU. These numbers matter, but they obscure a more fundamental question: &lt;strong&gt;what happens after deployment?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A model that scores 92% on a benchmark but can't improve from its own failures in production is less valuable than a model scoring 85% that compounds its experience over time. On-the-job learning introduces a &lt;strong&gt;compounding advantage&lt;/strong&gt; — agents that have been running longer perform better, not because they were retrained by a human, but because they evolved through use.&lt;/p&gt;

&lt;p&gt;Think about the economics of this. Two companies deploy competing AI agents for the same enterprise workflow. Company A's agent is static — every improvement requires an engineer to analyze failure cases, adjust prompts, and redeploy. Company B's agent learns from its own execution traces and adapts autonomously. After three months, the performance gap isn't linear. It's exponential. Company B's agent has been compounding improvements with every task it completes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Demands From Agent Architectures
&lt;/h2&gt;

&lt;p&gt;Here's the practical takeaway that most teams are going to miss: &lt;strong&gt;agent architectures need to be designed for mutability from day one&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most agent frameworks today are built around static components — fixed prompt templates, hardcoded tool chains, rigid orchestration logic. These architectures assume that the agent's behavior is defined at build time and frozen at deploy time. On-the-job learning breaks that assumption entirely.&lt;/p&gt;

&lt;p&gt;If you want agents that evolve, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Execution trace logging as a first-class concern&lt;/strong&gt; — not just for debugging, but as training data. Every action, observation, and decision point needs to be captured in a structured format that can feed back into the agent's learning loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mutable strategy layers&lt;/strong&gt; — the agent's decision-making logic can't be a monolithic prompt. It needs modular components that can be updated independently as the agent learns new patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails on self-modification&lt;/strong&gt; — an agent that can change its own behavior is powerful but dangerous. You need validation gates that ensure evolved behaviors don't violate safety constraints or drift from the intended task scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation infrastructure that runs continuously&lt;/strong&gt; — not just pre-deployment benchmarks, but ongoing performance monitoring that can distinguish genuine improvement from harmful drift.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Static prompt chains won't cut it when your competitor's agents are compounding their own experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On-the-job learning closes the feedback loop&lt;/strong&gt; between agent failure and improvement from weeks to hours, using execution traces as training signal rather than requiring manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compounding experience creates exponential advantages&lt;/strong&gt; — agents that learn from production use will increasingly outperform static agents, regardless of base model quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent architectures must be designed for mutability from day one&lt;/strong&gt; — static prompt chains and hardcoded tool orchestration are incompatible with continuous self-improvement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Question You Should Be Asking
&lt;/h2&gt;

&lt;p&gt;If you're building agents today, the most important architectural question isn't which model to use or which framework to adopt. It's this: &lt;strong&gt;are you designing for deployment, or for continuous improvement after deployment?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That distinction is about to separate the serious agent builders from everyone else. The agents that win in production won't be the ones that launched best — they'll be the ones that learned fastest.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>On-Device AI Is Changing How We Build — With Cover Image Test</title>
      <dc:creator>Chetan Sehgal</dc:creator>
      <pubDate>Wed, 08 Apr 2026 09:38:18 +0000</pubDate>
      <link>https://dev.to/chetan_e2dbf0aed91647397c/on-device-ai-is-changing-how-we-build-with-cover-image-test-10he</link>
      <guid>https://dev.to/chetan_e2dbf0aed91647397c/on-device-ai-is-changing-how-we-build-with-cover-image-test-10he</guid>
      <description>&lt;h2&gt;
  
  
  The Shift Nobody Priced In
&lt;/h2&gt;

&lt;p&gt;For the past two years, building AI into products meant one thing: an API call to a cloud endpoint. That assumption just broke.&lt;/p&gt;

&lt;p&gt;Google's Gemma 4 is a multimodal model with frontier-level reasoning that runs locally — on a phone, a laptop, an edge device. Not behind a server. Not metered per token. On the device in your hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Changes Your Architecture
&lt;/h2&gt;

&lt;p&gt;When inference is local, three constraints flip:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt; drops from hundreds of milliseconds to single digits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; goes from per-call pricing to zero marginal cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; goes from "we send your data to the cloud" to "it never leaves the device"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't incremental improvements. They change which features are viable to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Practitioners Should Do Now
&lt;/h2&gt;

&lt;p&gt;If you're building AI features today, benchmark on-device models for your use case. The gap between cloud and local quality is closing faster than most roadmaps account for.&lt;/p&gt;

&lt;p&gt;Hybrid inference — local for latency-sensitive tasks, cloud for complex reasoning — is likely the architecture that wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;On-device AI is no longer a compromise — it's a viable first choice for many use cases&lt;/li&gt;
&lt;li&gt;Gemma 4 signals that frontier capability at the edge is arriving faster than expected&lt;/li&gt;
&lt;li&gt;Architects who figure out hybrid inference now will ship faster and cheaper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's the first feature in your product you'd move from cloud to on-device?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
