<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AI Bug Slayer 🐞</title>
    <description>The latest articles on DEV Community by AI Bug Slayer 🐞 (@aibughunter).</description>
    <link>https://dev.to/aibughunter</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2949105%2F47e6e745-7f5d-4bd8-bfa6-98f609f42c56.jpg</url>
      <title>DEV Community: AI Bug Slayer 🐞</title>
      <link>https://dev.to/aibughunter</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aibughunter"/>
    <language>en</language>
    <item>
      <title>The Hottest AI Framework Right Now Has a Fatal Flaw Nobody Mentions</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Fri, 26 Jun 2026 03:31:32 +0000</pubDate>
      <link>https://dev.to/aibughunter/the-hottest-ai-framework-right-now-has-a-fatal-flaw-nobody-mentions-2ing</link>
      <guid>https://dev.to/aibughunter/the-hottest-ai-framework-right-now-has-a-fatal-flaw-nobody-mentions-2ing</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; (VentureBeat AI). For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; (VentureBeat AI). Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; (VentureBeat AI). The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Window to Build AI Expertise Is Closing Faster Than Anyone Expected</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 24 Jun 2026 03:36:26 +0000</pubDate>
      <link>https://dev.to/aibughunter/the-window-to-build-ai-expertise-is-closing-faster-than-anyone-expected-3a2c</link>
      <guid>https://dev.to/aibughunter/the-window-to-build-ai-expertise-is-closing-faster-than-anyone-expected-3a2c</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; (VentureBeat AI). For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; (VentureBeat AI). Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; (VentureBeat AI). The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What I Learned After Running AI Agents in Production for a Year</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 24 Jun 2026 03:35:50 +0000</pubDate>
      <link>https://dev.to/aibughunter/what-i-learned-after-running-ai-agents-in-production-for-a-year-49n</link>
      <guid>https://dev.to/aibughunter/what-i-learned-after-running-ai-agents-in-production-for-a-year-49n</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; (VentureBeat AI). For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; (VentureBeat AI). Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; (VentureBeat AI). The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Stop Fine-Tuning Your Model. Your Architecture Is the Problem.</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 24 Jun 2026 03:31:35 +0000</pubDate>
      <link>https://dev.to/aibughunter/stop-fine-tuning-your-model-your-architecture-is-the-problem-3kkg</link>
      <guid>https://dev.to/aibughunter/stop-fine-tuning-your-model-your-architecture-is-the-problem-3kkg</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="601"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1684369176170-463e84248b70%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1684369176170-463e84248b70%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="1080"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" width="420" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Exact Stack I Use to Build Production AI Agents (No Fluff)</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 24 Jun 2026 03:31:17 +0000</pubDate>
      <link>https://dev.to/aibughunter/the-exact-stack-i-use-to-build-production-ai-agents-no-fluff-2lmp</link>
      <guid>https://dev.to/aibughunter/the-exact-stack-i-use-to-build-production-ai-agents-no-fluff-2lmp</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" width="500" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1684369176170-463e84248b70%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1684369176170-463e84248b70%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="1080"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" width="420" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Hottest AI Framework Right Now Has a Fatal Flaw Nobody Mentions</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Mon, 22 Jun 2026 03:31:31 +0000</pubDate>
      <link>https://dev.to/aibughunter/the-hottest-ai-framework-right-now-has-a-fatal-flaw-nobody-mentions-3hd1</link>
      <guid>https://dev.to/aibughunter/the-hottest-ai-framework-right-now-has-a-fatal-flaw-nobody-mentions-3hd1</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1677442135703-1787eea5ce01%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1677442135703-1787eea5ce01%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" width="500" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="601"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Agents Are Becoming Your Coworkers — Here's What's Actually Happening</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Fri, 19 Jun 2026 03:32:03 +0000</pubDate>
      <link>https://dev.to/aibughunter/ai-agents-are-becoming-your-coworkers-heres-whats-actually-happening-14f</link>
      <guid>https://dev.to/aibughunter/ai-agents-are-becoming-your-coworkers-heres-whats-actually-happening-14f</guid>
      <description>&lt;p&gt;The AI world moved fast this week, and it's worth paying attention.&lt;/p&gt;

&lt;p&gt;If you've been following AI for a while, you've heard the hype cycle: "ChatGPT is here," "LLMs will replace everyone," "the singularity is coming." The noise is real, but something actually &lt;em&gt;different&lt;/em&gt; is happening right now — and it's less dramatic than the headlines suggest.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shift: From Chat to Agents
&lt;/h2&gt;

&lt;p&gt;For the past year, AI has been mostly about conversation. You ask a question, an LLM answers. It's useful, but it's also... passive. You're still in control. You decide what to ask. You decide what to do with the answer.&lt;/p&gt;

&lt;p&gt;That's changing.&lt;/p&gt;

&lt;p&gt;This week, AWS announced a new class of AI agents that act &lt;em&gt;autonomously&lt;/em&gt;. These aren't just chatbots that sound smarter. They're systems that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fix security vulnerabilities in your code without you asking&lt;/li&gt;
&lt;li&gt;Triage your email inbox and prioritize messages&lt;/li&gt;
&lt;li&gt;Manage tasks across multiple systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Google's AMIE system just proved in &lt;em&gt;Nature&lt;/em&gt; that it matches primary care physicians in complex disease management. Adobe expanded Firefly's creative agents across Photoshop and Premiere. Medical AI is now doing real diagnosis work.&lt;/p&gt;

&lt;p&gt;This is the inflection point. We're moving from "AI as a tool you control" to "AI as a coworker that acts on your behalf."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters (And Doesn't)
&lt;/h2&gt;

&lt;p&gt;The honest take: autonomy is both the opportunity and the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The opportunity:&lt;/strong&gt; Imagine delegating the boring, repetitive stuff — security patches, email management, data labeling — to something that doesn't get tired or distracted. That's real productivity gain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Imperfect autonomy is dangerous. Google literally just published research on "securing internal systems against increasingly capable and imperfectly aligned AI agents." Translation: we're building agents that are powerful enough to cause damage if they get it wrong.&lt;/p&gt;

&lt;p&gt;So we're in this uncomfortable middle ground right now. Agents are smart enough to do real work, but not smart enough to be fully trusted alone. The companies doing it well are threading the needle: agents handle specific, well-defined tasks with human oversight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Play
&lt;/h2&gt;

&lt;p&gt;Behind all this, the infrastructure is shifting hard.&lt;/p&gt;

&lt;p&gt;Europe's waking up to the fact that it doesn't have its own frontier LLM. OVHcloud just announced plans to train Europe's second major LLM (after Meta's Llama). Meanwhile, Alibaba updated its flagship model and released a new AI chip.&lt;/p&gt;

&lt;p&gt;The message: training and running LLMs costs serious money and compute. The countries and companies that control the infrastructure control the narrative.&lt;/p&gt;

&lt;p&gt;And at the application layer, LLM orchestration is becoming its own field — 22+ frameworks now exist just to manage how you run multiple models together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Translation:&lt;/strong&gt; If you're building anything with AI, infrastructure and orchestration matter as much as the models themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Should Actually Do
&lt;/h2&gt;

&lt;p&gt;If you're a developer, here's the pragmatic take:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start small with agents&lt;/strong&gt; — don't wait for perfect autonomy. Pick one specific, bounded task (email triage, code review, data processing) and build an agent for it. Test it with real work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expect you'll need to refine&lt;/strong&gt; — autonomy that works 95% of the time is still pretty useful, but you need monitoring. Build alerting and human checkpoints into your agents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Watch the infrastructure trends&lt;/strong&gt; — if you're deploying at scale, which model you use (Claude vs GPT vs open-source) and which LLM orchestration framework matters more than you'd think.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security isn't an afterthought&lt;/strong&gt; — Google's research on agent misalignment is real. If your agent can act in your systems, assume it will eventually make a weird decision. Design around that.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Real Question
&lt;/h2&gt;

&lt;p&gt;The headline right now is "AI agents are here" — and that's true. But the real question isn't whether agents work. It's whether &lt;em&gt;we know how to use them yet&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;We have the capability. We don't yet have the wisdom.&lt;/p&gt;

&lt;p&gt;That's changing, and it's worth watching.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's your take?&lt;/strong&gt; Are you building with AI agents? What's actually working and what's still hype? Drop a comment — I read them.&lt;/p&gt;

&lt;p&gt;If you're curious about AI trends, &lt;a href="https://dev.to/notifications/subscribe"&gt;subscribe&lt;/a&gt; to stay in the loop. New article every week.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>aiagents</category>
      <category>technology</category>
    </item>
    <item>
      <title>Context Windows Are Getting Huge. Here's Why That Changes Everything.</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Fri, 19 Jun 2026 03:31:32 +0000</pubDate>
      <link>https://dev.to/aibughunter/context-windows-are-getting-huge-heres-why-that-changes-everything-2jlh</link>
      <guid>https://dev.to/aibughunter/context-windows-are-getting-huge-heres-why-that-changes-everything-2jlh</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1677442135703-1787eea5ce01%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1677442135703-1787eea5ce01%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" width="500" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" width="420" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>When Context Windows Stop Mattering: The AI Stack That Actually Works</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 17 Jun 2026 03:32:02 +0000</pubDate>
      <link>https://dev.to/aibughunter/when-context-windows-stop-mattering-the-ai-stack-that-actually-works-26kk</link>
      <guid>https://dev.to/aibughunter/when-context-windows-stop-mattering-the-ai-stack-that-actually-works-26kk</guid>
      <description>&lt;h1&gt;
  
  
  When Context Windows Stop Mattering: The AI Stack That Actually Works
&lt;/h1&gt;

&lt;p&gt;The latest wave of AI news tells a story that's easy to miss if you're just scrolling headlines.&lt;/p&gt;

&lt;p&gt;This week, Z.ai dropped GLM-5.2 with a usable 1-million-token context window. Anthropic had to yank its latest models offline due to export controls. And the core problem everyone's wrestling with isn't raw capability anymore — it's operationalization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Bottleneck Isn't the Model
&lt;/h2&gt;

&lt;p&gt;Six months ago, the conversation was all about context windows. "If we can just fit more tokens, we solve everything." That narrative is dead.&lt;/p&gt;

&lt;p&gt;What's actually happening in production? Teams are discovering that the AI agents stack has six distinct layers between your LLM and something that actually works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tool design&lt;/strong&gt; — which APIs does the agent need? How do you abstract complexity?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — what's the agent actually doing? Where did it fail?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback patterns&lt;/strong&gt; — what happens when it confidently picks the wrong path?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State management&lt;/strong&gt; — how do you track context across multiple turns?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization&lt;/strong&gt; — not all tasks need a 1M-token model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop&lt;/strong&gt; — when does a human need to step in?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you nail layer 2 and botch layer 3, your agent is a liability. If you're obsessing over raw capability and ignoring layer 4, your multi-step workflows fail silently.&lt;/p&gt;

&lt;p&gt;The companies shipping working AI products aren't the ones chasing the biggest models. They're the ones building the best orchestration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Week Tells Us
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GLM-5.2's 1M-token context is interesting, but it's table stakes now.&lt;/strong&gt; The real question is: do you &lt;em&gt;need&lt;/em&gt; it for your use case? For most production agents, the answer is no. You need better tool design instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's export controls highlight a real business shift.&lt;/strong&gt; The frontier labs are hitting geopolitical walls that commodity models don't. This pushes more teams toward open models and multi-model strategies. Your production system can't depend on one API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apple's Siri getting left behind isn't about smarts — it's about integration.&lt;/strong&gt; Better AI assistants aren't winning because they're 2% smarter. They're winning because they're woven into workflows. That's an ops problem, not a capability problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Patterns From Successful Teams
&lt;/h2&gt;

&lt;p&gt;After watching dozens of deployments, three things keep showing up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Narrow beats general.&lt;/strong&gt; The teams getting ROI are solving specific problems, not trying to make general-purpose reasoning engines. A specialized agent for customer support that handles 80% of cases without human intervention is worth 10x more than a general-purpose agent that handles 20%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Observability is the feature.&lt;/strong&gt; Production teams spend more time debugging agent behavior than they spend implementing new features. If you can't see what your agent is doing, you can't trust it. Good observability beats good inference every single time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Redundancy is cheaper than perfection.&lt;/strong&gt; Instead of building one agent that's 99% accurate, build three agents that handle different aspects and fallback to a human when they disagree. Humans are expensive. So is being wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens Next
&lt;/h2&gt;

&lt;p&gt;The frontier labs will keep pushing capability. That's their job. But the real innovation in AI right now isn't in San Francisco research labs — it's in the boring infrastructure: better observability tools, smarter routing, cost optimization, and frameworks for building multi-model systems.&lt;/p&gt;

&lt;p&gt;If you're still waiting for the "right" model to build your AI product, you're thinking about this wrong. Start building with what exists today. The bottleneck isn't capability. It's execution.&lt;/p&gt;




&lt;p&gt;What are you building with AI right now? What's actually slowing you down — capability or execution? Drop a comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devops</category>
    </item>
    <item>
      <title>The Model Is Not the Product. Here's What Actually Is.</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Wed, 17 Jun 2026 03:31:30 +0000</pubDate>
      <link>https://dev.to/aibughunter/the-model-is-not-the-product-heres-what-actually-is-52b5</link>
      <guid>https://dev.to/aibughunter/the-model-is-not-the-product-heres-what-actually-is-52b5</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="601"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" width="420" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Two Years From Now, This Will Be the Only Skill That Matters in AI</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Mon, 15 Jun 2026 03:31:30 +0000</pubDate>
      <link>https://dev.to/aibughunter/two-years-from-now-this-will-be-the-only-skill-that-matters-in-ai-4a9f</link>
      <guid>https://dev.to/aibughunter/two-years-from-now-this-will-be-the-only-skill-that-matters-in-ai-4a9f</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1655720828018-edd2daec9349%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejzs9d23g2ljzymtazmu.gif" width="500" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What Happens When You Run 10 AI Agents at Once in a Real Codebase</title>
      <dc:creator>AI Bug Slayer 🐞</dc:creator>
      <pubDate>Mon, 15 Jun 2026 03:31:21 +0000</pubDate>
      <link>https://dev.to/aibughunter/what-happens-when-you-run-10-ai-agents-at-once-in-a-real-codebase-26ii</link>
      <guid>https://dev.to/aibughunter/what-happens-when-you-run-10-ai-agents-at-once-in-a-real-codebase-26ii</guid>
      <description>&lt;p&gt;I spend a lot of time in the AI space -- reading papers, building things, talking to engineers who are actually shipping. And there is a gap between what the demos show and what production systems actually look like that nobody is being fully honest about.&lt;/p&gt;

&lt;p&gt;So here is my honest take on where things actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With How We Talk About AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D900%26q%3D80" alt="AI Engineering" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everyone is calling everything an "agent" right now. A function that calls a tool? Agent. A chatbot with memory? Agent. A script with a loop? Agent.&lt;/p&gt;

&lt;p&gt;This dilution is not just semantic. It is causing real engineering mistakes.&lt;/p&gt;

&lt;p&gt;When you do not have a precise definition for what you are building, you end up over-engineering simple pipelines and under-engineering genuinely complex ones. I have seen teams spend weeks adding "agentic" orchestration to workflows that would have been fine as a single well-structured prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xr7n87tdzwt2mrvxhgf.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the definition I keep coming back to: an agent is a system that has an objective, not just an instruction. It decides what to do next. It handles failure. It knows when it is done.&lt;/p&gt;

&lt;p&gt;Everything else is just a fancy function call.&lt;/p&gt;

&lt;p&gt;🟢 If your system needs a human to tell it each step, it is not an agent. It is a chat interface.&lt;/p&gt;

&lt;p&gt;🔵 If your system can recover from a failed tool call and try a different approach, you are getting somewhere.&lt;/p&gt;

&lt;p&gt;✅ If your system can decompose a goal into subtasks and delegate them, that is the real thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Happening in Production Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768494-4dee2763ff3f%3Fw%3D900%26q%3D80" alt="Production AI" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The honest picture from teams I follow and talk to:&lt;/p&gt;

&lt;p&gt;Most real agent deployments are narrow. They do one thing well. Customer support triage. Document extraction. Code review on a specific codebase. They are not general-purpose reasoning engines. They are purpose-built pipelines with some intelligence in the decision layer.&lt;/p&gt;

&lt;p&gt;The teams getting good results are not chasing the latest model release. They are obsessing over:&lt;/p&gt;

&lt;p&gt;☑️ Tool design -- what can the agent actually call, and how clean is the interface&lt;/p&gt;

&lt;p&gt;☑️ Failure handling -- what happens when a tool returns nothing useful&lt;/p&gt;

&lt;p&gt;☑️ Observability -- can you trace exactly why the agent made the decision it made&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtdimquik7wkfgqudotd.gif" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The teams getting bad results are the ones that swapped out GPT-4 for the latest frontier model and expected different behavior without changing anything else.&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.&lt;/strong&gt; from VentureBeat AI. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Railway secures $100 million to challenge AWS with AI-native cloud infrastructure&lt;/strong&gt; from VentureBeat AI. Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/railway-secures-usd100-million-to-challenge-aws-with-ai-native-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Something I kept seeing pop up recently: &lt;strong&gt;Claude Code costs up to $200 a month. Goose does the same thing for free.&lt;/strong&gt; from VentureBeat AI. The artificial intelligence coding revolution comes with a catch: it's expensive.Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code a...&lt;/p&gt;

&lt;p&gt;Worth reading if you have not yet: &lt;a href="https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free" rel="noopener noreferrer"&gt;https://venturebeat.com/infrastructure/claude-code-costs-up-to-usd200-a-month-goose-does-the-same-thing-for-free&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Wars Are a Distraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1518770660439-4636190af475%3Fw%3D900%26q%3D80" alt="Tech Frameworks" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangChain. LangGraph. CrewAI. AutoGen. Semantic Kernel. Every month there is a new one and someone is writing a post about why the old one is dead.&lt;/p&gt;

&lt;p&gt;Here is what I actually think: the framework matters less than the patterns.&lt;/p&gt;

&lt;p&gt;The patterns that keep working regardless of what framework you use:&lt;/p&gt;

&lt;p&gt;✔️ Plan-then-execute. Have one reasoning step that produces a plan, and a separate execution step that follows it. Do not mix them.&lt;/p&gt;

&lt;p&gt;✔️ Separate retrieval from reasoning. Fetching context and using context are different jobs. Systems that conflate them get confused.&lt;/p&gt;

&lt;p&gt;✔️ Explicit handoffs. When one agent passes work to another, the handoff should be structured and logged. Not a string passed through a prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32l90ioevp981uk8bn7g.gif" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have rebuilt the same architecture in three different frameworks and the results were similar each time. The framework is scaffolding. The architecture is the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Retrieval Problem Nobody Has Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D900%26q%3D80" alt="Data Retrieval" width="900" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RAG is standard now. Almost every production AI system that touches proprietary data uses some form of it. But there is a problem that the tutorials do not cover well.&lt;/p&gt;

&lt;p&gt;The chunk boundaries are wrong.&lt;/p&gt;

&lt;p&gt;When you split a document into chunks and embed them, you are making assumptions about what pieces of context belong together. Those assumptions are often wrong. A paragraph that only makes sense in light of the paragraph before it gets retrieved in isolation and the model hallucinates the missing context.&lt;/p&gt;

&lt;p&gt;🟢 Better chunking strategies help. Overlapping windows, semantic chunking, parent-document retrieval.&lt;/p&gt;

&lt;p&gt;🔵 But the real fix is rethinking what you are storing. Sometimes the right thing to store is not the raw text but a structured representation of the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70uarudhlrwumofc6v2y.gif" width="400" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ If your RAG pipeline is returning technically correct but contextually useless results, the problem is almost certainly in the chunking or the metadata, not the embedding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Think This Is All Going
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531746790731-6c087fecd65a%3Fw%3D900%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531746790731-6c087fecd65a%3Fw%3D900%26q%3D80" alt="Future Technology" width="900" height="648"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The models are going to keep getting better. Context windows are going to keep expanding. The cost per token is going to keep dropping.&lt;/p&gt;

&lt;p&gt;None of that changes the fundamental engineering challenge: building systems you can trust to behave correctly when you are not watching.&lt;/p&gt;

&lt;p&gt;That is the problem worth solving. Governance, observability, and reliable tool use. Not chasing benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feacxowozsf51di9l80c9.gif" width="420" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineers who are going to matter in two years are the ones who can build AI systems that other engineers can maintain and trust. That is a different skill set than fine-tuning or prompt engineering.&lt;/p&gt;

&lt;p&gt;It is closer to systems design than it is to model research.&lt;/p&gt;




&lt;p&gt;If any of this resonates with what you are building, or if you have a completely different take, I want to hear it. Drop your experience in the comments. The interesting conversations in this space are not in the keynotes -- they are in the threads where people are actually honest about what works.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
