<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gilad Salinger</title>
    <description>The latest articles on DEV Community by Gilad Salinger (@giladsalinger).</description>
    <link>https://dev.to/giladsalinger</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940324%2F7a5897aa-cf97-4161-bc51-a9b6bedacbd4.JPG</url>
      <title>DEV Community: Gilad Salinger</title>
      <link>https://dev.to/giladsalinger</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/giladsalinger"/>
    <language>en</language>
    <item>
      <title>Why RAG Fails in Enterprise R&amp;D (And What Actually Works)</title>
      <dc:creator>Gilad Salinger</dc:creator>
      <pubDate>Tue, 19 May 2026 15:03:50 +0000</pubDate>
      <link>https://dev.to/giladsalinger/why-rag-fails-in-enterprise-rd-and-what-actually-works-5d7o</link>
      <guid>https://dev.to/giladsalinger/why-rag-fails-in-enterprise-rd-and-what-actually-works-5d7o</guid>
      <description>&lt;h1&gt;
  
  
  Why RAG Fails in Enterprise R&amp;amp;D (And What Actually Works)
&lt;/h1&gt;

&lt;p&gt;RAG was a breakthrough. Embedding documents into vectors, retrieving the most similar chunks at query time, and feeding them to an LLM — it gave models access to external knowledge for the first time. For a customer support bot searching a knowledge base, it's genuinely effective.&lt;/p&gt;

&lt;p&gt;But when you deploy RAG into an enterprise R&amp;amp;D environment — with 1,000+ engineers, dozens of interconnected systems, and AI agents that need to &lt;em&gt;take action&lt;/em&gt;, not just answer questions — it falls apart in predictable ways.&lt;/p&gt;

&lt;p&gt;I'm Gilad Salinger, CEO of &lt;a href="https://www.naboo.ai" rel="noopener noreferrer"&gt;Naboo&lt;/a&gt;. We build the context layer that replaces RAG for enterprise AI agents. After deploying in production at companies like Global-E (NASDAQ: GLBE) and Melio, I want to share the five specific failure modes we kept seeing — and the architectural approach that fixes them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: What Enterprise R&amp;amp;D Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;A typical enterprise engineering organization has context spread across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code repositories&lt;/strong&gt; (GitHub, GitLab, Bitbucket) — often 50+ repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project management&lt;/strong&gt; (Jira, Linear, Asana) — thousands of tickets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; (Confluence, Notion) — much of it outdated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication&lt;/strong&gt; (Slack, Teams) — where real decisions happen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; (Datadog, Splunk, PagerDuty) — production state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD&lt;/strong&gt; (Jenkins, GitHub Actions) — deployment history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a developer asks an AI coding agent "help me with this ticket," the agent needs to pull context from &lt;em&gt;all&lt;/em&gt; of these systems, understand the relationships between them, and filter by what the developer is allowed to see.&lt;/p&gt;

&lt;p&gt;RAG can't do this. Here's why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Mode 1: Context Fragmentation
&lt;/h2&gt;

&lt;p&gt;RAG indexes each data source independently. You get a vector store for your code, another for your Confluence docs, another for Slack messages. But enterprise context is &lt;em&gt;relational&lt;/em&gt;. A Jira ticket is meaningless without the code it references, the PR that implements it, and the Slack thread where the team discussed why they chose that approach.&lt;/p&gt;

&lt;p&gt;When an agent retrieves the 10 most similar chunks from each source, it gets 30 disconnected fragments. The LLM has to guess how they relate. In our benchmarks, this guessing is where most accuracy loss occurs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works instead:&lt;/strong&gt; Build a cross-system understanding that maps dependencies, ownership, and decision trails across all sources. When the agent queries for context, it gets a coherent package — not scattered chunks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Mode 2: No Intent Understanding
&lt;/h2&gt;

&lt;p&gt;RAG retrieves based on text similarity. "Fix the authentication bug" and "review the authentication module" would retrieve nearly identical chunks. But these tasks need completely different context.&lt;/p&gt;

&lt;p&gt;Fixing a bug requires: the specific error, recent changes to the auth flow, the PR that introduced the regression, relevant test failures. Reviewing a module requires: architectural overview, code ownership, tech debt history, related design documents.&lt;/p&gt;

&lt;p&gt;RAG treats both the same because it only understands text distance, not task semantics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works instead:&lt;/strong&gt; Calculate what context is needed based on the task type, current system state, and the user's role. Intent-aware retrieval, not similarity-based retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Mode 3: Stale Context
&lt;/h2&gt;

&lt;p&gt;Enterprise codebases change constantly. A PR merged 2 hours ago might change the correct approach to a task entirely. But most RAG systems re-index on a schedule — daily, sometimes weekly.&lt;/p&gt;

&lt;p&gt;We had a case where a developer asked an AI agent for help refactoring a module. The agent suggested an approach based on the old architecture because the RAG index hadn't caught the PR that changed the module's interface the previous day. The developer spent 3 hours on a dead-end approach before realizing the context was stale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works instead:&lt;/strong&gt; Continuous ingestion that updates the context model in real-time as commits, messages, tickets, and deployments happen. No batch re-indexing delays.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Mode 4: Security as an Afterthought
&lt;/h2&gt;

&lt;p&gt;This one is critical for enterprise. Most RAG implementations index everything into a single vector store, then try to filter results after retrieval. "Post-hoc RBAC."&lt;/p&gt;

&lt;p&gt;The problem: vector similarity search doesn't natively support access controls. If a junior developer's query is semantically similar to a document they shouldn't see, the vector DB returns it. The filtering layer has to catch it. And filtering layers have gaps.&lt;/p&gt;

&lt;p&gt;In defense, financial services, and healthcare organizations — where a single data access violation can mean regulatory penalties — this architecture is a non-starter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works instead:&lt;/strong&gt; Native RBAC that enforces permissions at retrieval time, not post-retrieval. The context layer inherits permissions from your existing tools (GitHub org roles, Jira project permissions, Confluence space restrictions) and only surfaces context the user is authorized to see.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Mode 5: Token Waste
&lt;/h2&gt;

&lt;p&gt;RAG retrieves by similarity, which means many returned chunks are tangentially relevant at best. In our analysis of production RAG deployments, roughly 60-70% of retrieved chunks contributed nothing to the agent's output. They just consumed tokens.&lt;/p&gt;

&lt;p&gt;This matters for three reasons: cost (tokens aren't free at enterprise scale), latency (more tokens = slower responses), and quality (noise in the context window degrades LLM output quality). The "lost in the middle" problem is well-documented — LLMs pay less attention to information in the middle of long contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works instead:&lt;/strong&gt; Deliver only the context the agent needs for the specific task. In our benchmarks, this means 90% fewer tokens with 97% higher accuracy. Less is more when the "less" is precisely targeted.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Works: Context Layers
&lt;/h2&gt;

&lt;p&gt;The pattern we landed on — and that we're seeing the industry converge toward — is what we call a &lt;strong&gt;context layer&lt;/strong&gt;. It sits between your data sources and your LLM/agent framework.&lt;/p&gt;

&lt;p&gt;Instead of: &lt;code&gt;Query → Embed → Vector search → Top-K chunks → LLM&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It's: &lt;code&gt;Query → Intent calculation → Cross-system context assembly → RBAC filtering → Execution-ready context → LLM&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The key differences:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cross-system understanding&lt;/strong&gt; instead of per-source indexing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent-aware retrieval&lt;/strong&gt; instead of similarity-based retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous ingestion&lt;/strong&gt; instead of batch re-indexing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native RBAC&lt;/strong&gt; instead of post-hoc filtering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precise context packages&lt;/strong&gt; instead of top-K similar chunks&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;We ran benchmarks using LLM-as-a-judge evaluation across production enterprise environments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;RAG (baseline)&lt;/th&gt;
&lt;th&gt;Context Layer&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Response accuracy&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;+97%&lt;/td&gt;
&lt;td&gt;Significant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token consumption&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;-90%&lt;/td&gt;
&lt;td&gt;10x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response latency&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;10x faster&lt;/td&gt;
&lt;td&gt;Faster context assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The accuracy improvement comes primarily from the intent-aware retrieval and cross-system relationship mapping. The token reduction comes from delivering only relevant context instead of similarity-matched chunks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When RAG Is Still Fine
&lt;/h2&gt;

&lt;p&gt;To be clear: RAG isn't wrong, it's scoped. If you're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A customer support bot searching a knowledge base&lt;/li&gt;
&lt;li&gt;A research assistant querying a document corpus&lt;/li&gt;
&lt;li&gt;A chatbot for a small team with a single repo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG works. The failure modes above only manifest at enterprise scale — multiple systems, complex permissions, AI agents that need to execute (not just answer), and accuracy requirements where "approximately right" isn't good enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Try
&lt;/h2&gt;

&lt;p&gt;If you're hitting these failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audit your current RAG pipeline for the five failures above. Most enterprise teams are hitting at least 3.&lt;/li&gt;
&lt;li&gt;Look at the context layer pattern — whether you build it yourself or use an existing implementation.&lt;/li&gt;
&lt;li&gt;Start measuring accuracy with LLM-as-judge, not vibes. The gap between RAG and intent-aware context is only visible when you measure properly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We've open-sourced our benchmark methodology. If you want to run it against your own data, reach out at &lt;a href="https://www.naboo.ai" rel="noopener noreferrer"&gt;naboo.ai&lt;/a&gt; or &lt;a href="https://calendly.com/gilad-33" rel="noopener noreferrer"&gt;book a technical demo&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Gilad Salinger is CEO &amp;amp; Co-Founder of &lt;a href="https://www.naboo.ai" rel="noopener noreferrer"&gt;Naboo&lt;/a&gt;, the enterprise context layer for AI agents. Previously founded and scaled a developer tools company. Naboo is backed by Cardumen Capital and 91 Ventures, and is deployed in production at Global-E (NASDAQ: GLBE), Melio, and other enterprise R&amp;amp;D organizations.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>enterprise</category>
    </item>
  </channel>
</rss>
