<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: dnyandeo bharambe</title>
    <description>The latest articles on DEV Community by dnyandeo bharambe (@dnyandeo).</description>
    <link>https://dev.to/dnyandeo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3957330%2Fc2d5d5df-e458-4f24-b141-328cb2c3e1f0.png</url>
      <title>DEV Community: dnyandeo bharambe</title>
      <link>https://dev.to/dnyandeo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dnyandeo"/>
    <language>en</language>
    <item>
      <title>Why I chose MCP over RAG for live infrastructure auditing</title>
      <dc:creator>dnyandeo bharambe</dc:creator>
      <pubDate>Thu, 28 May 2026 22:41:53 +0000</pubDate>
      <link>https://dev.to/dnyandeo/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing-1ce8</link>
      <guid>https://dev.to/dnyandeo/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing-1ce8</guid>
      <description>&lt;p&gt;I've been working on a project to audit distributed hardware infrastructure — devices &lt;br&gt;
spread across multiple sites, each running firmware that needs to stay compliant with a &lt;br&gt;
central policy. Pretty standard enterprise ops problem. &lt;br&gt;
My first instinct was RAG. Everyone reaches for RAG. You embed your documents, &lt;br&gt;
stand up a vector store, and your agent can reason over your data. I've built RAG &lt;br&gt;
pipelines before, they work well, so I started there. &lt;br&gt;
Three days in, I switched direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment I realized RAG wasn't the right fit
&lt;/h2&gt;

&lt;p&gt;I was testing the agent against a scenario where a device had failed a firmware check at &lt;br&gt;
2am. The agent reported it as compliant. &lt;br&gt;
The problem wasn't the model. The problem was that the data the agent was reasoning &lt;br&gt;
over was from an embedded snapshot I'd generated two days earlier. The device had &lt;br&gt;
drifted since then. The vector store didn't know — it can't know. It's a snapshot by &lt;br&gt;
design. &lt;br&gt;
That works fine for a documentation assistant. For infrastructure audit it's a problem, &lt;br&gt;
because you need to know what's happening now, not what was true when you last ran &lt;br&gt;
the embedding pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I needed wasn't retrieval — it was access
&lt;/h2&gt;

&lt;p&gt;Here's the reframe that changed how I thought about this. &lt;br&gt;
RAG answers the question: what documents are relevant to this query? &lt;br&gt;
What I actually needed to answer was: what is the current state of device X right now? &lt;br&gt;
Those are different questions. One is a search problem. The other is a database query. I &lt;br&gt;
was using the wrong tool. &lt;br&gt;
The inventory — firmware versions, device health, site assignments — lives in a SQLite &lt;br&gt;
database. The compliance policy lives in a structured text file. Neither of these is a &lt;br&gt;
document in any meaningful sense. Chunking them and embedding them into a vector &lt;br&gt;
store was me forcing square data into a round hole because that's what I knew how to do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfmk9zlzvta2wcnty185.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfmk9zlzvta2wcnty185.png" alt="Figure 1 — RAG vs MCP: why retrieval falls short for live infrastructure data " width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;server that exposes it as tools the agent can call: &lt;br&gt;
• get_inventory() — returns live device state, current to the second &lt;br&gt;
• query_policy() — reads the policy file and returns the requirements &lt;br&gt;
• flag_violation() — marks a device non-compliant with structured metadata &lt;br&gt;
The agent calls these the same way your application code calls an API. No embedding &lt;br&gt;
pipeline. No staleness problem. No guessing at similarity scores for what is &lt;br&gt;
fundamentally a structured query. &lt;/p&gt;

&lt;h2&gt;
  
  
  The gateway nobody talks about
&lt;/h2&gt;

&lt;p&gt;One thing I'd push back on in most agent tutorials — they wire the LLM directly to the &lt;br&gt;
frontend and call it done. &lt;br&gt;
I put a FastAPI gateway in between, and I'd do it again every time. &lt;br&gt;
The practical reason: NVIDIA NIM credits aren't free. A misconfigured client or a &lt;br&gt;
runaway loop can drain your quota in minutes if there's nothing between the UI and the &lt;br&gt;
model. The gateway enforces rate limits per IP before a single token is generated. &lt;br&gt;
Saved me actual money during development. &lt;br&gt;
The better reason: not every query needs the full audit agent. Simple questions — how &lt;br&gt;
many nodes are in Bellevue? — don't need a multi-step LangGraph agent burning &lt;br&gt;
Gemini 2.5 tokens. The gateway classifies intent and routes accordingly. Simple queries &lt;br&gt;
go to a lighter NIM worker. Full compliance audits go to the Gemini agent. &lt;br&gt;
It also centralises auth and logging in one place, which matters when you need to show &lt;br&gt;
a security team exactly what the agent did and when. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazh8ak0xkgcnanbg6oto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazh8ak0xkgcnanbg6oto.png" alt="Figure 2 — Full system architecture: gateway, dual-model routing, MCP sensor layer, and LLM Judge" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Judge
&lt;/h2&gt;

&lt;p&gt;This is the piece I'm most glad I built, and the one I almost skipped. &lt;br&gt;
Every response — whether it came from the NIM worker or the Gemini agent — passes &lt;br&gt;
through a secondary LLM before it reaches the user. I call it the Judge. Its only job is to &lt;br&gt;
read the agent's output, check it independently against the policy file, and decide &lt;br&gt;
whether the reasoning holds up. &lt;br&gt;
During testing, the Judge caught something the main agent missed. The agent had &lt;br&gt;
correctly identified a non-compliant firmware version, but applied a remediation rule that &lt;br&gt;
belonged to a different device category. The logic was sound — it just used the wrong &lt;br&gt;
rule. The Judge caught it because it reads the policy independently, without inheriting &lt;br&gt;
whatever context the main agent had accumulated during its reasoning loop. &lt;br&gt;
That independence is the point. If the Judge just re-reads the agent's own context, it's &lt;br&gt;
not really checking anything. You want it reading from the source, fresh. &lt;/p&gt;

&lt;h2&gt;
  
  
  Humans stay in the loop
&lt;/h2&gt;

&lt;p&gt;The agent can suggest remediation — here's the CLI command to fix the firmware drift &lt;br&gt;
on node 7. It cannot run it. &lt;br&gt;
There's a hard gate in the LangGraph state machine. Suggest remediation and execute &lt;br&gt;
remediation are separate nodes, and the only path between them runs through a human &lt;br&gt;
decision in the UI. An architect clicks Approve. Then and only then does the write &lt;br&gt;
operation touch the database. &lt;br&gt;
For infrastructure this felt like the right call. The cost of a false positive — a remediation &lt;br&gt;
that runs when it shouldn't — is much higher than the cost of an extra approval click. &lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;Two things. &lt;br&gt;
I'd instrument RAGAS metrics from day one. I ended up retrofitting evaluation on the &lt;br&gt;
agent's audit outputs and found gaps I'd been manually poking at for weeks. &lt;br&gt;
Faithfulness and context relevancy scores would have surfaced those faster. &lt;br&gt;
And I'd write the red-team report in parallel, not after. I know what failure modes the &lt;br&gt;
Judge catches now, but I reconstructed most of that knowledge from memory rather &lt;br&gt;
than documenting it as I found it. A live failure log from the start would've made that &lt;br&gt;
report much sharper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;RAG is the right tool for knowledge retrieval over static content. It's a less natural fit &lt;br&gt;
when your agent needs to query live structured data and act on what it finds. &lt;br&gt;
MCP let me give the agent real database access through a typed tool interface — no &lt;br&gt;
embedding pipeline, no staleness, no similarity search on what is fundamentally a &lt;br&gt;
relational query. For infrastructure audit, that was the right call. &lt;br&gt;
Code is on GitHub if you want to dig into the architecture. Happy to go deeper on the &lt;br&gt;
LangGraph state machine or the Judge design in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>mcp</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
