<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mobasshir Khan</title>
    <description>The latest articles on DEV Community by Mobasshir Khan (@mobasshir_khan_eaf8ec5cf3).</description>
    <link>https://dev.to/mobasshir_khan_eaf8ec5cf3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976515%2F8af0e596-029b-458f-839b-3289150631c4.jpg</url>
      <title>DEV Community: Mobasshir Khan</title>
      <link>https://dev.to/mobasshir_khan_eaf8ec5cf3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mobasshir_khan_eaf8ec5cf3"/>
    <language>en</language>
    <item>
      <title>I Rebuilt My RAG Pipeline From Scratch. Here's What Actually Made It Better.</title>
      <dc:creator>Mobasshir Khan</dc:creator>
      <pubDate>Mon, 15 Jun 2026 10:40:11 +0000</pubDate>
      <link>https://dev.to/mobasshir_khan_eaf8ec5cf3/i-rebuilt-my-rag-pipeline-from-scratch-heres-what-actually-made-it-better-4gip</link>
      <guid>https://dev.to/mobasshir_khan_eaf8ec5cf3/i-rebuilt-my-rag-pipeline-from-scratch-heres-what-actually-made-it-better-4gip</guid>
      <description>&lt;h3&gt;
  
  
  Spoiler: it wasn't a bigger model, a better embedding, or a longer prompt.
&lt;/h3&gt;

&lt;p&gt;When I first built my RAG pipeline, it was about as simple as it gets.&lt;/p&gt;

&lt;p&gt;Take a topic, fetch some relevant text, feed it to the model, generate an answer. Classic RAG. The kind of thing every tutorial walks you through in twenty minutes.&lt;/p&gt;

&lt;p&gt;And honestly? It worked. For a while.&lt;/p&gt;

&lt;p&gt;But I wasn't building a generic Q&amp;amp;A bot. I was building a &lt;strong&gt;debate learning system&lt;/strong&gt;, and that's where the cracks started showing fast.&lt;/p&gt;

&lt;p&gt;The output felt generic. The sources didn't match what each part of the lesson actually needed. And the system had no idea that "background knowledge," "debate framing," "rebuttal material," and "vocabulary support" are not the same thing.&lt;/p&gt;

&lt;p&gt;It was retrieving text. It just wasn't &lt;em&gt;understanding&lt;/em&gt; what kind of help each piece of the pipeline needed.&lt;/p&gt;

&lt;p&gt;So I tore it apart and rebuilt it.&lt;/p&gt;

&lt;p&gt;What came out the other side wasn't just "better RAG." It was a layered retrieval architecture that plans its own queries, routes them by intent, ranks and packs context intelligently, remembers what worked before, and evaluates itself against real lessons.&lt;/p&gt;

&lt;p&gt;Here's how I got there, and what I'd tell anyone trying to do the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With "Embed → Retrieve → Pray"
&lt;/h2&gt;

&lt;p&gt;Most basic RAG systems follow the same formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embed the query → retrieve top chunks → pass them to the model → hope the answer improves
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For simple document Q&amp;amp;A, this is fine. It's even good.&lt;/p&gt;

&lt;p&gt;But for debate learning, it falls apart, because different parts of the output need fundamentally different kinds of evidence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-knowledge&lt;/strong&gt; needs definitions and background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debate building&lt;/strong&gt; needs mechanisms, clashes, and framing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary&lt;/strong&gt; needs words that actually fit the topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language support&lt;/strong&gt; needs words that reinforce the lesson itself&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coaching&lt;/strong&gt; needs structured, distilled explanations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single chunk-search pipeline has no concept of any of this. It doesn't know the difference between a strong debate example and a generic background article. The result is noisy retrieval, repetitive patterns, and weaker final output, no matter how good your embeddings are.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Shift Wasn't Technical. It Was Mental.
&lt;/h2&gt;

&lt;p&gt;Here's the thing that changed everything for me, and it had nothing to do with code at first.&lt;/p&gt;

&lt;p&gt;I stopped thinking of retrieval as &lt;strong&gt;"find text"&lt;/strong&gt; and started thinking of it as &lt;strong&gt;"make decisions about evidence."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That single reframe pushed me toward a completely different architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;topic → plan → route → preselect → retrieve → rerank → pack → teach → evaluate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Suddenly retrieval wasn't a single step anymore. It was a pipeline of decisions, each one with a clear job.&lt;/p&gt;

&lt;p&gt;Here's the high-level flow I ended up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A["Topic / Node Intent"] --&amp;gt; B["Query Planner"]
    B --&amp;gt; C["Intent Router"]
    C --&amp;gt; D["Document Preselection"]
    D --&amp;gt; E["Chunk Retrieval"]
    E --&amp;gt; F["Reranking"]
    F --&amp;gt; G["Context Packing"]
    G --&amp;gt; H["Structured Evidence Lanes"]
    H --&amp;gt; I["Downstream Lesson Nodes"]
    I --&amp;gt; J["Trace Logging / Memory"]
    J --&amp;gt; K["Real-Trace Evaluation"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break down what each of these pieces actually does, and why it mattered.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Query Planning, Per Node
&lt;/h2&gt;

&lt;p&gt;Not every node in the pipeline needs the same kind of evidence. A pre-knowledge node and an argument-generation node should never be searching the same way.&lt;/p&gt;

&lt;p&gt;So I added a query planner that expands the raw topic into something far more useful for retrieval. Instead of handing the system a bare topic like &lt;em&gt;"feminism"&lt;/em&gt; or &lt;em&gt;"international relations,"&lt;/em&gt; the planner turns it into structured search intent: expanded terms, subqueries, and source preferences tailored to the node that's asking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_query_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expanded_terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subqueries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This alone improved recall noticeably. The system finally started asking the &lt;em&gt;right&lt;/em&gt; question before it even searched.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Intent-Aware Routing
&lt;/h2&gt;

&lt;p&gt;Once the query is planned, the router decides what &lt;em&gt;type&lt;/em&gt; of evidence the node actually needs: definitions, mechanisms, examples, clash material, style cues, or vocabulary support.&lt;/p&gt;

&lt;p&gt;This sounds small, but it fixes one of the most common mistakes in basic RAG: treating every retrieval need as identical. They're not, and pretending otherwise is where a lot of quality gets lost.&lt;/p&gt;

&lt;p&gt;Routing also made the whole system more inspectable. Every retrieval path now has a clear, debuggable purpose, which made every later improvement easier to reason about.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Hierarchical Retrieval (Stop Searching Everything, Every Time)
&lt;/h2&gt;

&lt;p&gt;This was one of the biggest quality jumps in the whole rebuild.&lt;/p&gt;

&lt;p&gt;Instead of searching every chunk across the entire corpus for every query, the system now first identifies the most likely &lt;em&gt;documents&lt;/em&gt;, and only then searches inside them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;before:  topic → search all chunks
after:   topic → choose good documents → search chunks inside them
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is bigger than it sounds. Retrieval no longer wastes effort scanning irrelevant corners of the corpus, and chunk search starts from a much stronger position every single time.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Reranking and Context Packing
&lt;/h2&gt;

&lt;p&gt;Relevant isn't the same as useful.&lt;/p&gt;

&lt;p&gt;I added a reranking stage that reorders retrieved evidence based on usefulness, source class, and role, then a context packer that arranges the final selection in a way the model can actually reason with.&lt;/p&gt;

&lt;p&gt;This is easy to underestimate, but a good retrieval system isn't just about &lt;em&gt;finding&lt;/em&gt; the right information. It's about &lt;em&gt;presenting&lt;/em&gt; it in a shape the model can use well.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Structured Evidence Lanes
&lt;/h2&gt;

&lt;p&gt;Instead of dumping every retrieved chunk into one giant context block, the system now separates evidence into distinct lanes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Definitions&lt;/li&gt;
&lt;li&gt;Mechanisms&lt;/li&gt;
&lt;li&gt;Examples&lt;/li&gt;
&lt;li&gt;Debate framing&lt;/li&gt;
&lt;li&gt;Vocabulary&lt;/li&gt;
&lt;li&gt;Style and coaching notes
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw evidence → structured evidence lanes → better lesson output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each downstream node reads only the lane it actually needs, instead of wading through everything else. The result is output that feels noticeably more coherent and "crafted," rather than a wall of loosely related text.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Smarter YouTube Ingestion
&lt;/h2&gt;

&lt;p&gt;A lot of the best debate and explanation content lives on YouTube, but blindly ingesting every transcript is a recipe for noise.&lt;/p&gt;

&lt;p&gt;So ingestion now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scans channel inventory&lt;/li&gt;
&lt;li&gt;Caches metadata locally&lt;/li&gt;
&lt;li&gt;Scores relevance from title and description&lt;/li&gt;
&lt;li&gt;Uses thumbnail text as a ranking signal&lt;/li&gt;
&lt;li&gt;Fetches transcripts only for videos worth fetching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Channels often hide their best material in the title, thumbnail, or a short description rather than the transcript itself. Making ingestion selective meant the pipeline got both faster &lt;em&gt;and&lt;/em&gt; more useful at the same time, which doesn't happen often.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Retrieval Memory
&lt;/h2&gt;

&lt;p&gt;This is the upgrade that turned the system from a one-off lookup tool into something that actually improves over time.&lt;/p&gt;

&lt;p&gt;The pipeline now remembers which lessons, sources, and retrieval choices worked well in the past, and reuses those patterns instead of repeating the same weak sources again and again.&lt;/p&gt;

&lt;p&gt;Of everything I built, this is the idea I'd point to as the most quietly powerful.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Evaluation Against Real Lessons
&lt;/h2&gt;

&lt;p&gt;I didn't want to rely on "it feels more advanced now," so I added evaluation: both synthetic retrieval checks and real trace-based scoring from saved lessons.&lt;/p&gt;

&lt;p&gt;That gave me a way to actually measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the right evidence lanes are being populated&lt;/li&gt;
&lt;li&gt;Whether the retrieval plan makes sense for the node&lt;/li&gt;
&lt;li&gt;Whether real lessons are improving over time&lt;/li&gt;
&lt;li&gt;Whether source reuse is getting smarter or going stale
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Does the lesson contain the right evidence lanes?
- Are the best sources showing up consistently?
- Is the output more specific than before?
- Are we repeating weak material too often?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this step, it's incredibly easy to convince yourself a system is better just because it &lt;em&gt;looks&lt;/em&gt; more sophisticated. Evaluation is what turned "I think this is better" into "I can show you it's better."&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Changed
&lt;/h2&gt;

&lt;p&gt;After all of this, the difference wasn't subtle:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better relevance&lt;/strong&gt; — retrieval results now match each node's actual purpose, not a generic average.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better structure&lt;/strong&gt; — output is split into useful, labeled sections instead of one long blob of text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better efficiency&lt;/strong&gt; — far less effort wasted on low-value sources and weak chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better debate usefulness&lt;/strong&gt; — the system surfaces material that actually helps build arguments, not just background reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better consistency&lt;/strong&gt; — because the architecture is layered, I can tune one piece without breaking everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better learning value&lt;/strong&gt; — the final output now contains pre-knowledge, debate angles, vocabulary, and coaching as distinct, usable layers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture, in One Line
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;topic → planner → router → document selection → rerank → pack → evidence lanes → lesson output → trace evaluation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step in that line exists for a reason. Not because it looks advanced on a diagram, but because debate learning was never a single retrieval problem to begin with. It's a retrieval &lt;em&gt;orchestration&lt;/em&gt; problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Specifically for Debate Learning
&lt;/h2&gt;

&lt;p&gt;A good debate learner doesn't just need articles. It needs background, concepts, argument structure, rebuttal logic, examples, vocabulary, coaching, repetition control, and topic freshness, all at once, all tailored to the same lesson.&lt;/p&gt;

&lt;p&gt;That's a much richer problem than "answer the question." Advanced RAG gave me a way to support all of those layers without the output collapsing into chaos.&lt;/p&gt;

&lt;p&gt;It also opened the door to real personalization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vocabulary pulled from the lesson itself&lt;/li&gt;
&lt;li&gt;Debate framing tied directly to the chosen article&lt;/li&gt;
&lt;li&gt;Pre-knowledge sourced from curated material&lt;/li&gt;
&lt;li&gt;Coaching grounded in the evidence that was actually retrieved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's what makes the difference between a tool that &lt;em&gt;answers&lt;/em&gt; and a tool that actually &lt;em&gt;teaches&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Biggest Lesson
&lt;/h2&gt;

&lt;p&gt;If there's one thing I'd want someone to take from all of this, it's this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Advanced RAG isn't about adding more model calls. It's about making retrieval more deliberate.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's tempting to think RAG quality comes from bigger embeddings, bigger models, or just more text in the context window. Those things can help. They're not the main answer.&lt;/p&gt;

&lt;p&gt;The real gains came from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Planning better queries&lt;/li&gt;
&lt;li&gt;Routing by intent&lt;/li&gt;
&lt;li&gt;Selecting better documents first&lt;/li&gt;
&lt;li&gt;Reranking evidence&lt;/li&gt;
&lt;li&gt;Packing context intelligently&lt;/li&gt;
&lt;li&gt;Separating source roles&lt;/li&gt;
&lt;li&gt;Remembering what worked&lt;/li&gt;
&lt;li&gt;Evaluating on real traces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the list that actually made the system feel "advanced," not the model behind it.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You're Building a Retrieval Pipeline
&lt;/h2&gt;

&lt;p&gt;Don't stop at chunk search. That's the beginning, not the destination.&lt;/p&gt;

&lt;p&gt;Once you start thinking in terms of intent, routing, document-level selection, evidence role separation, memory, and evaluation, RAG stops being a clever prompt trick and starts becoming a real system, one that can be debugged, tuned, and trusted.&lt;/p&gt;

&lt;p&gt;In my case, that shift was big enough to materially change both efficiency and output quality. If you're stuck wondering why your RAG pipeline "works but feels generic," there's a good chance the answer isn't your model.&lt;/p&gt;

&lt;p&gt;It's your architecture.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this was useful, I'd genuinely love to hear what you've tried in your own RAG pipelines, especially if you've found a different lever that moved the needle for you. Drop a comment, I read every one.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I built an npm package that makes AI follow my own architecture docs</title>
      <dc:creator>Mobasshir Khan</dc:creator>
      <pubDate>Tue, 09 Jun 2026 19:50:19 +0000</pubDate>
      <link>https://dev.to/mobasshir_khan_eaf8ec5cf3/i-built-an-npm-package-that-makes-ai-follow-my-own-architecture-docs-1pj6</link>
      <guid>https://dev.to/mobasshir_khan_eaf8ec5cf3/i-built-an-npm-package-that-makes-ai-follow-my-own-architecture-docs-1pj6</guid>
      <description>&lt;p&gt;I keep writing architecture docs for my projects. I keep letting AI write code that ignores them. Then I commit that code anyway, because I'm in flow, and the diff looks fine on the surface.&lt;br&gt;
A week later something breaks in a way that would never have happened if I'd just followed my own rules.&lt;/p&gt;

&lt;p&gt;So I built DocGuard — a small npm package that reads your markdown docs, looks at your staged code, and flags anything that contradicts what you've written. It quotes the exact doc line it found the violation against. No SaaS, no telemetry, runs locally, free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's the moment that convinced me to ship it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The catch&lt;br&gt;
I'm building a small React app called Today's Price — fetches commodity, crypto, and metal prices. Standard layered architecture: components → hooks → services → external APIs. The rules I'd written in docs/architecture.md:&lt;/p&gt;

&lt;p&gt;All external API calls MUST go through a file in src/services/.&lt;br&gt;
Components and hooks must never call fetch or any HTTP client directly.&lt;br&gt;
Components are presentational only — they must never call services directly.&lt;br&gt;
A component that imports anything from src/services/ is a violation.&lt;br&gt;
Hard-coding any string that looks like an API key in source is forbidden.&lt;br&gt;
I asked Claude to "add a news ticker widget for the homepage." It wrote a clean-looking NewsTicker.jsx that compiled, rendered, and worked:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;useEffect(() =&amp;gt; {&lt;br&gt;
  async function load() {&lt;br&gt;
    try {&lt;br&gt;
      const res = await fetch(&lt;br&gt;
        "https://newsapi.org/v2/top-headlines?category=business&amp;amp;apiKey=pk_live_a9b8c7d6e5f4g3h2"&lt;br&gt;
      );&lt;br&gt;
      const data = await res.json();&lt;br&gt;
      setHeadlines(data.articles || []);&lt;br&gt;
    } catch (err) {&lt;br&gt;
      console.error("news fetch failed", err);&lt;br&gt;
    }&lt;br&gt;
  }&lt;br&gt;
  load();&lt;br&gt;
}, []);&lt;/code&gt;&lt;br&gt;
Three rules violated in twenty lines. Component calls fetch directly. No service layer. Hard-coded API key in source. Try/catch in the wrong layer. The code worked. I would have committed it.&lt;/p&gt;

&lt;p&gt;The pre-commit hook fired during git commit:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0evpkyk34zn12wylucf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0evpkyk34zn12wylucf4.png" alt=" " width="799" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;WARN  src/components/NewsTicker.jsx:14  [api-contract]&lt;br&gt;
  Component makes a direct API call&lt;br&gt;
  Cited: docs/architecture.md:6&lt;br&gt;
    "All external API calls MUST go through a file in &lt;code&gt;src/services/&lt;/code&gt;"&lt;/p&gt;

&lt;p&gt;WARN  src/components/NewsTicker.jsx:14  [api-contract]&lt;br&gt;
  Component makes a direct API call&lt;br&gt;
  Cited: docs/architecture.md:12&lt;br&gt;
    "Components are presentational only — they must never call services directly"&lt;/p&gt;

&lt;p&gt;Summary: 0 error(s), 2 warning(s).&lt;/p&gt;

&lt;p&gt;Two warnings, each pointing at the exact doc line I had violated, with the rule quoted verbatim from my own file.&lt;/p&gt;

&lt;p&gt;The commit went through — I had api-contract set to warn, not block. But that's the point: the warnings caught my eye in a way that a passing build never would have. I deleted the file and re-prompted the AI: "use the existing service pattern." That second version was correct.&lt;/p&gt;

&lt;p&gt;If you want hard enforcement, one config tweak in .docguard.json turns warnings into blocks per category. I keep most categories at warn because trust is earned, not assumed.&lt;/p&gt;

&lt;p&gt;Why existing tools don't catch this&lt;br&gt;
I looked before I built. There's a lot in this space — none of it solves exactly this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ESLint, Prettier, lint-staged. Generic code quality. They don't know what my docs say.&lt;/li&gt;
&lt;li&gt;commitlint. Validates commit messages. Doesn't look at code.&lt;/li&gt;
&lt;li&gt;CodeRabbit, Greptile, Bito. AI code reviewers — but they review after the PR is open. By then the code is committed, pushed, and in someone's review queue. I want it stopped before it enters git history.&lt;/li&gt;
&lt;li&gt;Cursor rules / .cursorrules. IDE-side hints. The IDE uses them when generating; nothing enforces them at commit time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gap: a thing that sits between the AI's output and git commit, knows my docs, and refuses to let through code that contradicts them.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;How DocGuard works&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pre-commit hook fires on every git commit.&lt;/li&gt;
&lt;li&gt;Reads the staged diff. Only the hunks you're about to commit. Binary files and lockfiles skipped.&lt;/li&gt;
&lt;li&gt;Loads your markdown docs. Configurable glob, defaults to ./docs/*&lt;em&gt;/&lt;/em&gt;.md.&lt;/li&gt;
&lt;li&gt;Chunks them by heading, ≤1500 chars per chunk, with line metadata.
5.Embeds chunks locally with MiniLM via @xenova/transformers. Cached to .docguard/cache/. Model downloads once (~25MB), no per-commit API cost.&lt;/li&gt;
&lt;li&gt;Retrieves the relevant chunks for each changed file — cosine similarity plus a file-path boost.&lt;/li&gt;
&lt;li&gt;Sends only the top chunks + the diff to Groq (free tier, Llama 3.3 70B). Bring your own key, no SaaS.&lt;/li&gt;
&lt;li&gt;Validates the response. Either passes, warns, or blocks based on per-category severity.
That validation step is what makes the tool not annoying.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The thing nobody else does: cite-or-downgrade&lt;/strong&gt;&lt;br&gt;
The real problem with LLM-based commit checks is that the model will happily invent violations. It'll claim line 47 of your architecture doc says something it doesn't. If you trust that and block the commit, you'll hit one false positive and your user will uninstall the tool forever.&lt;/p&gt;

&lt;p&gt;DocGuard's defense is a hard contract enforced in code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every violation must include a chunk_id that exists in the chunks the model was actually sent.&lt;/li&gt;
&lt;li&gt;The quote field must be a literal substring of that chunk's text.&lt;/li&gt;
&lt;li&gt;If either check fails — unknown chunk, or quote not actually in the chunk — the violation is automatically downgraded to a warning, regardless of configured severity.&lt;/li&gt;
&lt;li&gt;DocGuard computes the cited file:line itself from chunk metadata. The model never gets to invent line numbers.
Look at the screenshot above. Each warning quotes my doc exactly. That's not a coincidence — it's the only way the citation guard would let the finding be reported at all. Anything paraphrased gets downgraded with a (downgraded: quote not found inside cited chunk) note.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blocking and even warning-level violations only happen when there's a literal, quotable basis in your docs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;npm install --save-dev @mobasshirkhan/docguard&lt;br&gt;
npx docguard init&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy53k9v10ih9kapzi13a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy53k9v10ih9kapzi13a.png" alt=" " width="799" height="381"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Drop your Groq API key into .env:&lt;/em&gt;&lt;br&gt;
&lt;code&gt;GROQ_API_KEY=gsk_...&lt;/code&gt;&lt;br&gt;
DocGuard reads .env from the repo root automatically. Get a free key at console.groq.com.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37yttszt8y93g0khgz3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37yttszt8y93g0khgz3p.png" alt=" " width="800" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;init writes .docguard.json, installs the hook (Husky-aware), pre-warms the embedding model, and adds the cache paths to .gitignore. Next time you git commit, the hook runs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpitjuh9dcjio66aw8i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpitjuh9dcjio66aw8i7.png" alt=" " width="755" height="786"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The config is short. Categories are fixed (security, architecture, api-contract, naming, style) and severity is per-category, not per-rule. No DSL, no rule authoring — your docs are the rules.&lt;/p&gt;

&lt;p&gt;No key? Network down? DocGuard falls through to "no semantic check, commit allowed." It's built to never block on its own failures — only on real, cited violations.&lt;/p&gt;

&lt;p&gt;Uninstall is one command and leaves no residue:&lt;br&gt;
&lt;code&gt;npx docguard uninstall&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why I wrote this&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
I'm spending more and more time letting AI write code. The code is fine. The architecture isn't.&lt;/p&gt;

&lt;p&gt;The honest fix isn't "use AI less." It's: write your rules down where the AI can see them, and have something that checks the AI's output against those rules before it enters the repo.&lt;/p&gt;

&lt;p&gt;That's all DocGuard is. A small, opinionated, local thing that closes one specific gap that nothing else closes.&lt;/p&gt;

&lt;p&gt;It's open source, MIT, on npm:&lt;br&gt;
(&lt;a href="https://www.npmjs.com/package/@mobasshirkhan/docguard/v/0.1.1-beta.2" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/@mobasshirkhan/docguard/v/0.1.1-beta.2&lt;/a&gt;)&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
