<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Khishamuddin Syed</title>
    <description>The latest articles on DEV Community by Khishamuddin Syed (@webkmsyed).</description>
    <link>https://dev.to/webkmsyed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1302847%2F6e688acd-3393-4f40-9c53-e9a24d453d22.jpg</url>
      <title>DEV Community: Khishamuddin Syed</title>
      <link>https://dev.to/webkmsyed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/webkmsyed"/>
    <language>en</language>
    <item>
      <title>RAG vs Fine-Tuning</title>
      <dc:creator>Khishamuddin Syed</dc:creator>
      <pubDate>Sun, 24 May 2026 09:41:00 +0000</pubDate>
      <link>https://dev.to/webkmsyed/rag-vs-fine-tuning-2k9h</link>
      <guid>https://dev.to/webkmsyed/rag-vs-fine-tuning-2k9h</guid>
      <description>&lt;p&gt;Everyone explains what RAG and fine-tuning are. Nobody tells you how to decide which one your project actually needs. Here's the honest breakdown.&lt;/p&gt;

&lt;p&gt;I've seen this question come up in every AI project discussion I've been part of recently: &lt;em&gt;"Should we use RAG or fine-tune the model?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And I've watched teams get it wrong in both directions. One team spent three months on a fine-tuning pipeline when a basic RAG setup would have solved their problem in a week. Another team built a full retrieval system for a use case where the model just needed to learn a consistent output format.&lt;/p&gt;

&lt;p&gt;The problem isn't that people don't understand what RAG and fine-tuning are. Most people have a rough idea. The problem is knowing &lt;em&gt;which one to actually reach for&lt;/em&gt; when you're staring at a real project with real constraints.&lt;/p&gt;

&lt;p&gt;That's what this article is about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick recap: what each one actually does
&lt;/h2&gt;

&lt;p&gt;Before getting into the decision framework, let me establish a baseline because these two things are genuinely different at a fundamental level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; changes what the model &lt;em&gt;sees&lt;/em&gt; at inference time. When a query comes in, a retrieval system searches your knowledge base, pulls the most relevant chunks, and injects them into the model's context window alongside the user's question. The model itself is untouched.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Query
    ↓
Search knowledge base
    ↓
Retrieve top N relevant chunks
    ↓
[System prompt] + [Retrieved chunks] + [User query] → LLM → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fine-tuning&lt;/strong&gt; changes how the model &lt;em&gt;behaves&lt;/em&gt; permanently. You take a pretrained model and train it further on your own dataset, updating its internal weights. After fine-tuning, every single response reflects what you taught it, without needing to retrieve anything.&lt;/p&gt;

&lt;p&gt;The one-line version: &lt;strong&gt;RAG changes what the model can see right now. Fine-tuning changes how the model tends to behave every time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to go deeper on how LLMs work under the hood before reading further, the full breakdown is at &lt;a href="https://blog.jargoniseasy.com/what-is-an-llm-how-do-large-language-models-work" rel="noopener noreferrer"&gt;What Is an LLM?&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real question nobody asks
&lt;/h2&gt;

&lt;p&gt;Most articles frame this as "RAG for knowledge, fine-tuning for behavior." That's true but incomplete. The question that actually matters in production is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does your intelligence need to live?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the model's weights (baked in permanently)&lt;/li&gt;
&lt;li&gt;In an external knowledge store (retrieved at runtime)&lt;/li&gt;
&lt;li&gt;In both&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you think about it this way, the decision usually becomes much clearer.&lt;/p&gt;




&lt;h2&gt;
  
  
  When RAG is the right call
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Your data changes frequently
&lt;/h3&gt;

&lt;p&gt;If you're building on top of documentation that gets updated, a knowledge base that grows, product information that changes, or anything with a timestamp on it RAG is the obvious choice. You update your vector database. The model doesn't need to be retrained. Done.&lt;/p&gt;

&lt;p&gt;Fine-tuning for this use case is painful: every time your data changes, you need to retrain. That's expensive, slow, and operationally annoying.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need the model to cite sources
&lt;/h3&gt;

&lt;p&gt;RAG retrieves specific chunks from specific documents. You know exactly where the answer came from. This matters enormously in legal, medical, compliance, and customer support contexts where "the model said so" isn't enough justification.&lt;/p&gt;

&lt;p&gt;Fine-tuned models have absorbed knowledge into their weights. They can't point you back to a source because they don't "know" where they learned something from.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your knowledge base is large and diverse
&lt;/h3&gt;

&lt;p&gt;If you have thousands of documents covering wildly different topics, fine-tuning on all of it tends to produce a model that's mediocre across all of them. RAG lets you retrieve precisely what's relevant to each query you're not asking the model to remember everything, you're asking it to use what you give it.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need to reduce hallucination on factual questions
&lt;/h3&gt;

&lt;p&gt;When an LLM answers from its weights alone, it's working from memory. Memory is unreliable for specific facts, numbers, names, and dates. RAG grounds the response in actual retrieved text, which dramatically reduces hallucination on factual queries.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;One thing worth knowing: if your entire knowledge base fits comfortably within the model's context window, you might not need RAG at all. For knowledge bases under roughly 200,000 tokens, full-context prompting (just stuffing everything in the prompt) can be faster and cheaper than building retrieval infrastructure. Always check the size before you architect anything.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  When fine-tuning is the right call
&lt;/h2&gt;

&lt;h3&gt;
  
  
  You need consistent output format or style
&lt;/h3&gt;

&lt;p&gt;If you want the model to always respond in a specific JSON structure, always use a particular tone, always follow a domain-specific template fine-tuning is much more reliable than prompting for this. You can instruct a model to follow a format in a system prompt, but it will occasionally deviate. A fine-tuned model that's been trained on hundreds of examples of the correct format almost never does.&lt;/p&gt;

&lt;h3&gt;
  
  
  You're working with specialized domain language
&lt;/h3&gt;

&lt;p&gt;Medical terminology, legal language, financial jargon, industry-specific acronyms if your domain has vocabulary and reasoning patterns that a general-purpose model handles poorly, fine-tuning on domain examples improves baseline performance significantly.&lt;/p&gt;

&lt;p&gt;This is different from giving the model domain knowledge (which RAG handles). It's about the model understanding how to &lt;em&gt;reason&lt;/em&gt; in a domain, not just what words are used.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your queries are consistent and repetitive
&lt;/h3&gt;

&lt;p&gt;Customer support bots that handle the same 50 questions in slightly different phrasings. Code completion tools for a specific internal framework. Translation models for a specific style guide. When the task is well-defined and repetitive, fine-tuning is efficient: the model internalizes the pattern and executes it reliably without retrieving anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need faster inference at scale
&lt;/h3&gt;

&lt;p&gt;Every RAG call involves a retrieval step before the model even starts generating. At low volume, this is negligible. At high volume with latency requirements, the retrieval overhead matters. A fine-tuned model that doesn't need to retrieve anything is faster per query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt size is a cost constraint
&lt;/h3&gt;

&lt;p&gt;RAG injects retrieved chunks into the context, which means longer prompts, which means more tokens per call, which means higher API costs. If you're running millions of queries per day, that adds up. A fine-tuned model handles this knowledge internally without bloating the prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;Fine-Tuning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Good for new/changing knowledge&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No, needs retraining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good for consistent format/style&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can cite sources&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reduces hallucination on facts&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upfront cost&lt;/td&gt;
&lt;td&gt;Low to medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ongoing maintenance&lt;/td&gt;
&lt;td&gt;Update the DB&lt;/td&gt;
&lt;td&gt;Retrain when data shifts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to production&lt;/td&gt;
&lt;td&gt;Days to weeks&lt;/td&gt;
&lt;td&gt;Weeks to months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk of degrading base model&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Real risk if data is poor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works with any base model&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Tied to the model you trained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The "just prompt it" option people forget
&lt;/h2&gt;

&lt;p&gt;Before committing to either approach, ask one question: can a good system prompt solve this?&lt;/p&gt;

&lt;p&gt;Seriously. I've watched teams spin up RAG pipelines for knowledge bases that had 10 documents totalling 8,000 words. Just put them in the system prompt. Done. No infrastructure, no embeddings, no vector database. Works fine.&lt;/p&gt;

&lt;p&gt;Similarly, before fine-tuning for a specific output format, test how far a detailed system prompt with examples gets you. A few good few-shot examples in the prompt often match what fine-tuning would give you, at zero additional cost.&lt;/p&gt;

&lt;p&gt;Prompt engineering is underrated as a first step. MDN-style documentation on how your model provider handles system prompts is worth reading before you build anything.&lt;/p&gt;

&lt;p&gt;For OpenAI specifically, the &lt;a href="https://platform.openai.com/docs/guides/prompt-engineering" rel="noopener noreferrer"&gt;Prompt Engineering Guide&lt;/a&gt; is worth reading front to back before you decide you need fine-tuning. It covers few-shot examples, JSON mode, and structured outputs all of which replace fine-tuning for a surprisingly large set of use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  A decision framework that actually works
&lt;/h2&gt;

&lt;p&gt;Here's the thinking process I go through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Does your data change frequently?
├── Yes → RAG
└── No → Continue

Do you need to cite sources?
├── Yes → RAG
└── No → Continue

Is the task about consistent behavior/style/format?
├── Yes → Fine-tuning
└── No → Continue

Does your domain have specialized reasoning patterns?
├── Yes → Fine-tuning (possibly + RAG)
└── No → Continue

Can a good system prompt solve this?
├── Yes → Just prompt it
└── No → Probably RAG for knowledge, fine-tuning for behavior
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  In production, it's usually both
&lt;/h2&gt;

&lt;p&gt;The "RAG vs fine-tuning" framing is a bit of a false choice. In 2026, most serious production systems use both. Fine-tune the model for domain reasoning patterns and consistent behavior, then add RAG on top for up-to-date factual grounding.&lt;/p&gt;

&lt;p&gt;The split that works well: &lt;strong&gt;volatile knowledge in retrieval, stable behavior in weights&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Your product policies change every quarter RAG. Your model needs to always respond in a specific structured format fine-tuning. Your customer support knowledge base has 5,000 articles that get edited daily RAG. Your model needs to understand your company's internal code conventions fine-tuning.&lt;/p&gt;

&lt;p&gt;These aren't in conflict. They're solving different parts of the problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What nobody tells you about fine-tuning failures
&lt;/h2&gt;

&lt;p&gt;Fine-tuning has a failure mode that's subtle and annoying: &lt;strong&gt;catastrophic forgetting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you train a model on your domain-specific dataset, you can inadvertently degrade its general capabilities. Fine-tune too aggressively on a narrow dataset and you get a model that's great at your specific task and noticeably worse at everything else.&lt;/p&gt;

&lt;p&gt;The mitigation is data diversity: make sure your fine-tuning dataset isn't so narrow that the model loses general reasoning ability. Mix in general examples alongside your domain-specific ones. And always eval your fine-tuned model against a broad benchmark, not just your target task.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use RAG&lt;/strong&gt; when your knowledge changes, when you need sources, when your knowledge base is large and dynamic, or when you need to reduce factual hallucinations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use fine-tuning&lt;/strong&gt; when you need consistent behavior, domain-specific reasoning, faster inference, or lower token costs at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try prompting first&lt;/strong&gt; it solves more than people think and costs nothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In production, use both&lt;/strong&gt; RAG for volatile knowledge, fine-tuning for stable behavior&lt;/li&gt;
&lt;li&gt;The question isn't which one is better. It's where your intelligence needs to live&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're new to how LLMs work and some of the terminology here felt unfamiliar, start with &lt;a href="https://blog.jargoniseasy.com/what-is-an-llm-how-do-large-language-models-work" rel="noopener noreferrer"&gt;How Large Language Models Actually Work&lt;/a&gt; it covers tokenization, context windows, training, and hallucination in plain English before you go deeper into architecture decisions like this one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built something with RAG or fine-tuning recently? Drop what you used and why in the comments. Real production decisions are always more interesting than the theory.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>machinelearning</category>
      <category>llm</category>
    </item>
    <item>
      <title>Day 5: Building VS Code Extension in Public</title>
      <dc:creator>Khishamuddin Syed</dc:creator>
      <pubDate>Mon, 11 May 2026 03:25:55 +0000</pubDate>
      <link>https://dev.to/webkmsyed/day-5-building-vs-code-extension-in-public-2d3g</link>
      <guid>https://dev.to/webkmsyed/day-5-building-vs-code-extension-in-public-2d3g</guid>
      <description>&lt;h2&gt;
  
  
  Day 5: The bugs I didn't see coming
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Building DevFlow Suite in public a 7-day series documenting the full journey from idea to VS Code Marketplace.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I thought I was done. The features were working, the UI looked right, and the extension was loading without errors.&lt;/p&gt;

&lt;p&gt;Then I started actually using it.&lt;/p&gt;

&lt;p&gt;Here's what broke and more importantly, why it broke.&lt;/p&gt;

&lt;h3&gt;
  
  
  The comment scanner race condition
&lt;/h3&gt;

&lt;p&gt;DevFlow Suite automatically scans your workspace for inline &lt;code&gt;//&lt;/code&gt; comments on every file save. The problem: when you deleted a comment and saved, the scanner would sometimes re-surface it.&lt;/p&gt;

&lt;p&gt;The delete event and the scan were both triggered by the same save. The delete hadn't flushed to state before the scanner ran.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; queue the scan behind the delete. Simple. Finding it: 3 hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  The line-number shift bug
&lt;/h3&gt;

&lt;p&gt;When you delete a comment, every comment below it shifts up by one line. The extension was storing line references as static numbers.&lt;/p&gt;

&lt;p&gt;Delete line 42. The comment that was on line 43 is now on 42 but the stored reference still says 43. It renders as missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; switch to text-based matching instead of line-number matching for identity. One line of logic. Two days of confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows path handling
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;fs.copyFileSync&lt;/code&gt; and &lt;code&gt;vscode.diff&lt;/code&gt; both behaved differently on Windows when mixing absolute and relative paths. The diff view would silently fail.&lt;/p&gt;

&lt;p&gt;I don't use Windows as my primary machine. I caught this only because someone else tested it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; always test path operations on Windows if your extension touches the filesystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  The recycle bin ghost problem
&lt;/h3&gt;

&lt;p&gt;Items restored from the recycle bin would sometimes re-appear in trash on the next render. The &lt;code&gt;isInTrash&lt;/code&gt; check was using stale line numbers same root cause as the shift bug, different surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; switch &lt;code&gt;isInTrash&lt;/code&gt; to text-based matching. Same pattern, different location.&lt;/p&gt;




&lt;h3&gt;
  
  
  What I learned from all of this
&lt;/h3&gt;

&lt;p&gt;VS Code extensions run in a strange environment. File events, webview persistence, command registration none of it behaves exactly like standard Node.js.&lt;/p&gt;

&lt;p&gt;The gap between "it works when I test it" and "it works reliably" is where all these bugs lived.&lt;/p&gt;

&lt;p&gt;The only real way to find them was to use the extension as if I hadn't built it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;PRD DOC and Past Updates:&lt;/strong&gt; &lt;a href="https://devflow-suite.notion.site/" rel="noopener noreferrer"&gt;https://devflow-suite.notion.site/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgiri59lpbjugkqnzn9n5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgiri59lpbjugkqnzn9n5.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>vscode</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
