<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: G V NIKITHA</title>
    <description>The latest articles on DEV Community by G V NIKITHA (@gv_nikitha).</description>
    <link>https://dev.to/gv_nikitha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3939285%2F89358741-a7d5-4727-afb4-ff0bc1f8ab49.png</url>
      <title>DEV Community: G V NIKITHA</title>
      <link>https://dev.to/gv_nikitha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gv_nikitha"/>
    <language>en</language>
    <item>
      <title>What is RAG? A Beginner's Guide to Retrieval-Augmented Generation (For Engineers Who Actually Build It)</title>
      <dc:creator>G V NIKITHA</dc:creator>
      <pubDate>Tue, 26 May 2026 10:30:20 +0000</pubDate>
      <link>https://dev.to/gv_nikitha/what-is-rag-a-beginners-guide-to-retrieval-augmented-generation-for-engineers-who-actually-build-2cg0</link>
      <guid>https://dev.to/gv_nikitha/what-is-rag-a-beginners-guide-to-retrieval-augmented-generation-for-engineers-who-actually-build-2cg0</guid>
      <description>&lt;p&gt;RAG sounds complicated.&lt;/p&gt;

&lt;p&gt;It's not.&lt;/p&gt;

&lt;p&gt;But a lot of introductions to RAG make it sound more mysterious than it actually is. They use terms like "semantic search" and "vector embeddings" and "retrieval pipeline" before explaining what the actual problem is.&lt;/p&gt;

&lt;p&gt;So let me start differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem RAG Solves
&lt;/h2&gt;

&lt;p&gt;Your AI model has a knowledge cutoff.&lt;/p&gt;

&lt;p&gt;If you're using Claude, GPT-4, or any modern LLM, it was trained on data up to a specific date. It doesn't know about your company's policies. It hasn't read your latest documentation. It doesn't understand your internal APIs.&lt;/p&gt;

&lt;p&gt;So when you ask it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"How do our authorization rules work?"&lt;/li&gt;
&lt;li&gt;"What's the return policy?"&lt;/li&gt;
&lt;li&gt;"What database schema do we use?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Makes something up (hallucination)&lt;/li&gt;
&lt;li&gt;Says it doesn't know&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both are bad in production.&lt;/p&gt;

&lt;p&gt;That's where RAG comes in.&lt;/p&gt;

&lt;p&gt;RAG doesn't retrain your model.&lt;br&gt;
RAG doesn't fine-tune anything.&lt;br&gt;
RAG doesn't give the model "new knowledge" in the traditional sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG does something simpler: it gives the model the right context before answering.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How RAG Actually Works
&lt;/h2&gt;

&lt;p&gt;Here's the flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Question
    ↓
Search Your Documents
    ↓
Get Relevant Excerpts
    ↓
Add Context to Prompt
    ↓
LLM Answers Based on Context
    ↓
Response to User
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;

&lt;p&gt;Let me break it down with a real example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Customer Support Bot
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without RAG:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "What's your return policy?"
LLM: "I don't have specific information about your company's return policy."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With RAG:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "What's your return policy?"

[System retrieves from docs]:
"Returns are accepted within 30 days. Items must be unopened. 
Refunds processed in 5-7 business days..."

LLM: "Your return policy allows returns within 30 days for unopened items. 
Refunds take 5-7 business days to process."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Parts of RAG
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Documents (Your Knowledge Base)
&lt;/h3&gt;

&lt;p&gt;This is everything you want the AI to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product documentation&lt;/li&gt;
&lt;li&gt;Internal policies&lt;/li&gt;
&lt;li&gt;API specifications&lt;/li&gt;
&lt;li&gt;Code repositories&lt;/li&gt;
&lt;li&gt;FAQs&lt;/li&gt;
&lt;li&gt;Previous conversations&lt;/li&gt;
&lt;li&gt;Business rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; These don't need to be in the LLM. They live in a database.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Retriever (Finding Relevant Info)
&lt;/h3&gt;

&lt;p&gt;When a user asks a question, you need to find the relevant documents quickly.&lt;/p&gt;

&lt;p&gt;This happens in two steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step A: Convert to Embeddings&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User question → numerical vector&lt;/li&gt;
&lt;li&gt;Your documents → numerical vectors&lt;/li&gt;
&lt;li&gt;These vectors live in a vector database (Pinecone, Weaviate, Milvus, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step B: Find Similarity&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compare question vector to document vectors&lt;/li&gt;
&lt;li&gt;Return the most similar documents&lt;/li&gt;
&lt;li&gt;(This happens via cosine similarity or other distance metrics)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real talk:&lt;/strong&gt; You don't need to understand the math. You just need to know that vectors let you find "similar" documents really fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The LLM (Answering with Context)
&lt;/h3&gt;

&lt;p&gt;Once you have the relevant documents, you add them to your prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a helpful customer support assistant.
Use the following context to answer questions:

[RETRIEVED DOCUMENTS GO HERE]

User Question: What's your return policy?

Answer:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM then answers based on the provided context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RAG &amp;gt; Other Approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  RAG vs. Fine-Tuning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fine-tuning:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train the model on your data&lt;/li&gt;
&lt;li&gt;Model learns your patterns permanently&lt;/li&gt;
&lt;li&gt;Takes weeks to update&lt;/li&gt;
&lt;li&gt;Expensive&lt;/li&gt;
&lt;li&gt;Requires technical expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add documents to a database&lt;/li&gt;
&lt;li&gt;Updates instantly&lt;/li&gt;
&lt;li&gt;Cheap&lt;/li&gt;
&lt;li&gt;Simple to implement&lt;/li&gt;
&lt;li&gt;Works with any LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; For most projects, RAG is better. Fine-tuning is only better if you need the model to learn a specific writing style or very niche patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG vs. Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt Engineering:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"You're a helpful support bot. Here are all our policies... [paste 10,000 words]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token wasteful (you're sending all context every time)&lt;/li&gt;
&lt;li&gt;Context window limit&lt;/li&gt;
&lt;li&gt;Not all context is relevant to every question&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send only relevant context&lt;/li&gt;
&lt;li&gt;Cheaper token usage&lt;/li&gt;
&lt;li&gt;Scales better&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; RAG is smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Common Beginner Mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake #1: Dumping Everything Into Vector DB
&lt;/h3&gt;

&lt;p&gt;Don't do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;documents = [
    "The quick brown fox jumped over the lazy dog. The dog was sleeping. The fox was fast.",
    "Our company was founded in 1995. We have 500 employees. We're based in San Francisco.",
    "..." (one giant document per topic)
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dilutes retrieval quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Break documents into chunks (usually 200-500 tokens per chunk).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chunks = [
    "The quick brown fox jumped over the lazy dog.",
    "The dog was sleeping.",
    "The fox was fast.",
    "Our company was founded in 1995.",
    "We have 500 employees.",
    "We're based in San Francisco.",
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mistake #2: Ignoring Retrieval Quality
&lt;/h3&gt;

&lt;p&gt;The best LLM won't help if you retrieve the wrong documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test your retrieval:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does searching for "return policy" actually return return policy docs?&lt;/li&gt;
&lt;li&gt;Does searching for "API authentication" return auth docs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If not, fix retrieval before blaming the LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake #3: Fixed Chunk Sizes for Everything
&lt;/h3&gt;

&lt;p&gt;Not all documents need the same chunk size.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code files: larger chunks (keep context)&lt;/li&gt;
&lt;li&gt;FAQs: smaller chunks (specific answers)&lt;/li&gt;
&lt;li&gt;Documentation: medium chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake #4: Trusting Retrieval Without Verification
&lt;/h3&gt;

&lt;p&gt;Always include retrieved documents in your prompt so:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The LLM can cite sources&lt;/li&gt;
&lt;li&gt;You can debug if answers are wrong&lt;/li&gt;
&lt;li&gt;Users know where info came from&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A Simple RAG System in Code
&lt;/h2&gt;

&lt;p&gt;Here's what basic RAG looks like with FastAPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Convert question to vector
&lt;/span&gt;    &lt;span class="n"&gt;question_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Search vector database
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Extract retrieved documents
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Create prompt with context
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer the question based on this context:

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Answer:&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5: Get LLM response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. That's RAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Customer Support
&lt;/h3&gt;

&lt;p&gt;Retrieve FAQs and policies → answer customer questions&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal Knowledge Base
&lt;/h3&gt;

&lt;p&gt;Retrieve docs → answer employee questions&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Assistant
&lt;/h3&gt;

&lt;p&gt;Retrieve codebase → help developers understand patterns&lt;/p&gt;

&lt;h3&gt;
  
  
  Product Recommendations
&lt;/h3&gt;

&lt;p&gt;Retrieve product info → personalized suggestions&lt;/p&gt;

&lt;h3&gt;
  
  
  Content Generation
&lt;/h3&gt;

&lt;p&gt;Retrieve research → generate informed articles&lt;/p&gt;

&lt;h2&gt;
  
  
  When RAG Might Not Be Enough
&lt;/h2&gt;

&lt;p&gt;RAG works great for &lt;strong&gt;retrieval-based problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Tell me about X"&lt;/li&gt;
&lt;li&gt;"How do I do X?"&lt;/li&gt;
&lt;li&gt;"What's our policy on X?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG struggles with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning&lt;/strong&gt; across many documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculations&lt;/strong&gt; on structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time data&lt;/strong&gt; that changes constantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those, you might need agents, tools, or specialized architectures.&lt;/p&gt;

&lt;p&gt;But that's a different post.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;RAG is not magic.&lt;/p&gt;

&lt;p&gt;It's just:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Store documents in a way that's searchable&lt;/li&gt;
&lt;li&gt;Retrieve relevant documents&lt;/li&gt;
&lt;li&gt;Add them to the prompt&lt;/li&gt;
&lt;li&gt;Let the LLM answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simple. Practical. Effective.&lt;/p&gt;

&lt;p&gt;And honestly, it's the reason AI assistants that actually work with your real data are becoming possible.&lt;/p&gt;

&lt;p&gt;Start simple. Add complexity later.&lt;/p&gt;

&lt;p&gt;That's how RAG actually works in production.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Stop Repeating Yourself to Your AI IDE — Use Rules Files Instead</title>
      <dc:creator>G V NIKITHA</dc:creator>
      <pubDate>Tue, 19 May 2026 16:20:35 +0000</pubDate>
      <link>https://dev.to/gv_nikitha/stop-repeating-yourself-to-your-ai-ide-use-rules-files-instead-7im</link>
      <guid>https://dev.to/gv_nikitha/stop-repeating-yourself-to-your-ai-ide-use-rules-files-instead-7im</guid>
      <description>&lt;p&gt;When I first started using AI coding tools seriously, I thought the biggest productivity boost would come from writing better prompts.&lt;/p&gt;

&lt;p&gt;So every session started the same way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use TypeScript&lt;/li&gt;
&lt;li&gt;Follow clean architecture&lt;/li&gt;
&lt;li&gt;Use TailwindCSS&lt;/li&gt;
&lt;li&gt;Add validation&lt;/li&gt;
&lt;li&gt;Keep components modular&lt;/li&gt;
&lt;li&gt;Avoid large functions&lt;/li&gt;
&lt;li&gt;Use async/await consistently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the next session would start…&lt;/p&gt;

&lt;p&gt;…and I’d type everything again.&lt;/p&gt;

&lt;p&gt;After a while, I realized something:&lt;/p&gt;

&lt;p&gt;The problem wasn’t my prompts anymore.&lt;/p&gt;

&lt;p&gt;The real problem was that the AI had no long-term understanding of my project.&lt;/p&gt;

&lt;p&gt;Every new chat felt like onboarding a new developer from scratch.&lt;/p&gt;

&lt;p&gt;That’s when I started exploring how tools like Cursor, Windsurf, Copilot, and Claude handle persistent context, memory, and project-level instructions.&lt;/p&gt;

&lt;p&gt;And honestly, this is where AI-assisted development starts becoming genuinely useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Most AI Workflows Reset Too Often
&lt;/h2&gt;

&lt;p&gt;A lot of developers still use AI tools like temporary conversations.&lt;/p&gt;

&lt;p&gt;You explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your tech stack&lt;/li&gt;
&lt;li&gt;your architecture&lt;/li&gt;
&lt;li&gt;your coding style&lt;/li&gt;
&lt;li&gt;your folder structure&lt;/li&gt;
&lt;li&gt;your naming conventions&lt;/li&gt;
&lt;li&gt;your preferred patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the session ends.&lt;/p&gt;

&lt;p&gt;The next session forgets everything.&lt;/p&gt;

&lt;p&gt;That’s why AI-generated code often feels inconsistent.&lt;/p&gt;

&lt;p&gt;One component follows your architecture perfectly.&lt;/p&gt;

&lt;p&gt;Another completely ignores it.&lt;/p&gt;

&lt;p&gt;One API includes proper validation.&lt;/p&gt;

&lt;p&gt;Another skips error handling entirely.&lt;/p&gt;

&lt;p&gt;One feature matches your project structure.&lt;/p&gt;

&lt;p&gt;Another creates an entirely new pattern.&lt;/p&gt;

&lt;p&gt;The AI itself is usually capable.&lt;/p&gt;

&lt;p&gt;What’s missing is persistent project context.&lt;/p&gt;

&lt;p&gt;Without that context, the AI generates code that works locally but doesn’t always fit system-wide consistency.&lt;/p&gt;

&lt;p&gt;And in real-world projects, consistency matters a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cursor: Rules-Based AI Development
&lt;/h2&gt;

&lt;p&gt;Cursor handles this using rules files.&lt;/p&gt;

&lt;p&gt;Inside your project, you can define persistent instructions using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.cursor/rules/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can split rules into focused files like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;frontend.mdc
backend.mdc
architecture.mdc
security.mdc
testing.mdc
api-patterns.mdc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These files are not just prompts.&lt;/p&gt;

&lt;p&gt;They behave more like engineering standards for the AI.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Backend Standards&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Use FastAPI with async routes
&lt;span class="p"&gt;-&lt;/span&gt; Validate request bodies with Pydantic
&lt;span class="p"&gt;-&lt;/span&gt; Keep business logic outside route handlers
&lt;span class="p"&gt;-&lt;/span&gt; Use service/repository architecture
&lt;span class="p"&gt;-&lt;/span&gt; Return structured JSON responses
&lt;span class="p"&gt;-&lt;/span&gt; Add proper exception handling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when Cursor generates backend code, it already understands how your project is structured.&lt;/p&gt;

&lt;p&gt;That changes the development experience completely.&lt;/p&gt;

&lt;p&gt;Instead of repeatedly fixing architecture mistakes, you spend more time reviewing actual implementation logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Windsurf: Memory and Workspace Awareness
&lt;/h2&gt;

&lt;p&gt;Windsurf takes a slightly different approach.&lt;/p&gt;

&lt;p&gt;Instead of relying heavily on rules files, Windsurf focuses more on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;workspace memory&lt;/li&gt;
&lt;li&gt;conversational continuity&lt;/li&gt;
&lt;li&gt;contextual understanding&lt;/li&gt;
&lt;li&gt;project awareness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over time, Windsurf starts recognizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your coding patterns&lt;/li&gt;
&lt;li&gt;preferred libraries&lt;/li&gt;
&lt;li&gt;folder structure&lt;/li&gt;
&lt;li&gt;naming styles&lt;/li&gt;
&lt;li&gt;repeated architectural decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of manually repeating:&lt;br&gt;
“Use TypeScript and modular architecture”&lt;/p&gt;

&lt;p&gt;…the AI gradually adapts to your workflow through repeated interaction and project context.&lt;/p&gt;

&lt;p&gt;That’s what makes Windsurf feel different.&lt;/p&gt;

&lt;p&gt;The experience becomes less like prompting a chatbot and more like working inside a development environment that slowly learns your habits.&lt;/p&gt;


&lt;h2&gt;
  
  
  GitHub Copilot: More Than Just Autocomplete
&lt;/h2&gt;

&lt;p&gt;A lot of developers still think of GitHub Copilot as smart autocomplete.&lt;/p&gt;

&lt;p&gt;But repository-level guidance is becoming increasingly important.&lt;/p&gt;

&lt;p&gt;Teams now combine Copilot with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repository instructions&lt;/li&gt;
&lt;li&gt;project documentation&lt;/li&gt;
&lt;li&gt;reusable prompts&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;editor configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because autocomplete alone does not guarantee consistency.&lt;/p&gt;

&lt;p&gt;Without context, Copilot might generate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inconsistent API structures&lt;/li&gt;
&lt;li&gt;duplicated utility functions&lt;/li&gt;
&lt;li&gt;different validation styles&lt;/li&gt;
&lt;li&gt;conflicting architectural patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once project standards are introduced, the generated code becomes much more aligned with the rest of the application.&lt;/p&gt;


&lt;h2&gt;
  
  
  Claude Projects and Long-Term Context
&lt;/h2&gt;

&lt;p&gt;Claude Projects introduced another interesting idea:&lt;/p&gt;

&lt;p&gt;Persistent project context.&lt;/p&gt;

&lt;p&gt;Instead of starting every conversation from zero, you can attach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding standards&lt;/li&gt;
&lt;li&gt;architecture documentation&lt;/li&gt;
&lt;li&gt;technical references&lt;/li&gt;
&lt;li&gt;workflow notes&lt;/li&gt;
&lt;li&gt;project instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives the AI more continuity across longer development cycles.&lt;/p&gt;

&lt;p&gt;And honestly, continuity is one of the biggest missing pieces in AI-assisted engineering right now.&lt;/p&gt;

&lt;p&gt;Because real software development is not isolated code generation.&lt;/p&gt;

&lt;p&gt;It’s maintaining consistency across an evolving system.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Biggest Shift Is Happening at the Workflow Level
&lt;/h2&gt;

&lt;p&gt;I think this is the part many developers still underestimate.&lt;/p&gt;

&lt;p&gt;Most AI discussions focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better prompts&lt;/li&gt;
&lt;li&gt;prompt engineering&lt;/li&gt;
&lt;li&gt;prompt tricks&lt;/li&gt;
&lt;li&gt;prompt frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the bigger shift is actually happening at the workflow level.&lt;/p&gt;

&lt;p&gt;The developers getting the best results are building systems where the AI already understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project architecture&lt;/li&gt;
&lt;li&gt;engineering standards&lt;/li&gt;
&lt;li&gt;reusable patterns&lt;/li&gt;
&lt;li&gt;technical constraints&lt;/li&gt;
&lt;li&gt;coding conventions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changes the role of the developer.&lt;/p&gt;

&lt;p&gt;You stop micromanaging every single output.&lt;/p&gt;

&lt;p&gt;You start designing systems that guide AI behavior consistently.&lt;/p&gt;

&lt;p&gt;And that’s a much more scalable workflow.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Changed in My Own Workflow
&lt;/h2&gt;

&lt;p&gt;After moving toward persistent AI workflows, I noticed improvements almost immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generated code became more consistent&lt;/li&gt;
&lt;li&gt;Folder structures stopped drifting&lt;/li&gt;
&lt;li&gt;Validation patterns became predictable&lt;/li&gt;
&lt;li&gt;Refactoring became easier&lt;/li&gt;
&lt;li&gt;Repeated corrections dropped significantly&lt;/li&gt;
&lt;li&gt;Feature development became faster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the biggest improvement was mental.&lt;/p&gt;

&lt;p&gt;The AI stopped feeling random.&lt;/p&gt;

&lt;p&gt;It started feeling like an assistant that actually understood the project context.&lt;/p&gt;

&lt;p&gt;Not perfectly.&lt;/p&gt;

&lt;p&gt;But well enough to remove a huge amount of repetitive setup work.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Biggest Mistake Developers Make
&lt;/h2&gt;

&lt;p&gt;One common mistake is writing vague instructions.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write clean code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds useful, but it’s too abstract.&lt;/p&gt;

&lt;p&gt;AI tools work much better with specific operational guidance.&lt;/p&gt;

&lt;p&gt;Something like this is far more effective:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Use TypeScript everywhere
- Keep functions under 30 lines
- Add validation for all API inputs
- Avoid business logic inside UI components
- Extract reusable hooks for shared logic
- Use async/await consistently
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specific systems produce more predictable outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I still use prompts constantly.&lt;/p&gt;

&lt;p&gt;But I no longer think prompts alone are the foundation of good AI-assisted development.&lt;/p&gt;

&lt;p&gt;Persistent systems are.&lt;/p&gt;

&lt;p&gt;Rules files.&lt;br&gt;
Workspace memory.&lt;br&gt;
Project instructions.&lt;br&gt;
Architecture context.&lt;br&gt;
Reusable engineering standards.&lt;/p&gt;

&lt;p&gt;All of these reduce the need to repeatedly teach the AI the same things every session.&lt;/p&gt;

&lt;p&gt;And honestly, once you experience that workflow, traditional prompt-only development starts feeling surprisingly inefficient.&lt;/p&gt;

&lt;p&gt;The future of AI coding probably won’t belong to developers who write the longest prompts.&lt;/p&gt;

&lt;p&gt;It will belong to developers who build the best systems around the AI.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cursor</category>
      <category>windsurf</category>
      <category>antigravity</category>
    </item>
  </channel>
</rss>
