<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Miscanuk</title>
    <description>The latest articles on DEV Community by Michael Miscanuk (@michaelmiscanuk).</description>
    <link>https://dev.to/michaelmiscanuk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F593160%2F5429b010-0572-45b0-b902-db5d512e9dc8.png</url>
      <title>DEV Community: Michael Miscanuk</title>
      <link>https://dev.to/michaelmiscanuk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/michaelmiscanuk"/>
    <language>en</language>
    <item>
      <title>Most RAG Problems Are Retrieval Problems. Here Are 8 Fixes That Worked for Me</title>
      <dc:creator>Michael Miscanuk</dc:creator>
      <pubDate>Sun, 14 Jun 2026 06:06:27 +0000</pubDate>
      <link>https://dev.to/michaelmiscanuk/most-rag-problems-are-retrieval-problems-here-are-8-fixes-that-worked-for-me-bg4</link>
      <guid>https://dev.to/michaelmiscanuk/most-rag-problems-are-retrieval-problems-here-are-8-fixes-that-worked-for-me-bg4</guid>
      <description>&lt;p&gt;The first few times a RAG system gave me a bad answer, I did what I think everyone does: I went and fiddled with the prompt. Made it stricter. Added a "only answer from the context" line. It barely moved the needle.&lt;/p&gt;

&lt;p&gt;What finally fixed things was looking one step earlier. Nine times out of ten the model wasn't the problem at all — the right passage just never showed up in the context window, so there was nothing to ground the answer on. You can't prompt your way out of missing evidence.&lt;/p&gt;

&lt;p&gt;So here's what I now reach for when retrieval is the weak link, roughly in the order I'd try them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Why it helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Get chunking right first&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chunk size sets the ceiling on everything downstream. Too big and the answer drowns in noise; too small and it loses the surrounding context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Add some overlap&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;chunk_overlap=200&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Carries a bit of the previous chunk's tail forward, so an answer that straddles a boundary doesn't get sliced in half.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contextual chunking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;chunk = f"CONTEXT: {llm(doc, chunk)}\{chunk}"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sticks a short, model-written "here's where this fits" note on each chunk before you embed it. Makes otherwise-vague chunks findable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid search (BM25 + dense)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;collection.query.hybrid(query, alpha=0.5)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vectors are good at meaning, BM25 is good at literal strings — error codes, product names, that one weird acronym. Use both.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;co.rerank(model="rerank-v3.5", query=q, documents=texts, top_n=10)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A cross-encoder reads each query/doc pair properly and reorders them. Cheap next to the LLM call, big jump in precision.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parent-document retriever&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ParentDocumentRetriever(child_splitter=small)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search over tiny chunks so you hit the right spot, then hand the model the bigger surrounding chunk so it has room to reason.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rewrite the query&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;MultiQueryRetriever.from_llm(retriever, llm)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;People don't phrase questions the way docs are written. Generate a few variants, a fake answer to search with (HyDE), or a broader version of the question.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Filter on metadata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;where={"source": "handbook"}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cut the candidate set down by source, date, section, whatever — before the vector search runs. Faster and more accurate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fuse with RRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;RRF(d) = Σ 1/(k + rank_i(d))&lt;/code&gt;, &lt;code&gt;k=60&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Merges several ranked lists without needing their scores to line up.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;De-dupe / MMR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SimilarityPostprocessor&lt;/code&gt; + dedup&lt;/td&gt;
&lt;td&gt;Stops you from handing the model three chunks that all say the same thing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The three I'd reach for first
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hybrid search, so you stop losing exact terms
&lt;/h3&gt;

&lt;p&gt;This one bit me directly. We had a support bot that could happily explain &lt;em&gt;what&lt;/em&gt; a connection error was but couldn't find the doc for &lt;code&gt;ERR_CONN_REFUSED&lt;/code&gt; specifically, because dense embeddings smear that exact token into "something about connections." BM25 finds it instantly. The fix is to run both retrievers and merge the two ranked lists with Reciprocal Rank Fusion — no score calibration needed, which is the nice part:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dense_hits&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sparse_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dense_hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sparse_hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;top_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# doc IDs, best first
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reranking, so the best chunk is actually on top
&lt;/h3&gt;

&lt;p&gt;First-pass retrieval optimizes for recall, which means it grabs a wide net and the genuinely-best passage often sits at rank 7, not rank 1. A reranker takes that shortlist and scores each pair more carefully. It costs a bit more than the initial search but it's pocket change compared to the generation call, so there's not much reason to skip it. Pull a wide shortlist, rerank, keep the top few:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# grab 20–50, recall first
&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cross_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)][:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Contextual chunking, so a chunk knows what it's about
&lt;/h3&gt;

&lt;p&gt;Picture a chunk that just says &lt;em&gt;"the limit is 5,000 requests per minute."&lt;/em&gt; Great — for which API? Which plan? On its own it's nearly unretrievable. The trick is to have an LLM write one line of context for each chunk and prepend it before embedding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Briefly situate this chunk within the document:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;CHUNK:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# index the contextualized version
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, it costs an extra LLM call per chunk at index time. You pay it once, and it's the single change that moved our numbers the most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things I'd tell my past self not to do
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't start with the prompt.&lt;/strong&gt; If the evidence isn't in the context window, the prompt is irrelevant. Log what retrieval actually returned before you change anything else — half the time the bug is obvious the moment you look.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't set chunk size once and forget it.&lt;/strong&gt; It's the biggest lever you've got. Try 256, 512, 1024 on your own data and measure; the "right" default depends entirely on your docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't go dense-only.&lt;/strong&gt; Pure vector search will keep missing codes, IDs, and rare names. Bolting on BM25 is the cheapest real win after chunking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't skip reranking and just stuff 20 chunks in.&lt;/strong&gt; More context isn't better context — it's slower, pricier, and the model gets distracted. Retrieve wide, rerank, send five.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't crank &lt;code&gt;top_k&lt;/code&gt; to paper over bad ranking.&lt;/strong&gt; Past a handful of good chunks you're mostly adding noise. Fix the ordering, then trim.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Retrieval is where most of your RAG quality lives or dies, and the nice thing is these stack — cleaner chunks make the reranker's job easier, hybrid search feeds it better candidates, and so on. If you want the full version of this (embeddings, vector DBs, the query-rewriting tricks, agentic patterns, evaluation — way more than fits here), I keep a sorted reference over at &lt;a href="https://cheatgrid.com/generative-ai/0156-rag-retrieval-augmented-generation-cheat-sheet?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=rag-retrieval" rel="noopener noreferrer"&gt;CheatGrid's RAG cheat sheet&lt;/a&gt;. It's free and there's no signup.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
