<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Isaac Natarajan</title>
    <description>The latest articles on DEV Community by Isaac Natarajan (@isaacnatarajan).</description>
    <link>https://dev.to/isaacnatarajan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4002846%2F166a1ec3-97b5-4c53-9945-9377b06fd7fe.jpeg</url>
      <title>DEV Community: Isaac Natarajan</title>
      <link>https://dev.to/isaacnatarajan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/isaacnatarajan"/>
    <language>en</language>
    <item>
      <title>Building an Advanced RAG System with Multiple Chunking Strategies — A Practical Guide</title>
      <dc:creator>Isaac Natarajan</dc:creator>
      <pubDate>Thu, 25 Jun 2026 19:32:15 +0000</pubDate>
      <link>https://dev.to/isaacnatarajan/building-an-advanced-rag-system-with-multiple-chunking-strategies-a-practical-guide-36hj</link>
      <guid>https://dev.to/isaacnatarajan/building-an-advanced-rag-system-with-multiple-chunking-strategies-a-practical-guide-36hj</guid>
      <description>&lt;p&gt;I built an Advanced RAG system that compares 4 chunking strategies (fixed-size, recursive, semantic, hierarchical) on Apple's 10-K filings using NVIDIA NIM models, Qdrant, and custom evaluation metrics. Semantic chunking won with an overall score of 0.86. Here's everything I learned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Feo7zsyctwmu9otpwffof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Feo7zsyctwmu9otpwffof.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
Retrieval-Augmented Generation (RAG) is one of the most practical applications of LLMs today. Instead of relying on a model's training data, RAG retrieves relevant information from your own documents and uses it to generate accurate, grounded answers.&lt;/p&gt;

&lt;p&gt;But here's what most RAG tutorials skip: how you chunk your documents matters enormously. The same pipeline with different chunking strategies can produce wildly different results. I wanted to test this properly, so I built a system that runs 4 chunking strategies side by side on the same corpus and evaluates them with real metrics.&lt;/p&gt;

&lt;p&gt;In this post I'll walk through everything — data ingestion, chunking, vector storage, retrieval, generation, evaluation, and a Streamlit chatbot UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Tool / Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding Model&lt;/td&gt;
&lt;td&gt;NVIDIA llama-nemotron-embed-1b-v2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;NVIDIA llama-3.3-nemotron-super-49b-v1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Database&lt;/td&gt;
&lt;td&gt;Qdrant (local via Docker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipeline&lt;/td&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracing&lt;/td&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation&lt;/td&gt;
&lt;td&gt;Custom LLM-as-judge metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Streamlit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All NVIDIA models are accessed via &lt;a href="https://integrate.api.nvidia.com/v1" rel="noopener noreferrer"&gt;https://integrate.api.nvidia.com/v1&lt;/a&gt; which is OpenAI-compatible, making integration straightforward.&lt;/p&gt;

&lt;p&gt;One important quirk with llama-3.3-nemotron-super-49b-v1.5 — it has a thinking mode that needs to be explicitly disabled, and you need high max_tokens (8192+) otherwise the model spends all its tokens on internal reasoning and returns None as content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = client.chat.completions.create(
    model=LLM_MODEL,
    messages=[...],
    max_tokens=8192,
    extra_body={"chat_template_kwargs": {"thinking": False}}
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similarly, the embedding model is asymmetric and requires an input_type parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# For document chunks
client.embeddings.create(model=EMBED_MODEL, input=text, extra_body={"input_type": "passage"})

# For queries
client.embeddings.create(model=EMBED_MODEL, input=query, extra_body={"input_type": "query"})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;&lt;br&gt;
I used Apple's 10-K annual reports for 2022 and 2023, downloaded directly from Apple's investor relations page as PDFs. Financial documents are ideal for this kind of project because they have mixed content — dense paragraphs, tables, numbered sections, and boilerplate — which makes chunking strategy comparison genuinely meaningful.&lt;/p&gt;

&lt;p&gt;Extraction with pdfplumber:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pdfplumber

def load_pdfs():
    documents = []
    for filename in os.listdir("data/pdfs"):
        if filename.endswith(".pdf"):
            with pdfplumber.open(f"data/pdfs/{filename}") as pdf:
                full_text = ""
                for page in pdf.pages:
                    text = page.extract_text()
                    if text:
                        full_text += text + "\n"
            documents.append({"filename": filename, "text": clean_text(full_text)})
    return documents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After cleaning, I ended up with ~221k characters from the 2022 report and ~207k from 2023.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 4 Chunking Strategies&lt;/strong&gt;&lt;br&gt;
This is the heart of the project. Each strategy produces a different number and quality of chunks from the same documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Fixed-size Chunking&lt;/strong&gt;&lt;br&gt;
The simplest approach — split every N characters with some overlap regardless of content boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=50, separator="\n")
chunks = splitter.split_text(text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: 951 chunks&lt;br&gt;
Pros: Fast, simple, predictable&lt;br&gt;
Cons: Cuts across sentences and paragraphs, losing context&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Recursive Character Splitting&lt;/strong&gt;&lt;br&gt;
LangChain's default. Tries to split on paragraph breaks first, then sentences, then words — preserving semantic units as much as possible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " ", ""]
)
chunks = splitter.split_text(text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: 954 chunks&lt;br&gt;
Pros: Smarter splits, respects natural language boundaries&lt;br&gt;
Cons: Still fixed-size, just more intelligent about where to cut&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Semantic Chunking&lt;/strong&gt;&lt;br&gt;
Instead of splitting by size, this approach embeds every sentence and splits where the semantic similarity between adjacent sentences drops below a threshold. Topics stay together, topic boundaries become chunk boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def semantic_chunking(documents, threshold=0.6, min_chunk_size=200):
    # Embed every sentence
    # Split where cosine similarity drops below threshold
    # Merge tiny chunks to ensure minimum size
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key insight: The threshold matters enormously. At 0.8 I got 3281 tiny chunks that couldn't answer questions. Lowering to 0.6 produced 1123 meaningful chunks that performed much better.&lt;/p&gt;

&lt;p&gt;Result: 1123 chunks (after tuning)&lt;br&gt;
Pros: Topically coherent chunks, great for complex documents&lt;br&gt;
Cons: Slow (embeds every sentence), sensitive to threshold choice&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Hierarchical Chunking&lt;/strong&gt;&lt;br&gt;
Store small chunks for precise retrieval, but return their larger parent chunk to the LLM for rich context. Best of both worlds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=50)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)

for parent in parent_splitter.split_text(text):
    for child in child_splitter.split_text(parent):
        chunks.append({"text": child, "parent_text": parent, ...})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During retrieval, the child chunk is used to find the right section, but the parent text is returned to the LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if strategy == "hierarchical" and "parent_text" in result.payload:
    text = result.payload["parent_text"]  # Return richer context
else:
    text = result.payload["text"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: 1070 chunks&lt;br&gt;
Pros: Precise retrieval + rich context, perfect faithfulness scores&lt;br&gt;
Cons: More storage, context recall can suffer if parent chunks are too broad&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced RAG Techniques&lt;/strong&gt;&lt;br&gt;
Beyond chunking, I added three techniques to improve retrieval quality:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query Rewriting&lt;/strong&gt;&lt;br&gt;
Before searching, the LLM generates 3 variations of the user's query to capture different aspects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Original: "What was Apple's total revenue in 2023?"
# Rewritten:
# 1. "What was Apple Inc.'s total revenue for fiscal year ending September 2023?"
# 2. "How much revenue did Apple generate during its 2023 fiscal period?"
# 3. "What is Apple's consolidated revenue for the twelve months ending 2023?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each variation searches the vector store independently, results are deduplicated and ranked by score.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Search (Dense + BM25)&lt;/strong&gt;&lt;br&gt;
Combines dense vector search (semantic meaning) with BM25 keyword search (exact term matching). Financial documents have specific numbers and terminology where exact matching helps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Dense search score * 0.7 + BM25 score * 0.3
combined_score = dense_score * 0.7 + bm25_score * 0.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Contextual Compression&lt;/strong&gt;&lt;br&gt;
Before passing chunks to the LLM, extract only the sentences relevant to the query. Reduces noise and token usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# From a 500-word chunk about Apple's products and revenue,
# extract only the 2 sentences about revenue figures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Vector Storage with Qdrant&lt;/strong&gt;&lt;br&gt;
I chose Qdrant over ChromaDB for its better performance, built-in hybrid search support, and production-readiness. Running locally via Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -p 6333:6333 qdrant/qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chunking strategy gets its own collection (2048-dimensional vectors from the NVIDIA embedding model):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;COLLECTION_NAMES = {
    "fixed_size": "fixed_size_collection",
    "recursive": "recursive_collection",
    "semantic": "semantic_collection",
    "hierarchical": "hierarchical_collection"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One lesson learned: upsert in batches of 100, not all at once. Sending 1000+ points in a single request causes connection timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation Framework&lt;/strong&gt;&lt;br&gt;
I originally planned to use RAGAS but ran into dependency conflicts with the latest version. Instead of spending hours fighting package versions, I built custom LLM-as-judge metrics — which actually gives more control and transparency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 4 Metrics&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Faithfulness —&lt;/strong&gt; Does the answer stick to the retrieved context, or does the model hallucinate?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer Relevance —&lt;/strong&gt; Does the response actually address the question asked?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Precision —&lt;/strong&gt; Of what was retrieved, how much was actually relevant?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Recall —&lt;/strong&gt; Does the context contain enough information to answer the question?&lt;/p&gt;

&lt;p&gt;Each metric prompts the LLM to return a score between 0.0 and 1.0:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def faithfulness(answer, contexts):
    context = "\n\n".join([c[:300] for c in contexts])
    prompt = f"""Given this context: {context}
And this answer: {answer}
Is the answer fully supported by the context? Reply with just a number: 1.0 for yes, 0.5 for partially, 0.0 for no."""
    return llm_score(prompt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LangSmith Tracing&lt;/strong&gt;&lt;br&gt;
Every pipeline run — query, strategy, response, contexts, and all 4 metric scores — is logged to LangSmith automatically. This runs silently in the background and gives a full audit trail of every evaluation run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;br&gt;
After evaluating all 4 strategies on 5 financial questions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Faithfulness&lt;/th&gt;
&lt;th&gt;Ans. Relevance&lt;/th&gt;
&lt;th&gt;Ctx. Precision&lt;/th&gt;
&lt;th&gt;Ctx. Recall&lt;/th&gt;
&lt;th&gt;Overall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed-size&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.62&lt;/td&gt;
&lt;td&gt;0.60&lt;/td&gt;
&lt;td&gt;0.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recursive&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.89&lt;/td&gt;
&lt;td&gt;0.60&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.76&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;td&gt;0.86 🏆&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hierarchical&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.69&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.81&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Findings&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic chunking wins overall (0.86)&lt;/strong&gt; — After tuning the threshold from 0.8 to 0.6, semantic chunking produced the best faithfulness (0.90) and context recall (0.80). Topically coherent chunks mean the LLM gets focused, relevant context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hierarchical has perfect faithfulness (1.00)&lt;/strong&gt; — Returning parent text to the LLM means it always has rich, complete context to work with. No hallucination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recursive has best context precision (0.89)&lt;/strong&gt; — Smart splitting means retrieved chunks are highly relevant to the query.&lt;/p&gt;

&lt;p&gt;Fixed-size is weakest but simplest — Works fine as a baseline but leaves performance on the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streamlit Chatbot&lt;/strong&gt;&lt;br&gt;
To make the project interactive, I built a Streamlit UI that lets you switch between chunking strategies in real time and see retrieved contexts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;strategy = st.selectbox("Chunking Strategy", 
    ["fixed_size", "recursive", "semantic", "hierarchical"])

if prompt := st.chat_input("Ask about Apple's financials..."):
    result = rag_pipeline(prompt, strategy, use_rewriting=True, use_compression=True)
    st.markdown(result["response"])

    with st.expander("Retrieved Contexts"):
        for ctx in result["contexts"]:
            st.markdown(ctx)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with streamlit run app.py. Try asking comparison questions like "How did iPhone revenue change between 2022 and 2023?" to see how different strategies handle multi-document retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lessons Learned&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;1. The NVIDIA Nemotron model needs special handling&lt;/strong&gt;&lt;br&gt;
The model has a built-in thinking mode. Always set max_tokens=8192 and chat_template_kwargs: {"thinking": False} or you'll get None responses as the model exhausts its token budget on internal reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Semantic chunking threshold is critical&lt;/strong&gt;&lt;br&gt;
Threshold of 0.8 → 3281 tiny, useless chunks. Threshold of 0.6 → 1123 meaningful chunks. Always add a minimum chunk size as a guard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Hierarchical chunking needs parent text for retrieval&lt;/strong&gt;&lt;br&gt;
If you retrieve child chunks but pass child text to the LLM, context recall suffers. Always return the parent text to the LLM while using the child for retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Batch your Qdrant upserts&lt;/strong&gt;&lt;br&gt;
Sending all vectors at once causes connection timeouts. Batch in groups of 100.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Build custom eval metrics when RAGAS doesn't cooperate&lt;/strong&gt;&lt;br&gt;
Dependency conflicts are real. Custom LLM-as-judge metrics are transparent, flexible, and work with any model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Evaluation reveals what tuning hides&lt;/strong&gt;&lt;br&gt;
Without evaluation, I would never have caught that semantic was producing tiny useless chunks, or that hierarchical was ignoring parent text. Run eval early and often.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full source code available at: &lt;a href="https://github.com/IsaacNatarajan/Advanced-RAG/" rel="noopener noreferrer"&gt;https://github.com/IsaacNatarajan/Advanced-RAG/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built with NVIDIA NIM, Qdrant, LangChain, LangSmith, and Streamlit. If you found this useful, drop a ❤️ and feel free to ask questions in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>beginners</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
