<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: surajrkhonde</title>
    <description>The latest articles on DEV Community by surajrkhonde (@surajrkhonde).</description>
    <link>https://dev.to/surajrkhonde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1105683%2F1391e193-ba7a-4e01-8e5e-4e607fd467db.png</url>
      <title>DEV Community: surajrkhonde</title>
      <link>https://dev.to/surajrkhonde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/surajrkhonde"/>
    <language>en</language>
    <item>
      <title>Phase 2: Embeddings &amp; Semantic Search</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Tue, 23 Jun 2026 08:51:10 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/phase-2-embeddings-semantic-search-3lco</link>
      <guid>https://dev.to/surajrkhonde/phase-2-embeddings-semantic-search-3lco</guid>
      <description>&lt;h2&gt;
  
  
  From Text to Vectors: The Complete Story
&lt;/h2&gt;




&lt;h2&gt;
  
  
  The Story Starts: Why Can't We Just Search for Words?
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle! Phase 1 was done. Now we have clean chunks. Can't we just search for keywords?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me ask you something. Someone applies for a job and says:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resume says:&lt;/strong&gt; "I managed a team of 10 people for 3 years"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Job Description asks:&lt;/strong&gt; "Do you have leadership experience?"&lt;/p&gt;

&lt;p&gt;Are these related?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Obviously yes! But the words are different. Team management and leadership.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. A computer doing simple keyword search would say NO.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keyword Search:
Resume: "team management"
Question: "leadership"
Match? NO (different words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the problem Phase 2 solves.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; By understanding &lt;strong&gt;meaning&lt;/strong&gt;, not just words.&lt;/p&gt;




&lt;h2&gt;
  
  
  PHASE 2 OVERVIEW: Three Key Concepts
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Document Ingestion ✅
    ↓
Phase 2: Embeddings &amp;amp; Semantic Search ← WE ARE HERE
    │
    ├─ Step 1: Tokenization
    │         (Text → Token IDs)
    │
    ├─ Step 2: Embedding Layer
    │         (Token IDs → Vectors)
    │
    └─ Step 3: Semantic Search
              (Find Similar Vectors)
    ↓
Phase 3: Safety &amp;amp; Production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  STEP 1: TOKENIZATION
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is Tokenization?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Computers don't understand words like "React" or "JavaScript". They only understand numbers.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;tokenizer&lt;/strong&gt; converts text into numbers called &lt;strong&gt;token IDs&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Text:           "React is awesome"
                    ↓
Tokenizer converts to:
Token IDs:      [4521, 318, 9876]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But why not just one number per word?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good question. Look at this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Word:&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="s2"&gt;"JavaScriptDeveloper"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Tokenizer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;might&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;see:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Script"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Developer"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;←&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tokens&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"JavaScript"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Developer"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;←&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tokens&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Depends&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tokenizer's&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;vocabulary.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Is This Important?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because the embedding model doesn't receive text. It receives token IDs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Text Input:
"Employees must submit resignation 60 days before"
           ↓
Tokenizer (internal to embedding model):
[1001, 245, 782, 4521, 99, 1234, ...]
           ↓
Embedding Layer looks up each token:
1001 → [0.12, -0.45, 0.78, ...]
245  → [0.88, -0.12, 0.34, ...]
           ↓
Combines them into final vector
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real Example: Your Name
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Your name is Suraj Khonde. How does the tokenizer handle it?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If the tokenizer has seen "Suraj" often:
"Suraj" → Single Token → 5234

If not:
"Suraj" → Breaks into: ["Su", "raj"]
                      → [5234, 9876]

Same with "Khonde":
If common: → Single token
If rare:   → Multiple tokens

Result might be:
"Suraj Khonde" → [5234, 9876, 7621, 3344]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So less common words use more tokens?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! This is why we count tokens, not words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word count:&lt;/strong&gt; 2 words&lt;br&gt;
&lt;strong&gt;Token count:&lt;/strong&gt; 4 tokens&lt;/p&gt;

&lt;p&gt;For billing and cost, we count tokens!&lt;/p&gt;


&lt;h2&gt;
  
  
  STEP 2: EMBEDDING LAYER
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The Bridge: From Numbers to Meaning
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Now we have token IDs. But they're meaningless.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Token ID: 4521

This is just an identifier. Like an employee ID.
Employee #4521 is just a number.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedding layer transforms this into meaningful vectors.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think of a giant lookup table inside the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Token ID    Vector (Meaning)
─────────────────────────────────────────
4521   →  [0.13, -0.46, 0.79, ... 1536 values]
318    →  [0.91, -0.21, 0.12, ... 1536 values]
9876   →  [0.44, 0.77, -0.33, ... 1536 values]

When model sees token 4521, it looks up:
4521 → [0.13, -0.46, 0.79, ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What are those numbers? [0.13, -0.46, 0.79]?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Those numbers &lt;strong&gt;encode meaning&lt;/strong&gt;. They describe what the token represents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who Created These Numbers?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; When the embedding model was first created, these numbers were mostly &lt;strong&gt;random&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But then something magical happened: &lt;strong&gt;training&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;During training on billions of documents, the model learned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React appears with JavaScript many times
↓
So React and JavaScript vectors move closer

Node.js appears with Express many times
↓
So Node.js and Express vectors move closer

Dog appears with Puppy many times
↓
So Dog and Puppy vectors move closer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After training on billions of examples, the vectors ended up in meaningful positions.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So React and JavaScript have similar vectors?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! That's why semantic search works.&lt;/p&gt;




&lt;h2&gt;
  
  
  KEY CONCEPT: Why 1536 Dimensions?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem with Too Few Dimensions
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Suppose the embedding only used 1 number:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React   → 5
Python  → 7
JavaScript → 6

Now what? React and JavaScript are close (5 vs 6), 
but so are React and Python (5 vs 7).
All three look similar with one number!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;❌ Not enough information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why So Many Numbers?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Language is incredibly complex. The vector needs to capture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dimension 1: Frontend vs Backend
Dimension 2: Web vs Systems Programming
Dimension 3: Dynamic vs Compiled
Dimension 4: Syntax Style
Dimension 5: Ecosystem Size
...
Dimension 1536: [something the model learned]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each dimension captures a different aspect of meaning.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So with 1536 dimensions, React can be described as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React: [0.9 (very frontend), 0.2 (mostly web), 0.95 (dynamic), ...]
Node.js: [0.3 (somewhat frontend), 0.8 (backend), 0.95 (dynamic), ...]
Python: [0.1 (rarely frontend), 0.9 (backend), 0.5 (compiled), ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! And notice:&lt;/p&gt;

&lt;p&gt;React and Node.js both have 0.95 (dynamic) → similar there&lt;br&gt;
React is 0.9 frontend, Node is 0.3 → different there&lt;br&gt;
Python is 0.5 (middle) → different from both&lt;/p&gt;

&lt;p&gt;This richness requires &lt;strong&gt;many dimensions&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Mental Model: Google Maps
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think of Google Maps. Bangalore has coordinates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Latitude: 12.9716
Longitude: 77.5946

These 2 numbers uniquely identify Bangalore.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Embeddings work similarly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React has 1536 coordinates in "semantic space":
[0.13, -0.46, 0.79, ..., X1536]

This position uniquely identifies React's meaning.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  STEP 3: SEMANTIC SEARCH
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Keyword Search vs Semantic Search
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So now we have vectors for all chunks. How do we search?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me show you the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keyword Search:
─────────────────
User asks: "What is notice period?"
System searches for exact words: "notice" + "period"

Document contains: "resignation requirements"
Match? NO ❌

Semantic Search:
────────────────
User asks: "What is notice period?"
System converts to vector: [0.21, -0.44, 0.88, ...]

Document: "resignation requirements"
System converts to vector: [0.22, -0.40, 0.85, ...]

Compare vectors: Very similar! ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cosine Similarity: How Similar Are Two Vectors?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; We need a way to measure "how similar" two vectors are.&lt;/p&gt;

&lt;p&gt;Think of vectors as &lt;strong&gt;arrows pointing in a direction&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Vector A: ↗️ (pointing up-right)
Vector B: ↗️ (pointing up-right)
Similarity: HIGH (same direction)

Vector A: ↗️
Vector B: →️ (pointing right)
Similarity: MEDIUM (somewhat same)

Vector A: ↗️
Vector B: ↙️ (pointing down-left)
Similarity: LOW (opposite directions)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So cosine similarity measures "angle between vectors"?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! In mathematics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cosine Similarity Score Range:
─────────────────────────────
1.0  = Vectors point in exactly same direction (identical)
0.5  = 60° angle between them (medium similarity)
0.0  = 90° angle (unrelated)
-0.5 = 120° angle (somewhat opposite)
-1.0 = Opposite directions (contradictory)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real Example
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Suppose vector DB contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question Vector: 
"What is notice period?" 
→ [0.21, -0.44, 0.88, ...]

Document Chunks:
─────────────────────────────────────
Chunk 1: "Employees must serve 30 days notice period"
→ [0.22, -0.40, 0.85, ...]
Cosine Similarity: 0.95 ✅ (Very Similar)

Chunk 2: "Resignation must be submitted in writing"
→ [0.19, -0.42, 0.87, ...]
Cosine Similarity: 0.92 ✅ (Similar)

Chunk 3: "Office timings are 9 AM to 5 PM"
→ [0.05, 0.88, -0.12, ...]
Cosine Similarity: 0.15 ❌ (Not Related)

Chunk 4: "Coffee is available in cafeteria"
→ [-0.45, 0.12, 0.33, ...]
Cosine Similarity: -0.05 ❌ (Unrelated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system returns the top 3 chunks with highest similarity.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why top 3 and not all?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt; - Why send irrelevant chunks to LLM?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window&lt;/strong&gt; - LLM has limited capacity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; - More tokens = more money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt; - Less noise = better answers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is called &lt;strong&gt;Top-K Retrieval&lt;/strong&gt; (usually K=3-5).&lt;/p&gt;




&lt;h2&gt;
  
  
  Complete Flow: Query Time
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Now let's trace a complete query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USER ASKS:
"What is our notice period?"

STEP 1: Tokenization (inside embedding model)
─────────────────────────────────────
"What is our notice period?"
→ Tokenizer
→ [789, 44, 123, 555, 321]

STEP 2: Embedding (inside embedding model)
─────────────────────────────────────
Token IDs: [789, 44, 123, 555, 321]
→ Embedding Layer looks up each
→ Combines into one vector
Query Vector: [0.21, -0.44, 0.88, ... 1536 values]

STEP 3: Vector Database Search
─────────────────────────────────────
Query Vector: [0.21, -0.44, 0.88, ...]
↓
Compare against ALL stored chunk vectors
↓
Compute cosine similarity for each
↓
Sort by score (highest first)
↓
Return Top-K (usually 5)

STEP 4: Pass to LLM
─────────────────────────────────────
Retrieved chunks:
1. "Employees receive 30 days notice period"
2. "Manager approval required for resignation"
3. "Notice period starts from submission date"

Prompt:
Context:
[chunk 1]
[chunk 2]
[chunk 3]

Question: What is our notice period?

STEP 5: Generate Answer
─────────────────────────────────────
LLM reads context and answers:
"The notice period is 30 days from submission."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Question Everyone Asks: "Where's the Magic?"
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So the magic is... the vectors?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes and no. The magic is in &lt;strong&gt;how vectors are created&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;During training on billions of documents, the model learned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Notice period
↔
Resignation requirements
↔
Notice duration

These concepts appear together many times.
So their vectors end up close to each other.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's not magic. It's &lt;strong&gt;pattern recognition at scale&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Different Models Give Different Vectors
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; If I use OpenAI embeddings, then switch to Cohere, what happens?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Different model = different vector space.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI text-embedding-3:
"React" → [0.13, -0.46, 0.79, ...]

Cohere Embed:
"React" → [0.57, 0.12, -0.34, ...]

Different numbers!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Can I compare them?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; NO! They're in different "semantic spaces".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;It's like:
Celsius: 25 degrees
Fahrenheit: 77 degrees

Different scales. Can't compare directly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you change embedding models, you must &lt;strong&gt;re-embed everything&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Concern: Model Migrations
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where RAG gets expensive.&lt;/p&gt;

&lt;p&gt;Your company has:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100,000 documents
After chunking: 5 million chunks
After embedding: 5 million vectors stored
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New, better embedding model appears:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Old Model: 92% accuracy
New Model: 95% accuracy

But cost of migration: $50,000 (re-embedding 5M vectors)

CTO's question:
"Is 3% improvement worth $50,000?"

Usually NO. So we don't upgrade.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So companies are stuck with old models?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes, until the improvement is significant enough. This is why:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Many enterprises still use &lt;strong&gt;old APIs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Many use &lt;strong&gt;old databases&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Migration is &lt;strong&gt;expensive and risky&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The best technology doesn't always win. The best &lt;strong&gt;ROI wins&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Companies: Gradual Migration
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Smart companies do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Old documents (stored 2 years ago)
→ Use old embedding model

New documents (uploaded today)
→ Use new embedding model

Gradually migrate in background:
- When storage capacity allows
- When costs are low
- When nobody notices the performance dip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why Store Original Chunk Text?
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, we stored the vector. Why also store the chunk text?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Great question! It's for the future.&lt;/p&gt;

&lt;p&gt;Scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Today:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Using&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;V&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Store:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"chunk_text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Employees receive 30 days notice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;-0.44&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"embedding_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text-embedding-3-small"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Tomorrow:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;V&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;available&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(better!)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;we&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;do:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Old&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Vector&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;New&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Vector?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;NO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;❌&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;But&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;we&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;do:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Old&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;New&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;New&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Vector&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;YES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;✅&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;why&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;chunk_text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;precious!&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without chunk_text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You'd have to:
1. Find original PDF
2. Parse it again
3. Clean it again
4. Chunk it again
5. Embed it with new model

= Huge pain and cost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Vector Database: Making Search Fast
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: 5 Million Vectors!
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; If we have 5 million vectors, don't we compare with all of them?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query Vector
↓
Compare with 5 million vectors = 5 million calculations
↓
Takes 30 seconds

User waits 30 seconds. Bad experience.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where &lt;strong&gt;Vector Databases&lt;/strong&gt; come in (Pinecone, Qdrant, pgvector).&lt;/p&gt;

&lt;h3&gt;
  
  
  How Vector Databases Work Fast
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; They use &lt;strong&gt;indexing&lt;/strong&gt;. Like Google Maps.&lt;/p&gt;

&lt;p&gt;Google doesn't search the entire world to find Bangalore. It uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Regions → Countries → States → Cities → Exact Location
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similarly, vector databases use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Approximate Nearest Neighbor (ANN)
↓
Divides the semantic space into regions
↓
When you search, quickly finds which region
↓
Only searches that region
↓
Returns nearest neighbors in 100ms instead of 30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Interview-Level Answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q1: What is Tokenization?
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Here's your answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Tokenization is the process of breaking down text into smaller units called tokens and converting them into token IDs. Since embedding models work with numbers, not text, a tokenizer converts words or subwords into numeric identifiers. For example, 'React is awesome' might become [4521, 318, 9876] where each number represents a token. The granularity depends on the tokenizer's vocabulary - common words might be single tokens, while rare words could be split into multiple tokens."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q2: What is an Embedding?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"An embedding is a vector representation of text that captures semantic meaning. When a token ID is passed to the embedding layer, the model looks it up in a learned embedding matrix, returning 1536 numbers (dimensions) that collectively represent the token's meaning. These numbers are learned during the model's training process - similar concepts end up with vectors that point in similar directions in this high-dimensional space."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q3: Why 1536 Dimensions?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"One number cannot capture the complexity of language. The embedding model needs to represent many relationships simultaneously: whether a word relates to programming, frontend vs backend, syntax style, ecosystem size, etc. With 1536 dimensions, each dimension can capture a different aspect of meaning. This richness allows the model to distinguish between concepts that would look similar with fewer dimensions."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q4: How Does Semantic Search Work?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Semantic search works by converting both queries and documents into embeddings (vectors), then using cosine similarity to find documents whose vectors point in similar directions. If a user asks 'notice period' and a document discusses 'resignation requirements', both will have vectors that point in similar directions despite using different words, because they were learned to be related during the model's training on billions of documents."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q5: What Happens When You Change Embedding Models?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"When you switch embedding models, you must re-embed all stored chunks because different models produce vectors in different vector spaces. You can't compare vectors from Model A with query vectors from Model B - it's like mixing Celsius and Fahrenheit. Companies must evaluate the accuracy improvement against the cost of re-indexing. For example, improving from 92% to 95% accuracy might not justify a $50,000 re-embedding cost."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Complete Architecture So Far
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PHASE 1: DOCUMENT INGESTION ✅
─────────────────────────────
PDF Upload
  ↓
File Hash Check
  ↓
Parse &amp;amp; Clean
  ↓
Chunking
  ↓
Deduplication
  ↓
Store Chunk Text

PHASE 2: EMBEDDINGS ← YOU ARE HERE
─────────────────────────────
Chunk Text
  ↓
Tokenization (internal to embedding model)
  ↓
Token IDs
  ↓
Embedding Layer
  ↓
Vector
  ↓
Store in Vector Database

PHASE 3: QUERY TIME
─────────────────────────────
User Question
  ↓
Tokenization + Embedding
  ↓
Query Vector
  ↓
Vector Database Search (ANN)
  ↓
Top-K Similar Chunks
  ↓
Send to LLM
  ↓
Generate Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary: What Phase 2 Solves
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Phase 1&lt;/th&gt;
&lt;th&gt;Phase 2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Large Documents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Chunking&lt;/td&gt;
&lt;td&gt;✅ Handled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exact Keyword Matching&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Works&lt;/td&gt;
&lt;td&gt;❌ Fails for synonyms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Similar Concepts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Keyword search misses&lt;/td&gt;
&lt;td&gt;✅ Vector similarity finds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scale (Millions of chunks)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Search is slow&lt;/td&gt;
&lt;td&gt;✅ Vector DB indexes fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Meaning Understanding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Can't understand meaning&lt;/td&gt;
&lt;td&gt;✅ Embeddings capture it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tokenization&lt;/strong&gt; = Text → Numbers (done by embedding model internally)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt; = Numbers → Meaningful vectors (captures semantic relationships)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Search&lt;/strong&gt; = Find vectors pointing in similar direction (cosine similarity)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Databases&lt;/strong&gt; = Index for fast nearest-neighbor search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-embedding Cost&lt;/strong&gt; = Why companies don't switch models often&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Text Storage&lt;/strong&gt; = Insurance for future model upgrades&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Next: Phase 2 Node.js Code
&lt;/h2&gt;

&lt;p&gt;Now that you understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How tokenization works&lt;/li&gt;
&lt;li&gt;How embeddings are created&lt;/li&gt;
&lt;li&gt;How semantic search finds similar documents&lt;/li&gt;
&lt;li&gt;How to measure similarity&lt;/li&gt;
&lt;li&gt;Production considerations for migrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll implement Phase 2 in Node.js:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create embeddings for chunks&lt;/li&gt;
&lt;li&gt;Store vectors in pgvector&lt;/li&gt;
&lt;li&gt;Implement semantic search&lt;/li&gt;
&lt;li&gt;Handle model versioning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready?&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Remember: Less noise, more action. Phase 2 is where the actual intelligence happens.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>Phase 1: Document Ingestion - The Hidden Complexity Before Embeddings</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:46:44 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/phase-1-document-ingestion-the-hidden-complexity-before-embeddings-4d32</link>
      <guid>https://dev.to/surajrkhonde/phase-1-document-ingestion-the-hidden-complexity-before-embeddings-4d32</guid>
      <description>&lt;h2&gt;
  
  
  The Complete Story: Why Most RAG Systems Fail Before They Start
&lt;/h2&gt;




&lt;h2&gt;
  
  
  The Story Begins: Why Your Upload Button Is Just The Beginning
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle! I finally built my RAG system. User uploads a PDF, system finds answers. Simple, right?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; (smiles knowingly) You uploaded a PDF and got an answer?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Yes! It works!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Did you get the &lt;em&gt;right&lt;/em&gt; answer?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Well... sometimes. Why?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because between "user uploads PDF" and "system creates embeddings", there are 15 critical steps. Skip even one, and your system fails silently. You get wrong answers and don't know why.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; 15 steps? I just embedded the text!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. That's the problem. Come, let me show you what production engineers actually do.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 15 Steps of Phase 1: Document Ingestion
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think of it like cooking biryani. You don't just dump rice and meat, right?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; No, you prepare everything first!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. Before you cook, you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Wash rice&lt;/li&gt;
&lt;li&gt;Soak rice&lt;/li&gt;
&lt;li&gt;Prepare meat&lt;/li&gt;
&lt;li&gt;Marinate meat&lt;/li&gt;
&lt;li&gt;Chop onions&lt;/li&gt;
&lt;li&gt;... and many more steps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Only THEN you cook.&lt;/p&gt;

&lt;p&gt;The same with documents. Before embeddings, you must:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Document Upload (user action)
2. File Hashing (did we see this before?)
3. PDF Parsing (extract text)
4. Text Extraction (convert PDF → text)
5. Text Cleaning (remove junk)
6. Metadata Extraction (add context)
7. Chunking (split smartly)
8. Chunk Boundaries (don't break meaning)
9. Chunk Size (1000 tokens, not 100)
10. Overlap (context continuity)
11. Chunk Hashing (detect changes)
12. Deduplication (prevent duplicates)
13. Versioning (handle updates)
14. Incremental Ingestion (avoid re-embedding)
15. Cost Optimization (save money)

ONLY THEN:
   ↓
   Embeddings
   ↓
   Vector DB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's a lot! Where do I start?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; With step 1. Let's go slow. The foundation must be solid.&lt;/p&gt;




&lt;h2&gt;
  
  
  PHASE 1, STEP 1-2: Document Upload &amp;amp; File Hashing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Does File Hashing Matter?
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why hash a file? Why not just upload it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because humans are lazy. The HR person uploads the same PDF three times. Your system processes it three times. Three embeddings created. Three times the cost.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So hash prevents duplicates?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. But here's the trick: don't hash the filename.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why not?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because someone could change the content and keep the same name. Look:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;File: HR_Policy.pdf (Version 1)
Content: "30 days notice required"
Filename Hash: HR_Policy.pdf

File: HR_Policy.pdf (Version 2)
Content: "7 days notice required"
Filename Hash: HR_Policy.pdf (SAME!)

System thinks: Same file!
Reality: Different policies!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a disaster.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So we hash the content?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. The actual binary data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HR_Policy.pdf (Version 1)
↓
[PDF binary bytes: 0xAA 0xBB 0xCC ...]
↓
SHA256
↓
A7B82C1F9D3E...

HR_Policy.pdf (Version 2)
↓
[PDF binary bytes: 0xAA 0xBB 0xDD ...]  ← Different!
↓
SHA256
↓
X9Z47M3Q2K1L...  ← Different hash!

System detects: New file, process it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1-2: Node.js Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/ingestion/fileHasher.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;path&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/database&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;FileHashResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;alreadyExists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;fileSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Step 1-2: Upload file and check if already processed
 * 
 * ⚠️ CRITICAL POINTS:
 * 1. Hash FILE CONTENT, not filename
 * 2. Same content → same hash (deterministic)
 * 3. Different content → different hash (even if same filename)
 * 4. Check database before processing
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleFileUpload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;FileHashResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;statSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 1: Read file and create hash from content&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;File upload started&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileSize&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;File hashed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;contentBytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Check if this exact file was already processed&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existingFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;oneOrNone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`
      SELECT id, created_at 
      FROM documents 
      WHERE file_hash = $1 AND tenant_id = $2
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existingFile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Duplicate file detected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;firstUploadedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;existingFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;created_at&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;alreadyExists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;fileSize&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 3: File is new, save metadata&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;documentRecord&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`
      INSERT INTO documents (tenant_id, file_name, file_hash, file_size, status)
      VALUES ($1, $2, $3, $4, $5)
      RETURNING id, created_at
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;uploaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;New file registered&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;documentRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;alreadyExists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileSize&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;File upload error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Verify file integrity (optional but recommended)
 * If file is corrupted, don't process it
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifyFileIntegrity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expectedHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;actualHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;actualHash&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;expectedHash&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Database schema for documents table&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;documentsTableSQL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
CREATE TABLE IF NOT EXISTS documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL REFERENCES tenants(id),
  file_name VARCHAR(500) NOT NULL,
  file_hash VARCHAR(64) NOT NULL,  -- SHA256 produces 64 char hex
  file_size BIGINT NOT NULL,
  status VARCHAR(50) DEFAULT 'uploaded',  -- uploaded, parsing, parsed, chunking, chunked, embedding, complete
  error_message TEXT,
  uploaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  -- Unique constraint: same file can't be uploaded twice for same tenant
  CONSTRAINT unique_file_per_tenant UNIQUE(tenant_id, file_hash),

  INDEX idx_file_hash (file_hash),
  INDEX idx_tenant_status (tenant_id, status)
);
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So if someone uploads the same PDF twice, we detect it and skip?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. And we save embedding cost. Which is the most expensive step.&lt;/p&gt;




&lt;h2&gt;
  
  
  PHASE 1, STEP 3: PDF Parsing - Choosing the Right Tool
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Now we have the PDF. How do we extract text?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where the real decision happens. There are five major tools. Each has tradeoffs.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Five?! Which one should I use?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Depends on your documents. Let me show you.&lt;/p&gt;

&lt;h3&gt;
  
  
  The PDF Parsing Landscape
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Simple Text PDFs          →  pdf-parse (cheap, simple)
                          ↓
Mixed content (text + tables) →  PDFPlumber (better)
                          ↓
Complex documents        →  Unstructured (production-grade)
                          ↓
Advanced documents       →  LlamaParse (state-of-art)
(tables, images, OCR)
                          ↓
Enterprise documents     →  Azure Document Intelligence
(forms, invoices, scans)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Table Support&lt;/th&gt;
&lt;th&gt;OCR&lt;/th&gt;
&lt;th&gt;Metadata&lt;/th&gt;
&lt;th&gt;Production Ready&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pdf-parse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple text&lt;/td&gt;
&lt;td&gt;₹0 (free)&lt;/td&gt;
&lt;td&gt;⚡⚡⚡ Fast&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;⚠️ Hobby&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDFPlumber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text + tables&lt;/td&gt;
&lt;td&gt;₹0&lt;/td&gt;
&lt;td&gt;⚡⚡ Medium&lt;/td&gt;
&lt;td&gt;✓ Basic&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;⚠️ Limited&lt;/td&gt;
&lt;td&gt;⚠️ Small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unstructured&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Normal docs&lt;/td&gt;
&lt;td&gt;₹50-200/mo&lt;/td&gt;
&lt;td&gt;⚡ Slow&lt;/td&gt;
&lt;td&gt;✓ Good&lt;/td&gt;
&lt;td&gt;✓ Basic&lt;/td&gt;
&lt;td&gt;✓ Good&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LlamaParse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex docs&lt;/td&gt;
&lt;td&gt;₹100-500/mo&lt;/td&gt;
&lt;td&gt;⚡ Slow&lt;/td&gt;
&lt;td&gt;✓ Excellent&lt;/td&gt;
&lt;td&gt;✓ Advanced&lt;/td&gt;
&lt;td&gt;✓ Excellent&lt;/td&gt;
&lt;td&gt;✓ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure Doc Int.&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;₹500-2000/mo&lt;/td&gt;
&lt;td&gt;⚡ Medium&lt;/td&gt;
&lt;td&gt;✓ Perfect&lt;/td&gt;
&lt;td&gt;✓ Perfect&lt;/td&gt;
&lt;td&gt;✓ Perfect&lt;/td&gt;
&lt;td&gt;✓ Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me explain each.&lt;/p&gt;




&lt;h3&gt;
  
  
  Tool 1: pdf-parse (Free, Simple)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simple approach - good for learning, bad for production&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pdf-parse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractTextFromPDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dataBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dataBuffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Raw text&lt;/span&gt;
  &lt;span class="c1"&gt;// Output: "Company Policy Leave Policy Employees receive..."&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Notice what we lost?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original PDF:
═══════════════════════════════════════
COMPANY POLICY
───────────────────────────────────────
Leave Policy (HEADING)

Employees are entitled to 24 paid leaves annually.
(PARAGRAPH)

Leave Types:
- Annual Leave
- Casual Leave
(LIST)
═══════════════════════════════════════

pdf-parse Output:
"COMPANY POLICY Leave Policy Employees are entitled to 24 paid leaves annually. Leave Types: Annual Leave Casual Leave"

LOST:
✗ Heading level
✗ List structure
✗ Paragraph breaks
✗ Section organization
✗ Tables (if any)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So we just get text soup?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. And when you embed soup, you get soup answers.&lt;/p&gt;




&lt;h3&gt;
  
  
  Tool 2: PDFPlumber (Better, Still Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# PDFPlumber - extracts tables better
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pdfplumber&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pdfplumber&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract text
&lt;/span&gt;        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Extract tables (if any)
&lt;/span&gt;        &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_tables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Page &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tables: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tables&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Better, but still loses structure. And it's Python, not Node.js.&lt;/p&gt;




&lt;h3&gt;
  
  
  Tool 3: Unstructured (Production Choice)
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is what most companies use. It preserves structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Using Unstructured via API (Node.js friendly)&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Step 3: Parse PDF using Unstructured
 * 
 * IMPORTANT: Unstructured preserves document structure
 * Returns: Array of structured elements
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parsePDFWithUnstructured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base64Content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.unstructuredapp.io/general/v0/general&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;base64Content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hi_res&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// High resolution parsing&lt;/span&gt;
        &lt;span class="na"&gt;coordinates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;     &lt;span class="c1"&gt;// Preserve coordinates (useful for tables)&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;UNSTRUCTURED_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Output: Structured elements&lt;/span&gt;
    &lt;span class="c1"&gt;// [&lt;/span&gt;
    &lt;span class="c1"&gt;//   { type: "Title", text: "Leave Policy", metadata: {...} },&lt;/span&gt;
    &lt;span class="c1"&gt;//   { type: "Heading", text: "Annual Leave", metadata: {...} },&lt;/span&gt;
    &lt;span class="c1"&gt;//   { type: "Paragraph", text: "Employees are entitled...", metadata: {...} },&lt;/span&gt;
    &lt;span class="c1"&gt;//   { type: "ListItem", text: "Manager approval required", metadata: {...} }&lt;/span&gt;
    &lt;span class="c1"&gt;// ]&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PDF parsed with structure preserved&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;elementCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="na"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unstructured parsing failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So it preserves structure?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. Look at the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Unstructured&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Output:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="err"&gt;type:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;text:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Leave Policy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;metadata:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;page_number:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;section:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Policies"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="err"&gt;type:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Paragraph"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;text:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Employees receive 24 leaves"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;metadata:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;page_number:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="err"&gt;type:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"List"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;text:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Annual Leave, Casual Leave"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;metadata:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;page_number:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;list_items:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Now&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;we&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;know:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;What&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;title&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;What&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;body&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;text&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;What&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;list&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;numbers&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Sections&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Tool 4: LlamaParse (State of the Art)
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; For really complex documents, use LlamaParse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LlamaParse - best for complex PDFs&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;FormData&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;form-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Parse PDF with LlamaParse (best for:
 * - Multi-column layouts
 * - Tables with merged cells
 * - Images with text
 * - Scanned documents (OCR)
 * - Footnotes and annotations
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parsePDFWithLlamaParse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Step 1: Upload file&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FormData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createReadStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;parsing_instruction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`
      Extract all content including:
      - Tables with proper structure
      - Column layouts
      - Images with OCR
      - Headings and sections
      - Metadata like page numbers
    `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;uploadResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.llamaindex.ai/api/parsing/upload_file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLAMAPARSE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getHeaders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jobId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;uploadResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Poll for results&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;statusResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`https://api.llamaindex.ai/api/parsing/job/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/result/markdown`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLAMAPARSE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;statusResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;statusResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="c1"&gt;// Wait 2 seconds before retry&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LlamaParse completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;jobId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;parsedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LlamaParse failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; When do I use LlamaParse vs Unstructured?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Simple rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document type?

  ├─ Simple text policies
  │  └─→ pdf-parse (free)
  │
  ├─ Text + basic tables
  │  └─→ Unstructured (cheap, good)
  │
  ├─ Complex tables, multi-column
  │  └─→ LlamaParse (excellent)
  │
  └─ Enterprise documents, forms, invoices
     └─→ Azure Document Intelligence (best)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Tool 5: Azure Document Intelligence (Enterprise)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Azure Document Intelligence - for enterprise documents&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DocumentAnalysisClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;AzureKeyCredential&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@azure/ai-form-recognizer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Parse with Azure (best for:
 * - Invoices
 * - Forms
 * - Bank documents
 * - Scanned PDFs with OCR
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseWithAzureDocumentIntelligence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DocumentAnalysisClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AZURE_FORM_RECOGNIZER_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AzureKeyCredential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AZURE_FORM_RECOGNIZER_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Choose model based on document type&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;poller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;beginAnalyzeDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prebuilt-document&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Or: prebuilt-invoice, prebuilt-receipt, etc.&lt;/span&gt;
      &lt;span class="nx"&gt;fileContent&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;poller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pollUntilDone&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract structure&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rowCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;columnCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;cells&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cells&lt;/span&gt;
      &lt;span class="p"&gt;})),&lt;/span&gt;
      &lt;span class="na"&gt;forms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// For form data&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Azure parsing complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Azure parsing failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; These all cost money?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes, except pdf-parse. And here's the secret: you should still use pdf-parse for simple documents!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Cost. If 80% of your documents are simple policies, use pdf-parse. Use expensive parsers only for 20% that need them.&lt;/p&gt;




&lt;h2&gt;
  
  
  PHASE 1, STEP 4-5: Text Extraction &amp;amp; Cleaning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Garbage Problem
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; After parsing, you get garbage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw Extracted Text:

═════════════════════════════════════════════════════════════
Company Name                          Page 1
═════════════════════════════════════════════════════════════
[CONFIDENTIAL]

LEAVE POLICY

[Document Version: 2.3]

Employees are entitled to 24 paid leaves annually.
These leaves must be approved by:
1. Direct Manager
2. HR Department

[CONFIDENTIAL]
═════════════════════════════════════════════════════════════
Company Name                          Page 2
═════════════════════════════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's wrong with this?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; When you embed "[CONFIDENTIAL]", your system learns that every policy is confidential. When someone asks "Is this policy confidential?", the answer is always YES!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 4-5: Clean the text&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Step 4-5: Extract and Clean text
 * 
 * Remove:
 * - Page headers/footers
 * - Repeated watermarks
 * - Extra whitespace
 * - Page numbers
 * - Confidential markings (if generic)
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;cleanExtractedText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove page headers (appears at top of every page)&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sr"&gt;/^Company Name&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+Page &lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*$/gm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove confidential markings (if generic)&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="sr"&gt;CONFIDENTIAL&lt;/span&gt;&lt;span class="se"&gt;\]\n?&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove document version&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="sr"&gt;Document Version: &lt;/span&gt;&lt;span class="se"&gt;[\d&lt;/span&gt;&lt;span class="sr"&gt;.&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove page numbers alone on a line&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*Page &lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*$/gm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Normalize whitespace (multiple spaces/newlines → single space)&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove leading/trailing whitespace&lt;/span&gt;
  &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Text cleaned&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;originalLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;cleanedLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;removed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Example&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
═════════════════════
Company Name    Page 1
═════════════════════
[CONFIDENTIAL]

LEAVE POLICY

Employees get 24 leaves.

[CONFIDENTIAL]
═════════════════════
Company Name    Page 2
═════════════════════
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cleanExtractedText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Output: "LEAVE POLICY Employees get 24 leaves."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PHASE 1, STEP 6: Metadata Extraction
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Metadata is information ABOUT the chunk, not IN the chunk.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Like what?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document: HR_Policy.pdf
Department: Human Resources
Section: Leave Policy
Page: 3
Version: 2.3
Effective Date: 2025-01-01
Author: HR Manager
Source: Internal Wiki
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This metadata helps retrieval.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Suppose someone asks: "What's the HR policy for leaves?"&lt;/p&gt;

&lt;p&gt;Without metadata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System searches all documents&lt;/li&gt;
&lt;li&gt;Returns leaves policies from EVERY department&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With metadata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System knows to search only HR documents&lt;/li&gt;
&lt;li&gt;Returns only HR leave policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Much better!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 6: Extract metadata&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;DocumentMetadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;department&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;section&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;effectiveDate&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Extract metadata from document structure
 * 
 * Sources:
 * 1. Folder structure: /HR/Policies/Leave_Policy.pdf
 * 2. Filename: leave_policy_v2.pdf
 * 3. Parser output: Unstructured gives headings as metadata
 * 4. LLM classification: Ask AI to classify
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractMetadataFromStructure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;parserElements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;DocumentMetadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Source 1: Folder structure&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pathParts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pathParts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;pathParts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Human Resources&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Source 2: Filename&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pathParts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;pathParts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;versionMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/v&lt;/span&gt;&lt;span class="se"&gt;(\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;versionMatch&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;versionMatch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Source 3: Parser output (if using Unstructured)&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;section&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parserElements&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;parserElements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;heading&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parserElements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Heading&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;heading&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;section&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;heading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DocumentMetadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;section&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;version&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Metadata extracted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Optional: Use LLM to classify (more powerful but costs money)
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;classifyWithLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Classify this document. Return JSON:
{
  "department": "HR|Finance|IT|Legal",
  "topic": "Leave|Salary|Security|Contracts",
  "confidentiality": "public|internal|confidential"
}

Document: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PHASE 1, STEP 7-10: Chunking - The Heart of Preparation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Chunking Matters
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is the most important step. Everything after depends on it.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; More important than embeddings?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; YES. Because bad chunks → bad embeddings → wrong answers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Chunking Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 7: Smart chunking with boundaries&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Chunking strategies:
 * 
 * 1. Fixed size (simple, dumb)
 *    ├─ Pros: Easy to understand
 *    └─ Cons: Breaks meaning
 * 
 * 2. Sentence boundaries (better)
 *    ├─ Pros: Doesn't break sentences
 *    └─ Cons: Variable sizes
 * 
 * 3. Semantic (best, most complex)
 *    ├─ Pros: Keeps meaning together
 *    └─ Cons: Expensive, slow
 * 
 * 4. Recursive (good hybrid)
 *    ├─ Pros: Smart, flexible
 *    └─ Cons: Complex implementation
 */&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;slidingWindowChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;windowTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;overlapTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 1 token ≈ 4 characters (rough estimate for English)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;windowTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;overlapTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Find sentence boundary (don't break mid-sentence)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;periodIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newlineIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;periodIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newlineIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Example&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
Employees receive 24 paid leaves annually. 
These leaves can be taken as:
1. Annual Leave (20 days)
2. Casual Leave (4 days)

Manager approval is required for:
- Leaves longer than 5 days
- Leave during peak season
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;slidingWindowChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// [&lt;/span&gt;
&lt;span class="c1"&gt;//   "Employees receive 24 paid leaves annually...",&lt;/span&gt;
&lt;span class="c1"&gt;//   "These leaves can be taken as: 1. Annual Leave..."&lt;/span&gt;
&lt;span class="c1"&gt;// ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8-10: Chunk Boundaries, Size, and Overlap
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why does size matter?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Look at three scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario 1: Too Small (100 tokens)

Chunk 1: "Employees receive"
Chunk 2: "24 paid leaves"
Chunk 3: "annually"

Result: Lost context. AI confused.

Question: "How many leaves do employees get?"
Chunk 1 alone: "Employees receive" (incomplete)
Chunk 2 alone: "24 paid leaves" (where?)
Chunk 3 alone: "annually" (what annually?)

Wrong answer!

─────────────────────────────────────

Scenario 2: Right Size (1000 tokens)

Chunk: "Employees receive 24 paid leaves annually.
         These leaves can be taken as annual or casual.
         Manager approval required for leaves &amp;gt;5 days."

Result: Full context. Clear meaning.

Question: "How many leaves do employees get?"
Answer: "24 paid leaves"

Correct!

─────────────────────────────────────

Scenario 3: Too Large (5000 tokens)

Chunk: [Entire company handbook]

Result: Too much noise.

Question: "How many leaves?"
System reads 5000 tokens to find answer.
Slow. Confused by other policies.

Wrong section highlighted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Sweet spot: 1000-1500 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Complete chunking service&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;startChar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;endChar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Steps 7-10: Complete chunking strategy
 * 
 * - Sliding window with overlap
 * - Respects sentence boundaries  
 * - Includes overlap for context
 * - Validates chunk size
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createSmartChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;windowTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;overlapTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;windowTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;overlapTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;chunkIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate not empty&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Remaining text too small, merge with previous&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lastChunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nx"&gt;lastChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Find sentence boundary&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;periodIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;periodIndex&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;periodIndex&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Estimate tokens (rough)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate: chunk should be reasonable size&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Chunk too small&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;startChar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;endChar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Chunking complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;textLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;avgChunkTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PHASE 1, STEP 11-13: Deduplication, Hashing &amp;amp; Versioning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 11: Chunk Hashing
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; After chunking, we hash each chunk.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why hash chunks? We already hashed the file!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because files change. When they do, we need to know WHICH chunks changed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HR_Policy_v1.pdf (Old)
├─ Chunk 1: "Employees get 24 leaves"
├─ Chunk 2: "Manager approval required"
└─ Chunk 3: "Casual leave guidelines"

HR_Policy_v2.pdf (New)
├─ Chunk 1: "Employees get 30 leaves"  ← CHANGED!
├─ Chunk 2: "Manager approval required" ← SAME
└─ Chunk 3: "Casual leave guidelines"   ← SAME

Old system: Re-embed all 3 chunks
Smart system: Re-embed only chunk 1

Cost saving: 66%!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 11: Hash each chunk&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Key insight: Hash is deterministic
 * 
 * Same text:
 * hashChunk("Employees get 24 leaves")
 * Always returns: A7B82C1F...
 * 
 * Different text:
 * hashChunk("Employees get 30 leaves")
 * Always returns: X9Z47M3Q...
 * 
 * Different hash = Document changed
 */&lt;/span&gt;

&lt;span class="c1"&gt;// Example&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Employees receive 24 paid leaves annually&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk1Hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// A7B82C1F9D3E4K2...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Employees receive 24 paid leaves annually&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk2Hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// A7B82C1F9D3E4K2... (SAME!)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Employees receive 30 paid leaves annually&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk3Hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// X9Z47M3Q2K1L9... (DIFFERENT!)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 12: Deduplication
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 12: Detect and skip duplicates&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Deduplication at multiple levels:
 * 
 * Level 1: Document level
 *          Have we processed this exact file?
 *          (Already done with file hash)
 * 
 * Level 2: Chunk level
 *          Have we processed this exact chunk?
 * 
 * Level 3: Semantic level
 *          Is this chunk similar to existing chunk?
 *          (Advanced, uses embeddings)
 */&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;deduplicateChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="nl"&gt;skipped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;skipped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Check if chunk exists&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existingChunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;oneOrNone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`
      SELECT id FROM chunks 
      WHERE chunk_hash = $1 AND tenant_id = $2
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;chunkHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existingChunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Duplicate chunk skipped&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;chunkHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunkHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="nx"&gt;skipped&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Deduplication complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;new&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;skipped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;skipped&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;skipped&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What if dedup changes because we change chunk size?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good question! That's versioning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 13: Versioning
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 13: Handle document versions&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleDocumentVersion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;newFileHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Check if this document (by name) was processed before&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldVersion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`
    SELECT id, version FROM documents 
    WHERE file_name = $1 AND tenant_id = $2
    ORDER BY version DESC LIMIT 1
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldVersion&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Old version exists&lt;/span&gt;
    &lt;span class="c1"&gt;// Mark it as inactive&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;none&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`
      UPDATE documents 
      SET active = false 
      WHERE id = $1
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;oldVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oldVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Create new version&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newDoc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`
    INSERT INTO documents 
    (tenant_id, file_name, file_hash, version, active)
    VALUES ($1, $2, $3, $4, true)
    RETURNING id
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newFileHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Document versioned&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;active&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newDoc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;isNewVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;oldVersion&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Database schema
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;versioningSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
CREATE TABLE documents (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  file_name VARCHAR(500),
  file_hash VARCHAR(64),
  version INTEGER DEFAULT 1,
  active BOOLEAN DEFAULT true,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  INDEX idx_active (tenant_id, active)
);
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Important: During retrieval, we search ONLY active documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Search only from active documents&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;document_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; 
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PHASE 1, STEP 14-15: Incremental Ingestion &amp;amp; Cost Optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Big Picture: Before vs After
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me show you the difference between naive and smart systems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAIVE SYSTEM
───────────────────────────────────────

File v1 uploaded
  ├─ Parse
  ├─ Chunk (1000 chunks)
  ├─ Embed ALL 1000  ← Cost: ₹50 per 1000 = ₹50
  └─ Store

File v2 uploaded (only 1 chunk changed)
  ├─ Parse
  ├─ Chunk (1000 chunks)
  ├─ Embed ALL 1000  ← Cost: ₹50 again!
  └─ Store

Total cost: ₹100
Total waste: ₹50 (50% wasted)

───────────────────────────────────────

SMART SYSTEM
───────────────────────────────────────

File v1 uploaded
  ├─ Parse
  ├─ Chunk (1000 chunks)
  ├─ Embed ALL 1000  ← Cost: ₹50
  └─ Store with chunk hashes

File v2 uploaded (only 1 chunk changed)
  ├─ Parse
  ├─ Chunk (1000 chunks)
  ├─ Compare hashes → 999 same, 1 changed
  ├─ Embed ONLY 1 chunk  ← Cost: ₹0.05 only!
  └─ Store (update index for chunk 1)

Total cost: ₹50.05
Total waste: ₹0 (99% efficient!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 14: Incremental Ingestion
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 14: Incremental ingestion (smart re-embedding)&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * The key insight:
 * 
 * Don't re-embed unchanged chunks.
 * Reuse old embeddings.
 * 
 * This is how production systems save money.
 */&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;incrementalIngestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;documentVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Get old chunks from previous version&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldDoc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`
    SELECT id FROM documents 
    WHERE file_name = (SELECT file_name FROM documents WHERE id = $1)
      AND version = $2 - 1
      AND tenant_id = $3
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentVersion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;embeddingsNeeded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;embeddingsReused&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldDoc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Get old chunks&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldChunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;manyOrNone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`
      SELECT id, chunk_hash, embedding FROM chunks 
      WHERE document_id = $1
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;oldDoc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Compare: which chunks are new/changed?&lt;/span&gt;
    &lt;span class="nx"&gt;embeddingsNeeded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newChunk&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;oldChunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oldChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunk_hash&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;newHash&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;oldChunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Chunk unchanged: reuse embedding&lt;/span&gt;
        &lt;span class="nx"&gt;embeddingsReused&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// Copy old embedding to new chunk&lt;/span&gt;
        &lt;span class="nx"&gt;newChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oldChunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Don't need to re-embed&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Need to embed&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Incremental ingestion&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;totalChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddingsNeeded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reused&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddingsReused&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;efficiency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;embeddingsReused&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;chunksToEmbed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddingsNeeded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunksCached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddingsReused&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;costSaving&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddingsReused&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.00001&lt;/span&gt;  &lt;span class="c1"&gt;// ₹ saved&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 15: Cost Optimization Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 15: Track costs and optimize&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;optimizeIngestionCosts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Strategy 1: Batch embeddings&lt;/span&gt;
  &lt;span class="c1"&gt;// Don't embed one by one. Batch them.&lt;/span&gt;
  &lt;span class="c1"&gt;// Cost: 1000 embeddings in 1 API call = cheaper than 1000 calls&lt;/span&gt;

  &lt;span class="c1"&gt;// Strategy 2: Use cheaper models for simple chunks&lt;/span&gt;
  &lt;span class="c1"&gt;// Use Gemini for obvious policy text&lt;/span&gt;
  &lt;span class="c1"&gt;// Use Claude only for complex analysis&lt;/span&gt;

  &lt;span class="c1"&gt;// Strategy 3: Cache embeddings aggressively&lt;/span&gt;
  &lt;span class="c1"&gt;// If same chunk text appears in document 1 and 2&lt;/span&gt;
  &lt;span class="c1"&gt;// Don't embed twice. Reuse.&lt;/span&gt;

  &lt;span class="c1"&gt;// Strategy 4: Incremental updates&lt;/span&gt;
  &lt;span class="c1"&gt;// Don't re-embed when document slightly changes&lt;/span&gt;
  &lt;span class="c1"&gt;// Only embed changed chunks&lt;/span&gt;

  &lt;span class="cm"&gt;/**
   * Real example: HR uploads 100 policies
   * 
   * Policy 1: 100 chunks
   * Policy 2: 100 chunks
   * Policy 3: 100 chunks (99 similar to Policy 1)
   * 
   * Naive: Embed 300 chunks = ₹15
   * Smart: Embed 200 chunks = ₹10 (saved ₹5)
   * 
   * At scale (1000 policies):
   * Naive: ₹500/month
   * Smart: ₹100/month (saved ₹400!)
   */&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costBreakdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;parsing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// ₹0.50 per document&lt;/span&gt;
    &lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// ₹50 per 1000 chunks&lt;/span&gt;
    &lt;span class="na"&gt;vectorStorage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// ₹10 per GB/month&lt;/span&gt;
    &lt;span class="na"&gt;llmClassification&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// ₹1 per document (optional)&lt;/span&gt;
    &lt;span class="na"&gt;monthlyInfrastructure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cost breakdown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;costBreakdown&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;costBreakdown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Complete Production Ingestion Flow
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me show you the complete flow as code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Complete ingestion pipeline&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/database&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Phase 1: Complete Document Ingestion Pipeline
 * 
 * This is what production systems do before embeddings.
 * 
 * Cost tracking included.
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;completeIngestionPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;chunksCreated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;chunksEmbedded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;costEstimate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunksCreated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunksEmbedded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;costEstimate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// STEP 1-2: Upload &amp;amp; File Hashing&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 1-2: File upload and hashing...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleFileUpload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alreadyExists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;File already processed. Skipping.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 3: Choose parser&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 3: Parsing PDF...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parserType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;selectParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// pdf-parse, unstructured, etc.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsedElements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;parsePDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parserType&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 4-5: Extract &amp;amp; Clean&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 4-5: Extracting and cleaning text...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cleanText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsedElements&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cleanText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cleanExtractedText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleanText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 6: Metadata&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 6: Extracting metadata...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractMetadataFromStructure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parsedElements&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 7-10: Chunking&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 7-10: Creating chunks...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createSmartChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleanText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunksCreated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 11-12: Deduplication&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 11-12: Deduplication...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;skipped&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deduplicateChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fileHash&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 13: Versioning&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 13: Handling versions...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;versionInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleDocumentVersion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fileHash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fileHash&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;documentId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;versionInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// STEP 14-15: Incremental &amp;amp; Optimize&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Step 14-15: Incremental ingestion...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;chunksToEmbed&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;incrementalIngestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;versionInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;versionInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunksEmbedded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunksToEmbed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Calculate cost&lt;/span&gt;
    &lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;costEstimate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunksToEmbed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// ₹50 per 1000&lt;/span&gt;

    &lt;span class="c1"&gt;// Save chunks to database&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;newChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hashChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;none&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`
        INSERT INTO chunks 
        (document_id, tenant_id, chunk_text, chunk_index, chunk_hash, metadata)
        VALUES ($1, $2, $3, $4, $5, $6)
        `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="nx"&gt;versionInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;chunkHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;✓ Phase 1 complete!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Return ready for Phase 2: Embeddings&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Ingestion pipeline failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Helper: Choose the right parser
 */&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;selectParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()?.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extension&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pdf&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Use document size to decide&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;statSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// &amp;lt; 1MB: simple PDF&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pdf-parse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// 1-10MB: normal PDF&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unstructured&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// &amp;gt; 10MB: complex PDF&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;llamaparse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extension&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docx-parse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pdf-parse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// default&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Complete Journey: From Upload to Ready-for-Embedding
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me show you the entire flow visually.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USER UPLOADS PDF
       ↓
   [STEP 1-2]
   File Hash Check
       ↓
   Already exists? ──YES──→ STOP (save cost)
       │
       NO
       ↓
   [STEP 3]
   Parse PDF ──→ Choose parser
                 ├─ pdf-parse (free)
                 ├─ Unstructured (cheap)
                 └─ LlamaParse (expensive)
       ↓
   [STEP 4-5]
   Extract &amp;amp; Clean ──→ Remove garbage
                       (headers, footers, etc.)
       ↓
   [STEP 6]
   Extract Metadata ──→ Add context
                        (department, section, etc.)
       ↓
   [STEP 7-10]
   Smart Chunking ──→ 1000-1500 tokens
                      with overlap &amp;amp; boundaries
       ↓
   [STEP 11-12]
   Chunk Hashing ──→ Detect duplicates
       ↓
   Skip duplicates? ──YES──→ Don't embed
       │
       NO
       ↓
   [STEP 13]
   Versioning ──→ Mark old version inactive
                  Keep new version active
       ↓
   [STEP 14-15]
   Incremental ──→ Only embed changed chunks
   Ingestion       Reuse cached embeddings
       ↓
   ✓ READY FOR EMBEDDINGS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Interview-Level Summary
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; If someone asks me in an interview about RAG ingestion?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Say this:&lt;/p&gt;

&lt;p&gt;"Phase 1 Document Ingestion has 15 critical steps before embeddings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;File Upload &amp;amp; Hashing&lt;/strong&gt; - Hash file content (not filename) to detect duplicates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF Parsing&lt;/strong&gt; - Choose tool based on document type (pdf-parse for simple, Unstructured for normal, LlamaParse for complex)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Extraction&lt;/strong&gt; - Get raw text from parser output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Cleaning&lt;/strong&gt; - Remove headers, footers, watermarks, junk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Extraction&lt;/strong&gt; - Extract department, section, version from structure or LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking&lt;/strong&gt; - Split into 1000-1500 tokens with smart boundaries (sentence/paragraph)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Size&lt;/strong&gt; - Balance between context (large) and precision (small)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Boundaries&lt;/strong&gt; - Don't break mid-sentence; respect document structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overlap&lt;/strong&gt; - 200 tokens overlap to maintain context continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Hashing&lt;/strong&gt; - Hash each chunk to detect changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deduplication&lt;/strong&gt; - Skip duplicate chunks at semantic level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning&lt;/strong&gt; - Handle document updates (v1 inactive, v2 active)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Ingestion&lt;/strong&gt; - Only re-embed changed chunks, reuse old embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Optimization&lt;/strong&gt; - Track costs, batch requests, use cheaper models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index Update&lt;/strong&gt; - Store chunks and prepare for vector embedding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: Avoid embedding unnecessary content. At enterprise scale, avoiding 100K unnecessary embeddings saves thousands per month."&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Checklist: Before You Deploy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pre-Deployment Questions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;preDeploymentChecklist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Parsing&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Have you tested parsing with your actual documents?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you handle tables, images, multi-column layouts?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Did you choose the right parser for your documents?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Deduplication&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Are you hashing file contents, not filenames?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you check for existing files before parsing?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you check for duplicate chunks before embedding?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Chunking&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Is your chunk size 800-1500 tokens?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do chunks respect sentence/paragraph boundaries?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you have 100-200 token overlap?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Metadata&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you extract metadata from document structure?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Is metadata used during retrieval?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Versioning&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you handle document updates (v1, v2, v3)?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do old versions get marked inactive?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Cost&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Are you tracking embedding costs per document?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you reuse embeddings for unchanged chunks?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Are you batching embedding requests?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Production&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Do you have error handling and retries?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Are you logging every step?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Can you rollback a bad ingestion?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Before embeddings, 15 steps matter&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skip even one, quality suffers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The right parser saves time&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pdf-parse: Free, simple&lt;/li&gt;
&lt;li&gt;Unstructured: Balanced&lt;/li&gt;
&lt;li&gt;LlamaParse: Best quality&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Chunking is everything&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Size: 1000-1500 tokens&lt;/li&gt;
&lt;li&gt;Overlap: 200 tokens&lt;/li&gt;
&lt;li&gt;Boundaries: Sentence/paragraph&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Deduplication saves money&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hash file content, not filename&lt;/li&gt;
&lt;li&gt;Hash chunks, reuse embeddings&lt;/li&gt;
&lt;li&gt;50%+ cost savings possible&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Metadata enables better retrieval&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract from structure&lt;/li&gt;
&lt;li&gt;Use during search&lt;/li&gt;
&lt;li&gt;Improve accuracy&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Production systems are incremental&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't re-embed everything&lt;/li&gt;
&lt;li&gt;Reuse cached embeddings&lt;/li&gt;
&lt;li&gt;Only process what changed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So this is what "production-grade RAG" really means?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. It's not just about embeddings and vector search. It's about doing the hard work BEFORE you search. That's what separates hobby projects from systems that work at scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;You now understand Phase 1: Document Ingestion.&lt;/p&gt;

&lt;p&gt;Next comes Phase 2: Retrieval &amp;amp; Ranking (vector search, reranking, query expansion)&lt;/p&gt;

&lt;p&gt;Then Phase 3: Safety &amp;amp; Production (hallucination prevention, monitoring, cost tracking)&lt;/p&gt;

&lt;p&gt;Ready to learn Phase 2?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, definitely! But first... can you show me a working example?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's next. We'll build a real ingestion pipeline together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reference Code Repository
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Complete implementation files:&lt;/span&gt;

1. fileHasher.ts        - File hashing, duplicate detection
2. pdfParsers.ts        - pdf-parse, Unstructured, LlamaParse
3. textCleaning.ts      - Remove junk, normalize whitespace
4. metadataExtractor.ts - Extract from structure, LLM classify
5. chunking.ts          - Sliding window, smart boundaries
6. deduplication.ts     - Chunk hashing, duplicate detection
7. versioning.ts        - Handle updates, version tracking
8. incrementalIngestion.ts - Reuse embeddings, smart updates
9. ingestionPipeline.ts - Complete flow orchestration
10. costOptimizer.ts    - Track costs, identify savings

Total: ~2000 lines of production-ready Node.js code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Remember: Less noise, more action. Phase 1 is the foundation. &lt;br&gt;
Build it right.&lt;/strong&gt; &lt;br&gt;
&lt;strong&gt;written by&lt;/strong&gt; -&lt;strong&gt;SurajK&lt;/strong&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>dataengineering</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>RAG Pipeline: Complete Node.js Implementation Guide</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Sat, 20 Jun 2026 10:27:48 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/rag-pipeline-complete-nodejs-implementation-guide-1n54</link>
      <guid>https://dev.to/surajrkhonde/rag-pipeline-complete-nodejs-implementation-guide-1n54</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Build production RAG systems in Node.js - Know where it breaks, why it works, and when to use it&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Introduction: Why Node.js for RAG?
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, why would I build RAG in Node.js? I thought this was AI stuff?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good question. Node.js is perfect for RAG because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fast I/O (API calls to Claude, PostgreSQL queries)&lt;/li&gt;
&lt;li&gt;Real-time capable (WebSockets for streaming)&lt;/li&gt;
&lt;li&gt;Easy deployment (Vercel, Railway, Render)&lt;/li&gt;
&lt;li&gt;Async/await makes complex flows clean&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Plus, you probably already have Node.js running your backend. Why add Python?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So I can build the whole thing in JavaScript?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. Frontend, backend, RAG - all JavaScript. That's the beauty.&lt;/p&gt;

&lt;p&gt;But we need to be honest about limitations. Let's talk about that too.&lt;/p&gt;




&lt;h2&gt;
  
  
  PART 1: PROJECT SETUP
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initialize Your Project
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create project&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;rag-system
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-system

&lt;span class="c"&gt;# Initialize Node.js&lt;/span&gt;
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;express dotenv @anthropic-ai/sdk pg pg-promise cors body-parser
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; nodemon typescript @types/node

&lt;span class="c"&gt;# Optional but recommended for production&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;winston helmet compression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rag-system/
├── src/
│   ├── config/
│   │   ├── database.ts         # PostgreSQL + pgvector setup
│   │   └── embedding.ts         # Claude embeddings
│   ├── services/
│   │   ├── retrieval.ts         # Vector search logic
│   │   ├── reranking.ts         # Two-stage ranking
│   │   ├── queryProcessing.ts   # Query expansion
│   │   └── safety.ts            # Hallucination prevention
│   ├── routes/
│   │   └── rag.ts               # API endpoints
│   ├── utils/
│   │   ├── logger.ts            # Logging (critical for debugging)
│   │   └── metrics.ts           # Track recall, precision
│   └── index.ts                 # Main server
├── .env                          # Secrets
├── package.json
└── tsconfig.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  TypeScript Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compilerOptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ES2020"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"module"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"commonjs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lib"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ES2020"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"outDir"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rootDir"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./src"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"strict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"esModuleInterop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"skipLibCheck"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"forceConsistentCasingInFileNames"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 2: DATABASE SETUP (PostgreSQL + pgvector)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Create PostgreSQL Database with pgvector
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is your foundation. Get it wrong, everything breaks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Connect to PostgreSQL&lt;/span&gt;
&lt;span class="n"&gt;psql&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;U&lt;/span&gt; &lt;span class="n"&gt;postgres&lt;/span&gt;

&lt;span class="c1"&gt;-- Create database&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Connect to the database&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;

&lt;span class="c1"&gt;-- Install pgvector extension&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create resumes table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;resumes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;candidate_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;-- CRITICAL: tenant isolation&lt;/span&gt;
  &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create chunks table (where vectors live)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;resumes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_index&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;-- The vector: 1536 dimensions for Claude embeddings&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;-- Metadata for debugging&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;-- CRITICAL: Always check tenant&lt;/span&gt;
  &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation_chunks&lt;/span&gt; 
    &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;tenants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create indexes&lt;/span&gt;
&lt;span class="c1"&gt;-- 1. Vector index for fast search (MOST IMPORTANT)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_resume_chunks_embedding&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- 2. Tenant index (security)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_resume_chunks_tenant&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- 3. Text search index (keyword search)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_resume_chunks_text&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;-- Create tenants table (multi-tenancy)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;tenants&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create query logs (for metrics)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;query_logs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;tenants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;latency_ms&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nb"&gt;precision&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;cost_cents&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create index on query logs for analytics&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_query_logs_tenant&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;query_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Database Connection Service
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where your first failure point lives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/config/database.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;pgPromise&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pg-promise&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;dotenv&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dotenv&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;initOptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Detailed error info (critical for debugging)&lt;/span&gt;
  &lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database Error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="c1"&gt;// Connection events&lt;/span&gt;
  &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database connected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database disconnected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pgp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pgPromise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;initOptions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pgp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_HOST&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_PORT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;5432&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_NAME&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rag_system&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_USER&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;postgres&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Connection pooling&lt;/span&gt;
  &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Timeout after 5 seconds&lt;/span&gt;
  &lt;span class="na"&gt;connectionTimeoutMillis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Idle timeout&lt;/span&gt;
  &lt;span class="na"&gt;idleTimeoutMillis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Test connection on startup&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;initializeDatabase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT 1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;✓ Database connection verified&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;✗ Database connection failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Environment Variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;DB_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost
&lt;span class="nv"&gt;DB_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;span class="nv"&gt;DB_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rag_system
&lt;span class="nv"&gt;DB_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres
&lt;span class="nv"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_secure_password

&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
&lt;span class="nv"&gt;ANTHROPIC_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-3-5-sonnet-20241022

&lt;span class="nv"&gt;NODE_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3000

&lt;span class="c"&gt;# Security&lt;/span&gt;
&lt;span class="nv"&gt;ADMIN_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;super-secret-key-change-this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 3: EMBEDDING SERVICE
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create Embeddings with Claude
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where the first real cost happens. Know what can fail here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/config/embedding.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;EmbeddingResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Get embeddings for text chunks.
 * 
 * ⚠️ FAILURE POINTS:
 * 1. API rate limit (429) - implements exponential backoff
 * 2. Token too long (4096 tokens max) - chunks pre-validated
 * 3. Network timeout - retry logic built in
 * 4. Cost tracking - logs cost per embedding
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;EmbeddingResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// VALIDATION: Prevent token overrun&lt;/span&gt;
    &lt;span class="c1"&gt;// Claude's text embeddings: ~1 token = 4 chars average&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validTexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// ~4000 tokens&lt;/span&gt;
        &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Text truncated for embedding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
          &lt;span class="na"&gt;originalLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;truncatedTo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Call Claude API for embeddings&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Generate embeddings for the following texts. Return ONLY valid JSON array with "embeddings" key containing array of number arrays.

Texts:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;validTexts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;

Return format: {"embeddings": [[...], [...], ...]}`&lt;/span&gt;
      &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Parse response&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; 
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; 
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[][];&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parseError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to parse embeddings response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid embeddings response format&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate embeddings&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;validTexts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`Embedding count mismatch: got &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, expected &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;validTexts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Calculate cost (Claude 3.5 Sonnet: $0.003 per 1M input tokens)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costCents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputTokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.003&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Embeddings generated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
      &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;costCents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;costCents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;validTexts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Embedding API error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Retry logic for rate limits&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Rate limited. Waiting before retry...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;getEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Exponential backoff in real system&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Embed a single text (convenience function)
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;embedText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getEmbeddings&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Chunking Function
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember: 1000-1500 tokens, 200-token overlap.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/utils/chunking.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Break text into chunks with sliding window overlap.
 * 
 * ⚠️ FAILURE POINTS:
 * 1. Overlap larger than chunk size
 * 2. Single chunk can't hold meaningful text
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;windowTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;overlapTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Simple tokenization (1 token ≈ 4 chars for English)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;windowTokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Text is smaller than chunk size&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Text smaller than chunk window&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
      &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;windowTokens&lt;/span&gt; 
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt;
    &lt;span class="p"&gt;}];&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Calculate character window (1 token ≈ 4 chars)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;windowTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;overlapTokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;charOverlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Chunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;charWindow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Find sentence boundary to avoid splitting mid-sentence&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;periodIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newlineIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;periodIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newlineIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;charWindow&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Found good boundary&lt;/span&gt;
        &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;boundaryIndex&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Stop if we've reached the end&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Text chunked&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;originalLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;avgChunkTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 4: RETRIEVAL SERVICE
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector Search with Hybrid (Vector + Keyword)
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is the heart. Where everything lives or dies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/services/retrieval.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/database&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embedText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/embedding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;RetrievalResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;vectorDistance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;keywordScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;combinedScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Retrieve relevant chunks using hybrid search.
 * 
 * ⚠️ FAILURE POINTS:
 * 1. Missing tenant_id check → DATA BREACH
 * 2. Vector index not built → Slow queries (10s+ instead of 100ms)
 * 3. Query too long → API error
 * 4. No results → Need to handle gracefully
 * 5. Typos in query → Keyword search might fail
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;hybridSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RetrievalResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Validate inputs&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tenant_id and resume_id are required&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query cannot be empty&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query truncated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;originalLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 1: Get query embedding&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Embedding query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embedText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Vector search (fast)&lt;/span&gt;
    &lt;span class="c1"&gt;// Convert embedding to PostgreSQL format: [0.1, 0.2, ...]&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embeddingString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectorResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;manyOrNone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      SELECT 
        chunk_text,
        chunk_index,
        embedding &amp;lt;=&amp;gt; $1::vector AS vector_distance
      FROM resume_chunks
      WHERE 
        tenant_id = $2
        AND resume_id = $3
      ORDER BY vector_distance ASC
      LIMIT $4
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;embeddingString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// Get 2x to filter&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectorResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No vector results found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 3: Keyword filter (precision)&lt;/span&gt;
    &lt;span class="c1"&gt;// Only keep chunks that also match the query&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keywordResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;manyOrNone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      SELECT 
        chunk_text,
        chunk_index,
        ts_rank(
          to_tsvector('english', chunk_text), 
          plainto_tsquery('english', $1)
        ) AS keyword_score
      FROM resume_chunks
      WHERE 
        tenant_id = $2
        AND resume_id = $3
        AND to_tsvector('english', chunk_text) @@ 
            plainto_tsquery('english', $1)
      ORDER BY keyword_score DESC
      LIMIT $4
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 4: Combine results&lt;/span&gt;
    &lt;span class="c1"&gt;// Chunks that appear in both vector AND keyword search are best&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vectorResults&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vr&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;keywordResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunk_text&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;vr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;vr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;keywordScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;kr&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;kr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;keyword_score&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="c1"&gt;// Weighted score: 60% vector, 40% keyword&lt;/span&gt;
          &lt;span class="na"&gt;combinedScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;vr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kr&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;keyword_score&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hybrid search complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;resultsCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;vectorResultsCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vectorResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;keywordResultsCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;keywordResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Log for metrics&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;none&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
        INSERT INTO query_logs (tenant_id, query, latency_ms)
        VALUES ($1, $2, $3)
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;combined&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;RetrievalResult&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Retrieval error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;tenantId&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Multi-query retrieval - search with multiple variations.
 * 
 * Better recall, but slower and more expensive.
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;multiQueryRetrieval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RetrievalResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RetrievalResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;hybridSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Deduplicate by chunk text, keep highest score&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unique&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;allResults&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RetrievalResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;unique&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;combinedScore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Multi-query retrieval error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 5: RERANKING SERVICE
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Two-stage is where quality happens. First stage is fast, second is accurate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/services/reranking.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;RerankedResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Rerank chunks using Claude (more accurate but slower).
 * 
 * ⚠️ FAILURE POINTS:
 * 1. Claude API timeout (fix with timeout wrapper)
 * 2. Chunks too long (truncate before sending)
 * 3. Response parsing fails
 * 4. Cost explosion (reranking costs money - track it)
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RerankedResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 30 second timeout&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Truncate chunks to prevent token overflow&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;truncatedChunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; 
      &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Build reranking prompt&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunksFormatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;truncatedChunks&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s1"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a search relevance expert. Rank the following chunks by relevance to the query.

Query: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"

Chunks to rank:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunksFormatted&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;

Return ONLY valid JSON with this format:
{
  "rankings": [
    {"index": 0, "relevance_score": 0.95},
    {"index": 1, "relevance_score": 0.72}
  ]
}

Relevance score: 0.0 (irrelevant) to 1.0 (highly relevant)
Sort by relevance_score descending.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;
      &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Parse response&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;rankings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Extract JSON from response (might be wrapped in markdown)&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\{[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\}&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jsonStr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;rankings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rankings&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parseError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to parse reranking response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="c1"&gt;// Fallback: return original order&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
      &lt;span class="p"&gt;}));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Convert to results&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rankings&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relevance_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
      &lt;span class="p"&gt;}))&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Reranking complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;inputCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;outputCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;topScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Reranking error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;chunksCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Fallback: return original order&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;}));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 6: QUERY PROCESSING
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Expand the query so you find more relevant chunks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/services/queryProcessing.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Expand a query into related search terms.
 * 
 * ⚠️ FAILURE POINTS:
 * 1. LLM generates irrelevant expansions
 * 2. Original query lost in expansion
 * 3. Too many expansions → slow retrieval
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;expandQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Given this query about a job candidate, generate 2-3 alternative phrasings or related concepts that would help find relevant information.

Original query: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"

Return ONLY a JSON array of strings:
["alternative1", "alternative2", "alternative3"]

These should help find the same information using different keywords.`&lt;/span&gt;
      &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;expansions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;expansions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to parse query expansion&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Always include original query&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allQueries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;expansions&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query expanded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;original&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;expansions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allQueries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;allQueries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query expansion error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;originalQuery&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c1"&gt;// Fallback&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Normalize query (remove typos, standardize terms).
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizeQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;// Remove extra spaces&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;// Remove special characters (keep alphanumeric and spaces)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[^\w\s]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 7: SAFETY &amp;amp; HALLUCINATION PREVENTION
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where you prevent the AI from lying. Critical.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/services/safety.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;SafeAnswer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nl"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Get answer from AI with multiple safety layers.
 * 
 * ⚠️ FAILURE POINTS:
 * 1. AI answer not in JSON format
 * 2. Confidence calculation wrong
 * 3. Evidence doesn't exist in chunks
 * 4. Excessive cost for failed attempts
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;safeAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;confidenceThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;SafeAnswer&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No relevant information found.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="na"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No source chunks provided&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Layer 1: Retrieval boundaries&lt;/span&gt;
    &lt;span class="c1"&gt;// Show ONLY the chunks, nothing from training&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunksText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`[Chunk &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s1"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are evaluating a candidate resume based on specific chunks.

INSTRUCTIONS:
1. Answer ONLY based on the provided chunks
2. Do NOT use any knowledge from training data
3. If information is not in chunks, say "Unknown"
4. Always cite which chunk supports your answer
5. Return valid JSON ONLY - no other text

Query: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"

Chunks provided:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;chunksText&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;

Return JSON in this exact format:
{
  "answer": "your answer here",
  "confidence": 0.0 to 1.0,
  "evidence_chunks": [0, 1, 2],
  "explanation": "why you're confident"
}`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Layer 2: Structured output&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;
      &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Parse response&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\{[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\}&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;jsonMatch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jsonStr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to parse safety response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error processing answer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="na"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid response format&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Layer 3: Validation&lt;/span&gt;
    &lt;span class="c1"&gt;// Check evidence chunks actually exist&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validEvidenceIndices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;evidence_chunks&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="na"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;validEvidenceIndices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No valid evidence for answer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;requestedIndices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;evidence_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;chunksCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;evidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;validEvidenceIndices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="na"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Layer 4: Confidence gating&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isSafe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;confidenceThreshold&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Low confidence answer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;confidenceThreshold&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Safe answer generated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;evidenceCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;isSafe&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Confident&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Low confidence&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Safety check error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
      &lt;span class="na"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Validate that answer is faithful to evidence.
 * Post-check: does answer match the chunks?
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;validateFaithfulness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;isFaithful&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Simple check: are key terms from answer in evidence?&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;answerTerms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;evidenceText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchedTerms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;answerTerms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;term&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; 
      &lt;span class="nx"&gt;term&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;evidenceText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;term&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;answerTerms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; 
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;matchedTerms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;answerTerms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; 
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;isFaithful&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;score&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Faithfulness validation error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;isFaithful&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 8: API ROUTES &amp;amp; ENDPOINTS
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is what the client calls. Make it robust.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/routes/rag.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/database&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;hybridSearch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;multiQueryRetrieval&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../services/retrieval&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;rerank&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../services/reranking&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;expandQuery&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../services/queryProcessing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;safeAnswer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validateFaithfulness&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../services/safety&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;chunkText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/chunking&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getEmbeddings&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../config/embedding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Middleware: Check authentication&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Missing API key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// In production, validate against database&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ADMIN_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid API key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Upload and process a resume.
 * POST /rag/upload
 */&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/upload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;candidateName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;candidateName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Missing required fields: tenantId, candidateName, resumeText&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; 
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 1: Save resume&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resumeResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      INSERT INTO resumes (tenant_id, candidate_name, raw_text)
      VALUES ($1, $2, $3)
      RETURNING id
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;candidateName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;resumeResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Chunk the resume&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Resume chunked&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 3: Get embeddings for all chunks&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkTexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embeddingResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunkTexts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 4: Save chunks with embeddings&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;embeddingResults&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embeddingArray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;none&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
        INSERT INTO resume_chunks 
        (resume_id, tenant_id, chunk_text, chunk_index, embedding)
        VALUES ($1, $2, $3, $4, $5::vector)
      `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;embeddingArray&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Resume uploaded successfully&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Resume for &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;candidateName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; processed successfully`&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Upload error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Query a resume.
 * POST /rag/query
 */&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useExpansion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;resumeId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Missing required fields: tenantId, resumeId, question&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 1: Expand query if requested&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;useExpansion&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expandQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query expanded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: Retrieve chunks (multi-query if expanded)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useExpansion&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;multiQueryRetrieval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;hybridSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resumeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No relevant information found in resume.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="na"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 3: Rerank for accuracy&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topChunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 4: Get safe answer with evidence&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;safeAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topChunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 5: Validate faithfulness (optional)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;faithfulness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;validateFaithfulness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;safeAns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isSafe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;faithfulness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;faithfulness&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;chunksRetrieved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;chunksReranked&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Query error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Get metrics for a tenant.
 * GET /rag/metrics/:tenantId
 */&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/metrics/:tenantId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tenantId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Query logs aggregation&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      SELECT 
        COUNT(*) as query_count,
        AVG(latency_ms) as avg_latency,
        MAX(latency_ms) as max_latency,
        MIN(latency_ms) as min_latency,
        AVG(recall) as avg_recall,
        AVG(precision) as avg_precision,
        SUM(cost_cents) / 100.0 as total_cost_dollars
      FROM query_logs
      WHERE tenant_id = $1
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;metrics&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Metrics error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 9: MAIN SERVER
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/index.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cors&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;compression&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compression&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;helmet&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;helmet&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;dotenv&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dotenv&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;ragRoutes&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./routes/rag&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;initializeDatabase&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./config/database&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./utils/logger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Middleware&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;helmet&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// Security headers&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;compression&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// Compress responses&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;cors&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// Health check&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/health&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// RAG routes&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/rag&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ragRoutes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Error handler&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NextFunction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unhandled error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Internal server error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Start server&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Initialize database&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;initializeDatabase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Server running on port &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to start server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Logging Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/utils/logger.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;winston&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createLogger&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LOG_LEVEL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;// Console in development&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Console&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorize&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; [&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;level&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;
            &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="c1"&gt;// File for production&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
      &lt;span class="na"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs/error.log&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;winston&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
      &lt;span class="na"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs/combined.log&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 10: FAILURE POINTS &amp;amp; SOLUTIONS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Summary Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Point&lt;/th&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Missing tenant_id&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data leak between companies&lt;/td&gt;
&lt;td&gt;No isolation check&lt;/td&gt;
&lt;td&gt;Add WHERE tenant_id = X to EVERY query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector index missing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Queries take 10+ seconds&lt;/td&gt;
&lt;td&gt;Sequential scan of 500K vectors&lt;/td&gt;
&lt;td&gt;Create IVFFLAT index on embedding column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query too long&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API error 4096 tokens exceeded&lt;/td&gt;
&lt;td&gt;Question &amp;gt;16000 chars&lt;/td&gt;
&lt;td&gt;Truncate queries to 500 chars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No results&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Empty array returned&lt;/td&gt;
&lt;td&gt;Chunks don't exist or embeddings wrong&lt;/td&gt;
&lt;td&gt;Check if chunks were saved, verify vector distance threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hallucination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI invents information&lt;/td&gt;
&lt;td&gt;No retrieval boundaries in prompt&lt;/td&gt;
&lt;td&gt;Use safety layers (5 layers as described)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate limit (429)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API call fails&lt;/td&gt;
&lt;td&gt;Too many requests to Claude&lt;/td&gt;
&lt;td&gt;Implement exponential backoff, queue requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database connection lost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Cannot connect to server"&lt;/td&gt;
&lt;td&gt;Network issue, DB down, wrong credentials&lt;/td&gt;
&lt;td&gt;Add retry logic, connection pooling, health checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding dimension mismatch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Vector dimension 1536 != 768"&lt;/td&gt;
&lt;td&gt;Using different embedding model&lt;/td&gt;
&lt;td&gt;Ensure consistent model (claude-3-5-sonnet)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory overload&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js crashes&lt;/td&gt;
&lt;td&gt;Trying to embed entire 100MB file&lt;/td&gt;
&lt;td&gt;Chunk before embedding, process in batches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost explosion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unexpected $10k bill&lt;/td&gt;
&lt;td&gt;Each embedding/rerank/answer costs money&lt;/td&gt;
&lt;td&gt;Track costs, log them, set spending limits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  PART 11: PROS AND CONS OF RAG
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PROS ✓
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Accuracy&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✓ AI answers from evidence, not training data&lt;/li&gt;
&lt;li&gt;✓ No hallucinations (with proper safety layers)&lt;/li&gt;
&lt;li&gt;✓ Can trace every answer to source
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without RAG: "Does John know Docker?" → Guess → "maybe, looks like it"
With RAG: "Does John know Docker?" → Evidence → "Yes. His resume says: 'Docker, Kubernetes, 4 years'"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. &lt;strong&gt;Up-to-Date&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✓ Works with information from last month&lt;/li&gt;
&lt;li&gt;✓ No retraining needed&lt;/li&gt;
&lt;li&gt;✓ Can add 1000 new resumes per day&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Explainability&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✓ Users see &lt;em&gt;why&lt;/em&gt; the system answered&lt;/li&gt;
&lt;li&gt;✓ Can audit decisions&lt;/li&gt;
&lt;li&gt;✓ Legally defensible&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Cost&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✓ Cheaper than retraining models&lt;/li&gt;
&lt;li&gt;✓ Only pay for what you use&lt;/li&gt;
&lt;li&gt;✓ PostgreSQL is free (open source)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;Flexibility&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✓ Can use with ANY language&lt;/li&gt;
&lt;li&gt;✓ Works with specialized domains&lt;/li&gt;
&lt;li&gt;✓ Easy to update documents&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CONS ✗
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Complexity&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ More moving parts: embeddings, database, retrieval, reranking, safety&lt;/li&gt;
&lt;li&gt;✗ More things to break&lt;/li&gt;
&lt;li&gt;✗ More things to debug
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Single LLM: Simple
RAG: Embeddings → Vector DB → Retrieval → Reranking → Safety checks
Each stage can fail independently
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. &lt;strong&gt;Retrieval Quality&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ If wrong chunks retrieved, answer is wrong&lt;/li&gt;
&lt;li&gt;✗ Embeddings are fuzzy (not perfect matches)&lt;/li&gt;
&lt;li&gt;✗ Rare topics might not be in training data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resume: "Worked with distributed ledger technology"
Search: "blockchain"
Result: Miss (DLT ≠ blockchain in embeddings)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. &lt;strong&gt;Cost of Operations&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ Each embedding: $0.00001&lt;/li&gt;
&lt;li&gt;✗ Each rerank: $0.0001&lt;/li&gt;
&lt;li&gt;✗ Each LLM answer: $0.001&lt;/li&gt;
&lt;li&gt;✗ Adds up fast with scale
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100K queries/month = $100/month just for operations
(Not including infrastructure, salaries, etc)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. &lt;strong&gt;Latency&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ Vector search: 100ms&lt;/li&gt;
&lt;li&gt;✗ Reranking: 300-500ms&lt;/li&gt;
&lt;li&gt;✗ Safety check: 200ms&lt;/li&gt;
&lt;li&gt;✗ Total: 600-800ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;User expects &amp;lt;200ms. RAG adds delay.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;Data Quality Matters&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ If source documents are bad, answers are bad&lt;/li&gt;
&lt;li&gt;✗ Garbage in = garbage out&lt;/li&gt;
&lt;li&gt;✗ No amount of engineering fixes bad data
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resume full of typos: "React" → "Rreact" → Embeddings confused
→ System can't find React skills
→ Answer is wrong
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  6. &lt;strong&gt;Scaling is Hard&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;✗ Vector index maintenance complex&lt;/li&gt;
&lt;li&gt;✗ Multi-tenancy requires isolation at every level&lt;/li&gt;
&lt;li&gt;✗ Database queries become slow with 10M+ chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use RAG
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Use RAG?&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customer support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;Up-to-date, explainable, no hallucinations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Medical diagnosis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;Safety-critical, needs evidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resume screening&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;Domain-specific, needs accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;General chatbot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;Training data sufficient, latency matters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Quick facts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;Simple lookup is faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Creative writing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;Hallucinations are features, not bugs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MAYBE&lt;/td&gt;
&lt;td&gt;Depends on code freshness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legal documents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;Must cite sources, no mistakes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When NOT to Use RAG
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Simple lookup → Use database
2. Conversational → Use base LLM
3. Speed critical → Too slow (600ms+)
4. Data quality poor → Garbage in/out
5. Training data sufficient → No value-add
6. Cost-sensitive → Each query costs money
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PART 12: COST ANALYSIS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Break Down of Costs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Per-Query Costs (approximate):

1. Embedding query:
   - 50 tokens @ $0.000003/token = $0.00015

2. Vector search + keyword filter:
   - Database operation ≈ $0 (hosted: ~$0.00001)

3. Reranking (Claude):
   - 500 tokens input + 100 output @ $0.003/$0.015 = $0.0018

4. Final answer:
   - 500 tokens input + 200 output @ $0.003/$0.015 = $0.004

Total per query: ≈ $0.0075 (~0.75 cents)

At scale:
- 1K queries/month = $7.50
- 100K queries/month = $750
- 1M queries/month = $7,500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to Reduce Costs
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cache results&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same question asked twice? Use Redis cache&lt;/li&gt;
&lt;li&gt;Saves: 80% of cost&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Batch processing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't embed one by one&lt;/li&gt;
&lt;li&gt;Embed 100 at a time&lt;/li&gt;
&lt;li&gt;Saves: 20% (batch discounts)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Smart reranking&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only rerank if needed&lt;/li&gt;
&lt;li&gt;Don't rerank for obvious matches&lt;/li&gt;
&lt;li&gt;Saves: 30-50%&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cheaper models&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Claude Haiku for simple tasks&lt;/li&gt;
&lt;li&gt;Claude Opus only for complex reasoning&lt;/li&gt;
&lt;li&gt;Saves: 80%&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  PART 13: DEPLOYMENT &amp;amp; PRODUCTION
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Docker Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:18-alpine&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Docker Compose (Local Dev)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;

&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:15-alpine&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rag_system&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_data:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432:5432"&lt;/span&gt;

  &lt;span class="na"&gt;pgvector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.pgvector&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;

  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DB_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;DB_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;DB_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rag_system&lt;/span&gt;
      &lt;span class="na"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${ANTHROPIC_API_KEY}&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deployment Checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Environment variables set correctly&lt;/li&gt;
&lt;li&gt;[ ] Database indexes created&lt;/li&gt;
&lt;li&gt;[ ] API key securely stored (use AWS Secrets Manager)&lt;/li&gt;
&lt;li&gt;[ ] Logging configured (CloudWatch or similar)&lt;/li&gt;
&lt;li&gt;[ ] Monitoring set up (CPU, memory, latency)&lt;/li&gt;
&lt;li&gt;[ ] Rate limiting enabled&lt;/li&gt;
&lt;li&gt;[ ] Database backups enabled&lt;/li&gt;
&lt;li&gt;[ ] Error alerts configured&lt;/li&gt;
&lt;li&gt;[ ] Cost monitoring enabled&lt;/li&gt;
&lt;li&gt;[ ] Security scan passed&lt;/li&gt;
&lt;li&gt;[ ] Load test passed (1000 req/sec)&lt;/li&gt;
&lt;li&gt;[ ] Graceful shutdown implemented&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  PART 14: EXAMPLE QUERIES
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Complete End-to-End Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Upload a resume&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/rag/upload &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: super-secret-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "tenantId": "company-1",
    "candidateName": "John Doe",
    "resumeText": "John has 5 years React experience, built e-commerce platforms with Node.js..."
  }'&lt;/span&gt;

&lt;span class="c"&gt;# Response:&lt;/span&gt;
&lt;span class="c"&gt;# {&lt;/span&gt;
&lt;span class="c"&gt;#   "success": true,&lt;/span&gt;
&lt;span class="c"&gt;#   "resumeId": "uuid-123",&lt;/span&gt;
&lt;span class="c"&gt;#   "chunkCount": 8&lt;/span&gt;
&lt;span class="c"&gt;# }&lt;/span&gt;

&lt;span class="c"&gt;# 2. Query the resume&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/rag/query &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: super-secret-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "tenantId": "company-1",
    "resumeId": "uuid-123",
    "question": "Does John have React experience?",
    "useExpansion": true
  }'&lt;/span&gt;

&lt;span class="c"&gt;# Response:&lt;/span&gt;
&lt;span class="c"&gt;# {&lt;/span&gt;
&lt;span class="c"&gt;#   "success": true,&lt;/span&gt;
&lt;span class="c"&gt;#   "answer": "Yes, John has 5 years of React experience.",&lt;/span&gt;
&lt;span class="c"&gt;#   "confidence": 0.95,&lt;/span&gt;
&lt;span class="c"&gt;#   "evidence": ["John has 5 years React experience, built e-commerce..."],&lt;/span&gt;
&lt;span class="c"&gt;#   "isSafe": true,&lt;/span&gt;
&lt;span class="c"&gt;#   "faithfulness": 0.92,&lt;/span&gt;
&lt;span class="c"&gt;#   "latency": 720,&lt;/span&gt;
&lt;span class="c"&gt;#   "chunksRetrieved": 10,&lt;/span&gt;
&lt;span class="c"&gt;#   "chunksReranked": 5&lt;/span&gt;
&lt;span class="c"&gt;# }&lt;/span&gt;

&lt;span class="c"&gt;# 3. Get metrics&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET http://localhost:3000/rag/metrics/company-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: super-secret-key"&lt;/span&gt;

&lt;span class="c"&gt;# Response:&lt;/span&gt;
&lt;span class="c"&gt;# {&lt;/span&gt;
&lt;span class="c"&gt;#   "success": true,&lt;/span&gt;
&lt;span class="c"&gt;#   "metrics": {&lt;/span&gt;
&lt;span class="c"&gt;#     "query_count": 42,&lt;/span&gt;
&lt;span class="c"&gt;#     "avg_latency": 680,&lt;/span&gt;
&lt;span class="c"&gt;#     "max_latency": 1200,&lt;/span&gt;
&lt;span class="c"&gt;#     "avg_recall": 0.91,&lt;/span&gt;
&lt;span class="c"&gt;#     "avg_precision": 0.88,&lt;/span&gt;
&lt;span class="c"&gt;#     "total_cost_dollars": 0.32&lt;/span&gt;
&lt;span class="c"&gt;#   }&lt;/span&gt;
&lt;span class="c"&gt;# }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  FINAL CHECKLIST: BEFORE YOU SHIP
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Functionality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Embeddings working (call Claude API successfully)&lt;/li&gt;
&lt;li&gt;[ ] PostgreSQL storing vectors correctly&lt;/li&gt;
&lt;li&gt;[ ] Hybrid search returning results&lt;/li&gt;
&lt;li&gt;[ ] Reranking improving quality&lt;/li&gt;
&lt;li&gt;[ ] Safety layers preventing hallucinations&lt;/li&gt;
&lt;li&gt;[ ] Multi-tenancy isolated (tenant1 can't see tenant2)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Vector search &amp;lt;200ms&lt;/li&gt;
&lt;li&gt;[ ] Full query &amp;lt;1000ms&lt;/li&gt;
&lt;li&gt;[ ] No N+1 queries&lt;/li&gt;
&lt;li&gt;[ ] Database queries indexed&lt;/li&gt;
&lt;li&gt;[ ] Caching working&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] API keys not logged&lt;/li&gt;
&lt;li&gt;[ ] Tenant isolation verified (manual test)&lt;/li&gt;
&lt;li&gt;[ ] SQL injection prevented (parameterized queries)&lt;/li&gt;
&lt;li&gt;[ ] Rate limiting enabled&lt;/li&gt;
&lt;li&gt;[ ] HTTPS enforced&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Operations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Logging configured&lt;/li&gt;
&lt;li&gt;[ ] Error alerts set up&lt;/li&gt;
&lt;li&gt;[ ] Cost tracking enabled&lt;/li&gt;
&lt;li&gt;[ ] Database backups tested&lt;/li&gt;
&lt;li&gt;[ ] Graceful error handling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Query latency tracked&lt;/li&gt;
&lt;li&gt;[ ] Recall/precision measured&lt;/li&gt;
&lt;li&gt;[ ] Cost per query logged&lt;/li&gt;
&lt;li&gt;[ ] Error rate tracked&lt;/li&gt;
&lt;li&gt;[ ] Uptime monitored&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Reference: Common Issues &amp;amp; Fixes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue: Queries Take 10+ Seconds
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Missing vector index&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; 
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: Same Query Returns Different Results
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Stale embeddings or non-deterministic reranking&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Always embed with same model, use fixed random seed for reranking&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue: AI Answers with Wrong Information
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Insufficient safety layers&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Add confidence thresholding, require citations, validate faithfulness&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue: "Tenant1 sees Tenant2 data"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Missing tenant_id check in WHERE clause&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Add WHERE tenant_id = $X to EVERY query&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue: API calls get rate limited (429)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Too many rapid requests to Claude&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Implement exponential backoff, queue requests, batch operations&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue: Cost explodes unexpectedly
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; No cost tracking, inefficient queries, excessive reranking&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Log cost per operation, implement budgets, use cheaper models for easy tasks&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle's Final Word:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RAG is powerful but complex. Every layer serves a purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings&lt;/strong&gt; = understand meaning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector DB&lt;/strong&gt; = store and search fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; = find relevant information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranking&lt;/strong&gt; = improve quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt; = prevent lies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; = track costs and metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't need all of this on day one. Start simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1:&lt;/strong&gt; PostgreSQL + embeddings + basic search&lt;br&gt;
&lt;strong&gt;Week 1:&lt;/strong&gt; Add reranking&lt;br&gt;
&lt;strong&gt;Month 1:&lt;/strong&gt; Add safety layers&lt;br&gt;
&lt;strong&gt;Month 3:&lt;/strong&gt; Add monitoring and optimization&lt;/p&gt;

&lt;p&gt;Each layer buys you something. Know what you're buying.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; When should I NOT use RAG?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; When:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is stable and doesn't change&lt;/li&gt;
&lt;li&gt;You need sub-200ms latency&lt;/li&gt;
&lt;li&gt;You're paying $10/month and can't afford $100+&lt;/li&gt;
&lt;li&gt;You're building a simple chatbot&lt;/li&gt;
&lt;li&gt;Your source data has low quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Otherwise? RAG is the way.&lt;/p&gt;

&lt;p&gt;Now go build. Start simple. Measure everything. Ship fast.&lt;/p&gt;

&lt;p&gt;Good luck. ---SurajK&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>programming</category>
      <category>code</category>
    </item>
    <item>
      <title>RAG Pipeline: The Uncle-Nephew Complete Learning Guide</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Sat, 20 Jun 2026 09:44:17 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/rag-pipeline-the-uncle-nephew-complete-learning-guide-7h4</link>
      <guid>https://dev.to/surajrkhonde/rag-pipeline-the-uncle-nephew-complete-learning-guide-7h4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to Build Systems That Actually Know Your Data (Not Hallucinate About It)&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Introduction: The Story Begins
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I keep hearing "RAG this, RAG that" in tech interviews. When I ask what it means, people throw around words like "Retrieval-Augmented Generation" and I just nod like I understand. But honestly? I'm lost.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; (laughing) That's the best honest question I've heard all week. Let me ask you something first. If I gave you a question right now - "What year did India win the World Cup?" - how would you answer?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Well... I'd pull up Google, search for it, read the answer, then tell you.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. You don't answer from memory alone. You go &lt;em&gt;fetch the information&lt;/em&gt; first, then &lt;em&gt;answer based on what you found&lt;/em&gt;. That's RAG in real life. And that simple idea - fetch first, answer after - fixes almost every problem we face with AI today.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But uncle, AI can remember things from its training. Why does it need to fetch?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Ah! That's where we land in trouble. Come, sit...&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 1: RAG FUNDAMENTALS - The Core Concept
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem We're Actually Solving
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine you're hiring for a tech company. You receive 500 resumes for a Senior React Developer role. Now tell me - how would you actually process them?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I'd... probably make a spreadsheet? List all the candidates with key skills?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Right. But here's the catch - you can't read all 500 resumes deeply. So what do you really do?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Skim for keywords like "React", "JavaScript", "5 years"?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. You skim and hope you don't miss anyone good. Now, here's the problem: what if a candidate wrote "React.js" instead of "React"? Your eyes might still catch it. But a dumb computer doing exact string matching? It says "no match".&lt;/p&gt;

&lt;p&gt;What if someone wrote "Built real-time user interfaces with the React framework"? The candidate clearly knows React, but the word "React" appears nowhere in that sentence. The computer misses them.&lt;/p&gt;

&lt;p&gt;This is exactly what happens with AI. When you ask an AI a question, it tries to answer purely from what it memorized during training. And it makes mistakes - sometimes big ones.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So RAG stops these mistakes?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Precisely. RAG is simple: instead of asking the AI to guess, you hand it the document first. You say, "Here's the resume, here's the job description - now answer my question."&lt;/p&gt;

&lt;p&gt;The AI stops guessing. It reads the evidence. It answers correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is RAG Really?
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay, so RAG = give the AI documents, then ask questions?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes, but we need to be precise about &lt;em&gt;how&lt;/em&gt; we give it documents. RAG has three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; - Find the right documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augmentation&lt;/strong&gt; - Add those documents to the AI's prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt; - Let AI answer based on what it read&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. R-A-G.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But if it's that simple, why is everyone talking about it like it's complicated?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because "finding the right documents" is the hard part. You have 500 resumes. When someone asks "Does John have Docker experience?", you can't search all 500 linearly. That's slow.&lt;/p&gt;

&lt;p&gt;And you can't use simple keyword search either - what if John wrote "containerization" instead of "Docker"? What if he wrote "I work with Kubernetes" - which means he knows Docker too?&lt;/p&gt;

&lt;p&gt;So the real question becomes: How do you find the &lt;em&gt;right&lt;/em&gt; documents fast, and how do you understand when two different words mean the same thing?&lt;/p&gt;

&lt;p&gt;This is where everything else - embeddings, vector databases, chunking - comes in. They're all in service of solving that one problem.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay, I think I get the big picture. But uncle, why can't we just make the AI smarter?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Two reasons. First - you train an AI once. After that, it doesn't learn new information. If your company has 100 internal policies created last month, the AI knows nothing about them. It can't learn them instantly.&lt;/p&gt;

&lt;p&gt;Second - even if you could retrain it, hallucinations would still happen. AIs are pattern-matching machines. They're brilliant at patterns. But they sometimes see patterns that aren't there. RAG forces the AI to cite its sources, to point at evidence.&lt;/p&gt;

&lt;p&gt;It's the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I think John knows Docker" (guess)&lt;/li&gt;
&lt;li&gt;"John knows Docker because his resume clearly says 'Docker, Kubernetes, 4 years'" (evidence)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second one is what you want. That's RAG.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 2: EMBEDDINGS - Understanding Meaning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Fundamental Problem with Words
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I have a question. How does a computer know that "React" and "React.js" are the same thing?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's the right question. Let me explain with a story. &lt;/p&gt;

&lt;p&gt;I go to a fruit market. I see apples, mangoes, oranges. How do I know which is which? I look at them - color, shape, smell. My brain recognizes patterns and says "that's a mango".&lt;/p&gt;

&lt;p&gt;Now, how does a computer do that? It doesn't have eyes. All it has are numbers.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Numbers?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. Here's the trick: every word - "React", "React.js", "JavaScript", "Python" - is just a number to a computer. Or actually, a list of numbers. We call this list an "embedding".&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Like a list of... what kind of numbers?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine I'm describing a person to you. I might say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Height: 180 cm&lt;/li&gt;
&lt;li&gt;Age: 30&lt;/li&gt;
&lt;li&gt;Skin tone: medium&lt;/li&gt;
&lt;li&gt;Hair color: black&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's 4 numbers describing 1 person. Now imagine I use 1536 numbers instead. Each number describes a different quality - not just physical things, but hidden things like "how technical is this word", "is this related to web development", "how often is this used", and so on.&lt;/p&gt;

&lt;p&gt;So "React" becomes a point in a 1536-dimensional space. "React.js" becomes another point in that same space. And because they mean the same thing, those two points are &lt;em&gt;very close to each other&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But "React" and "Python"? Those points are far apart.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay, so closeness = similarity?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. And here's the beautiful part: you don't have to manually create these embeddings. An AI model does it for you. You feed it the word "React" and it says, "This word should be represented as [0.12, -0.45, 0.78, ..., 1536 numbers]".&lt;/p&gt;

&lt;p&gt;Different models might represent it differently, but the same model always represents similar concepts closely. That's what matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Is Genius
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So embeddings let computers understand meaning?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; They let computers represent meaning &lt;em&gt;as a position in space&lt;/em&gt;. And when you represent things as positions, you can do math with them.&lt;/p&gt;

&lt;p&gt;For example, if I tell you: "Engineer" - "Java" + "React" = ?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's... weird math?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Right? But with embeddings, you can actually do this. And you know what the answer is?&lt;/p&gt;

&lt;p&gt;The vector closest to that result is usually "web developer" or something similar. The math captures meaning.&lt;/p&gt;

&lt;p&gt;Now here's why this matters for RAG:&lt;/p&gt;

&lt;p&gt;When someone asks "Does John have Docker experience?", we convert that question to an embedding - a point in 1536D space. Then we search for resume chunks that are close to that point. The chunks mentioning Docker - whether they say "Docker", "containerization", "container orchestration", or "I deploy with Docker" - are all close to the question's embedding.&lt;/p&gt;

&lt;p&gt;So we find them all. And the AI reads them. And the AI answers correctly.&lt;/p&gt;

&lt;p&gt;Without embeddings, simple keyword search would miss half of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 3: VECTOR DATABASES - Storing at Scale
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Storage Problem Nobody Talks About
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay, so we have embeddings. But uncle, if I have a company with 10,000 resumes, and each resume has 50 chunks, that's 500,000 embeddings. Each embedding is 1536 numbers. How do I store and search that?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where most people choose poorly. They think: "I'll use Pinecone! I'll use Milvus! I'll use Weaviate!"&lt;/p&gt;

&lt;p&gt;They add three systems. One for storage, one for search, one for cache. Now they have 3 things that could break. 3 things to debug. 3 bills to pay.&lt;/p&gt;

&lt;p&gt;There's a better way.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; PostgreSQL.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; The database? Just... PostgreSQL?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; With one addition: the pgvector extension. PostgreSQL is already reliable, it's already running your data, it's already scaling. We just add vector support.&lt;/p&gt;

&lt;p&gt;Think about it this way: 500,000 vectors stored in PostgreSQL as data. When you have a question, you convert it to an embedding, and you say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;question_embedding&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator means "find the 5 vectors closest to my question vector". PostgreSQL finds them in milliseconds. You're done.&lt;/p&gt;

&lt;p&gt;No new infrastructure. No vendor lock-in. Just one database doing one job well.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But doesn't the &amp;lt;=&amp;gt; operator need an index to be fast?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good catch. Yes. PostgreSQL's pgvector creates an IVFFLAT index by default. Think of it like this:&lt;/p&gt;

&lt;p&gt;When you have 500,000 resumes and need to find the 5 closest to a question, searching all 500,000 linearly is slow. The index divides the 1536D space into regions (like neighborhoods). When you search, it says "The question's embedding is closest to this neighborhood" and only searches that neighborhood. Much faster.&lt;/p&gt;

&lt;p&gt;A linear search of 500K vectors? 5 seconds. With the index? 50-100ms.&lt;/p&gt;

&lt;p&gt;That's the magic of the right index.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting It Up
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I actually create this?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Simple. PostgreSQL + pgvector. Here's what you do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create extension&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create table with vector column&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create index for fast search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have a table. You insert chunks with their embeddings. You query with the &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator. Done.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's it. No API calls to external services. No paying per query. Just a single, reliable database.&lt;/p&gt;

&lt;p&gt;This is the beauty of choosing the right tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 4: BASIC RETRIEVAL - Finding What You Need
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How Retrieval Works
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, we have embeddings, we have a database. How does actual retrieval happen?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me walk you through a real scenario. A hiring manager asks: "Does Priya have blockchain experience?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Understand the question&lt;/strong&gt;&lt;br&gt;
We need to find chunks about blockchain skills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Convert question to embedding&lt;/strong&gt;&lt;br&gt;
"blockchain experience" becomes a vector: [0.34, -0.12, ..., 1536 numbers]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Search the database&lt;/strong&gt;&lt;br&gt;
We query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'priya-uuid'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;question_embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Get results&lt;/strong&gt; (might be):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Ethereum smart contracts development"&lt;/li&gt;
&lt;li&gt;"Bitcoin protocol understanding"&lt;/li&gt;
&lt;li&gt;"Web3 and DeFi platform expertise"&lt;/li&gt;
&lt;li&gt;"Solidity programming language"&lt;/li&gt;
&lt;li&gt;"Cryptocurrency platform design"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Send to AI&lt;/strong&gt;&lt;br&gt;
We put these 5 chunks in the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based on the following resume excerpts:
[chunk 1]
[chunk 2]
[chunk 3]
[chunk 4]
[chunk 5]

Question: Does Priya have blockchain experience?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 6: AI answers&lt;/strong&gt;&lt;br&gt;
"Yes, Priya has strong blockchain experience. She has worked with Ethereum smart contracts, understands Bitcoin protocol, and has built on Web3/DeFi platforms using Solidity."&lt;/p&gt;

&lt;p&gt;With evidence.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Basic Retrieval Fails Sometimes
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; This seems perfect. What's the problem?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; The problem is subtle. Resume has this line:&lt;/p&gt;

&lt;p&gt;"I worked on distributed ledger technology and consensus protocols"&lt;/p&gt;

&lt;p&gt;This person knows blockchain. But the word "blockchain" is not there.&lt;/p&gt;

&lt;p&gt;You search for "blockchain". Embeddings are smart, but not magic. "Distributed ledger technology" and "blockchain" are close but not identical.&lt;/p&gt;

&lt;p&gt;Sometimes the vector search returns this chunk. Sometimes it doesn't.&lt;/p&gt;

&lt;p&gt;Basic retrieval works maybe 70% of the time. For production, you need 95%+.&lt;/p&gt;

&lt;p&gt;That's why we need advanced techniques.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So what makes it better?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Multiple techniques working together. But we'll get there. First, let's talk about preparation.&lt;/p&gt;


&lt;h2&gt;
  
  
  SECTION 5: CHUNKING STRATEGIES - Smart Breaking
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The Chunking Dilemma
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, let me ask something. Why do we break resumes into chunks at all? Why not just embed the whole resume?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good question. Two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt; - An embedding costs money (usually). A 50-page resume broken into chunks might cost $0.50 instead of $5 if you embed it whole.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt; - If you embed the entire resume as one chunk, and you search for "Docker experience", the system returns the whole 50-page resume. The AI has to find the Docker mention itself. But if you break it into chunks, the system returns only the chunks about Docker. The AI reads less noise.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But here's the problem: how small should chunks be?&lt;/p&gt;
&lt;h3&gt;
  
  
  Size Matters More Than You Think
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Can't I just pick a number? Like 500 tokens per chunk?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let's see what happens with different sizes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: Too small (100 tokens)&lt;/strong&gt;&lt;br&gt;
Chunk 1: "John has"&lt;br&gt;
Chunk 2: "5 years of"&lt;br&gt;
Chunk 3: "React experience"&lt;/p&gt;

&lt;p&gt;Problem: When we read Chunk 3, we've lost context. 5 years of &lt;em&gt;what&lt;/em&gt;? The AI is confused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: Right size (1000 tokens)&lt;/strong&gt;&lt;br&gt;
"John has 5 years of React experience, built e-commerce platforms, Redux for state management, and mentored junior developers."&lt;/p&gt;

&lt;p&gt;Perfect. Context is clear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3: Too large (5000 tokens)&lt;/strong&gt;&lt;br&gt;
Entire work history section, with all jobs, all skills, everything mixed together.&lt;/p&gt;

&lt;p&gt;Problem: Too much noise. When the AI searches for "React", it gets 5000 tokens to read, but only 2 sentences mention React. It's slow and confusing.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; The sweet spot is usually 1000-1500 tokens per chunk. Not less, not more.&lt;/p&gt;
&lt;h3&gt;
  
  
  Overlap is Important
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's overlap? Overlapping chunks?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. Imagine this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without overlap:&lt;/strong&gt;&lt;br&gt;
Chunk 1 ends: "...John used React..."&lt;br&gt;
Chunk 2 starts: "...for building e-commerce..."&lt;/p&gt;

&lt;p&gt;When you read Chunk 2 alone, you don't know John used React for this. Context is broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With 200-token overlap:&lt;/strong&gt;&lt;br&gt;
Chunk 1: "...John has React experience. He built e-commerce..."&lt;br&gt;
Chunk 2: "...He built e-commerce with Redux and WebSockets..."&lt;/p&gt;

&lt;p&gt;Now each chunk contains enough context. You can read them individually and still understand.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So overlap = context continuity?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. It's like reading a book. Each page overlaps the previous one slightly. You always have context.&lt;/p&gt;
&lt;h3&gt;
  
  
  Chunking Strategies
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Are there different ways to chunk?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. Here are the main ones:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Chunk Size&lt;/th&gt;
&lt;th&gt;Overlap&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive/Fixed&lt;/td&gt;
&lt;td&gt;512 tokens&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Simple code&lt;/td&gt;
&lt;td&gt;Loses context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sliding Window&lt;/td&gt;
&lt;td&gt;1000-1500&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;Resumes, documents&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Technical docs&lt;/td&gt;
&lt;td&gt;Complex to implement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recursive&lt;/td&gt;
&lt;td&gt;1000 first, then smaller&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;Large documents&lt;/td&gt;
&lt;td&gt;More processing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For resumes? Sliding window, 1000 tokens, 200-token overlap. It's what works best.&lt;/p&gt;

&lt;p&gt;Here's example code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sliding_window_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Break text into chunks with overlap.
    window_size = tokens per chunk
    overlap = tokens of overlap between chunks
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Simplified tokenization
&lt;/span&gt;    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;John has 5 years React... [long text]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sliding_window_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# chunks[0] = "John has 5 years React... [1000 tokens]"
# chunks[1] = "[last 200 tokens of chunk 0] ... [next 800 tokens]"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple. Effective. This is what production systems use.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 6: HYBRID SEARCH - Vector + Keywords
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Limitation of Pure Vector Search
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, embeddings are smart. But are they perfect?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; No. Here's a real problem:&lt;/p&gt;

&lt;p&gt;Resume has: "Kubernetes 1.26"&lt;br&gt;
You search for: "Kubernetes 1.26"&lt;/p&gt;

&lt;p&gt;Vector search thinks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Kubernetes 1.26" is a version&lt;/li&gt;
&lt;li&gt;"Kubernetes 1.25" is also a version&lt;/li&gt;
&lt;li&gt;They're similar technologies&lt;/li&gt;
&lt;li&gt;Distance: 0.85 (similar!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FALSE MATCH! The versions are different, but vectors say they're similar.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So vectors miss exact matches?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Not miss - they're fuzzy. They blur differences. Sometimes that's good (finding "React" when you search "React.js"). Sometimes it's bad (confusing version numbers).&lt;/p&gt;

&lt;p&gt;This is why you need both: vectors AND keywords.&lt;/p&gt;
&lt;h3&gt;
  
  
  Keyword Search: The Simple Part
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How does keyword search work?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Simple. Exact string matching.&lt;/p&gt;

&lt;p&gt;Resume: "Kubernetes 1.26"&lt;br&gt;
Search: "1.26"&lt;br&gt;
Result: EXACT MATCH. Yes.&lt;/p&gt;

&lt;p&gt;Resume: "Kubernetes 1.25"&lt;br&gt;
Search: "1.26"&lt;br&gt;
Result: NO MATCH. No.&lt;/p&gt;

&lt;p&gt;It's binary. No fuzzy. Just right or wrong.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hybrid = Best of Both
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So we use both?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. Here's how:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search "Kubernetes 1.26"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Step 1: Vector search on all chunks&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returns top 20 chunks that mention versions/container orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 2: Keyword filter&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On those 20 chunks, filter by exact keyword match: "1.26"&lt;/li&gt;
&lt;li&gt;Returns maybe 3 chunks with exact version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 3: Combine and rerank&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Return the 3 high-precision results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: Perfect accuracy. Both semantic understanding AND exact precision.&lt;/p&gt;

&lt;p&gt;PostgreSQL can do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john-uuid'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Kubernetes 1.26'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;question_embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;question_embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find chunks where text contains "Kubernetes" AND "1.26" (keyword search)&lt;/li&gt;
&lt;li&gt;AND the embedding is close to the question (vector search)&lt;/li&gt;
&lt;li&gt;Order by vector similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both working together.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Each Excels
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; When do I use pure vector?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; When you're searching for concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Tell me about this candidate's experience with modern development"&lt;/li&gt;
&lt;li&gt;"What's their background in cloud systems?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vectors are great here. They find related concepts even with different words.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; And pure keyword?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; For exact facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What certifications does John have?"&lt;/li&gt;
&lt;li&gt;"What's their GPA?"&lt;/li&gt;
&lt;li&gt;"When did they graduate?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keywords are perfect. You don't need fuzzy matching.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; And hybrid?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; For most real scenarios. Maximum accuracy. That's what production uses.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 7: RERANKING - Ranking Results Better
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Quality Problem
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, we retrieve 5 chunks. But are they the &lt;em&gt;best&lt;/em&gt; 5?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's a great question. Let me show you a problem:&lt;/p&gt;

&lt;p&gt;Search: "Docker experience"&lt;/p&gt;

&lt;p&gt;Vector search returns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"I have 5 years Docker production experience" (similarity: 0.92)&lt;/li&gt;
&lt;li&gt;"Container orchestration with Docker expertise" (similarity: 0.91)&lt;/li&gt;
&lt;li&gt;"My Docker projects all failed miserably" (similarity: 0.90) ← PROBLEM!&lt;/li&gt;
&lt;li&gt;"Docker mentioned in course material" (similarity: 0.85)&lt;/li&gt;
&lt;li&gt;"Web dev includes Docker basics" (similarity: 0.82)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Items 3-5 shouldn't be in the top 5! Item 3 is negative, items 4-5 are weak.&lt;/p&gt;

&lt;p&gt;Basic vector search ranks by similarity score. But similarity doesn't capture quality.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So we need a better ranker?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. And here's the trick: we use two models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two-Stage Retrieval
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Two models? That sounds expensive.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. So we use them smartly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Fast retrieval (milliseconds)&lt;/strong&gt;&lt;br&gt;
Vector search returns 20 candidates. Speed matters here, so we use a fast model.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time: 100ms&lt;/li&gt;
&lt;li&gt;Quality: 70%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Accurate ranking (milliseconds)&lt;/strong&gt;&lt;br&gt;
We rerank those 20 with a better, slower model.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time: 300-500ms&lt;/li&gt;
&lt;li&gt;Quality: 95%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total time: 500ms. Still fast. But now the top 5 are &lt;em&gt;good&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The key insight: We don't apply the accurate model to all 500,000 chunks. That's too slow. We apply it only to 20. That's feasible.&lt;/p&gt;
&lt;h3&gt;
  
  
  Available Rerankers
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What models should I use?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; A few options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reranker&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BGE-Reranker-Base&lt;/td&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Very Good&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cohere Rerank&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;$$&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude/GPT-4&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Best&lt;/td&gt;
&lt;td&gt;$$$&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For production? Use BGE. It's free, open-source, and performs almost as well as expensive options.&lt;/p&gt;

&lt;p&gt;Here's how:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrossEncoder&lt;/span&gt;

&lt;span class="c1"&gt;# Load open-source reranker
&lt;/span&gt;&lt;span class="n"&gt;reranker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrossEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-reranker-base&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# We have top 20 chunks from vector search
&lt;/span&gt;&lt;span class="n"&gt;top_20_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;...]&lt;/span&gt;

&lt;span class="c1"&gt;# Rerank them
&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_20_chunks&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Sort by score
&lt;/span&gt;&lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_20_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Return top 5
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Takes 300ms. Makes huge quality difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 8: QUERY PROCESSING - Smart Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Queries Are Ambiguous
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, when someone asks "Does John have frontend skills?", does the AI understand that means React, Vue, JavaScript, HTML?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Not automatically. Here's the problem:&lt;/p&gt;

&lt;p&gt;Resume has: "React, JavaScript, HTML, CSS expertise"&lt;br&gt;
Search for: "frontend skills"&lt;/p&gt;

&lt;p&gt;Vector search looks for the exact embedding of "frontend skills". But the resume doesn't have those words. It has the components.&lt;/p&gt;

&lt;p&gt;A human reads "React, JavaScript, HTML, CSS" and immediately thinks "frontend". A computer doesn't make that leap.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So how do we fix it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Query rewriting. You expand the query.&lt;/p&gt;
&lt;h3&gt;
  
  
  Query Expansion
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Expand it how?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Instead of searching for "frontend skills", you search for:&lt;/p&gt;

&lt;p&gt;(frontend OR React OR Vue OR Angular OR JavaScript OR HTML OR CSS)&lt;/p&gt;

&lt;p&gt;Now you find any mention of these. Much better.&lt;/p&gt;

&lt;p&gt;Here's code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;expand_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Expand a query into related terms.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simple approach using a dict
&lt;/span&gt;    &lt;span class="n"&gt;expansions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;frontend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Vue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Angular&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;JavaScript&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HTML&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CSS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;backend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Python&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Java&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;databases&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;APIs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;devops&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Docker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Kubernetes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CI/CD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deployment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synonyms&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expansions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; OR &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; OR &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;synonyms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frontend skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;expanded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;expand_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Result: "frontend skills OR React OR Vue OR Angular OR JavaScript OR HTML OR CSS"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Query Retrieval
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Is there a smarter way?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! Instead of manually expanding, let an LLM generate variations.&lt;/p&gt;

&lt;p&gt;User asks: "Real-time system experience?"&lt;/p&gt;

&lt;p&gt;LLM thinks: "That means WebSockets, Socket.IO, real-time updates, pub/sub, message queues"&lt;/p&gt;

&lt;p&gt;LLM generates multiple queries:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"WebSockets and real-time communication"&lt;/li&gt;
&lt;li&gt;"Socket.IO implementation"&lt;/li&gt;
&lt;li&gt;"Real-time data streaming"&lt;/li&gt;
&lt;li&gt;"Redis pub/sub usage"&lt;/li&gt;
&lt;li&gt;"Message queue systems"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You search all 5 queries. You get chunks from all angles. You combine results.&lt;/p&gt;

&lt;p&gt;Coverage: 10x better.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generate multiple search queries from one.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Given this question, generate 3-5 related search queries 
that would find relevant information. Return only the queries, 
one per line.

Original question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;original_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Queries:
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Real-time system experience?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Returns:
# ["WebSockets real-time", "Socket.IO", "Redis pub/sub", "Message queues", "Real-time streaming"]
&lt;/span&gt;
&lt;span class="c1"&gt;# Now search each
&lt;/span&gt;&lt;span class="n"&gt;all_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;vector_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;all_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Deduplicate and return top 5
&lt;/span&gt;&lt;span class="n"&gt;unique_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_results&lt;/span&gt;&lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is elegant. The LLM understands what "real-time" means and generates good search queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 9: EVALUATION &amp;amp; METRICS - Measuring Success
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Measurement Problem
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I built a RAG system. It works. But how do I know if it's &lt;em&gt;good&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This is where most teams fail. They build, ship, and hope. No metrics.&lt;/p&gt;

&lt;p&gt;But you can measure. Here are the important metrics:&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Metrics
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What should I measure?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Four main things:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Measures&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Of all relevant docs, how many found?&lt;/td&gt;
&lt;td&gt;&amp;gt;90%&lt;/td&gt;
&lt;td&gt;Missing information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Precision&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Of what found, how relevant?&lt;/td&gt;
&lt;td&gt;&amp;gt;85%&lt;/td&gt;
&lt;td&gt;Wrong context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Response time&lt;/td&gt;
&lt;td&gt;&amp;lt;2 seconds&lt;/td&gt;
&lt;td&gt;User experience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Faithfulness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI stays factual&lt;/td&gt;
&lt;td&gt;&amp;gt;95%&lt;/td&gt;
&lt;td&gt;Hallucinations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let me explain each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recall vs Precision
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's the difference?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me give you an example. Say there are 10 Docker-related chunks in John's resume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt;: Of those 10, how many did you find?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You find 9? Recall = 9/10 = 90%&lt;/li&gt;
&lt;li&gt;You find 6? Recall = 6/10 = 60%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;High recall means: "I found most of the relevant information"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt;: Of what you returned, how many were relevant?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You return 15 chunks. 10 are about Docker. Precision = 10/15 = 67%&lt;/li&gt;
&lt;li&gt;You return 9 chunks. 8 are about Docker. Precision = 8/9 = 89%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;High precision means: "Most of what I returned was useful"&lt;/p&gt;

&lt;p&gt;Usually there's a tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Return everything (high recall, low precision)&lt;/li&gt;
&lt;li&gt;Return only the most confident (low recall, high precision)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You want both high.&lt;/p&gt;

&lt;h3&gt;
  
  
  Measuring
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I actually measure this?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; You need labeled data. A test set.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take 10 real questions&lt;/li&gt;
&lt;li&gt;For each, have a human list all relevant chunks&lt;/li&gt;
&lt;li&gt;Run your system&lt;/li&gt;
&lt;li&gt;Compare: did it find all relevant chunks? (Recall)&lt;/li&gt;
&lt;li&gt;Were all returned chunks relevant? (Precision)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_recall_precision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevant&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    retrieved: chunks your system found
    relevant: chunks a human labeled as relevant
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relevant_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Recall: of all relevant, how many found?
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;recall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_ids&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;relevant_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Precision: of what found, how many relevant?
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_ids&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;relevant_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Docker experience&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Kubernetes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Machine learning&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;relevant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Docker experience&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Kubernetes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Container orchestration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculate_recall_precision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Recall = 2/3 = 67% (found 2 of 3 relevant)
# Precision = 2/3 = 67% (2 of 3 returned were relevant)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Latency
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I measure speed?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Does John have Docker experience?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="c1"&gt;# Convert to ms
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Latency: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Target: &amp;lt;2000ms (2 seconds). Anything faster is bonus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What about cost?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Important for production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Cost per query
# Embedding: $0.00001 per 1K tokens
# Reranking: $0.0001 per query (if using paid service)
# LLM answer: $0.001 per query (if using Claude)
# Total: ~$0.0011 per query
&lt;/span&gt;
&lt;span class="c1"&gt;# If you do 1M queries/month:
&lt;/span&gt;&lt;span class="n"&gt;monthly_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.0011&lt;/span&gt;  &lt;span class="c1"&gt;# $1100/month
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what you need to know for business decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 10: HALLUCINATION PREVENTION - Keeping AI Honest
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hallucinations: The Biggest Problem
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, what's a hallucination?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Here's a real example:&lt;/p&gt;

&lt;p&gt;Resume says: "Worked with React for 2 years"&lt;br&gt;
Hiring manager asks: "Experience with Kubernetes?"&lt;br&gt;
AI responds: "Yes, John has Kubernetes with orchestration."&lt;/p&gt;

&lt;p&gt;WRONG! The resume never mentions Kubernetes.&lt;/p&gt;

&lt;p&gt;The AI invented an answer. It saw "Docker" or "containers" (maybe), thought "Kubernetes is related to containers", and hallucinated.&lt;/p&gt;

&lt;p&gt;This is a hallucination. The AI is lying, but confidently.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Hallucinations Happen
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But why does this happen?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; AIs are pattern-matching machines. They learned during training: "Docker" often appears near "Kubernetes". So the AI's brain has connected them.&lt;/p&gt;

&lt;p&gt;Even if you show the AI only a resume mentioning Docker, its training whispers: "And Kubernetes is usually near Docker!"&lt;/p&gt;

&lt;p&gt;So the AI thinks: "Probably Kubernetes too."&lt;/p&gt;

&lt;p&gt;It's an artifact of how AIs learn. Not a bug. A feature used wrongly.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution: Force the AI to Use Only Given Information
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do you stop it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Five layers of protection:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Retrieval Boundaries&lt;/strong&gt;&lt;br&gt;
Show the AI ONLY the retrieved chunks. Nothing from training.&lt;/p&gt;

&lt;p&gt;In your prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based ONLY on the following information:

[Retrieved chunks]

Answer: Does John have Kubernetes experience?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The word "ONLY" is critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Structured Output&lt;/strong&gt;&lt;br&gt;
Force JSON format. Constrain the answer shape.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You MUST respond in JSON format:
{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 0.0 to 1.0,
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk 2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]
}
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces the AI to cite evidence or it can't answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Validation&lt;/strong&gt;&lt;br&gt;
Check the output against the original chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check: does evidence exist in chunks?
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;evidence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;evidence&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Hallucination detected!
&lt;/span&gt;        &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Reject
&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 4: Confidence Gating&lt;/strong&gt;&lt;br&gt;
Low confidence? Escalate to human:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Hand off to human
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Unknown - escalated to human&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Low confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 5: Required Citations&lt;/strong&gt;&lt;br&gt;
Every claim must point to a chunk:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: Does John know Kubernetes?
Retrieved chunks:
[1] "Docker and container experience"
[2] "Kubernetes cluster management"

Answer: Yes, John knows Kubernetes (from chunk 2).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI must cite. No citation? No answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Together
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Do you use all five?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; In production? All five. At different stages.&lt;/p&gt;

&lt;p&gt;Retrieval boundaries + structured output stop most hallucinations.&lt;/p&gt;

&lt;p&gt;Validation + confidence gating catch the rest.&lt;/p&gt;

&lt;p&gt;Citations let users verify.&lt;/p&gt;

&lt;p&gt;Together? Hallucinations become nearly impossible.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 11: PRODUCTION ARCHITECTURE - Building for Real Users
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prototype vs Production
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, my RAG system works on my laptop. But production is different?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Completely different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prototype version:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works for me&lt;/li&gt;
&lt;li&gt;Might crash&lt;/li&gt;
&lt;li&gt;Takes 5 seconds to respond&lt;/li&gt;
&lt;li&gt;Processes 1 user at a time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Production version:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works for everyone&lt;/li&gt;
&lt;li&gt;99.9% uptime&lt;/li&gt;
&lt;li&gt;Responds in &amp;lt;2 seconds&lt;/li&gt;
&lt;li&gt;Handles 1000 concurrent users&lt;/li&gt;
&lt;li&gt;Secure (no data leaks)&lt;/li&gt;
&lt;li&gt;Monitored (alerts when something breaks)&lt;/li&gt;
&lt;li&gt;Compliant (legal requirements)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not the same system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Architecture for Production
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I design it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Here's a proven architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    Users
                      ↓
              Load Balancer (Nginx)
             /          |          \
         API-1      API-2      API-3  (scale horizontally)
             \          |          /
                      ↓
                 Cache Layer (Redis)
                      ↓
           PostgreSQL + pgvector + pgvector Index
                      ↓
              External APIs (Claude, etc.)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Load Balancer&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributes user traffic&lt;/li&gt;
&lt;li&gt;No single API gets overwhelmed&lt;/li&gt;
&lt;li&gt;One dies? Traffic routes to others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multiple APIs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handle load together&lt;/li&gt;
&lt;li&gt;Fault tolerant&lt;/li&gt;
&lt;li&gt;Can update one without stopping others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cache&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redis stores recent results&lt;/li&gt;
&lt;li&gt;Same question asked twice? Answer from cache&lt;/li&gt;
&lt;li&gt;90% faster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single source of truth&lt;/li&gt;
&lt;li&gt;Persistent&lt;/li&gt;
&lt;li&gt;Indexed for speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;External APIs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude for answers&lt;/li&gt;
&lt;li&gt;Anthropic handles scaling there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each layer has a job. Each layer is replaceable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tenant Isolation - Critical
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Wait, what if I'm hosting this for multiple companies?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Then you have a SERIOUS security issue if you don't handle isolation.&lt;/p&gt;

&lt;p&gt;Company A's data must NEVER be visible to Company B.&lt;/p&gt;

&lt;p&gt;Every single query must include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john-123'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'company-a'&lt;/span&gt;  &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="n"&gt;THIS&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="n"&gt;CRITICAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the &lt;code&gt;tenant_id&lt;/code&gt; check, a hacker or bug could leak data.&lt;/p&gt;

&lt;p&gt;This is not optional. This is "you'll be sued" level of important.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 12: ATS DEEP DIVE - Resume Scoring
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is ATS (Applicant Tracking System)?
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I've heard of ATS. What is it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Old ATS systems just searched for keywords. "Does the resume have the word React?" Yes/No.&lt;/p&gt;

&lt;p&gt;Modern ATS - the kind you're building with RAG - is intelligent. It understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does the candidate have the skill?&lt;/li&gt;
&lt;li&gt;How many years?&lt;/li&gt;
&lt;li&gt;At what level?&lt;/li&gt;
&lt;li&gt;Do they have relevant projects?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resume gets a score. Top scorers get interviewed. Bad scorers get rejected.&lt;/p&gt;

&lt;p&gt;The difference between old and new? Old ATS rejects good candidates because they phrased things differently. New ATS finds them anyway.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Scoring Algorithm
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do you score a resume?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; With components. Job requirements × candidate abilities.&lt;/p&gt;

&lt;p&gt;Job description requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React (5+ years)&lt;/li&gt;
&lt;li&gt;JavaScript (required)&lt;/li&gt;
&lt;li&gt;Node.js (preferred)&lt;/li&gt;
&lt;li&gt;AWS (preferred)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's say this resume has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React (6 years) ← exceeds requirement&lt;/li&gt;
&lt;li&gt;JavaScript (7 years) ← exceeds requirement&lt;/li&gt;
&lt;li&gt;Node.js (4 years) ← has it&lt;/li&gt;
&lt;li&gt;AWS (not mentioned)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the scoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Score a resume against job requirements.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# Required skills: 60% weight
&lt;/span&gt;    &lt;span class="n"&gt;required_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;required_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;years_required&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;required_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;candidate_years&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;candidate_years&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;years_required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;required_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;  &lt;span class="c1"&gt;# Full points
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;candidate_years&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Partial credit
&lt;/span&gt;            &lt;span class="n"&gt;required_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_years&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;years_required&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
        &lt;span class="c1"&gt;# else: 0 points
&lt;/span&gt;
    &lt;span class="n"&gt;required_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;required_score&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;required_count&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;required_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="c1"&gt;# Preferred skills: 25% weight
&lt;/span&gt;    &lt;span class="n"&gt;preferred_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;preferred_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;preferred&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;preferred_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;preferred_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# Half credit for preferred
&lt;/span&gt;
    &lt;span class="n"&gt;preferred_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;preferred_score&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;preferred_count&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;preferred_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="c1"&gt;# Projects: 15% weight
&lt;/span&gt;    &lt;span class="n"&gt;project_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;projects&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
    &lt;span class="n"&gt;project_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Cap at 100
&lt;/span&gt;
    &lt;span class="c1"&gt;# Final score
&lt;/span&gt;    &lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;required_avg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.60&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;preferred_avg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="n"&gt;project_avg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;JavaScript&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;projects&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;E-commerce app&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Real-time chat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;requirements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;JavaScript&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;preferred&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score_resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# React: (6 &amp;gt;= 5) = 100 pts
# JavaScript: (7 &amp;gt;= 3) = 100 pts
# Required avg: 100
# Node.js: has it = 50 pts
# Preferred avg: 25
# Projects: 2 × 25 = 50 pts
# Final: (100 × 0.60) + (25 × 0.25) + (50 × 0.15) = 60 + 6.25 + 7.5 = 73.75/100
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's scoring. But wait - how do you extract the data?&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill Normalization
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Resume might say "React" but another might say "React.js". How do you normalize?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Build a dictionary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skill_aliases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;React.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ReactJS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;react&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nodejs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Node&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Docker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Docker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;docker&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;containers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mentioned_skill&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find the canonical skill name.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;mentioned_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mentioned_skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aliases&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;skill_aliases&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mentioned_lower&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;aliases&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;canonical&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mentioned_skill&lt;/span&gt;  &lt;span class="c1"&gt;# Unknown skill
&lt;/span&gt;
&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;normalize_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;React.js&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Output: React
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;normalize_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodejs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;    &lt;span class="c1"&gt;# Output: Node.js
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now "React" and "React.js" are treated as the same skill.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Unknown Skills: Use Embeddings
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; But what if the resume mentions a skill not in your dictionary?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Use embeddings to find similar skills.&lt;/p&gt;

&lt;p&gt;Resume mentions: "Built trading platform with real-time updates"&lt;br&gt;
Job requires: "WebSockets experience"&lt;/p&gt;

&lt;p&gt;These aren't in your dictionary. But you can check similarity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_skill_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mentioned_skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_skill&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Use embeddings to find matches for unknown skills.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Get embeddings
&lt;/span&gt;    &lt;span class="n"&gt;mentioned_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mentioned_skill&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;required_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;required_skill&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate distance
&lt;/span&gt;    &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosine_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mentioned_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# If similar enough, count as a match
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Threshold
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;find_skill_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;real-time data updates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WebSockets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Candidate probably has WebSockets experience
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is powerful. It catches variations and similar technologies.&lt;/p&gt;




&lt;h2&gt;
  
  
  SECTION 13: ADVANCED PATTERNS - Agents &amp;amp; Future
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Happens After Basic RAG
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, we have RAG working. What comes next?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Basic RAG answers questions. Advanced RAG (agents) &lt;em&gt;decides&lt;/em&gt; what to do.&lt;/p&gt;

&lt;p&gt;Difference:&lt;/p&gt;

&lt;p&gt;Basic RAG: "Does John know React?" → Find resume → Answer&lt;/p&gt;

&lt;p&gt;Agent RAG: "Is John qualified for Senior Developer role?" → &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find resume&lt;/li&gt;
&lt;li&gt;Find job description
&lt;/li&gt;
&lt;li&gt;Compare skills&lt;/li&gt;
&lt;li&gt;Check experience level&lt;/li&gt;
&lt;li&gt;Look for projects&lt;/li&gt;
&lt;li&gt;Verify certifications&lt;/li&gt;
&lt;li&gt;Reason through all evidence&lt;/li&gt;
&lt;li&gt;Answer with reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent is intelligent about what to search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agentic RAG with Claude
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I build an agent?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Using Claude's tool use (function calling):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search candidate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s resume.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Your RAG search logic
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_job_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search job description.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;job_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_candidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Use Claude as an agent to analyze candidate fit.
    Claude decides what to search, what to compare.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_resume&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search a candidate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s resume for information&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resume_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resume_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search job requirements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze if candidate &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resume_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is qualified for job &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

Use the available tools to:
1. Find required skills in the job description
2. Check if the candidate has those skills
3. Compare years of experience
4. Look for relevant projects
5. Make a recommendation

Provide detailed reasoning.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="c1"&gt;# Agent loop
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if Claude wants to use tools
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Process tool calls
&lt;/span&gt;            &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_resume&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resume_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                        &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# search_job
&lt;/span&gt;                        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_job_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                        &lt;span class="p"&gt;)&lt;/span&gt;

                    &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;

            &lt;span class="c1"&gt;# Add Claude's response and tool results to messages
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Claude is done - extract final answer
&lt;/span&gt;            &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;

&lt;span class="c1"&gt;# Use it
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_candidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;john-resume&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;senior-dev-job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is elegant. Claude is the brain. It decides what to search, in what order, and how to reason about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future of RAG
&lt;/h3&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's next? After agents?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; A few directions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Adaptive RAG&lt;/strong&gt;&lt;br&gt;
System learns what works. If a question fails, it tries different search strategies automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Multi-modal RAG&lt;/strong&gt;&lt;br&gt;
Right now we handle text. Soon: images, tables, videos. A resume with a photo of their projects. A video walkthrough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Real-time RAG&lt;/strong&gt;&lt;br&gt;
System continuously updates knowledge. Candidate updates LinkedIn → RAG system knows instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Collaborative RAG&lt;/strong&gt;&lt;br&gt;
Multiple agents reasoning together. One searches, one evaluates, one questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Explainable RAG&lt;/strong&gt;&lt;br&gt;
System shows its work. "I rejected John because he doesn't have 5 years React, only 3 years, and React 5+ was required."&lt;/p&gt;

&lt;p&gt;These are coming. RAG is only beginning.&lt;/p&gt;


&lt;h2&gt;
  
  
  CONCLUSION: You're a RAG Architect Now
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me summarize what you've learned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RAG Fundamentals&lt;/strong&gt; - Fetch documents, then answer. Don't guess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings&lt;/strong&gt; - Represent meaning as positions in space.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Databases&lt;/strong&gt; - Store embeddings fast. PostgreSQL + pgvector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt; - Find relevant documents using vector search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking&lt;/strong&gt; - Break documents smartly (1000-1500 tokens, 200 overlap).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Search&lt;/strong&gt; - Combine vectors and keywords for accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranking&lt;/strong&gt; - Use two models: fast retrieval, accurate ranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query Processing&lt;/strong&gt; - Expand queries to find more relevant docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics&lt;/strong&gt; - Measure recall, precision, latency, faithfulness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination Prevention&lt;/strong&gt; - Five layers: boundaries, structure, validation, confidence, citations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt; - Load balancing, caching, security, monitoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ATS Scoring&lt;/strong&gt; - Intelligent resume evaluation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; - AI decides what to search and how.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's a lot, uncle. But I think I understand. RAG isn't magic - it's just engineering.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. RAG is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval (fast lookup)&lt;/li&gt;
&lt;li&gt;+ Augmentation (add context)&lt;/li&gt;
&lt;li&gt;+ Generation (let AI answer)&lt;/li&gt;
&lt;li&gt;= Reliable AI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every layer serves one purpose: make AI useful for real people.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; And production?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember: Reliable for 10 users &amp;gt;&amp;gt; Perfect for nobody.&lt;/p&gt;

&lt;p&gt;Start simple. Build on PostgreSQL. Measure everything. Iterate fast.&lt;/p&gt;

&lt;p&gt;You don't need fancy systems. You need good engineering.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, one last question.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Go ahead.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; How do I actually start? Like, code?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's next lesson, beta. First, you understand the system. Then you build it.&lt;/p&gt;

&lt;p&gt;But I'll give you one gift: a simple starter stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Next.js + React&lt;/span&gt;
&lt;span class="na"&gt;Backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Node.js + Express&lt;/span&gt;
&lt;span class="na"&gt;Database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQL + pgvector&lt;/span&gt;
&lt;span class="na"&gt;Embedding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Claude API (embeddings endpoint)&lt;/span&gt;
&lt;span class="na"&gt;LLM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Claude API (messages endpoint)&lt;/span&gt;
&lt;span class="na"&gt;Hosting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS Lightsail or Render&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple. Proven. Scales.&lt;/p&gt;

&lt;p&gt;Now go build something amazing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference: Key Code Snippets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Embedding and Storage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get embeddings from Claude.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an embedding service. Return ONLY JSON.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get embeddings for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PostgreSQL Hybrid Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
       &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
               &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'search_term'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;keyword_rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;resume_chunks&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;resume_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'candidate-uuid'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'search_term'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword_rank&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scoring Algorithm
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_candidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_skills&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_skills&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simple scoring: % of required skills matched.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;matched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidate_skills&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required_skills&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matched&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;required_skills&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hallucination Prevention
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safe_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer only if confident.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You MUST:
1. Answer ONLY using the provided chunks
2. Cite evidence
3. Return JSON with answer, confidence, evidence
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Chunks: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Uncertain - escalated to human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;RAG isn't magic. It's engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear data structures&lt;/li&gt;
&lt;li&gt;Proven patterns&lt;/li&gt;
&lt;li&gt;Careful measurement&lt;/li&gt;
&lt;li&gt;Honest assessment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You've learned the foundations. Now use them.&lt;/p&gt;

&lt;p&gt;Build something reliable.&lt;br&gt;
Build something honest.&lt;br&gt;
Build something useful.&lt;/p&gt;

&lt;p&gt;You've got this.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Created for developers who want to understand how RAG actually works, not just use it as a black box.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remember: Less noise, more action.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Useful Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tools We Discussed
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL with pgvector&lt;/strong&gt; - Local vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude API&lt;/strong&gt; - Embeddings and LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt; - Implementation examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; - Caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nginx&lt;/strong&gt; - Load balancing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Testing Checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Embedding quality (check cosine distances)&lt;/li&gt;
&lt;li&gt;[ ] Retrieval recall &amp;gt;90%&lt;/li&gt;
&lt;li&gt;[ ] Retrieval precision &amp;gt;85%&lt;/li&gt;
&lt;li&gt;[ ] Latency &amp;lt;2 seconds&lt;/li&gt;
&lt;li&gt;[ ] Hallucination rate &amp;lt;5%&lt;/li&gt;
&lt;li&gt;[ ] Cost tracking in place&lt;/li&gt;
&lt;li&gt;[ ] Monitoring alerts configured&lt;/li&gt;
&lt;li&gt;[ ] Data isolation verified (multi-tenant)&lt;/li&gt;
&lt;li&gt;[ ] Cache hit rate &amp;gt;50%&lt;/li&gt;
&lt;li&gt;[ ] User feedback collected&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Load balancer in front&lt;/li&gt;
&lt;li&gt;[ ] Multiple API instances&lt;/li&gt;
&lt;li&gt;[ ] Redis caching layer&lt;/li&gt;
&lt;li&gt;[ ] PostgreSQL with indexes&lt;/li&gt;
&lt;li&gt;[ ] Tenant isolation on every query&lt;/li&gt;
&lt;li&gt;[ ] Error logging&lt;/li&gt;
&lt;li&gt;[ ] Uptime monitoring&lt;/li&gt;
&lt;li&gt;[ ] Graceful degradation&lt;/li&gt;
&lt;li&gt;[ ] Security scanning&lt;/li&gt;
&lt;li&gt;[ ] Backup strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Go build. Good luck. 🚀&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Event-Driven Architecture: Uncle Explains Like You're Five 👦👨‍🦳</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Sun, 14 Jun 2026 09:21:11 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/event-driven-architecture-uncle-explains-like-youre-five-4aaj</link>
      <guid>https://dev.to/surajrkhonde/event-driven-architecture-uncle-explains-like-youre-five-4aaj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A conversation between Uncle (a backend architect) and Nephew (a curious developer) about events, publishers, subscribers, Redis Pub/Sub, RabbitMQ, and Kafka&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Beginning: What is an Event?
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I see words like "Redis Pub/Sub", "RabbitMQ", "Kafka" everywhere in job descriptions. What do they all do? Are they the same thing?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; (smiles) No, they're different. But before I confuse you with names, let me ask you something. Have you ever watched the news on TV?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Yes uncle, every morning!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Perfect! So when the news channel says "Breaking News: India won the cricket match" - what happened?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Something important occurred... and they announced it!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! That "something important occurred" is called an &lt;strong&gt;Event&lt;/strong&gt;. In software, when something happens - like a user placing an order, a payment succeeding, or a file uploading - that's an event.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So event = something that happened?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! Think of it as news. When you place an order at Zomato, that's an event. When you get a payment notification from Google Pay, that's an event. When someone follows you on Instagram, that's an event.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay, I get it. But uncle, why is this "event" concept important? I can just write code directly, right?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Ah! That's where the real story begins...&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Spaghetti Code
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine you own a food delivery company. A customer places an order. Now, what all needs to happen?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Umm... save the order, send email confirmation, alert the restaurant?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good! Let me write the code without events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;placeOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;saveOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;sendEmailConfirmation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;sendSMSNotification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;notifyRestaurant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;updateAnalyticsDashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;addLoyaltyPoints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;updateInventory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, one day your boss says "Also send WhatsApp notification". What do you do?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Add another function call in the same code?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes, but now the function becomes bigger. Tomorrow boss says "Also notify admin", you add more. Your Order Service becomes this giant blob of code that does 10 different things. It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hard to maintain&lt;/strong&gt; - Change one thing, something else breaks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tightly coupled&lt;/strong&gt; - Every service knows about every other service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard to scale&lt;/strong&gt; - One service dying affects everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is called &lt;strong&gt;spaghetti code&lt;/strong&gt;. Everyone is tangled together.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That sounds awful! So what's the solution?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; What if the Order Service just announces "Order Created" and doesn't care who's listening?&lt;/p&gt;




&lt;h2&gt;
  
  
  Enter: Publisher &amp;amp; Subscriber
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember the news channel? The channel broadcasts news. The viewers listen if they want. The channel doesn't know who's watching - could be 1 person or 1 million.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Yes, like YouTube!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Perfect! YouTube is the &lt;strong&gt;Publisher&lt;/strong&gt; - they create content. You are the &lt;strong&gt;Subscriber&lt;/strong&gt; - you listen. The video upload is the &lt;strong&gt;Event&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now in our food delivery system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Order Service&lt;/strong&gt; = Publisher (creates the event "Order Created")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email Service, SMS Service, Analytics&lt;/strong&gt; = Subscribers (listen to that event)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Order Service just announces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📢 "Order Created - Order #123"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And doesn't care who listens!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So if tomorrow boss wants WhatsApp notification, I just add another subscriber?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! Order Service doesn't change. You just create a WhatsApp Service that subscribes to "Order Created" events. Beautiful decoupling!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Amazing! But uncle, how do they communicate? There must be something in the middle passing messages around?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! That's where Redis Pub/Sub comes in...&lt;/p&gt;




&lt;h2&gt;
  
  
  Redis Pub/Sub: The Simple Messenger
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think of it like a WhatsApp group. You have a group called "Order Events". When someone posts a message, everyone in the group sees it.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Oh! So Order Service is like one person posting in the group, and others are reading?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! Here's how it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Order Service (Publisher)
        |
        | Publishes to channel: "order-created"
        v
      Redis
        |
   -----+-----
   |    |    |
   v    v    v
Email  SMS  Analytics
Service Service Service
(Subscribers)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service subscribes to a "channel" - like joining a WhatsApp group.&lt;/p&gt;

&lt;p&gt;Order Service publishes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order-created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Email Service listens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order-created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So Redis is like the WhatsApp server?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! Redis is the messenger sitting in the middle, broadcasting messages.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; This seems perfect! Why would we need anything else?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Ah, great question! Here comes the problem...&lt;/p&gt;




&lt;h2&gt;
  
  
  The Crash: Redis Pub/Sub's Dirty Secret
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine it's 10:00 AM. Order Service publishes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Order Created - Order #123"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But at that exact moment, Email Service has crashed (network issue). Redis broadcasts the message. Email Service isn't listening.&lt;/p&gt;

&lt;p&gt;Message is gone. &lt;strong&gt;Forever&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Oh no! So the customer never gets the confirmation email?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! At 10:01 AM, Email Service comes back online. Can it ask Redis "Hey, what messages did I miss?"&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Can't it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; No! Redis says "Sorry, that message was 1 minute ago. I already sent it. I don't save messages."&lt;/p&gt;

&lt;p&gt;Think of it like FM Radio:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Radio broadcasts live music&lt;/li&gt;
&lt;li&gt;If you're listening: ✅ You hear it&lt;/li&gt;
&lt;li&gt;If you're not listening: ❌ You miss it&lt;/li&gt;
&lt;li&gt;Radio doesn't replay old songs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's a problem for important stuff like payment notifications!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! What if the event is "Payment Success"? We CANNOT lose that message. A customer should get confirmation even if they restart their phone.&lt;/p&gt;

&lt;p&gt;For such critical events, we need something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stores messages&lt;/strong&gt; (doesn't lose them)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retries&lt;/strong&gt; (if service is down, deliver later)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirms&lt;/strong&gt; (acknowledges delivery)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Redis Pub/Sub can't do any of this. And this pain gave birth to the next hero...&lt;/p&gt;




&lt;h2&gt;
  
  
  RabbitMQ: The Reliable Messenger
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What can RabbitMQ do that Redis can't?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me draw you a scenario. Same 10:00 AM order.&lt;/p&gt;

&lt;p&gt;Order Service sends to RabbitMQ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Order #123 Created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RabbitMQ says "Got it! I'm storing this message in a Queue."&lt;/p&gt;

&lt;p&gt;Now, even if Email Service is down, the message &lt;strong&gt;stays in the queue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When Email Service comes back online at 10:01 AM, RabbitMQ says:&lt;/p&gt;

&lt;p&gt;"Hey! You have 1 message. Here it is."&lt;/p&gt;

&lt;p&gt;Email Service reads and processes it.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So RabbitMQ is like a post office?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Perfect analogy! The post office stores your letters even if the recipient isn't home. When they come back, they get the letter.&lt;/p&gt;

&lt;p&gt;But there's a twist...&lt;/p&gt;




&lt;h2&gt;
  
  
  Queues &amp;amp; Workers
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; In RabbitMQ, we don't have "subscribers listening". Instead, we have &lt;strong&gt;workers pulling messages&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Imagine a pizza shop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kitchen has a &lt;strong&gt;Queue&lt;/strong&gt; of pizza orders written on paper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker 1&lt;/strong&gt; pulls an order: "Margherita Pizza"&lt;/li&gt;
&lt;li&gt;Worker 1 makes it and says "ACK" (acknowledgement) - "I got it!"&lt;/li&gt;
&lt;li&gt;Order is removed from queue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker 2&lt;/strong&gt; pulls next order: "Pepperoni Pizza"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No order is lost&lt;/li&gt;
&lt;li&gt;No order is made twice&lt;/li&gt;
&lt;li&gt;If Worker 1 crashes mid-making, the order goes back to queue for another worker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What does ACK mean?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; ACK = "Acknowledgement". It means "I processed this message, you can delete it from queue now."&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Email Service pulling messages&lt;/span&gt;
&lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;order-created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orderId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Process&lt;/span&gt;
    &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ✅ Success, delete from queue&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ❌ Failed, put back in queue&lt;/span&gt;
    &lt;span class="c1"&gt;// Another worker will retry&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Oh! So if Email Service crashes before sending email, the message goes back to queue?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! Another Email Service instance picks it up. The message isn't lost.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So RabbitMQ is perfect now?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Not quite... It has its own limitations...&lt;/p&gt;




&lt;h2&gt;
  
  
  RabbitMQ's Limitation: Queue is Consumed
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Here's the thing. In RabbitMQ, when a message is consumed and ACK'd, it's &lt;strong&gt;deleted forever&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Imagine you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email Service&lt;/li&gt;
&lt;li&gt;SMS Service&lt;/li&gt;
&lt;li&gt;Analytics Service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All processing the same event "Order Created".&lt;/p&gt;

&lt;p&gt;Email Service: ✅ Processed, ACK'd&lt;br&gt;
SMS Service: ✅ Processed, ACK'd&lt;br&gt;
Analytics Service: Hmm, I crashed. I never got to process it!&lt;/p&gt;

&lt;p&gt;In RabbitMQ, the message is gone. Analytics loses the data.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So we can't have multiple independent consumers reading the same message?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Not really. Each message is meant for ONE consumer.&lt;/p&gt;

&lt;p&gt;RabbitMQ is good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tasks: "Send this email", "Generate this PDF", "Resize this image"&lt;/li&gt;
&lt;li&gt;Once done, forget about it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it's bad for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Events/Facts: "User Purchased", "Order Placed"&lt;/li&gt;
&lt;li&gt;Where history matters&lt;/li&gt;
&lt;li&gt;Where multiple services need to analyze&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So what do we do?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Here comes the king... &lt;strong&gt;Kafka&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Kafka: The Event Store
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine instead of a trash can (like RabbitMQ), events are stored in a &lt;strong&gt;notebook&lt;/strong&gt; that never gets thrown away.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic: orders (a notebook)

Order 101 ← Event 1 (stored forever)
Order 102 ← Event 2 (stored forever)
Order 103 ← Event 3 (stored forever)
Order 104 ← Event 4 (stored forever)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email Service reads page 1, 2, 3&lt;/li&gt;
&lt;li&gt;SMS Service reads page 1, 2, 3 (same pages!)&lt;/li&gt;
&lt;li&gt;Analytics reads page 1, 2, 3 (same pages!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The notebook never gets destroyed. Everyone can read whenever they want!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Can they reread the same page again?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! That's the magic of Kafka. Let me show you...&lt;/p&gt;




&lt;h2&gt;
  
  
  The Magic of Offsets &amp;amp; Replay
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Kafka numbers every event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Offset 0 → Order 101
Offset 1 → Order 102
Offset 2 → Order 103
Offset 3 → Order 104
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Analytics Service reads Offsets 0, 1, 2. Then it crashes.&lt;/p&gt;

&lt;p&gt;It remembers: "I've read up to Offset 2."&lt;/p&gt;

&lt;p&gt;When it comes back online, Kafka says: "Where did you leave off?"&lt;/p&gt;

&lt;p&gt;Analytics: "Offset 2"&lt;/p&gt;

&lt;p&gt;Kafka: "Here's Offset 3, 4, 5..." (only new ones)&lt;/p&gt;

&lt;p&gt;No duplicates! No missed messages!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; And the notebook never got destroyed?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Never! It stays. This enables something magical called &lt;strong&gt;Replay&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;New team arrives: "Recommendation Team". They need to analyze last 30 days of purchases.&lt;/p&gt;

&lt;p&gt;In RabbitMQ: "Sorry, we already processed and deleted those messages."&lt;/p&gt;

&lt;p&gt;In Kafka: "No problem! Start reading from Offset 0."&lt;/p&gt;

&lt;p&gt;The Recommendation Team reads all 30 days of historical data. Done!&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; That's amazing! So Kafka is always better than RabbitMQ?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; It's different, not better. RabbitMQ is simpler if you just need task queues. Kafka is complex but powerful if you need event history.&lt;/p&gt;

&lt;p&gt;But wait, there's more to Kafka...&lt;/p&gt;




&lt;h2&gt;
  
  
  The Scaling Problem: Partitions
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, what if Amazon generates 10 million events per second? Can one notebook handle it?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; No! That's why Kafka invented &lt;strong&gt;Partitions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 notebook (one notebook is slow)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It creates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 notebooks (10 notebooks in parallel)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic: orders

Partition 0         Partition 1         Partition 2
------          ------          ------
Order 101       Order 102       Order 103
Order 105       Order 106       Order 104
Order 108       Order 109       Order 107
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Events are distributed across partitions. Each partition can be on a different machine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Partition 0 → Machine A (writing 3M events/sec)
Partition 1 → Machine B (writing 3M events/sec)
Partition 2 → Machine C (writing 3M events/sec)

Total: 9M events/sec (in parallel!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So how does Kafka decide which event goes to which partition?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; It uses a &lt;strong&gt;Key&lt;/strong&gt;. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;orders&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// All events for same user go to same partition&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All events for User 101 go to the same partition. This ensures ordering.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Ordering? Why does that matter?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Imagine User 101:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Logs in&lt;/li&gt;
&lt;li&gt;Purchases iPhone&lt;/li&gt;
&lt;li&gt;Logs out&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If these go to different partitions, they might be processed out of order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Logs out" (done first)&lt;/li&gt;
&lt;li&gt;"Logs in" (second)&lt;/li&gt;
&lt;li&gt;"Purchases" (last)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's wrong! The user hasn't even logged in yet when they purchase!&lt;/p&gt;

&lt;p&gt;By sending all User 101 events to the same partition, they stay ordered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Partition 2
  |
  v
Login (User 101)
Purchase (User 101)
Logout (User 101)

Always in order!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Brilliant! And what about multiple Email Services reading the same partition?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Ah! Here's where &lt;strong&gt;Consumer Groups&lt;/strong&gt; come in...&lt;/p&gt;




&lt;h2&gt;
  
  
  Consumer Groups: The Team
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Kafka says: "One partition can be read by only ONE consumer at a time, but multiple consumers can form a &lt;strong&gt;Consumer Group&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;Think team of workers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic: orders (3 partitions)

Consumer Group: Email Service
  - Worker A
  - Worker B
  - Worker C

Kafka assigns:
  - Worker A reads Partition 0
  - Worker B reads Partition 1
  - Worker C reads Partition 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No duplicate processing (each partition owned by one worker)&lt;/li&gt;
&lt;li&gt;Parallel processing (3 workers working simultaneously)&lt;/li&gt;
&lt;li&gt;Load balanced (if one worker dies, another takes over)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What if Worker B crashes?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Kafka detects the crash and &lt;strong&gt;rebalances&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before:
  - Worker A → Partition 0
  - Worker B → Partition 1 (💥 CRASHED)
  - Worker C → Partition 2

After:
  - Worker A → Partition 0, 1 (rebalanced)
  - Worker C → Partition 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Processing continues!&lt;/p&gt;




&lt;h2&gt;
  
  
  Redis vs RabbitMQ vs Kafka: The Comparison
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, now I'm confused. When should I use what?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Let me create a mental model for each:&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis Pub/Sub
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think &lt;strong&gt;FM Radio&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Radio Station broadcasts → Listeners hear (if tuned in)
                       → If not listening, message lost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live notifications ("User is typing...")&lt;/li&gt;
&lt;li&gt;Live updates ("Stock price changed")&lt;/li&gt;
&lt;li&gt;Realtime dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliable message delivery&lt;/li&gt;
&lt;li&gt;Message history&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RabbitMQ
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think &lt;strong&gt;Delivery Service&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sender → Package → Delivery Boy → Receiver
         (stored temporarily in warehouse)

Once delivered and signed, package is gone.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task queues ("Send Email", "Generate PDF")&lt;/li&gt;
&lt;li&gt;Work distribution&lt;/li&gt;
&lt;li&gt;Simple message processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event history&lt;/li&gt;
&lt;li&gt;Multiple independent consumers&lt;/li&gt;
&lt;li&gt;Replay capability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kafka
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Think &lt;strong&gt;Library Book&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Author writes → Book stored → Many readers can read
              (stays on shelf forever)

Anyone can read anytime. History preserved.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event history ("User purchased", "Payment completed")&lt;/li&gt;
&lt;li&gt;Multiple independent services analyzing same events&lt;/li&gt;
&lt;li&gt;Replay and timeline analysis&lt;/li&gt;
&lt;li&gt;High-throughput event streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple task queues (too complex)&lt;/li&gt;
&lt;li&gt;Fire-and-forget scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Perfect! So they're for different use cases?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! Choose based on your need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Real-time, lose-if-missed? → Redis
Reliable task queue? → RabbitMQ
Event history &amp;amp; multiple consumers? → Kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Real World Example: E-Commerce Order
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Can you give me a complete real-world example?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Sure! Amazon customer purchases iPhone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event:&lt;/strong&gt; "UserPurchased"&lt;/p&gt;

&lt;p&gt;Using Kafka, multiple services independently process this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Order&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;publishes:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Topic:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user-events&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Partition:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(because&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;events&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;go&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;here)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;userId:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;productId:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;iPhone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;amount:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

        &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Kafka&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stores&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;forever&lt;/span&gt;&lt;span class="w"&gt;

        &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Now&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;multiple&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;services&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;read:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Send&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;confirmation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;email&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Analytics&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Update&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;dashboard,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;revenue&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;metrics&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Fraud&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Check&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;suspicious&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;purchase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;pattern&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Recommendation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Suggest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;related&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;products&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Inventory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Update&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;count&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Loyalty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Service:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Offset:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;reward&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;points&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All reading the &lt;strong&gt;same event&lt;/strong&gt; independently. All storing their own offsets.&lt;/p&gt;

&lt;p&gt;If Analytics crashes and comes back after 2 hours:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It reads from its last offset&lt;/li&gt;
&lt;li&gt;Gets all events from the last 2 hours&lt;/li&gt;
&lt;li&gt;Processes them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The event still exists on Kafka's Topic, unchanged.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; This is like a perfect audit trail!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly! Netflix keeps Kafka events for 30 days. Any new team can analyze historical data. That's the power!&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways: What to Remember
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, so I should just remember...?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember these mental models:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event&lt;/strong&gt; = Something happened (like news)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Publisher&lt;/strong&gt; = One who creates events (like news channel)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Subscriber&lt;/strong&gt; = One who listens to events (like viewer)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis Pub/Sub&lt;/strong&gt; = FM Radio (instant, no storage)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RabbitMQ&lt;/strong&gt; = Delivery Service (reliable, consumed/deleted)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kafka&lt;/strong&gt; = Library Books (permanent, many can read)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partition&lt;/strong&gt; = Multiple books handling millions of events in parallel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consumer Group&lt;/strong&gt; = Team of workers reading partitions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Offset&lt;/strong&gt; = Page number, remember where you left off&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Replay&lt;/strong&gt; = Read from beginning again (only possible in Kafka)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; This makes so much sense now!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's the goal! Once you understand the "why" of events, the tools become simple. They all solve the same problem (decoupling services) with different tradeoffs.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use What: A Decision Tree
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Any quick tips for choosing?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Sure!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Do you need message history?
├─ YES → Use Kafka
├─ NO → Are multiple independent services reading the same message?
         ├─ YES → Use Kafka (trust me)
         ├─ NO → Do you need reliable delivery?
                  ├─ YES → Use RabbitMQ
                  ├─ NO → Use Redis Pub/Sub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Quick examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat app (new message notification) → Redis Pub/Sub&lt;/li&gt;
&lt;li&gt;Job queue (send 1000 emails) → RabbitMQ&lt;/li&gt;
&lt;li&gt;Order processing (email, analytics, fraud, inventory) → Kafka&lt;/li&gt;
&lt;li&gt;Live dashboard updates → Redis Pub/Sub&lt;/li&gt;
&lt;li&gt;Banking (payment settlement) → RabbitMQ or Kafka&lt;/li&gt;
&lt;li&gt;User analytics pipeline → Kafka&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Word
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Remember, it all starts with understanding the problem. None of these tools matter if you don't know:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What events matter to your business?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who needs to know about them?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Can we afford to lose a message?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do we need history?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How many events per second?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you answer these, the choice becomes obvious.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Thank you uncle! This was super clear!&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; You're welcome! Now go build something with events! Start simple with Redis Pub/Sub, then graduate to RabbitMQ, then maybe Kafka when you need it.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Will do, uncle! Less noise, more action! 😄&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; That's my boy! 👏&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Happy event-driven coding!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>eventdriven</category>
      <category>kafka</category>
      <category>backend</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Dawa Saathi: I finally finished the part of my medicine bot that actually mattered</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Fri, 05 Jun 2026 12:47:51 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/dawa-saathi-i-finally-finished-the-part-of-my-medicine-bot-that-actually-mattered-l99</link>
      <guid>https://dev.to/surajrkhonde/dawa-saathi-i-finally-finished-the-part-of-my-medicine-bot-that-actually-mattered-l99</guid>
      <description>&lt;p&gt;AI medicine-awareness bot worked — but only in English, the one language most of the people I built it for can't read. Here's the problem, the one-day sprint to fix it, and how GitHub Copilot got me through it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub Finish-Up-A-Thon submission.&lt;/strong&gt; I shipped this project months ago, but it was never truly &lt;em&gt;finished&lt;/em&gt; — it was missing the one feature that decided whether a real person in my own community could use it. This challenge was the push that made me sit down and close that gap in a single night.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The problem we face
&lt;/h2&gt;

&lt;p&gt;In India, you can walk into most pharmacies and buy prescription drugs without a prescription. We've quietly normalized something genuinely dangerous, and the research is blunt about it.&lt;/p&gt;

&lt;p&gt;A simulated-patient study in Bengaluru sent two researchers into &lt;strong&gt;261 pharmacies&lt;/strong&gt; with fake symptoms. Antibiotics were handed over &lt;strong&gt;without any prescription at roughly two-thirds (66.7%) of them.&lt;/strong&gt; Not a single pharmacy warned about side effects. Only about one in five even mentioned that a doctor's prescription was needed. The "guidance" most people walked out with was &lt;em&gt;"take it twice a day"&lt;/em&gt; — and nothing else. (&lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676185/" rel="noopener noreferrer"&gt;study&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Here is the part that kept me up at night: it's not because people are careless. The shopkeeper isn't a doctor. The label is printed in tiny English. A proper consultation costs time and money many families simply don't have. So people take what they're handed — and hope it's fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's important
&lt;/h2&gt;

&lt;p&gt;This isn't a small inconvenience. Taking the wrong medicine, or the right medicine the wrong way, is one of the biggest drivers of &lt;strong&gt;antibiotic resistance&lt;/strong&gt; — the slow disaster where medicines stop working.&lt;/p&gt;

&lt;p&gt;The landmark 2019 Lancet GRAM study estimated &lt;strong&gt;1.27 million deaths worldwide were directly attributable&lt;/strong&gt; to drug-resistant infections in a single year. In India alone, an estimated &lt;strong&gt;297,000 deaths were directly attributable to AMR, and about 1.04 million were associated with it.&lt;/strong&gt; (&lt;a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)02724-0/fulltext" rel="noopener noreferrer"&gt;Lancet GRAM 2019&lt;/a&gt;, &lt;a href="https://journals.lww.com/pmrr/fulltext/2024/01040/antimicrobial_resistance__an_emerging_global.1.aspx" rel="noopener noreferrer"&gt;India figures&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;I'm not a doctor and I can't change how medicines are sold. But I'm a developer, and I kept circling one small thing I might actually be able to build:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What if a person could at least understand what they're holding — before they take it?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I built &lt;strong&gt;Dawa Saathi&lt;/strong&gt;. You send a photo of a medicine label, and it reads it with AI and explains, in plain language: what the medicine is commonly used for, its common and serious side effects, and the key warnings (pregnancy, liver, kidney, alcohol, children, the elderly). It does &lt;strong&gt;not&lt;/strong&gt; diagnose. It does &lt;strong&gt;not&lt;/strong&gt; give a dose. It never says a medicine is "safe to take." Because with medicine, a confident wrong answer is more dangerous than no answer at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap I shipped with (the "before")
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth about the first version: &lt;strong&gt;it only spoke English.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I built Dawa Saathi for people in rural and small-town India — the elderly, people with little formal schooling, people who grew up reading Kannada, Hindi, or Marathi, not English. I come from that world myself; I went to a Marathi-medium government school in a village before I ever wrote a line of code. And yet the tool I built &lt;em&gt;for&lt;/em&gt; those people answered every single question in fluent English.&lt;/p&gt;

&lt;p&gt;It &lt;em&gt;worked&lt;/em&gt;. The AI read labels accurately, the safety guardrails held, caching and rate limits were solid. But it was "finished" the way a bridge is finished when it stops halfway across the river. The one feature that decided whether a real person in my community could actually use it — language — was the thing I'd left undone.&lt;/p&gt;

&lt;p&gt;The Finish-Up-A-Thon prompt ("revive a project you started but never completed") landed on exactly that nerve.&lt;/p&gt;

&lt;h2&gt;
  
  
  One day, under pressure
&lt;/h2&gt;

&lt;p&gt;I gave myself the length of the challenge to fix it, and it came down to a single overnight session — about &lt;strong&gt;six hours straight&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That kind of pressure cuts both ways. On one hand, a deadline is the best forcing function I know; the project had sat "good enough" for months precisely because nothing forced me to finish it. On the other hand, language support is not a cosmetic change — it touches the welcome flow, every button, every error message, the medicine response itself, and the storage layer. At 2 a.m., with a medical app where a sloppy mistake isn't just a bug, that's a lot of surface area to get right while tired.&lt;/p&gt;

&lt;p&gt;There were stretches where I went down the wrong path — I first started building a translation dictionary, realized it would silently drop medical words, and had to back it out and rethink the whole approach. Hours like that are where solo projects usually die.&lt;/p&gt;

&lt;h2&gt;
  
  
  How GitHub Copilot helped me through it
&lt;/h2&gt;

&lt;p&gt;I enabled GitHub Copilot specifically for this sprint, and I'll be honest about exactly what it did — good and bad — because that's the point of this challenge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it genuinely sped me up:&lt;/strong&gt; the repetitive scaffolding. The new i18n layer has dozens of string entries across two languages; the Telegram handler needed every reply rewired to read from it; the callback router needed a whole new branch for language selection. From a comment and the surrounding code, Copilot autocompleted big chunks of that pattern-heavy work. Tasks that would have been an hour of careful typing became a few minutes of accepting and fixing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it helped me think under fatigue:&lt;/strong&gt; when the bot stubbornly kept replying in English even after I'd changed the code, Copilot Chat helped me reason through the deploy pipeline and spot that I was running an old compiled build (&lt;code&gt;npm start&lt;/code&gt; runs &lt;code&gt;dist/&lt;/code&gt;, not &lt;code&gt;src/&lt;/code&gt; — I'd forgotten to rebuild). At 3 a.m., that saved me from chasing a ghost for an hour. It also caught a sloppy callback branch where my "language" and "purpose" buttons could collide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where I had to stay in charge:&lt;/strong&gt; Copilot is confidently wrong sometimes. It suggested code that &lt;em&gt;looked&lt;/em&gt; right but quietly dropped a safety guardrail, and once invented an API shape that didn't exist. In a normal app that's an annoyance; in a &lt;strong&gt;medical&lt;/strong&gt; app, an unreviewed AI suggestion is exactly the "confident wrong answer" my whole project exists to prevent. So I read every line it gave me. The speed was real — but the judgment stayed mine.&lt;/p&gt;

&lt;p&gt;Honestly, Copilot didn't replace that six-hour night. It made the difference between finishing it and giving up around hour three.&lt;/p&gt;

&lt;h2&gt;
  
  
  The major update: what actually changed
&lt;/h2&gt;

&lt;p&gt;The first thing the bot now does on &lt;code&gt;/start&lt;/code&gt; is ask which language you're comfortable in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🌐 Please choose your language
ದಯವಿಟ್ಟು ನಿಮ್ಮ ಭಾಷೆಯನ್ನು ಆಯ್ಕೆಮಾಡಿ
[ English ]   [ ಕನ್ನಡ ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tap a button, and &lt;strong&gt;everything afterward&lt;/strong&gt; — welcome, buttons, questions, and the full medicine explanation — comes back in that language. The choice is remembered, and you can change it anytime with &lt;code&gt;/language&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Under the hood, I split the text into two kinds and treated them very differently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fixed UI text&lt;/strong&gt; (buttons, labels, errors, and the all-important medical disclaimer) lives in a small &lt;strong&gt;i18n table&lt;/strong&gt;. It's instant, costs zero tokens, and — critical for health content — the disclaimer in each language can be reviewed word-for-word instead of machine-guessed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The medicine explanation is generated by the LLM directly in the user's language.&lt;/strong&gt; I deliberately did &lt;em&gt;not&lt;/em&gt; translate the English afterward. The awareness model writes its answer in Kannada from the start, while keeping JSON keys in English (so the parser never changes) and drug names in Latin script (so "Tranexamic Acid 500mg" is never mangled). Every safety rule — no dosage, no diagnosis, no "safe to take," always "consult a doctor" — is repeated for the target language.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That choice means &lt;strong&gt;one model call instead of two&lt;/strong&gt;, no brittle dictionary dropping medical words, and a reply that reads like it was written &lt;em&gt;for&lt;/em&gt; a Kannada speaker rather than awkwardly converted into Kannada. Each language is cached separately, so the next person who scans the same strip gets an instant answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for people in Karnataka
&lt;/h2&gt;

&lt;p&gt;This is the part I actually care about.&lt;/p&gt;

&lt;p&gt;Kannada is the first language of around &lt;strong&gt;44 million people&lt;/strong&gt;, and roughly &lt;strong&gt;two-thirds of Karnataka's ~68 million residents&lt;/strong&gt; speak it natively (&lt;a href="https://en.wikipedia.org/wiki/Kannada" rel="noopener noreferrer"&gt;2011 census / overview&lt;/a&gt;). Meanwhile, by census-based estimates only about &lt;strong&gt;one in ten Indians&lt;/strong&gt; use English at all — and a far smaller share read it comfortably enough to parse a medicine label or a list of side effects.&lt;/p&gt;

&lt;p&gt;Put those two facts together and the old version's flaw is stark: an English-only health tool quietly excluded the majority of the exact people it was meant to help. The grandmother in a Belagavi village holding a strip of tablets, the daily-wage worker in Kalaburagi who left school early — they could open the bot, but they couldn't &lt;em&gt;read&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;With this update, that same grandmother can now ask in Kannada and get an answer she can actually understand — what the medicine is for, what to watch out for, and a clear, repeated nudge to see a doctor. It doesn't make the bot a doctor. It makes the bot &lt;em&gt;legible&lt;/em&gt; to the people who need it most. For a medicine-awareness tool, legibility in the user's own language isn't a feature — it's the whole job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The before → after, at a glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Languages&lt;/td&gt;
&lt;td&gt;English only&lt;/td&gt;
&lt;td&gt;English + ಕನ್ನಡ (more in ~2 lines)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First interaction&lt;/td&gt;
&lt;td&gt;Dumped an English welcome&lt;/td&gt;
&lt;td&gt;Asks your language, then speaks it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who could actually use it&lt;/td&gt;
&lt;td&gt;English readers (~1 in 10)&lt;/td&gt;
&lt;td&gt;Tens of millions of Kannada speakers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation approach&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;LLM generates in-language; i18n for fixed UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reachable by my own family&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Future plan
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hindi and Marathi next&lt;/strong&gt; — the architecture already supports them; each new language is roughly a two-line change plus a reviewed disclaimer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice output&lt;/strong&gt;, for users who can't read at all — arguably the people who need this most.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviewed translations&lt;/strong&gt; of the fixed disclaimer with help from native-speaker volunteers, so the safety text is never just my best guess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pharmacy and NGO partnerships&lt;/strong&gt; — the long-term home for something like this is alongside health workers and CSR programs, not as a lone bot.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The honest impact
&lt;/h2&gt;

&lt;p&gt;I want to be careful not to overclaim. This won't fix the bigger problem. That needs doctors, pharmacists, and real regulation. One Telegram bot does not move a national health statistic.&lt;/p&gt;

&lt;p&gt;But the gap it closes is real and specific: a person can now understand, in their own language, what they're about to put in their body — before they take it. If that makes even a few people pause and think, or take one question to a doctor they'd otherwise have skipped, it was worth the night.&lt;/p&gt;

&lt;p&gt;If you work in public health or medicine, I'd genuinely value your guidance on doing this responsibly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it / code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bot:&lt;/strong&gt; TELEGRAM_BOT_LINK — &lt;a href="https://t.me/dawasaathi_bot" rel="noopener noreferrer"&gt;https://t.me/dawasaathi_bot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/SurajKhonde/Dawa_saathi_mediAssist" rel="noopener noreferrer"&gt;https://github.com/SurajKhonde/Dawa_saathi_mediAssist&lt;/a&gt; (MIT licensed)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Demo:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before_challenge&lt;/strong&gt;:-&lt;a href="https://drive.google.com/file/d/1UW5HsSfBZKWx93P-flJ9sUKVrg03JflS/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/1UW5HsSfBZKWx93P-flJ9sUKVrg03JflS/view?usp=sharing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After_challenge&lt;/strong&gt; :- &lt;a href="https://drive.google.com/file/d/1pDaeMIelnb9ymHF9UygBmR4DY6lT7Vh1/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/1pDaeMIelnb9ymHF9UygBmR4DY6lT7Vh1/view?usp=sharing&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built solo. Not a substitute for a doctor — by design.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Non-prescription antimicrobial dispensing in Bangalore pharmacies (261-pharmacy simulated-client study): &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676185/" rel="noopener noreferrer"&gt;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676185/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Global burden of bacterial AMR, 2019 (Lancet, GRAM): &lt;a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)02724-0/fulltext" rel="noopener noreferrer"&gt;https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)02724-0/fulltext&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;India-specific AMR mortality (297,000 attributable / 1,042,500 associated, 2019): &lt;a href="https://journals.lww.com/pmrr/fulltext/2024/01040/antimicrobial_resistance__an_emerging_global.1.aspx" rel="noopener noreferrer"&gt;https://journals.lww.com/pmrr/fulltext/2024/01040/antimicrobial_resistance__an_emerging_global.1.aspx&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kannada speakers and Karnataka share (2011 census overview): &lt;a href="https://en.wikipedia.org/wiki/Kannada" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Kannada&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
    </item>
    <item>
      <title>Uncle, I burned ₹1000 in 4 runs — what did I do wrong?</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Fri, 05 Jun 2026 07:17:23 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/uncle-i-burned-1000-in-4-runs-what-did-i-do-wrong-59il</link>
      <guid>https://dev.to/surajrkhonde/uncle-i-burned-1000-in-4-runs-what-did-i-do-wrong-59il</guid>
      <description>&lt;p&gt;"An uncle-nephew conversation about the new line item on every developer's bill: LLM tokens."&lt;/p&gt;




&lt;h1&gt;
  
  
  Uncle, I burned ₹1000 in 4 runs
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;An uncle-nephew conversation on why some AI calls cost paise, and others cost thousands.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Uncle, I have a problem. I put ₹1000 of my own money into the AI API this month and it finished in &lt;em&gt;four runs&lt;/em&gt;. I don't even know what happened.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Sit down, beta. Drink chai first. Then tell me everything you built this month, from the beginning.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Okay. I built four things. Three of them were fine. One of them ate the whole month.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Good. We'll go one by one. Start with the easy one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project one: image text extraction
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; First one was simple. User uploads a photo — invoice, prescription, screenshot, whatever. I send it to the vision model. It returns the text. Done.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; How many calls per user?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; One. Maybe two if the image is bad and they retry.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; And the cost?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Honestly, kaka, almost nothing. A few paise per image. I scanned around 200 images during testing and barely touched ₹50.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Now stop and think. Why was it so cheap?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Aha… because it's just one image in, some text out. One round trip.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Exactly. &lt;strong&gt;One input, one output, one transaction.&lt;/strong&gt; Like sending one SMS. You know what you're paying before you press send. Keep this model in your head. It's the cheapest pattern that exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project two: a customer support bot
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Second one was a support bot for a client. User types a question. Bot replies.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Where does the bot get its answers from?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I wrote a system prompt — around 800 words — explaining the product, common questions, tone of voice. Every user message goes in with that prompt. Bot replies.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; And the cost?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; A bit higher than OCR. Each conversation was maybe ₹2 to ₹3. Total for the month, around ₹200 across hundreds of conversations.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Tell me — why is this one costlier than the image project?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; (thinking) …because the system prompt is going every time?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. &lt;strong&gt;You're paying for the same 800 words of instructions on every single message.&lt;/strong&gt; The user pays for one message, you pay for 800 + their message. But conversations are short, so it doesn't get out of hand.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; And the bot replies are also small — 100, 150 words.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Small input, small output, slightly bigger envelope than OCR. Still safe. Now the next project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project three: document Q&amp;amp;A with RAG
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Third one was a document Q&amp;amp;A. User uploads a 50-page PDF, asks questions, gets answers from inside the document.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Smart. How did you handle the document?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I chunked the PDF, embedded each chunk, stored them in a vector database. When the user asks, I pull the top 5 relevant chunks and send them with the question to the LLM.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Beta, that's actually a sensible design. Cost?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; ₹3 to ₹4 per question. The project as a whole cost me around ₹300 across some testing and a few real users.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Why higher than the support bot?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Because each question carries 5 chunks of document context. Maybe 3000 words of input on every call.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Correct. &lt;strong&gt;Bigger input on every call.&lt;/strong&gt; But notice what you didn't do — you didn't send the entire 50-page PDF every time. You sent only the 5 most relevant pieces. That's a smart RAG.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's a dumb one?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; A dumb RAG sends the whole 50-page document every time the user asks anything. Some people actually build this. They pay for 50 pages of input when they could have paid for 5. Don't be that person.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Noted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project four: the disaster
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Now tell me about the project that ate the rest of your money.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; (sighs) Job search agent. Idea was — user uploads their resume, my agent searches the web for matching jobs, returns the top ones with reasoning. Like a personal recruiter.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; How does the agent search?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; It has a web search tool. Searches job boards, company career pages — whatever comes up. Clicks into postings, reads them, decides if they match the resume, then either suggests them or searches again with refined keywords.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; How many of these search-and-read cycles per user?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I didn't think about it carefully. The agent does around 5 searches. Each search returns 10 results. Then it fetches the most promising 3 or 4 pages in full. Reasons over them. Maybe searches again.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Cost per user?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; …₹200 to ₹250 per user, kaka.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; (silence)&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I tested it four times to debug. ₹1000 gone. And it didn't even get anyone a job. It just &lt;em&gt;suggested&lt;/em&gt; some.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the money actually went
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Beta, listen carefully now. I want you to understand what really happened, because it will change how you think about every AI project from now on.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Tell me.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Look at the four projects we discussed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OCR:                [image] → [text]                          one call
Support bot:        [prompt + query] → [answer]               one call
RAG:                [prompt + chunks + query] → [answer]      one call
Web search agent:   [prompt + query] → [search results]
                  → [prompt + results] → [decide to fetch]
                  → [prompt + results + FULL PAGE] → [decide next]
                  → [prompt + results + page + ANOTHER PAGE] → ...
                  → and on, and on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Oh… the last one is a loop.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes. And here is the part that breaks beginners. &lt;strong&gt;Every call inside that loop includes everything that came before it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Wait, what?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; When the agent makes its second decision, it must see what the first search returned. So the second call carries the first search's results. When it makes the third decision, it must see the first two searches plus whatever pages were fetched in between. By the fifth step, the input on each call has grown to thousands and thousands of words.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So the context keeps stacking.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; And web pages are &lt;em&gt;huge&lt;/em&gt;. One stripped-down job posting page is easily 2000 to 5000 words. Multiply by 3 or 4 pages, your agent is reasoning over 15,000 to 20,000 words on every step. Five steps of looping — you're at 80,000+ words of input billed for &lt;strong&gt;one user session.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Compared to OCR…&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Compared to OCR, your web search agent cost you roughly &lt;strong&gt;400 times more per user&lt;/strong&gt;. For the same kind of API.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; (quietly) I see.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one rule to remember
&lt;/h2&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Pick up your pen. Write this down. Stick it above your laptop.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The cost of an LLM application is not the cost of one call.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's the cost of one call × how many times you call × how big the input has grown by then.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Three things multiply together. Most beginner cost disasters come from forgetting one of them.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Which one bit me?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; All three at once. &lt;strong&gt;Many calls&lt;/strong&gt; because of the agent loop. &lt;strong&gt;Growing input&lt;/strong&gt; because of accumulated page content. And &lt;strong&gt;each call seemed cheap when you tested in isolation&lt;/strong&gt;, so you never saw the multiplication coming.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Kaka, this is like electricity. Each appliance seems small. The bill at the end of the month tells the truth.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Yes! Exactly that. AI tokens are electricity, not water connection. Variable, not fixed. Some operations are fans. Some are air conditioners running all day.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to build the same agent without going broke
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So how do I build a job search agent that doesn't bankrupt me?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; I'll give you five rules. Learn them properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 1: Match the model to the job
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Not every step needs your most expensive model. There's a hierarchy.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Like what?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Extracting text from an image? Use a smaller, cheaper vision model. The task is mechanical. Classifying whether a job posting matches a resume? A small model is more than enough. Writing the final personalized recommendation letter to the user? &lt;em&gt;Now&lt;/em&gt; use your best model. That's the part where quality is visible.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I was using the smartest, most expensive model for &lt;em&gt;every&lt;/em&gt; step.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; I know. Most people do. It's like using a JCB to plant a tulsi plant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 2: Don't pass full pages — pass extracts
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; If your agent fetches a 5000-word page, do not shove all 5000 words into the next reasoning step.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Then what?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Run a &lt;em&gt;cheap&lt;/em&gt; model first to pull only the relevant 200 words. Pass those 200 words to your expensive reasoning model. Two-stage processing. You can cut 80–90% of cost this way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bad:    [agent — best model] reads 5000-word page → decides
Good:   [extractor — cheap model] pulls 200 relevant words
     → [agent — best model] reads 200 words → decides
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rule 3: Cap the loop
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Set a maximum number of iterations on your agent. Three searches. Four. Not "keep going until satisfied."&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Why?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Because an agent with no cap will happily run forever if you let it. Polite, hard-working, expensive servants need supervision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 4: Cache like your wallet depends on it
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; If two users have similar resumes, you're searching for the same jobs and reading the same pages. Cache the search results. Cache the extracted summaries. Cache the model's interpretations.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Disk is cheaper than tokens?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Disk is cheaper than tokens by a factor of a thousand. Always.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 5: Use specialized APIs, not generic crawling
&lt;/h3&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; This was the biggest mistake in your job project.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Tell me.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; You were doing general web search and crawling job board HTML. That is the most expensive path that exists. Job boards have APIs. LinkedIn, Naukri, Indeed, Glassdoor — many of them have official APIs or structured data feeds.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Even paid?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Even paid. Pay ₹1000 once for an API subscription instead of ₹250 per user for crawling. The math turns positive at the fourth user.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; I thought building my own crawler was the clever, free way.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; It was the &lt;em&gt;expensive&lt;/em&gt; way wearing a free costume.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bigger lesson
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Kaka, I think I've been treating the AI API like a database. Like a fixed cost.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; And what is it actually?&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; More like electricity. The more you do, the more it draws. Some operations draw a lot more than others.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Now you're thinking like an engineer, not just a developer.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What's the difference?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; A developer asks, "does this work?" An engineer asks, "does this work, and what does it cost to run for ten thousand users?" Both questions matter. But only one of them will tell you whether your product survives past the third month.&lt;/p&gt;




&lt;h2&gt;
  
  
  The math before you build
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; So before I build my next AI project, what do I do?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Estimate four numbers for your worst-case user.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Tell me.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;How many model calls per session.&lt;/li&gt;
&lt;li&gt;How big the input is on each call (in words — roughly 1 word ≈ 1.3 tokens).&lt;/li&gt;
&lt;li&gt;How big the output is on each call.&lt;/li&gt;
&lt;li&gt;What you're paying per million input and per million output tokens for your chosen model.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Multiply them out. If the per-user cost is more than what the user pays you — or more than you can absorb as a free tier — you don't have a product. You have a charity that will close in two months.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Or worse, a bill that goes to my father.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; (laughs) Don't make me explain this conversation to him.&lt;/p&gt;




&lt;h2&gt;
  
  
  A short summary for the next nephew
&lt;/h2&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Kaka, can you give me a small table I can keep in my notes?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Take this.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project type&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Cost behaviour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image OCR&lt;/td&gt;
&lt;td&gt;One image → one output&lt;/td&gt;
&lt;td&gt;Cheap, predictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support bot&lt;/td&gt;
&lt;td&gt;Prompt + question → answer&lt;/td&gt;
&lt;td&gt;Cheap, mildly accumulating&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document RAG&lt;/td&gt;
&lt;td&gt;Retrieved chunks + question&lt;/td&gt;
&lt;td&gt;Moderate, scales with chunk size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent + web search&lt;/td&gt;
&lt;td&gt;Many loops with growing context&lt;/td&gt;
&lt;td&gt;Expensive, scales explosively&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; The hierarchy is not about technology being "advanced" or "basic." It is about &lt;strong&gt;how many tokens flow through the model per user, per session.&lt;/strong&gt; Some patterns naturally produce a small flow. Others produce a flood.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; Know which one I'm building. Do the math first. Pick the right model for each step. Cache aggressively.&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; And one last thing.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; What?&lt;/p&gt;

&lt;p&gt;👨‍🦳 &lt;strong&gt;Uncle:&lt;/strong&gt; Next time, build a small calculator before you build the agent. One spreadsheet cell would have saved you a thousand rupees.&lt;/p&gt;

&lt;p&gt;👦 &lt;strong&gt;Nephew:&lt;/strong&gt; (laughs) Noted, kaka.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful, I write about backend engineering, system design, and these kinds of practical lessons from real projects. Less noise, more action.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— until the next nephew project goes sideways.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>softwareengineering</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Long ChatGPT chats kept breaking my copy-paste. So I built an exporter.</title>
      <dc:creator>surajrkhonde</dc:creator>
      <pubDate>Fri, 05 Jun 2026 06:01:08 +0000</pubDate>
      <link>https://dev.to/surajrkhonde/long-chatgpt-chats-kept-breaking-my-copy-paste-so-i-built-an-exporter-3bej</link>
      <guid>https://dev.to/surajrkhonde/long-chatgpt-chats-kept-breaking-my-copy-paste-so-i-built-an-exporter-3bej</guid>
      <description>&lt;p&gt;A free Chrome extension I built to save full ChatGPT conversations as Markdown in one click — because the existing ways were genuinely painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  tags: chrome, productivity, opensource, chatgpt
&lt;/h2&gt;

&lt;p&gt;If you've ever had one of those long ChatGPT conversations — the kind where you spent half a day going deep into system design, or debugging something tricky, or learning a new concept end to end — you've probably run into this moment:&lt;/p&gt;

&lt;p&gt;You finish. You realize the chat is gold. You want to save it.&lt;/p&gt;

&lt;p&gt;You hit &lt;code&gt;Ctrl + A&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Your browser freezes for ten seconds.&lt;/p&gt;

&lt;p&gt;You finally manage to copy. You paste it into a doc. Half the formatting is gone. Code blocks are mangled. The structure that made the conversation useful is just… not there anymore.&lt;/p&gt;

&lt;p&gt;That's where this story starts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem (and why it's worse than it sounds)
&lt;/h2&gt;

&lt;p&gt;I learn a lot from ChatGPT. Not in a passive "ask one question, move on" way — I have &lt;strong&gt;multi-day conversations&lt;/strong&gt; going on system design, RabbitMQ internals, architecture decisions, debugging sessions. Some of my chats have &lt;strong&gt;800+ messages&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These conversations are genuinely valuable. They're notes I keep coming back to. They're the work.&lt;/p&gt;

&lt;p&gt;But getting them &lt;em&gt;out&lt;/em&gt; of ChatGPT is shockingly bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copy-paste&lt;/strong&gt;: freezes the browser on long chats, mangles formatting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screenshot&lt;/strong&gt;: useless if you ever need to search or re-read it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT's own export feature&lt;/strong&gt;: emails you a giant ZIP… sometimes days later… of &lt;em&gt;everything you've ever written&lt;/em&gt;. No per-chat option.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual cleanup&lt;/strong&gt;: spending an hour formatting one conversation into Markdown is not what I want to be doing at 11 PM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I checked if a tool already existed. Some did — most were either broken on long chats, badly designed, full of trackers, or only saved the messages currently visible on screen (which is a &lt;em&gt;fraction&lt;/em&gt; of a long conversation, because ChatGPT only keeps the visible messages loaded in memory).&lt;/p&gt;

&lt;p&gt;So I built my own.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT Exporter&lt;/strong&gt; is a Chrome extension that does one thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Save your entire ChatGPT conversation as a clean Markdown file in one click.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. No account, no signup, no upload, no cloud. You click a button, your file downloads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoofodr882o9be4mtkfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoofodr882o9be4mtkfw.png" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;br&gt;
It has three modes depending on the situation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Export Full Chat&lt;/strong&gt; — the main one. Works on chats of any length, including ones with thousands of messages. One click, full conversation, done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Export Visible&lt;/strong&gt; — when you only want to save the part you're looking at right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Scroll Capture&lt;/strong&gt; — a fallback that captures messages as you scroll through the chat. Useful if the first mode ever hiccups.&lt;/p&gt;

&lt;p&gt;The output is clean Markdown with proper headings, code blocks (with the right language), bold, italics, links, lists — everything that made the conversation structured stays structured.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;It started as a tool for me, but it turns out a lot of people have this problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code learners&lt;/strong&gt; taking notes from long debugging or learning sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writers and researchers&lt;/strong&gt; preserving drafts and idea threads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; archiving architectural discussions they want to reference later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anyone&lt;/strong&gt; who realized too late that an important conversation got buried in their chat history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever thought "I should save that conversation somewhere I can actually find it" and then didn't, because the friction was too high — this is for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Chrome Web Store&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://chromewebstore.google.com/detail/pmkhjmgnkohjdgninjfkdcegfkmnkfca" rel="noopener noreferrer"&gt;https://chromewebstore.google.com/detail/pmkhjmgnkohjdgninjfkdcegfkmnkfca&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📦 &lt;strong&gt;Source code&lt;/strong&gt;: &lt;a href="https://github.com/SurajKhonde/chatgpt-exporter" rel="noopener noreferrer"&gt;https://github.com/SurajKhonde/chatgpt-exporter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's open source under MIT. Free forever. No ads, no tracking, no account.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned building it
&lt;/h2&gt;

&lt;p&gt;A few things I picked up that might be useful to anyone building browser extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manifest V3 is much stricter than V2.&lt;/strong&gt; Permissions get audited automatically — request only what you actually use, or your submission gets rejected. (Ask me how I know.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DOM virtualization is a real problem.&lt;/strong&gt; ChatGPT only mounts the messages currently near the viewport, so naively scraping the page misses 90% of a long conversation. Solving this properly took more iteration than I expected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stable IDs matter.&lt;/strong&gt; Every message in ChatGPT has a unique ID — using that as the dedup key (instead of position) is the difference between a working exporter and a broken one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shipping a Chrome extension is a process.&lt;/strong&gt; Icons, screenshots, promo tiles, privacy practices, justifications for every permission… expect to spend almost as much time on the store listing as the code.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;A few things I'd like to add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF export (Markdown → PDF in one click)&lt;/li&gt;
&lt;li&gt;JSON export (raw conversation tree, for people who want to do their own processing)&lt;/li&gt;
&lt;li&gt;Batch export — saving multiple conversations at once&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have ideas, PRs welcome on GitHub.&lt;/p&gt;




&lt;h2&gt;
  
  
  One last thing
&lt;/h2&gt;

&lt;p&gt;If you've also been losing important conversations in your ChatGPT history, give this a try. And if it saves you time, drop a ⭐ on the repo or share this with someone who needs it.&lt;/p&gt;

&lt;p&gt;Built out of a real problem I face daily. Hope it helps you too.&lt;/p&gt;

&lt;p&gt;— Suraj&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
