<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ksirailway Base</title>
    <description>The latest articles on DEV Community by Ksirailway Base (@ksirailway_base).</description>
    <link>https://dev.to/ksirailway_base</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838216%2F0576ac78-c109-4eb0-8bb3-d3c28911237a.jpg</url>
      <title>DEV Community: Ksirailway Base</title>
      <link>https://dev.to/ksirailway_base</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ksirailway_base"/>
    <language>en</language>
    <item>
      <title>To Embed or Not to Embed? That Is the Question.</title>
      <dc:creator>Ksirailway Base</dc:creator>
      <pubDate>Tue, 24 Mar 2026 20:16:07 +0000</pubDate>
      <link>https://dev.to/ksirailway_base/to-embed-or-not-to-embed-that-is-the-question-24eg</link>
      <guid>https://dev.to/ksirailway_base/to-embed-or-not-to-embed-that-is-the-question-24eg</guid>
      <description>&lt;p&gt;&lt;strong&gt;To Embed or Not to Embed? That Is the Question&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a series of stories about my grammar RAG assistant &lt;a href="https://github.com/Ksirailway-base/BookMind" rel="noopener noreferrer"&gt;BookMind&lt;/a&gt; and it pissed me off again.&lt;/p&gt;

&lt;p&gt;Student asked: “&lt;strong&gt;Explain the Past Simple tense.&lt;/strong&gt;”&lt;/p&gt;

&lt;p&gt;The system gave a decent explanation.&lt;/p&gt;

&lt;p&gt;Then the student said: “&lt;strong&gt;Give me an exercise on this topic.&lt;/strong&gt;”&lt;/p&gt;

&lt;p&gt;Instead of pulling an exercise from the same unit, the model brought something from a completely different section. The conversation broke.&lt;/p&gt;

&lt;p&gt;That was the moment I finally added a proper reranker.&lt;/p&gt;

&lt;p&gt;What changed in the pipeline&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Stage 1: Hybrid retrieval (25 candidates)
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Stage 2: Cross-Encoder reranking
&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrossEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Stage 3: Only the best 5 go to the LLM
&lt;/span&gt;&lt;span class="n"&gt;final_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)][:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real conversation after adding reranker&lt;br&gt;
Student asks for the rule → system correctly pulls the right section (Past Simple)&lt;br&gt;
Student asks for a task on the same topic → system now pulls the correct exercise (p.378)&lt;br&gt;
Student submits wrong answers (“goed”, “boughten”) → system gives precise feedback and points to the exact unit (Unit 68 &amp;gt; 68.3)&lt;/p&gt;

&lt;p&gt;The whole conversation stayed coherent. No more jumping between unrelated parts of the book.&lt;/p&gt;

&lt;p&gt;Measurable improvement&lt;br&gt;
Before reranker: Top-1 Accuracy ≈ 40%&lt;br&gt;
After reranker: Top-1 Accuracy ≈ 95%&lt;br&gt;
Reranking 24–25 candidates takes ~1.51 seconds&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So?&lt;/strong&gt;&lt;br&gt;
Embeddings + hybrid search are good at finding something. Cross-encoder reranking is what makes the system actually understand what is relevant for the current question.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The extra 1.5 seconds is worth every millisecond.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Have you tried cross-encoder reranking in your projects? How many candidates do you usually pass to it?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Your RAG pipeline is only as good as the shit you put in your vector database</title>
      <dc:creator>Ksirailway Base</dc:creator>
      <pubDate>Tue, 24 Mar 2026 07:15:59 +0000</pubDate>
      <link>https://dev.to/ksirailway_base/your-rag-pipeline-is-only-as-good-as-the-shit-you-put-in-your-vector-database-133g</link>
      <guid>https://dev.to/ksirailway_base/your-rag-pipeline-is-only-as-good-as-the-shit-you-put-in-your-vector-database-133g</guid>
      <description>&lt;p&gt;I’m continuing my series of posts about my RAG assistant for textbook grammar. The first version worked. Technically. You ask a question -&amp;gt; you get an answer.&lt;br&gt;
And then I started testing it like a regular student… and I was blown away.&lt;br&gt;
Instead of helping me learn, the model was just solving the exercises. It was spitting out ready-made answers. At first I thought, “Well, the prompt is bad.” I was wrong.&lt;br&gt;
The problem was how I fed the book into the model.&lt;br&gt;
How I did it at first (shame on me)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# pdf → chunks по N токенов → Chroma
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The "fill in the blanks" exercise and the rule associated with it looked almost identical when embedded. When a student asks for help with the exercise, the system pulls the answer key from another page. The model happily solves the problem for the student.&lt;br&gt;
I was seriously cringing when I realized how stupid that was.&lt;br&gt;
&lt;strong&gt;What I did instead&lt;/strong&gt;&lt;br&gt;
I wrote a parser that understands the textbook's structure before anything gets sent to Chroma.&lt;/p&gt;

&lt;p&gt;Here's what the main classification looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_classify_content_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_EXERCISE_KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exercise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_REFERENCE_KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_VOCAB_KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vocabulary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_EXAMPLE_KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_GRAMMAR_KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then each chunk is assigned rich metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;book&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;book_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hierarchy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;section&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fill_blank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rewrite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grammar_terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;present perfect, passive voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;related_rule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unit 5 &amp;gt; 5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main lesson that really hit home for me:&lt;br&gt;
Everyone goes on and on about embeddings, chunk size, hybrid search, rerankers, and which prompt is best. That’s important.&lt;br&gt;
But if your model doesn’t understand the difference between a rule and an exercise—even if you feed it Claude 4 Opus it’s still going to be crap. The model will start building a structure that you didn’t give it.&lt;br&gt;
&lt;a href="https://github.com/Ksirailway-base/BookMind" rel="noopener noreferrer"&gt;GitHub link&lt;/a&gt;&lt;br&gt;
&lt;a href="https://huggingface.co/spaces/Ksirailway/BookMind" rel="noopener noreferrer"&gt;HuggingFace Demo&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How I Built an Offline AI Tutor That Actually Understands Textbooks (LFM2 + RAG)</title>
      <dc:creator>Ksirailway Base</dc:creator>
      <pubDate>Sun, 22 Mar 2026 10:46:59 +0000</pubDate>
      <link>https://dev.to/ksirailway_base/how-i-built-an-offline-ai-tutor-that-actually-understands-textbooks-lfm2-rag-1mpm</link>
      <guid>https://dev.to/ksirailway_base/how-i-built-an-offline-ai-tutor-that-actually-understands-textbooks-lfm2-rag-1mpm</guid>
      <description>&lt;p&gt;I was studying English from Murphy's Grammar in Use and kept running into the same problem: every AI I tried wanted to explain grammar like it was giving a TED talk. Long preambles, theatrical examples, confident hallucinations about rules that aren't in my book.&lt;/p&gt;

&lt;p&gt;I wanted something simpler. Open the book. Give me exercise 47. Check my answer against the actual rule on that page. No internet. No subscription. No GPU.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem I actually had to solve&lt;/strong&gt;&lt;br&gt;
Standard RAG pipelines chunk blindly. A fill-in-the-blank exercise that spans two pages becomes nonsense after splitting. I wrote a regex parser that extracts exercises directly from the PDF before they touch the LLM — the task is copied from the book, not generated. No hallucinations possible on the exercise itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For retrieval:&lt;/strong&gt; ChromaDB + BM25 hybrid search. &lt;br&gt;
&lt;strong&gt;For inference:&lt;/strong&gt; LFM2-2.6B via llama.cpp. I chose LFM2 specifically because I wanted to test it — see how it handles factual constraint, whether it stays inside the textbook or wanders off. First time using it in a real pipeline. Turns out it's well-behaved on RAM-only hardware.&lt;br&gt;
Conversational memory covers 24 messages (12 per side) — enough to do a full exercise session with follow-up questions without losing context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I don't know yet&lt;/strong&gt;&lt;br&gt;
The parser works for Murphy-style grammar books. &lt;em&gt;I have no idea what happens with math textbooks or scientific papers&lt;/em&gt; — that's next on the list. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you try it on something weird, open an issue and tell me what breaks. And link on HuggingFace Space &lt;a href="https://huggingface.co/spaces/Ksirailway/BookMind" rel="noopener noreferrer"&gt;there&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
