DEV Community: Ksirailway Base

To Embed or Not to Embed? That Is the Question.

Ksirailway Base — Tue, 24 Mar 2026 20:16:07 +0000

To Embed or Not to Embed? That Is the Question

In a series of stories about my grammar RAG assistant BookMind and it pissed me off again.

Student asked: “Explain the Past Simple tense.”

The system gave a decent explanation.

Then the student said: “Give me an exercise on this topic.”

Instead of pulling an exercise from the same unit, the model brought something from a completely different section. The conversation broke.

That was the moment I finally added a proper reranker.

What changed in the pipeline

# Stage 1: Hybrid retrieval (25 candidates)
candidates = retriever.invoke(question)

# Stage 2: Cross-Encoder reranking
scores = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') \
            .predict([[question, doc.page_content] for doc in candidates])

# Stage 3: Only the best 5 go to the LLM
final_context = [doc for _, doc in sorted(zip(scores, candidates), reverse=True)][:5]

Real conversation after adding reranker
Student asks for the rule → system correctly pulls the right section (Past Simple)
Student asks for a task on the same topic → system now pulls the correct exercise (p.378)
Student submits wrong answers (“goed”, “boughten”) → system gives precise feedback and points to the exact unit (Unit 68 > 68.3)

The whole conversation stayed coherent. No more jumping between unrelated parts of the book.

Measurable improvement
Before reranker: Top-1 Accuracy ≈ 40%
After reranker: Top-1 Accuracy ≈ 95%
Reranking 24–25 candidates takes ~1.51 seconds

So?
Embeddings + hybrid search are good at finding something. Cross-encoder reranking is what makes the system actually understand what is relevant for the current question.

The extra 1.5 seconds is worth every millisecond.

Have you tried cross-encoder reranking in your projects? How many candidates do you usually pass to it?

Your RAG pipeline is only as good as the shit you put in your vector database

Ksirailway Base — Tue, 24 Mar 2026 07:15:59 +0000

I’m continuing my series of posts about my RAG assistant for textbook grammar. The first version worked. Technically. You ask a question -> you get an answer.
And then I started testing it like a regular student… and I was blown away.
Instead of helping me learn, the model was just solving the exercises. It was spitting out ready-made answers. At first I thought, “Well, the prompt is bad.” I was wrong.
The problem was how I fed the book into the model.
How I did it at first (shame on me)

# pdf → chunks по N токенов → Chroma

That's it. The "fill in the blanks" exercise and the rule associated with it looked almost identical when embedded. When a student asks for help with the exercise, the system pulls the answer key from another page. The model happily solves the problem for the student.
I was seriously cringing when I realized how stupid that was.
What I did instead
I wrote a parser that understands the textbook's structure before anything gets sent to Chroma.

Here's what the main classification looks like:

def _classify_content_type(text: str, section: str) -> str:
    combined = f"{section} {text}"

    if _EXERCISE_KEYWORDS.search(combined):
        return "exercise"
    if _REFERENCE_KEYWORDS.search(combined):
        return "reference"
    if _VOCAB_KEYWORDS.search(combined):
        return "vocabulary"
    if _EXAMPLE_KEYWORDS.search(combined):
        return "example"
    if _GRAMMAR_KEYWORDS.search(combined):
        return "rule"
    return "other"

And then each chunk is assigned rich metadata:

metadata = {
    "book": book_name,
    "page": page_num,
    "chapter": hierarchy.get("chapter", ""),
    "section": section,
    "content_type": content_type,
    "task_pattern": "fill_blank" | "choose" | "rewrite" | ...,
    "grammar_terms": "present perfect, passive voice",
    "related_rule": "Unit 5 > 5.1",  
    ...
}

The main lesson that really hit home for me:
Everyone goes on and on about embeddings, chunk size, hybrid search, rerankers, and which prompt is best. That’s important.
But if your model doesn’t understand the difference between a rule and an exercise—even if you feed it Claude 4 Opus it’s still going to be crap. The model will start building a structure that you didn’t give it.
GitHub link
HuggingFace Demo

How I Built an Offline AI Tutor That Actually Understands Textbooks (LFM2 + RAG)

Ksirailway Base — Sun, 22 Mar 2026 10:46:59 +0000

I was studying English from Murphy's Grammar in Use and kept running into the same problem: every AI I tried wanted to explain grammar like it was giving a TED talk. Long preambles, theatrical examples, confident hallucinations about rules that aren't in my book.

I wanted something simpler. Open the book. Give me exercise 47. Check my answer against the actual rule on that page. No internet. No subscription. No GPU.

The problem I actually had to solve
Standard RAG pipelines chunk blindly. A fill-in-the-blank exercise that spans two pages becomes nonsense after splitting. I wrote a regex parser that extracts exercises directly from the PDF before they touch the LLM — the task is copied from the book, not generated. No hallucinations possible on the exercise itself.

For retrieval: ChromaDB + BM25 hybrid search.
For inference: LFM2-2.6B via llama.cpp. I chose LFM2 specifically because I wanted to test it — see how it handles factual constraint, whether it stays inside the textbook or wanders off. First time using it in a real pipeline. Turns out it's well-behaved on RAM-only hardware.
Conversational memory covers 24 messages (12 per side) — enough to do a full exercise session with follow-up questions without losing context.

What I don't know yet
The parser works for Murphy-style grammar books. I have no idea what happens with math textbooks or scientific papers — that's next on the list.

If you try it on something weird, open an issue and tell me what breaks. And link on HuggingFace Space there