DEV Community

Cover image for Specialized Chatbot using RAG — Part III
NEBULA DATA
NEBULA DATA

Posted on

Specialized Chatbot using RAG — Part III

Alright — this is where things finally get interesting. In Part II, we already prepared everything:

  • Our documents are processed

  • Chunks are created

  • Embeddings are stored inside ChromaDB

So technically… our chatbot already has knowledge. But here’s the problem: it still doesn’t know how to use it. Right now, our system is just a “smart storage”. Not a “smart chatbot”. In this part, we’re going to fix that.

From Storage → Intelligence

Let’s quickly recall how RAG actually works. Instead of answering directly, the system:

  1. Converts the user question into an embedding

  2. Searches the vector database

  3. Retrieves relevant chunks

  4. Sends them to the LLM as context

  5. Generates an answer

This is what transforms a normal chatbot into a domain-specific assistant.


Step 1 — Converting the User Query into an Embedding

When a user asks something, we don’t send it directly to the model. We first convert it into a vector.

query = "How much is cash balances (Kas) for 2024?"
query_embedding = embed([query])[0]
Enter fullscreen mode Exit fullscreen mode

Why? Because our database doesn’t understand raw text — it understands vectors.


Step 2 — Searching the Vector Database

Now we use that embedding to find the most relevant chunks.

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=3
)
Enter fullscreen mode Exit fullscreen mode

This will return the top 3 most relevant pieces of text.


Step 3 — Preparing Context

Combine those chunks into a single context to provide to the LLM.

context = "\n\n".join(results)
# or, if `results` is a list of dicts with a 'text' field:
# context = "\n\n".join([r['text'] for r in results])
Enter fullscreen mode Exit fullscreen mode

Step 4 — Sending Context to the LLM

Pass the combined context plus the original user question to the LLM so it can generate an informed, domain-specific answer.

prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
response = llm.generate(prompt)
Enter fullscreen mode Exit fullscreen mode

Step 5 — Generating the Answer

The LLM uses the retrieved context to produce a response grounded in your documents. This is the step that turns a plain vector store into an intelligent, domain-aware chatbot.


That’s the core RAG flow — convert query → retrieve relevant chunks → provide context → generate answer. In the next part we’ll look at optimizing retrieval quality, prompt engineering, and handling long contexts.

If you want to try it out yourself, check out Nebula Lab here: https://ai-nebula.com/.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.