Specialized Chatbot using RAG — Part III

#ai #webdev #programming #productivity

Alright — this is where things finally get interesting. In Part II, we already prepared everything:

Our documents are processed
Chunks are created
Embeddings are stored inside ChromaDB

So technically… our chatbot already has knowledge. But here’s the problem: it still doesn’t know how to use it. Right now, our system is just a “smart storage”. Not a “smart chatbot”. In this part, we’re going to fix that.

From Storage → Intelligence

Let’s quickly recall how RAG actually works. Instead of answering directly, the system:

Converts the user question into an embedding
Searches the vector database
Retrieves relevant chunks
Sends them to the LLM as context
Generates an answer

This is what transforms a normal chatbot into a domain-specific assistant.

Step 1 — Converting the User Query into an Embedding

When a user asks something, we don’t send it directly to the model. We first convert it into a vector.

query = "How much is cash balances (Kas) for 2024?"
query_embedding = embed([query])[0]

Why? Because our database doesn’t understand raw text — it understands vectors.

Step 2 — Searching the Vector Database

Now we use that embedding to find the most relevant chunks.

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=3
)

This will return the top 3 most relevant pieces of text.

Step 3 — Preparing Context

Combine those chunks into a single context to provide to the LLM.

context = "\n\n".join(results)
# or, if `results` is a list of dicts with a 'text' field:
# context = "\n\n".join([r['text'] for r in results])

Step 4 — Sending Context to the LLM

Pass the combined context plus the original user question to the LLM so it can generate an informed, domain-specific answer.

prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
response = llm.generate(prompt)

Step 5 — Generating the Answer

The LLM uses the retrieved context to produce a response grounded in your documents. This is the step that turns a plain vector store into an intelligent, domain-aware chatbot.

That’s the core RAG flow — convert query → retrieve relevant chunks → provide context → generate answer. In the next part we’ll look at optimizing retrieval quality, prompt engineering, and handling long contexts.

If you want to try it out yourself, check out Nebula Lab here: https://ai-nebula.com/.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.