Conversational AI hasn’t just evolved in the past few years — it’s quietly staged a revolution.
We’ve gone from basic intent classification and rigid, rule‑based scripts to Large Language Models (LLMs) that can hold conversations with fluency.
And yet — in real‑world production environments where accuracy, trustworthiness, and scalability matter — LLMs alone aren’t enough. The real magic happens when you blend their creativity with structured, reliable systems.
In this post, we’ll explore how Rasa, LLMs, and RAG (Retrieval‑Augmented Generation) can work together to build chatbots that are natural, reliable, and grounded.
Bridging the Gap: Why Mix Rasa, LLMs, and RAG?
LLMs are brilliant at generating smooth, coherent replies — but they can “hallucinate” when they don’t know the answer, especially in regulated or domain‑specific contexts. Rasa, on the other hand, excels at predictable control — intent classification, slot filling, business logic, and integrating with APIs. RAG acts as the glue, retrieving trusted documents, FAQs, or knowledge base entries, feeding them into the LLM so responses remain factual and grounded in the organization’s data.
This is where:
Rasa ⟶ gives you predictable control: intent classification, slot filling, API calls, and workflow logic.
RAG (Retrieval‑Augmented Generation) ⟶ injects facts by retrieving trusted answers from your documentation, FAQs, or knowledge base.
LLM ⟶ uses both user input and retrieved context to generate natural, grounded responses.
The Architecture at a Glance
You can think of this trio as a human‑like conversation team inside your chatbot:
- Rasa is the executive planner — listening to the user, interpreting their intent, and deciding which pathway to follow.
- RAG plays the research assistant — diving into the knowledge base and coming back with the most relevant facts.
- LLM is the storyteller — presenting those facts in a friendly, human‑like tone.
A typical workflow looks like this:
- The user asks a question.
- Rasa analyses the intent and determines whether this is a standard flow (e.g., “Book me a meeting”) or a factual query (“What’s your refund policy?”).
- For factual queries, Rasa triggers the RAG pipeline.
- The query is converted into an embedding vector, similar content is fetched from a FAISS index or other vector storage.
- Retrieved data is inserted into an LLM prompt so the answer is grounded in the right context.
- The LLM generates a polished response, and Rasa sends it back to the user.
Production Benefits
By combining these components together, you create a balanced conversational system: Rasa provides structured, predictable dialogue control; RAG contributes accurate, context‑aware information; and the LLM adds natural, human‑like interaction. This blend minimizes misleading responses, handles complex queries with context retention, scales gracefully as your knowledge base grows, and allows new features to be introduced without disrupting existing behavior.
Minimal Working Example (Rasa + FAISS RAG Custom Action)
# Pseudo-code: Rasa + FAISS RAG Custom Action
load embedding_model("all-MiniLM-L6-v2")
load faiss_index("kb.index") # Built from company knowledge base
class ActionRAGLookup:
function name():
return "action_rag_lookup"
function run(dispatcher, tracker, domain):
user_query = tracker.get_latest_message_text()
query_vector = embedding_model.encode(user_query)
distances, indices = faiss_index.search(query_vector, top_k=1)
best_answer = read_from_file("kb_answers.txt")[indices[0]]
dispatcher.send_message(best_answer)
return no_events
Minimal Working Example (Pseudo‑Code)
load embedding_model("all-MiniLM-L6-v2")
load faiss_index("kb.index") # Built from company knowledge base
class ActionRAGLookup:
function name():
return "action_rag_lookup"
function run(dispatcher, tracker, domain):
user_query = tracker.get_latest_message_text()
query_vector = embedding_model.encode(user_query)
distances, indices = faiss_index.search(query_vector, top_k=1)
best_answer = read_from_file("kb_answers.txt")[indices[0]]
dispatcher.send_message(best_answer)
return no_events
Step‑by‑Step Implementation
1. Set up Rasa
pip install rasa
rasa init
This creates a default bot with intents, domain, and sample actions.
2. Prepare your Knowledge Base
- Gather Q&A pairs from your docs.
- Generate embeddings (
all-MiniLM-L6-v2
). - Build and save FAISS index (
kb.index
) and answers file.
3. Create the Custom RAG Action
- Add retrieval logic like in
ActionRAGLookup
. - Encode query, search index, return top match.
4. Update domain.yml
- Add
action_rag_lookup
underactions
. - Declare needed intents (e.g.,
ask_question
). - Optional: add
utter_fallback
for low‑confidence cases.
5. Train and Run
rasa train
rasa run # Terminal 1 — starts bot server
rasa run actions # Terminal 2 — starts action server
6. Test
rasa shell
Type a query → Rasa processes it → RAG retrieves matching answer → LLM formats and sends it back.
Domain — The Blueprint of Your Bot
File: domain.yml
Think of the domain as your chatbot’s ID card + skill list.
It tells Rasa:
- Who you are (list of possible responses the bot knows)
- What you can do (custom actions, forms)
- What you understand (intents & entities you’re trained to recognize)
- What you remember (slots that hold values through a conversation)
A typical structure:
intents:
- greet
- ask_price
entities:
- product_name
slots:
product_name:
type: text
responses:
utter_greet:
- text: "Hi there! How can I help you?"
utter_ask_price:
- text: "How much this product costs?"
actions:
- action_rag_lookup
The domain is like the API surface of your bot. If your bot’s ability or response isn’t declared here, Rasa won’t touch it — even if it’s coded elsewhere.
Stories — Example Conversations Rasa Learns From
File: stories.yml
Stories are like training scripts for actors — they’re examples of how a conversation might go from start to finish.
- They teach Rasa’s dialogue management how to react over multiple turns.
- Based on these examples, Rasa learns patterns for fluid, non‑rigid conversations.
Example:
stories:
- story: user asks a factual question
steps:
- intent: ask_question
- action: action_rag_lookup
Here, we’re saying:
“If the user intent is ask_question, the next step in the conversation is to run our RAG retrieval action.”
The more varied but plausible your stories, the smarter Rasa becomes at guessing what to do when a new but similar situation happens.
Rules — Fixed Conversation Logic
File: rules.yml
Rules are if‑this‑then‑that triggers, for when you want absolute determinism.
Unlike stories, rules don’t let Rasa “improvise” — they always fire exactly as defined.
Example:
rules:
- rule: Handle fallback
steps:
- intent: nlu_fallback
- action: utter_fallback
Meaning:
“Every time my NLU confidence is low, trigger
utter_fallback
.”
You use rules for:
- Mandatory flows (always confirm a user’s email before proceeding)
- Hard fallbacks (don’t let the bot guess)
- Compliance scripts (legal disclaimers, medical advice warnings)
What happens:
- This launches an interactive CLI where you type messages as a user.
- Rasa will process them through your NLU pipeline, then — if your logic routes to the RAG action — it will:
- Embed your text
- Retrieve the top match from FAISS
- Send that as the bot’s reply.
Top comments (0)