I'm Kirill Strelnikov, a freelance AI/Django developer in Barcelona. I've built RAG chatbots that automated 70% of customer support for e-commerce clients. This is a practical guide to building a production-ready RAG chatbot with Django, pgvector, and OpenAI — not a toy demo, but the actual architecture I use in client projects.
What is RAG and Why It Matters
RAG (Retrieval-Augmented Generation) = vector search + LLM. Instead of hoping the LLM "knows" your business data, you:
- Store your documents as vector embeddings
- When a user asks a question, find the most relevant documents
- Feed those documents to the LLM as context
- The LLM generates an answer based on YOUR data
Why RAG beats fine-tuning for business chatbots:
- No retraining when data changes (just re-embed)
- Works with any LLM (swap GPT-4 for Claude without rebuilding)
- Answers are grounded in real documents (reduces hallucination)
- You can show sources ("Based on: Return Policy, Section 3")
Step 1: Set Up pgvector in Django
pgvector is a PostgreSQL extension for vector similarity search. No separate vector database needed — your embeddings live alongside your regular data.
# Install pgvector on PostgreSQL
# Ubuntu/Debian:
sudo apt install postgresql-16-pgvector
# Or via Docker:
# Use image: pgvector/pgvector:pg16
pip install pgvector django-pgvector
# models.py
from pgvector.django import VectorField
class Document(models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
source = models.CharField(max_length=255) # "faq", "product", "policy"
embedding = VectorField(dimensions=1536) # text-embedding-3-small
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
indexes = [
models.Index(fields=["source"]),
]
def __str__(self):
return self.title
# migration: enable pgvector extension
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [("chatbot", "0001_initial")]
operations = [
migrations.RunSQL(
"CREATE EXTENSION IF NOT EXISTS vector;",
reverse_sql="DROP EXTENSION IF EXISTS vector;"
),
]
Step 2: Embed Your Documents
# embeddings.py
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
"""Get embedding vector for a text string."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def chunk_text(text: str, chunk_size: int = 300, overlap: int = 50) -> list[str]:
"""Split text into overlapping chunks by token count (approximate)."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
if chunk.strip():
chunks.append(chunk)
return chunks
def embed_document(title: str, content: str, source: str):
"""Chunk a document and store each chunk with its embedding."""
chunks = chunk_text(content)
documents = []
for i, chunk in enumerate(chunks):
embedding = get_embedding(chunk)
documents.append(
Document(
title=f"{title} (part {i+1})",
content=chunk,
source=source,
embedding=embedding,
)
)
Document.objects.bulk_create(documents)
return len(documents)
Chunking matters. I use 300-token chunks with 50-token overlap. Too small = lost context. Too large = diluted relevance. This size works well for FAQ and product data. For longer documents (legal, technical docs), I increase to 500 tokens.
Step 3: Vector Search
# search.py
from pgvector.django import CosineDistance
def search_documents(query: str, top_k: int = 5, source: str = None):
"""Find the most relevant documents for a query."""
query_embedding = get_embedding(query)
qs = Document.objects.annotate(
distance=CosineDistance("embedding", query_embedding)
)
if source:
qs = qs.filter(source=source)
return qs.order_by("distance")[:top_k]
pgvector's cosine distance search is fast enough for most business chatbots (sub-100ms for 100K documents). For larger datasets, add an IVFFlat or HNSW index:
# migration for HNSW index (faster search for large datasets)
migrations.RunSQL(
"CREATE INDEX ON chatbot_document USING hnsw (embedding vector_cosine_ops);",
)
Step 4: Generate Answers
# chatbot.py
from openai import OpenAI
client = OpenAI()
SYSTEM_PROMPT = """You are a helpful customer support assistant.
Answer questions using ONLY the provided context.
If the context doesn't contain the answer, say "I don't have information about that. Let me connect you with a human agent."
Always be concise and helpful. Cite the source document when relevant."""
def get_chatbot_response(user_message: str, conversation_history: list = None):
"""Generate a chatbot response using RAG."""
# 1. Retrieve relevant documents
relevant_docs = search_documents(user_message, top_k=5)
# 2. Build context string
context_parts = []
sources = []
for doc in relevant_docs:
context_parts.append(f"[{doc.title}]: {doc.content}")
sources.append(doc.title)
context = "\n\n".join(context_parts)
# 3. Build messages
messages = [
{"role": "system", "content": SYSTEM_PROMPT + f"\n\nContext:\n{context}"}
]
# Add conversation history for multi-turn
if conversation_history:
messages.extend(conversation_history[-6:]) # Last 3 exchanges
messages.append({"role": "user", "content": user_message})
# 4. Generate response
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.3, # Low temperature = more factual
max_tokens=500,
)
answer = response.choices[0].message.content
# 5. Confidence check
min_distance = relevant_docs[0].distance if relevant_docs else 1.0
needs_escalation = min_distance > 0.4 # Threshold tuned per project
return {
"answer": answer,
"sources": sources,
"needs_escalation": needs_escalation,
"confidence": round(1 - min_distance, 2),
}
Step 5: Django REST API
# views.py
from rest_framework.decorators import api_view
from rest_framework.response import Response
@api_view(["POST"])
def chat(request):
message = request.data.get("message", "").strip()
session_id = request.data.get("session_id")
if not message:
return Response({"error": "Message required"}, status=400)
# Get conversation history from session
history = get_session_history(session_id)
# Generate response
result = get_chatbot_response(message, history)
# Save to history
save_to_history(session_id, message, result["answer"])
# If low confidence, notify human agent
if result["needs_escalation"]:
notify_agent(session_id, message, result)
return Response({
"answer": result["answer"],
"sources": result["sources"],
"confidence": result["confidence"],
})
Step 6: Keep Embeddings Fresh
# tasks.py (Celery)
from celery import shared_task
@shared_task
def refresh_product_embeddings():
"""Re-embed products that changed since last sync."""
from shop.models import Product
for product in Product.objects.filter(updated_at__gte=last_sync_time()):
content = f"{product.name}. {product.description}. Price: {product.price} EUR."
# Delete old embeddings
Document.objects.filter(
source="product",
title__startswith=product.name
).delete()
# Create new ones
embed_document(product.name, content, source="product")
Schedule this with Celery Beat to run hourly or on product updates.
Production Results
From my e-commerce client project:
| Metric | Value |
|---|---|
| Documents embedded | ~2,000 chunks |
| Avg search latency | 45ms |
| Answer accuracy | ~92% (human-evaluated) |
| Support automation rate | 70% |
| Conversion rate increase | +35% |
| Monthly API cost | EUR 50-80 |
The 70% automation rate means 7 out of 10 customer questions are answered correctly without human intervention. The remaining 30% get escalated with full context, so the human agent can resolve them faster too.
Common Pitfalls
Embedding stale data. If your product catalog changes, your chatbot answers are wrong. Automate re-embedding.
No confidence threshold. Without escalation logic, the chatbot will confidently hallucinate. Always add a distance threshold.
Ignoring conversation context. A user asking "what about the blue one?" after asking about a dress needs multi-turn context. Pass conversation history.
Using a separate vector DB for small datasets. pgvector handles 100K+ documents easily. You don't need Pinecone or Weaviate until you hit millions.
Cost to Build
| Scope | Timeline | Cost |
|---|---|---|
| Basic RAG chatbot (FAQ only) | 1-2 weeks | EUR 800-1,500 |
| RAG + product catalog + CRM | 2-4 weeks | EUR 1,500-3,000 |
| Multi-channel (web + Telegram + WhatsApp) | 4-6 weeks | EUR 3,000-5,000 |
Detailed pricing: AI Chatbot Development Cost
I'm Kirill Strelnikov — I build production RAG chatbots, SaaS platforms, and Telegram bots as a freelance developer in Barcelona, Spain. 15+ projects delivered, EU-based, GDPR-compliant.
- Website: kirweb.site
- Telegram: @KirBcn
- More AI case studies: kirweb.site/ai-chatbots
Top comments (0)