Daniel R. Foster for OptyxStack

Posted on Feb 20 • Edited on Feb 27

We Built a Production-Ready Auto-Reply Chatbot (FastAPI + OpenAI + Hybrid Retrieval)

#ai #opensource #chatbot #machinelearning

We Built a Production-Ready Auto-Reply Chatbot (FastAPI + OpenAI + Hybrid Retrieval)

Most "chatbot tutorials" stop at:

app.py
50 lines of OpenAI calls
No logging
No retrieval
No evaluation
No production thinking

That's not how real systems work.

So We built a production-style auto-reply chatbot using:

FastAPI
OpenAI Chat Completions
OpenAI Embeddings
Hybrid retrieval (vector + keyword ready)
Clean service architecture
Separation of LLM / Retrieval / API layers

Full open-source repo: auto-reply-chatbot (FastAPI + OpenAI + Retrieval)

If you find it useful, consider starring the repo ⭐

What Problem This Solves

If you're building:

Customer support auto-reply
Ticket answering system
Live chat AI
Internal knowledge assistant
RAG-based chatbot

You don't need another toy example.

You need:

Structured backend
Clear LLM gateway
Retrieval service
Embedding pipeline
Production-ready folder layout

That's what this project demonstrates.

Architecture Overview

High-level flow:

API (FastAPI)
   ↓
AnswerService
   ↓
RetrievalService → Embeddings → Vector Search
   ↓
LLM Gateway → OpenAI Chat Completion
   ↓
Final Answer

This separation makes it:

Testable
Replaceable (swap LLM provider easily)
Scalable
Production-friendly

Project Structure

app/
├── api/
│   └── routes/
│       └── conversations.py
├── services/
│   ├── answer_service.py
│   ├── retrieval.py
│   ├── ingestion.py
│   └── llm_gateway.py
├── search/
│   └── embeddings.py
└── main.py

Why this matters?

Most examples mix everything in one file.

This project separates:

API layer
Business logic
Retrieval logic
LLM provider abstraction
Embedding layer

That's how real systems are built.

LLM Layer (Gateway Pattern)

Instead of calling OpenAI directly everywhere:

openai.chat.completions.create(...)

We wrap it in:

llm_gateway.chat(...)

Why?

Because:

You may change models
You may change providers
You may add logging
You may add retry policies
You may measure token cost

This pattern prevents vendor lock-in chaos.

Retrieval + Embeddings

The system uses:

text-embedding-3-small
Vector search flow
Document ingestion pipeline

Two flows exist:

Flow	Description
Ingestion	Document → Chunk → Embed → Store
Retrieval	User Query → Embed → Vector Search → Evidence → LLM

This creates a clean RAG-ready foundation.

Even if you're not using a full vector DB yet, the structure is ready for:

pgvector
Weaviate
Pinecone
Milvus

Why This Repo Is Different

Most repos show:

❌	✅
"Hello world" chatbot	Clear service boundaries
No architecture	Retrieval-first mindset
No layering	LLM abstraction
No production thinking	Ready for RAG
	FastAPI production pattern

🛠 Use Cases

You can extend this into:

SaaS auto-reply platform
AI support desk
AI ticket triage
Enterprise RAG assistant
Multi-tenant AI backend

It's a backend-first design — you can plug any frontend later.

🧪 What You Can Experiment With

Swap GPT-4o → GPT-4o-mini
Add hybrid retrieval (BM25 + vector)
Add eval loop
Add grounding verification
Add cost tracking
Add retry logic and latency control

This repo gives you the skeleton.

You build the muscle.

🚀 Why I Open-Sourced This

Because most AI tutorials skip the hard parts:

Architecture
Reliability
Separation of concerns
Scaling thinking

If you're serious about building AI systems — not just demos — this repo will help.

⭐ GitHub Repository

👉 https://github.com/OptyxStack/rag-knowledge-base-chatbot

If this project helps you:

⭐ Star the repo
🍴 Fork it
🛠 Contribute improvements
🔁 Share it

💡 Future Improvements Planned

Hybrid retrieval implementation
Evaluation pipeline
Cost monitoring
Latency optimization
Tool-calling support
Multi-tenant design

DEV Community

We Built a Production-Ready Auto-Reply Chatbot (FastAPI + OpenAI + Hybrid Retrieval)

We Built a Production-Ready Auto-Reply Chatbot (FastAPI + OpenAI + Hybrid Retrieval)

What Problem This Solves

Architecture Overview

Project Structure

LLM Layer (Gateway Pattern)

Retrieval + Embeddings

Why This Repo Is Different

🛠 Use Cases

🧪 What You Can Experiment With

🚀 Why I Open-Sourced This

⭐ GitHub Repository

💡 Future Improvements Planned

Top comments (0)