
Stop sending your sensitive datasheets to the cloud. Here is how I deployed a private, enterprise-grade RAG system.
As a Senior Automation Engineer, I deal with hundreds of technical documents every month — datasheets, schematics, internal protocols, and legacy codebases.
We all know the power of LLMs like GPT-4. Being able to ask, “What is the maximum voltage for the RS485 module on page 42?” and getting an instant answer is a game-changer.
But there is a problem: Privacy.
I cannot paste proprietary schematics or NDA-protected specs into ChatGPT. The risk of data leakage is simply too high.
So, I set out to build a solution. I wanted a “Second Brain” that was:
100% Offline: No data leaves my local network.
Free to run: No monthly API subscriptions (bye-bye, OpenAI bills).
Dockerized: Easy to deploy without “dependency hell.”
Here is the architecture I built using Llama 3, Ollama, and Docker.
The Architecture: Why this Tech Stack?
Building a RAG (Retrieval-Augmented Generation) system locally used to be a nightmare of Python dependencies and CUDA driver issues. To solve this, I designed a containerized microservices architecture.
The Brain: Ollama + Llama 3
I chose Ollama as the inference engine because it’s lightweight and efficient. For the model, Meta’s Llama 3 (8B) is the current sweet spot — it’s surprisingly capable of reasoning through technical documentation and runs smoothly on consumer GPUs (like an RTX 3060).The Memory: ChromaDB
For the vector database, I used ChromaDB. It runs locally, requires zero setup, and handles vector retrieval incredibly fast.The Glue: Python & Streamlit
The backend is written in Python, handling the “Ingestion Pipeline”:
Parsing: Extracting text from PDFs.
Chunking: Breaking text into manageable pieces.
Embedding: Converting text into vectors using the mxbai-embed-large model.
UI: A clean Streamlit interface for chatting with the data.
How It Works (The “Happy Path”)
The beauty of this system is the Docker implementation. Instead of installing Python libraries manually, the entire system spins up with a single command.
The docker-compose.yml orchestrates the communication between the AI engine, the database, and the UI.
YAML
Simplified concept of the setup
services:
ollama:
image: ollama/ollama:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
backend:
build: ./app
depends_on:
- ollama
- chromadb
Once running, the workflow is simple:
Drop your PDF files into the knowledge_base folder.
Click “Update Knowledge Base” in the UI.
Start chatting.
The system automatically vectorizes your documents. When you ask a question, it retrieves the most relevant paragraphs and feeds them to Llama 3 as context.
The Challenge: It’s Not Just About “Running” the Model
While the concept sounds simple, getting it to production-grade stability took me weeks of debugging.
Here is what most “Hello World” tutorials don’t tell you:
PDF Parsing is messy: Tables in engineering datasheets often break standard parsers.
Context Window limits: Llama 3 has a limit. You need a smart “Sliding Window” strategy for chunking large documents.
Docker Networking: Getting the Python container to talk to the Ollama container on the host GPU requires specific networking configurations.
I spent countless nights fixing connection timeouts, optimizing embedding models, and ensuring the UI doesn’t freeze during large file ingestions.
Want to Build Your Own?
If you are an engineer or developer who wants to own your data, I highly recommend building a local RAG system. It’s a great way to learn about GenAI architecture.
However, if you value your time and want to skip the configuration headaches, I have packaged my entire setup into a ready-to-deploy solution.
It includes:
✅ The Complete Source Code (Python/Streamlit).
✅ Production-Ready Docker Compose file.
✅ Optimized Ingestion Logic for technical docs.
✅ Setup Guide for Windows/Linux.
You can download the full package and view the detailed documentation on my GitHub.
👉 View the Project & Download Source Code on GitHub [https://github.com/PhilYeh1212/Local-AI-Knowledge-Base-Docker-Llama3/blob/main/README.md]
By Phil Yeh Senior Automation Engineer specializing in Industrial IoT and Local AI solutions.
Top comments (0)