Mārtiņš Veiss

Posted on May 11

AutoBot's RAG Pipeline Internals — A Python Developer's Guide

#ai #rag #python #opensource

If you've been watching the local-AI space lately, you've probably seen OpenClaw land 100k GitHub stars on the back of autonomous agents that build their own tools, their own social networks, and — if you're not careful — their own threat models.

AutoBot takes a different approach: you stay in control. Your data never leaves your machine. Your AI runs on your hardware. And the knowledge base — the thing that makes your local AI actually useful — is something you can read, extend, and contribute to.

This post is for Python developers who want to understand exactly how that knowledge base works, how to feed it your own codebase, and where to plug in if you want to help build it.

The Stack at a Glance

AutoBot's RAG pipeline is built on three components:

Layer	Technology	Role
Embedding model	Ollama (configurable)	Text → vectors
Vector store	ChromaDB	Similarity search
Retrieval + generation	LlamaIndex	Query → answer

All of it runs locally. No API calls. No data leaving your machine.

The main module lives at autobot-backend/knowledge/. The legacy knowledge_base.py at the backend root is a thin re-export shim — all real logic is in the knowledge/ package.

End-to-End Pipeline Walk-Through

1. Document Ingestion

Entry point: knowledge/documents.py — DocumentsMixin.add_document()

# knowledge/documents.py
async def add_document(
    self,
    content: str,
    metadata: Dict[str, Any] = None,
    doc_id: Optional[str] = None,
) -> Dict[str, Any]:
    """Add a document to the knowledge base with async processing."""

When you drop a file into AutoBot, this is what happens:

Content arrives — plain text, Markdown, or PDF.
Chunking — the document is split into overlapping chunks so context is preserved at retrieval time.
Embedding — each chunk is converted to a 768-dimensional float vector by the configured Ollama model.
Storage — vectors + original text land in ChromaDB, keyed by a stable document ID.

The embedding call goes through knowledge/embedding_cache.py (EmbeddingCache), which deduplicates repeated content and tracks usage via api/analytics_embedding_patterns.py (Issue #285). Cache hits skip the Ollama round-trip entirely — useful when you re-index after editing a doc.

2. Index Configuration

Entry point: knowledge/index.py — IndexMixin

ChromaDB uses HNSW (Hierarchical Navigable Small World) for approximate nearest-neighbour search. AutoBot exposes the tuning parameters directly:

# knowledge/index.py
def _get_hnsw_metadata(self) -> Dict[str, Any]:
    return {
        "hnsw:space": self.hnsw_space,           # distance metric (cosine by default)
        "hnsw:construction_ef": self.hnsw_construction_ef,
        "hnsw:search_ef": self.hnsw_search_ef,
        "hnsw:M": self.hnsw_m,
    }

The current defaults are tuned for collections with 545k+ vectors (Issue #72). If you're running on modest hardware with a small KB, you can tighten hnsw:M to reduce memory pressure.

All ChromaDB calls are wrapped with asyncio.to_thread() (Issue #369) to keep the FastAPI event loop unblocked — something to be aware of if you're adding new index operations.

3. Query → Answer

Entry point: knowledge/base.py — KnowledgeBaseCore

On query:

The question is embedded with the same Ollama model used at ingestion (same vector space = valid similarity).
HNSW search finds the top-k most similar chunks.
The chunks are passed to LlamaIndex as context alongside the query.
LlamaIndex sends the augmented prompt to the local Ollama LLM.
The answer references your documents, not generic training data.

# knowledge/base.py — core wiring
from llama_index.core import Settings, VectorStoreIndex
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.vector_stores.chroma import ChromaVectorStore

4. Advanced Retrieval

Entry point: autobot-backend/advanced_rag_optimizer.py

For complex queries, AutoBot can upgrade from plain vector search to a hybrid pipeline:

Hybrid scoring — blends semantic similarity (HNSW cosine) with BM25 keyword score via knowledge/search_components/reranking.py.
Query expansion — reformulates the question to improve recall on technical vocabulary mismatches.
MAP-Elites diversification — ensures results span multiple knowledge categories rather than returning near-duplicate chunks.
GPU acceleration — utils/semantic_chunker_gpu.py uses RTX 4070 / OpenVINO where available.

The SearchResult dataclass in advanced_rag_optimizer.py carries both the raw content and all four score dimensions (semantic_score, keyword_score, hybrid_score, rerank_score) — useful if you want to instrument retrieval quality.

5. Background Vectorization

Entry point: autobot-backend/background_vectorization.py — BackgroundVectorizer

When you add new facts or documents while AutoBot is running, BackgroundVectorizer picks them up asynchronously via FastAPI background tasks. You don't have to trigger a full re-index — the KB stays live.

Feeding Your Codebase to the Knowledge Base

AutoBot has a dedicated CodeEmbeddingGenerator (autobot-backend/code_embedding_generator.py) that uses CodeBERT instead of a generic text embedding model. Code has different semantics than prose — function names, types, and structure matter — and CodeBERT is trained on code.

# code_embedding_generator.py
@dataclass
class CodeEmbeddingResult:
    embedding: np.ndarray       # 768-dim CodeBERT vector
    device_used: str            # 'npu', 'cuda', or 'cpu'
    processing_time_ms: float
    model_name: str
    cache_hit: bool

To index your codebase:

Option 1 — Via the chat UI

You: Index the ./src directory into the knowledge base
AutoBot: ✓ Scanning ./src...
         Indexed 847 functions across 63 files
         Embedding device: NPU (OpenVINO)
         Ready for semantic code search

Option 2 — Via the connector system

The knowledge/connectors/ directory has a registry (registry.py) and a scheduler (scheduler.py). You can register a file-server connector pointing at your repo root and let AutoBot watch for changes:

# knowledge/connectors/file_server.py
# Register your source directory as a watched connector
connector = FileServerConnector(
    root_path="/path/to/your/repo",
    watch=True,
    file_extensions=[".py", ".md", ".yaml"],
)

Option 3 — Notion, web, database

Connectors also exist for Notion (notion.py), web crawl (web_crawler.py), audio (audio_connector.py), and database (database.py). The base class is knowledge/connectors/base.py — implement fetch() and register via registry.py.

Where to Plug In: Contributing to the KB Engine

Here are the cleanest entry points for first contributions:

`knowledge/documents.py` — DocumentsMixin

Good for: adding new file format support (EPUB, HTML, DOCX), improving chunking strategy.

The add_document() and related methods are well-isolated. A chunking improvement here applies to every ingestion path.

`knowledge/connectors/` — Connector Registry

Good for: adding new data sources (GitHub issues, Jira, Slack export).

Implement the BaseConnector interface and register in registry.py. Look at notion.py for a reference implementation with authentication handling.

`advanced_rag_optimizer.py` — Hybrid Search

Good for: retrieval quality improvements, new reranking strategies, better query expansion.

The SearchResult + QueryContext dataclasses are clean — adding a new scoring dimension means extending the dataclass and wiring it into compute_blended_score() in knowledge/search_components/reranking.py.

`knowledge/index.py` — HNSW Tuning

Good for: performance work on large vector collections, memory footprint reduction.

The HNSW parameter exposure is deliberately simple. There's room for adaptive tuning based on collection size and hardware profile.

`background_vectorization.py` — BackgroundVectorizer

Good for: incremental sync improvements, smarter deduplication, conflict resolution when a connector and a manual upload touch the same document.

Running the KB Locally

# Clone and start the full stack
git clone https://github.com/mrveiss/AutoBot-AI
cd AutoBot-AI
docker compose up -d

# Or use the installer script
curl -fsSL https://raw.githubusercontent.com/mrveiss/AutoBot-AI/Dev_new_gui/install.sh | bash

The knowledge base stores vectors in ./data/chromadb/ by default. It persists across container restarts.

To run just the backend in dev mode:

cd autobot-backend
pip install -r requirements.txt
uvicorn app_factory:create_app --factory --reload --port 8000

Where to Go Next

If you want to contribute to the Python side:

Good first issues (Python label): github.com/mrveiss/AutoBot-AI/labels/python
All good first issues: github.com/mrveiss/AutoBot-AI/labels/good%20first%20issue
Contributing guide: CONTRIBUTING.md
GitHub Discussions: github.com/mrveiss/AutoBot-AI/discussions

If this article saved you an hour of reading source code, you can buy me a coffee on Ko-fi — it goes directly toward hardware time for the project.

AutoBot is free, open source, and runs entirely on your hardware. The RAG pipeline is the core of what makes a local AI assistant actually useful — and it's a great place to dig in.

Your data. Your AI.

→ github.com/mrveiss/AutoBot-AI

DEV Community

AutoBot's RAG Pipeline Internals — A Python Developer's Guide

The Stack at a Glance

End-to-End Pipeline Walk-Through

1. Document Ingestion

2. Index Configuration

3. Query → Answer

4. Advanced Retrieval

5. Background Vectorization

Feeding Your Codebase to the Knowledge Base

Where to Plug In: Contributing to the KB Engine

`knowledge/documents.py` — DocumentsMixin

`knowledge/connectors/` — Connector Registry

`advanced_rag_optimizer.py` — Hybrid Search

`knowledge/index.py` — HNSW Tuning

`background_vectorization.py` — BackgroundVectorizer

Running the KB Locally

Where to Go Next

Top comments (0)

The Stack at a Glance

End-to-End Pipeline Walk-Through

1. Document Ingestion

2. Index Configuration

3. Query → Answer

4. Advanced Retrieval

5. Background Vectorization

Feeding Your Codebase to the Knowledge Base

Where to Plug In: Contributing to the KB Engine

knowledge/documents.py — DocumentsMixin

knowledge/connectors/ — Connector Registry

advanced_rag_optimizer.py — Hybrid Search

knowledge/index.py — HNSW Tuning

background_vectorization.py — BackgroundVectorizer

Running the KB Locally

Where to Go Next

`knowledge/documents.py` — DocumentsMixin

`knowledge/connectors/` — Connector Registry

`advanced_rag_optimizer.py` — Hybrid Search

`knowledge/index.py` — HNSW Tuning

`background_vectorization.py` — BackgroundVectorizer