You want to build a RAG pipeline with Haystack. The first thing the tutorial tells you is to spin up a Docker container for your vector store.
What if you could skip that entirely?
VelesDB now ships a first-party Haystack 2.x DocumentStore, contributed by @CrepuscularIRIS in PR #672. Two pip installs, zero infrastructure, and your Haystack pipeline has a vector backend that runs in-process.
pip install haystack-ai haystack-velesdb
That is the entire setup. No Docker. No server. No config file.
What the integration provides
The haystack-velesdb package gives you a VelesDBDocumentStore that implements the full Haystack DocumentStore protocol:
-
write_documents()with duplicate policies (SKIP, FAIL, OVERWRITE) -
filter_documents()with Haystack's filter syntax -
embedding_retrieval()for vector similarity search -
count_documents()anddelete_documents() -
to_dict()/from_dict()for pipeline serialization
It translates Haystack's filter operators (==, !=, >, <, in, not in, AND, OR, NOT) into VelesDB's native filter format automatically.
Step 1: Create a document store
from haystack_velesdb import VelesDBDocumentStore
store = VelesDBDocumentStore(
path="./my_knowledge_base",
collection_name="documents",
embedding_dim=384,
metric="cosine",
)
| Parameter | Default | What it does |
|---|---|---|
path |
"./velesdb_haystack" |
Where data lives on disk |
collection_name |
"haystack_documents" |
Collection identifier |
embedding_dim |
768 |
Must match your embedding model |
metric |
"cosine" |
Also supports euclidean and dot
|
The database is created lazily on first use. No connection string, no authentication.
Step 2: Index documents
from haystack.dataclasses import Document
documents = [
Document(
content="Transformers use self-attention to process sequences in parallel.",
meta={"source": "textbook", "topic": "architecture"},
),
Document(
content="HNSW is a graph-based algorithm for approximate nearest neighbor search.",
meta={"source": "paper", "topic": "indexing"},
),
Document(
content="RAG combines retrieval with generation to ground LLM responses in facts.",
meta={"source": "blog", "topic": "rag"},
),
Document(
content="Vector databases store high-dimensional embeddings for similarity search.",
meta={"source": "docs", "topic": "databases"},
),
Document(
content="Local-first software works offline and syncs when connectivity returns.",
meta={"source": "blog", "topic": "architecture"},
),
]
Now embed and write them. Haystack handles the embedding step as a pipeline component:
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
index_pipeline = Pipeline()
index_pipeline.add_component(
"embedder",
SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
index_pipeline.add_component("writer", DocumentWriter(document_store=store))
index_pipeline.connect("embedder", "writer")
index_pipeline.run({"embedder": {"documents": documents}})
print(f"Indexed {store.count_documents()} documents")
Indexed 5 documents
Step 3: Build a retrieval pipeline
Haystack's built-in retrievers are bound to InMemoryDocumentStore. For any custom store, you need a thin wrapper component. This is the canonical Haystack 2.x pattern:
from haystack import component
from typing import List
@component
class VelesRetriever:
def __init__(self, document_store: VelesDBDocumentStore, top_k: int = 5):
self._store = document_store
self._top_k = top_k
@component.output_types(documents=List[Document])
def run(self, query_embedding: List[float]) -> dict:
docs = self._store.embedding_retrieval(
query_embedding, top_k=self._top_k
)
return {"documents": docs}
Now wire it into a query pipeline:
from haystack.components.embedders import SentenceTransformersTextEmbedder
query_pipeline = Pipeline()
query_pipeline.add_component(
"embedder",
SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
query_pipeline.add_component("retriever", VelesRetriever(store, top_k=3))
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")
Step 4: Search
result = query_pipeline.run({"embedder": {"text": "How does similarity search work?"}})
for doc in result["retriever"]["documents"]:
print(f"[{doc.score:.3f}] {doc.content}")
[0.911] Vector databases store high-dimensional embeddings for similarity search.
[0.856] HNSW is a graph-based algorithm for approximate nearest neighbor search.
[0.820] Transformers use self-attention to process sequences in parallel.
Scores are cosine similarity normalized to [0, 1] by default (scale_score=True).
Handling duplicates
The integration supports all three Haystack duplicate policies:
from haystack.document_stores.types import DuplicatePolicy
# OVERWRITE (default): upsert, last write wins
store.write_documents(documents, policy=DuplicatePolicy.NONE)
# SKIP: keep existing, ignore duplicates
store.write_documents(documents, policy=DuplicatePolicy.SKIP)
# FAIL: raise DuplicateDocumentError if any ID exists
try:
store.write_documents(documents, policy=DuplicatePolicy.FAIL)
except Exception as e:
print(f"Duplicate detected: {e}")
This matters for incremental indexing. If your pipeline re-processes documents, SKIP avoids redundant writes while FAIL catches unexpected duplicates early.
Pipeline serialization
Haystack pipelines can be saved to YAML and loaded later. The VelesDB store supports this out of the box:
config = store.to_dict()
print(config["type"])
# haystack_velesdb.document_store.VelesDBDocumentStore
restored = VelesDBDocumentStore.from_dict(config)
This means your pipeline definitions are portable. Save them to version control, share them across environments.
Why combine them?
VelesDB and Haystack solve different halves of the RAG problem. Neither replaces the other.
What VelesDB brings to a Haystack pipeline:
-
Persistence without infrastructure. Haystack's default
InMemoryDocumentStoreloses everything when the process exits. VelesDB writes to disk, survives restarts, and needs no running server. -
Hybrid search out of the box. HNSW vector search + BM25 full-text search + Reciprocal Rank Fusion, all built into the engine. No need to chain a separate
TextSearchRetrieveror add a reranker. -
A graph engine for GraphRAG. VelesDB includes a native graph store (
create_graph_collection,traverse_bfs,get_outgoing). No other Haystack-compatible DocumentStore ships with this. - A 6MB footprint. Where Qdrant, Milvus, or Weaviate need Docker containers and hundreds of megabytes, VelesDB is a single pip install.
What Haystack brings to VelesDB users:
- Pipeline orchestration. Chain preprocessors, embedders, retrievers, prompt builders, and LLMs into a single graph. Swap any component without rewriting the rest.
-
Document preprocessing.
DocumentSplitter,DocumentCleaner, and converters for PDF, HTML, DOCX. VelesDB stores vectors, but getting clean chunks from raw files is Haystack's job. -
Model abstraction. Switch from
all-MiniLM-L6-v2tonomic-embed-textby changing one line. The rest of the pipeline stays the same. - Serialization and reproducibility. Export a full pipeline to YAML, check it into git, deploy it elsewhere. VelesDB alone has no concept of pipeline definition.
In short: VelesDB is the storage and retrieval engine. Haystack is the orchestration layer that connects it to everything else. Together, they give you a full local RAG stack with no Docker, no API keys, and no cloud dependency.
What is different from the raw VelesDB API?
If you have used VelesDB directly (collection.search(), collection.upsert()), the Haystack integration adds:
- String-to-integer ID mapping: Haystack uses string document IDs. VelesDB uses integers. The store handles the SHA-256 mapping transparently.
-
Filter translation: Haystack's filter DSL (
{"operator": "==", "field": "meta.topic", "value": "rag"}) is converted to VelesDB's native filter format. - Score normalization: Cosine similarity scores are scaled from [-1, 1] to [0, 1] for consistency with other Haystack stores.
-
Duplicate detection: Pre-scan checks before writes when using
FAILpolicy.
You do not lose anything. The same HNSW index, the same Rust engine, the same sub-millisecond latency. The Haystack layer just makes it composable with the rest of the Haystack ecosystem (preprocessors, generators, routers).
The bigger picture: three frameworks, one engine
VelesDB now has first-party connectors for the three major Python RAG frameworks:
| Framework | Package | Pattern |
|---|---|---|
| Haystack 2.x | haystack-velesdb |
DocumentStore |
| LangChain | langchain-velesdb |
VectorStore |
| LlamaIndex | llama-index-vector-stores-velesdb |
VectorStoreIndex |
All three use the same underlying Rust engine. Pick the framework that matches your team's workflow. The data format is the same, so you can even switch frameworks without re-indexing.
Getting started
pip install haystack-ai haystack-velesdb
-
VelesDB on GitHub - VelesDB Core License 1.0 (based on ELv2). The
haystack-velesdbconnector itself is MIT-licensed - Haystack integration - README and examples
- All VelesDB integrations (Haystack, LangChain, LlamaIndex)
The project is still young. A star on GitHub helps other developers find it, and we are always looking for partners and contributors. Details on velesdb.com.
Thanks to @CrepuscularIRIS for building the initial Haystack integration.
What is your current RAG stack? Are you running Haystack with a Docker-based vector store, or have you gone the embedded route? Drop a comment below.
Top comments (0)