Julien L for WiScale

Posted on May 1

VelesDB now has a Haystack connector - build a RAG pipeline with zero infrastructure

#ai #python #rag #tutorial

You want to build a RAG pipeline with Haystack. The first thing the tutorial tells you is to spin up a Docker container for your vector store.

What if you could skip that entirely?

VelesDB now ships a first-party Haystack 2.x DocumentStore, contributed by @CrepuscularIRIS in PR #672. Two pip installs, zero infrastructure, and your Haystack pipeline has a vector backend that runs in-process.

pip install haystack-ai haystack-velesdb

That is the entire setup. No Docker. No server. No config file.

What the integration provides

The haystack-velesdb package gives you a VelesDBDocumentStore that implements the full Haystack DocumentStore protocol:

write_documents() with duplicate policies (SKIP, FAIL, OVERWRITE)
filter_documents() with Haystack's filter syntax
embedding_retrieval() for vector similarity search
count_documents() and delete_documents()
to_dict() / from_dict() for pipeline serialization

It translates Haystack's filter operators (==, !=, >, <, in, not in, AND, OR, NOT) into VelesDB's native filter format automatically.

Step 1: Create a document store

from haystack_velesdb import VelesDBDocumentStore

store = VelesDBDocumentStore(
    path="./my_knowledge_base",
    collection_name="documents",
    embedding_dim=384,
    metric="cosine",
)

Parameter	Default	What it does
`path`	`"./velesdb_haystack"`	Where data lives on disk
`collection_name`	`"haystack_documents"`	Collection identifier
`embedding_dim`	`768`	Must match your embedding model
`metric`	`"cosine"`	Also supports `euclidean` and `dot`

The database is created lazily on first use. No connection string, no authentication.

Step 2: Index documents

from haystack.dataclasses import Document

documents = [
    Document(
        content="Transformers use self-attention to process sequences in parallel.",
        meta={"source": "textbook", "topic": "architecture"},
    ),
    Document(
        content="HNSW is a graph-based algorithm for approximate nearest neighbor search.",
        meta={"source": "paper", "topic": "indexing"},
    ),
    Document(
        content="RAG combines retrieval with generation to ground LLM responses in facts.",
        meta={"source": "blog", "topic": "rag"},
    ),
    Document(
        content="Vector databases store high-dimensional embeddings for similarity search.",
        meta={"source": "docs", "topic": "databases"},
    ),
    Document(
        content="Local-first software works offline and syncs when connectivity returns.",
        meta={"source": "blog", "topic": "architecture"},
    ),
]

Now embed and write them. Haystack handles the embedding step as a pipeline component:

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter

index_pipeline = Pipeline()
index_pipeline.add_component(
    "embedder",
    SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
index_pipeline.add_component("writer", DocumentWriter(document_store=store))
index_pipeline.connect("embedder", "writer")

index_pipeline.run({"embedder": {"documents": documents}})
print(f"Indexed {store.count_documents()} documents")

Indexed 5 documents

Step 3: Build a retrieval pipeline

Haystack's built-in retrievers are bound to InMemoryDocumentStore. For any custom store, you need a thin wrapper component. This is the canonical Haystack 2.x pattern:

from haystack import component
from typing import List

@component
class VelesRetriever:
    def __init__(self, document_store: VelesDBDocumentStore, top_k: int = 5):
        self._store = document_store
        self._top_k = top_k

    @component.output_types(documents=List[Document])
    def run(self, query_embedding: List[float]) -> dict:
        docs = self._store.embedding_retrieval(
            query_embedding, top_k=self._top_k
        )
        return {"documents": docs}

Now wire it into a query pipeline:

from haystack.components.embedders import SentenceTransformersTextEmbedder

query_pipeline = Pipeline()
query_pipeline.add_component(
    "embedder",
    SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
)
query_pipeline.add_component("retriever", VelesRetriever(store, top_k=3))
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")

Step 4: Search

result = query_pipeline.run({"embedder": {"text": "How does similarity search work?"}})

for doc in result["retriever"]["documents"]:
    print(f"[{doc.score:.3f}] {doc.content}")

[0.911] Vector databases store high-dimensional embeddings for similarity search.
[0.856] HNSW is a graph-based algorithm for approximate nearest neighbor search.
[0.820] Transformers use self-attention to process sequences in parallel.

Scores are cosine similarity normalized to [0, 1] by default (scale_score=True).

Handling duplicates

The integration supports all three Haystack duplicate policies:

from haystack.document_stores.types import DuplicatePolicy

# OVERWRITE (default): upsert, last write wins
store.write_documents(documents, policy=DuplicatePolicy.NONE)

# SKIP: keep existing, ignore duplicates
store.write_documents(documents, policy=DuplicatePolicy.SKIP)

# FAIL: raise DuplicateDocumentError if any ID exists
try:
    store.write_documents(documents, policy=DuplicatePolicy.FAIL)
except Exception as e:
    print(f"Duplicate detected: {e}")

This matters for incremental indexing. If your pipeline re-processes documents, SKIP avoids redundant writes while FAIL catches unexpected duplicates early.

Pipeline serialization

Haystack pipelines can be saved to YAML and loaded later. The VelesDB store supports this out of the box:

config = store.to_dict()
print(config["type"])
# haystack_velesdb.document_store.VelesDBDocumentStore

restored = VelesDBDocumentStore.from_dict(config)

This means your pipeline definitions are portable. Save them to version control, share them across environments.

Why combine them?

VelesDB and Haystack solve different halves of the RAG problem. Neither replaces the other.

What VelesDB brings to a Haystack pipeline:

Persistence without infrastructure. Haystack's default InMemoryDocumentStore loses everything when the process exits. VelesDB writes to disk, survives restarts, and needs no running server.
Hybrid search out of the box. HNSW vector search + BM25 full-text search + Reciprocal Rank Fusion, all built into the engine. No need to chain a separate TextSearchRetriever or add a reranker.
A graph engine for GraphRAG. VelesDB includes a native graph store (create_graph_collection, traverse_bfs, get_outgoing). No other Haystack-compatible DocumentStore ships with this.
A 6MB footprint. Where Qdrant, Milvus, or Weaviate need Docker containers and hundreds of megabytes, VelesDB is a single pip install.

What Haystack brings to VelesDB users:

Pipeline orchestration. Chain preprocessors, embedders, retrievers, prompt builders, and LLMs into a single graph. Swap any component without rewriting the rest.
Document preprocessing. DocumentSplitter, DocumentCleaner, and converters for PDF, HTML, DOCX. VelesDB stores vectors, but getting clean chunks from raw files is Haystack's job.
Model abstraction. Switch from all-MiniLM-L6-v2 to nomic-embed-text by changing one line. The rest of the pipeline stays the same.
Serialization and reproducibility. Export a full pipeline to YAML, check it into git, deploy it elsewhere. VelesDB alone has no concept of pipeline definition.

In short: VelesDB is the storage and retrieval engine. Haystack is the orchestration layer that connects it to everything else. Together, they give you a full local RAG stack with no Docker, no API keys, and no cloud dependency.

What is different from the raw VelesDB API?

If you have used VelesDB directly (collection.search(), collection.upsert()), the Haystack integration adds:

String-to-integer ID mapping: Haystack uses string document IDs. VelesDB uses integers. The store handles the SHA-256 mapping transparently.
Filter translation: Haystack's filter DSL ({"operator": "==", "field": "meta.topic", "value": "rag"}) is converted to VelesDB's native filter format.
Score normalization: Cosine similarity scores are scaled from [-1, 1] to [0, 1] for consistency with other Haystack stores.
Duplicate detection: Pre-scan checks before writes when using FAIL policy.

You do not lose anything. The same HNSW index, the same Rust engine, the same sub-millisecond latency. The Haystack layer just makes it composable with the rest of the Haystack ecosystem (preprocessors, generators, routers).

The bigger picture: three frameworks, one engine

VelesDB now has first-party connectors for the three major Python RAG frameworks:

Framework	Package	Pattern
Haystack 2.x	`haystack-velesdb`	DocumentStore
LangChain	`langchain-velesdb`	VectorStore
LlamaIndex	`llama-index-vector-stores-velesdb`	VectorStoreIndex

All three use the same underlying Rust engine. Pick the framework that matches your team's workflow. The data format is the same, so you can even switch frameworks without re-indexing.

Getting started

pip install haystack-ai haystack-velesdb

VelesDB on GitHub - VelesDB Core License 1.0 (based on ELv2). The haystack-velesdb connector itself is MIT-licensed
Haystack integration - README and examples
All VelesDB integrations (Haystack, LangChain, LlamaIndex)

The project is still young. A star on GitHub helps other developers find it, and we are always looking for partners and contributors. Details on velesdb.com.

Thanks to @CrepuscularIRIS for building the initial Haystack integration.

What is your current RAG stack? Are you running Haystack with a Docker-based vector store, or have you gone the embedded route? Drop a comment below.

DEV Community