Most vector database tutorials start the same way:
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
That's 500MB+ of Docker image, a running server process, a REST API to talk to, and a container to babysit in production. For what? Storing a few thousand embeddings and doing similarity search.
I've been building AI features for a project where everything runs locally: no cloud, no Docker, no external dependencies. I needed a vector store that I could pip install and forget about. So I built VelesDB, an embedded database written in Rust.
Here's what it looks like in practice.
Setup: one line
pip install velesdb
That's it. No Docker. No config files. No server to start. The entire engine is a ~3MB native binary that ships inside the Python wheel.
Create a database and index documents
import velesdb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # 384 dimensions
db = velesdb.Database("./my_vectors")
collection = db.create_collection("documents", dimension=384, metric="cosine")
texts = [
"Transformers use self-attention to process sequences in parallel.",
"HNSW is a graph-based algorithm for approximate nearest neighbor search.",
"RAG combines retrieval with generation to ground LLM responses in facts.",
"Vector databases store high-dimensional embeddings for similarity search.",
"Knowledge graphs represent relationships between entities as edges.",
"Local-first software works offline and syncs when connectivity returns.",
"Embedding models convert text into dense vector representations.",
]
vectors = model.encode(texts).tolist()
collection.upsert([
{"id": i, "vector": v, "payload": {"text": t}}
for i, (v, t) in enumerate(zip(vectors, texts))
])
That's a working vector database. On disk, it's just a regular directory with a few files: vectors, indexes, and a WAL for crash recovery. No server PID files, no config sprawl.
Search
query = "How does similarity search work?"
query_vec = model.encode(query).tolist()
results = collection.search(vector=query_vec, top_k=3)
for r in results:
print(f"score={r['score']:.4f} → {r['payload']['text']}")
score=0.82 → Vector databases store high-dimensional embeddings for similarity search.
score=0.71 → HNSW is a graph-based algorithm for approximate nearest neighbor search.
score=0.64 → Embedding models convert text into dense vector representations.
Standard cosine similarity search. VelesDB uses HNSW (Hierarchical Navigable Small World) under the hood, same algorithm as most production vector databases.
Where it gets interesting: hybrid search
Most embedded vector stores stop at basic similarity search. VelesDB also includes a BM25 full-text index, so you can combine keyword matching with semantic search:
# Pure keyword search (BM25)
results = collection.text_search("vector database embeddings", top_k=3)
# Hybrid: 70% semantic similarity + 30% keyword matching
results = collection.hybrid_search(
vector=model.encode("fast nearest neighbor algorithms").tolist(),
query="HNSW vector search algorithm",
top_k=3,
vector_weight=0.7
)
This is the same hybrid search pattern that Pinecone and Weaviate charge for. Here it's built into the engine.
Batch and multi-query search
If you're building a RAG pipeline, you often need to run multiple queries at once (maybe one per reformulated question). VelesDB handles this natively:
# Parallel batch search
batch_results = collection.batch_search([
{"vector": model.encode("machine learning models").tolist(), "top_k": 3},
{"vector": model.encode("graph databases").tolist(), "top_k": 3},
])
# Multi-query with result fusion (Reciprocal Rank Fusion)
fused = collection.multi_query_search(
vectors=[
model.encode("vector similarity search").tolist(),
model.encode("nearest neighbor algorithms").tolist(),
model.encode("embedding databases").tolist(),
],
top_k=5,
fusion=velesdb.FusionStrategy.rrf(k=60)
)
RRF fusion is the same technique used by Elasticsearch and Cohere's Rerank. It combines rankings from multiple queries into a single, more robust result set.
The feature nobody else has: built-in knowledge graph
This is why I built VelesDB instead of using Chroma or LanceDB. It has a native graph engine alongside the vector store.
graph = db.create_graph_collection("knowledge", dimension=384)
# Store node metadata
graph.store_node_payload(1, {"name": "Python", "type": "language"})
graph.store_node_payload(2, {"name": "Guido van Rossum", "type": "person"})
graph.store_node_payload(3, {"name": "Rust", "type": "language"})
graph.store_node_payload(4, {"name": "VelesDB", "type": "database"})
# Create edges
graph.add_edge({"id": 1, "source": 1, "target": 2, "label": "CREATED_BY",
"properties": {"year": 1991}})
graph.add_edge({"id": 2, "source": 4, "target": 3, "label": "WRITTEN_IN",
"properties": {"year": 2024}})
graph.add_edge({"id": 3, "source": 4, "target": 1, "label": "HAS_SDK",
"properties": {"version": "1.7.2"}})
# Traverse the graph
outgoing = graph.get_outgoing(4) # What is VelesDB connected to?
for edge in outgoing:
print(f"VelesDB →[{edge['label']}]→ node {edge['target']}")
# BFS traversal
reachable = graph.traverse_bfs(source_id=4, max_depth=2, limit=10)
Why does this matter? Because GraphRAG (combining vector search with graph traversal) is how you get AI agents that understand relationships, not just similarity. Vector search finds documents that look like your query. Graph traversal finds documents that are connected to your results.
With Qdrant or Pinecone, you'd need to bolt on Neo4j or a separate graph database. Here it's one engine, one pip install.
Real benchmarks
I ran these on the actual VelesDB engine (v1.7.2), not synthetic numbers.
Test config: Intel Core i9-14900KF, 64 GB RAM, Windows 11, Python 3.11 (2026-03-26)
| Operation | 10K vectors (384D) | 50K vectors (384D) |
|---|---|---|
| Bulk insert | ~9,000 vectors/sec | ~5,400 vectors/sec |
| Search (top-10, avg) | ~438 µs | ~1,463 µs |
| Search (top-10, p50) | ~409 µs | ~1,117 µs |
| Search (top-10, p99) | ~1,058 µs | ~3,017 µs |
| Database size on disk | 31 MB | 162 MB |
Sub-millisecond search at 10K vectors, ~1ms at 50K. Zero infrastructure, zero network calls.
For comparison, a Qdrant Docker container at rest uses ~200MB of RAM and requires a running gRPC server. VelesDB uses exactly as much memory as your vectors need, and the process exits when your script does.
When to use this (and when not to)
Use VelesDB when:
- Your dataset fits on a single machine (up to a few hundred thousand vectors)
- You need offline/local-first capability
- You can't send data to the cloud (GDPR, healthcare, finance)
- You want zero infrastructure to manage
- You're building GraphRAG or need relationship traversal
Use Qdrant/Pinecone/Weaviate when:
- You need distributed scaling across machines
- You have millions of vectors with multi-tenant isolation
- You want a managed service with built-in monitoring
Getting started
pip install velesdb
import velesdb
db = velesdb.Database("./my_data")
collection = db.create_collection("docs", dimension=384, metric="cosine")
collection.upsert([{"id": 1, "vector": [...], "payload": {"text": "hello"}}])
results = collection.search(vector=[...], top_k=5)
Full docs: velesdb.com/en
GitHub: github.com/cyberlife-coder/VelesDB
If you try it, I'd like to hear what you think, especially if you're coming from a Docker-based vector store. What's your current setup, and what made you choose it?
Top comments (0)