A few months ago I started building vecdb — a vector database that
runs entirely on your own machine. No cloud, no API keys, no subscription.
The problem
Most vector databases make you choose — semantic search OR keyword search.
Semantic search finds meaning but misses exact keywords. Keyword search
finds exact matches but misses meaning.
vecdb combines both in a two-stage pipeline:
- HNSW dense index retrieves candidates by meaning
- BM25 sparse index re-scores by keyword relevance
- A fusion function combines both scores
What it can do
- Hybrid HNSW + BM25 retrieval
- SQL-like query language with VECTOR_SIM predicate
- Python and TypeScript SDKs
- Single binary, Docker support
- 187 tests
- MIT license
Example query
SELECT * FROM documents
WHERE VECTOR_SIM(vec, [0.1, 0.2, 0.3]) > 0.75
AND payload->>'region' = 'US'
LIMIT 10;
Try it
GitHub: https://github.com/zaydmulani09/vecdb
Would love feedback from the community — especially on the
architecture and what to tackle in v0.2.0.
Top comments (0)