Mihir Ahuja

Posted on Aug 5, 2025

Swap Vector Databases by Changing a URL: Meet vectorwrap

tl;dr vectorwrap lets you migrate from an in-memory SQLite prototype to a production pgvector cluster by changing only the connection URL – all your insert & search code stays the same.

The Problem

Every vector database ships its own client, table schema, and API quirks.

A side-project that starts on DuckDB or SQLite turns into a week-long rewrite when you graduate to Postgres or MySQL HeatWave.

Wouldn't it be nicer if the only thing you had to edit was the connection string?

Enter vectorwrap.

Why bother? (The migration tax)

I built a toy semantic-search feature for a Flask app on in-memory SQLite.

When the project graduated to staging we needed:

Durable storage ➜ PostgreSQL + pgvector
Production-grade indexing ➜ HNSW
Simple CI tests ➜ DuckDB (fast + no server)

Three back-ends meant three client libraries, three table DDLs, and dozens of if db == "postgres": … branches. That's when vectorwrap was born.

30-second Quick Start

pip install "vectorwrap[all]"     # pgvector, HeatWave, SQLite-VSS, DuckDB-VSS

from vectorwrap import VectorDB

def embed(txt): return [0.1] * 768      # plug your own embeddings here

db = VectorDB("sqlite:///:memory:")      # 1️⃣ prototype
db.create_collection("docs", 768)
db.upsert("docs", 1, embed("hello world"), {"lang": "en"})
print(db.query("docs", embed("hello"), top_k=1))

db = VectorDB("postgresql://u:p@localhost/vectors")   # 2️⃣ prod swap
print(db.query("docs", embed("hello"), top_k=1))

No new imports, no table DDL, just the URL.

VectorDB is a thin façade that parses the URL scheme and dispatches to a backend driver:

VectorDB("postgresql://…") ─┬─> PgVectorDriver
                           ├─> MySQLVectorDriver
                           ├─> SQLiteVSSDriver
                           └─> DuckDBVSSDriver

Features

Unified API

Each driver implements four primitives:

create_collection(name, dim)
upsert(name, id, vector, metadata)
query(name, query_vector, top_k=5, filter=None)
close()

Metadata filter translator

Filters are expressed as pure Python dicts:

{"category": "phone", "price": {"$lt": 1000}}

The translator turns that into:

Backend	Generated SQL / predicate
Postgres	`WHERE metadata->>'category'='phone' AND (metadata->>'price')::float < 1000`
SQLite-VSS	fetch N × top_k then Python-filter (no JSON query)
DuckDB	`WHERE metadata->>'category'='phone' AND CAST(metadata->>'price' AS DOUBLE)<1000`
MySQL (HeatWave)	`WHERE JSON_EXTRACT(metadata,'$.category')='phone' AND JSON_EXTRACT(metadata,'$.price') < 1000`

Smart defaults

First insert → auto-create HNSW index (where supported).

top_k > 100 defaults to brute-force because HNSW recall degrades.

DuckDB/SQLite run synchronous by default; Postgres/MySQL drivers also expose awaitable methods via VectorDBAsync.

Zero-config tests

VectorDB("sqlite:///:memory:") + pytest gives you nanosecond-startup unit tests that run in CI without Docker.

Extensibility hook

Every driver inherits BaseVectorDriver; adding Qdrant is a 200-line PR (ping me!).

Benchmark (single CPU, 5k × 768, HNSW M=16 ef=100)

Backend	QPS ↑	p50 latency	Notes
PostgreSQL 16 + pgvector	210	4.7 ms	Unix-socket on local SSD
DuckDB 0.10.2	197	5.0 ms	In-proc; no network hop
MySQL 8.2 HeatWave	153	6.3 ms	TCP localhost
SQLite-VSS	138	6.9 ms	HNSW in shared library

Full Jupyter notebook + raw CSV in /bench.

Advanced usage

Async in FastAPI

from vectorwrap.asyncio import VectorDBAsync

db = VectorDBAsync("postgresql://u:p@host/db")   # `asyncpg` under the hood

@app.post("/search")
async def search(q: str):
    results = await db.query("docs", embed(q), top_k=10)
    return {"hits": results}

Running LangChain on top

from langchain.vectorstores.vectorwrap import VectorWrapStore

store = VectorWrapStore(VectorDB("duckdb:///demo.duckdb"), embeddings=OpenAIEmbeddings())
retriever = store.as_retriever(search_kwargs={"filter": {"lang": "en"}})

Migrating data between DBs

src  = VectorDB("sqlite:///legacy.db")
dest = VectorDB("postgresql://u:p@host/newdb")

for id_, vec, meta in src.scan("docs"):
    dest.upsert("docs", id_, vec, meta)

<10 lines, no external ETL.

Future roadmap

Redis & Elasticsearch adapters
Batch upserts (upsert_many)

Contributing

git clone https://github.com/mihirahuja1/vectorwrap
poetry install
pytest -q

Good-first-issue labels → metadata filter edge cases, docstrings

Code style → Black, ruff, mypy CI-enforced

Discussion → GitHub Discussions or #vectorwrap on pgvector Slack

Try it

GitHub: https://github.com/mihirahuja1/vectorwrap
PyPI: pip install vectorwrap
Colab playground: https://colab.research.google.com/github/mihirahuja1/vectorwrap/blob/HEAD/examples/demo_notebook.ipynb

If vectorwrap saves you a migration headache, feel free to leave a comment!

DEV Community

Swap Vector Databases by Changing a URL: Meet vectorwrap