DEV Community

Cover image for Swap Vector Databases by Changing a URL: Meet vectorwrap
Mihir Ahuja
Mihir Ahuja

Posted on

Swap Vector Databases by Changing a URL: Meet vectorwrap

Swap Vector Databases by Changing a URL: Meet vectorwrap

tl;dr vectorwrap lets you migrate from an in-memory SQLite prototype to a production pgvector cluster by changing only the connection URL – all your insert & search code stays the same.

The Problem

Every vector database ships its own client, table schema, and API quirks.

A side-project that starts on DuckDB or SQLite turns into a week-long rewrite when you graduate to Postgres or MySQL HeatWave.

Wouldn't it be nicer if the only thing you had to edit was the connection string?

Enter vectorwrap.

Why bother? (The migration tax)

I built a toy semantic-search feature for a Flask app on in-memory SQLite.

When the project graduated to staging we needed:

  • Durable storage ➜ PostgreSQL + pgvector
  • Production-grade indexing ➜ HNSW
  • Simple CI tests ➜ DuckDB (fast + no server)

Three back-ends meant three client libraries, three table DDLs, and dozens of if db == "postgres": … branches. That's when vectorwrap was born.


30-second Quick Start

pip install "vectorwrap[all]"     # pgvector, HeatWave, SQLite-VSS, DuckDB-VSS
Enter fullscreen mode Exit fullscreen mode
from vectorwrap import VectorDB

def embed(txt): return [0.1] * 768      # plug your own embeddings here

db = VectorDB("sqlite:///:memory:")      # 1️⃣ prototype
db.create_collection("docs", 768)
db.upsert("docs", 1, embed("hello world"), {"lang": "en"})
print(db.query("docs", embed("hello"), top_k=1))

db = VectorDB("postgresql://u:p@localhost/vectors")   # 2️⃣ prod swap
print(db.query("docs", embed("hello"), top_k=1))
Enter fullscreen mode Exit fullscreen mode

No new imports, no table DDL, just the URL.

VectorDB is a thin façade that parses the URL scheme and dispatches to a backend driver:

VectorDB("postgresql://…") ─┬─> PgVectorDriver
                           ├─> MySQLVectorDriver
                           ├─> SQLiteVSSDriver
                           └─> DuckDBVSSDriver
Enter fullscreen mode Exit fullscreen mode

Features

Unified API

Each driver implements four primitives:

create_collection(name, dim)
upsert(name, id, vector, metadata)
query(name, query_vector, top_k=5, filter=None)
close()
Enter fullscreen mode Exit fullscreen mode

Metadata filter translator

Filters are expressed as pure Python dicts:

{"category": "phone", "price": {"$lt": 1000}}
Enter fullscreen mode Exit fullscreen mode

The translator turns that into:

Backend Generated SQL / predicate
Postgres WHERE metadata->>'category'='phone' AND (metadata->>'price')::float < 1000
SQLite-VSS fetch N × top_k then Python-filter (no JSON query)
DuckDB WHERE metadata->>'category'='phone' AND CAST(metadata->>'price' AS DOUBLE)<1000
MySQL (HeatWave) WHERE JSON_EXTRACT(metadata,'$.category')='phone' AND JSON_EXTRACT(metadata,'$.price') < 1000

Smart defaults

First insert → auto-create HNSW index (where supported).

top_k > 100 defaults to brute-force because HNSW recall degrades.

DuckDB/SQLite run synchronous by default; Postgres/MySQL drivers also expose awaitable methods via VectorDBAsync.

Zero-config tests

VectorDB("sqlite:///:memory:") + pytest gives you nanosecond-startup unit tests that run in CI without Docker.

Extensibility hook

Every driver inherits BaseVectorDriver; adding Qdrant is a 200-line PR (ping me!).

Benchmark (single CPU, 5k × 768, HNSW M=16 ef=100)

Backend QPS ↑ p50 latency Notes
PostgreSQL 16 + pgvector 210 4.7 ms Unix-socket on local SSD
DuckDB 0.10.2 197 5.0 ms In-proc; no network hop
MySQL 8.2 HeatWave 153 6.3 ms TCP localhost
SQLite-VSS 138 6.9 ms HNSW in shared library

Full Jupyter notebook + raw CSV in /bench.

Advanced usage

Async in FastAPI

from vectorwrap.asyncio import VectorDBAsync

db = VectorDBAsync("postgresql://u:p@host/db")   # `asyncpg` under the hood

@app.post("/search")
async def search(q: str):
    results = await db.query("docs", embed(q), top_k=10)
    return {"hits": results}
Enter fullscreen mode Exit fullscreen mode

Running LangChain on top

from langchain.vectorstores.vectorwrap import VectorWrapStore

store = VectorWrapStore(VectorDB("duckdb:///demo.duckdb"), embeddings=OpenAIEmbeddings())
retriever = store.as_retriever(search_kwargs={"filter": {"lang": "en"}})
Enter fullscreen mode Exit fullscreen mode

Migrating data between DBs

src  = VectorDB("sqlite:///legacy.db")
dest = VectorDB("postgresql://u:p@host/newdb")

for id_, vec, meta in src.scan("docs"):
    dest.upsert("docs", id_, vec, meta)
Enter fullscreen mode Exit fullscreen mode

<10 lines, no external ETL.

Future roadmap

  • Redis & Elasticsearch adapters
  • Batch upserts (upsert_many)

Contributing

git clone https://github.com/mihirahuja1/vectorwrap
poetry install
pytest -q
Enter fullscreen mode Exit fullscreen mode

Good-first-issue labels → metadata filter edge cases, docstrings

Code style → Black, ruff, mypy CI-enforced

Discussion → GitHub Discussions or #vectorwrap on pgvector Slack

Try it

If vectorwrap saves you a migration headache, feel free to leave a comment!

Top comments (0)