TL;DR: Embex is a universal ORM for vector databases that lets you switch between Qdrant, Pinecone, Chroma, LanceDB, Milvus, Weaviate, and PgVector without changing a single line of code. Built on a Rust core with SIMD acceleration, it's 4x faster than pure Python/JS implementations.
The Problem I Was Trying to Solve
If you've worked with vector databases, you know the pain:
- Every database has a different API - Qdrant uses one format, Pinecone another, Chroma yet another
- Vendor lock-in - Switching providers means rewriting your entire codebase
- Performance trade-offs - Most clients are pure Python/JS, leaving performance on the table
- Setup complexity - Getting started requires Docker, API keys, cloud accounts...
I wanted a solution that:
- ✅ Works with any vector database
- ✅ Lets you start with zero setup
- ✅ Delivers production-grade performance
- ✅ Scales from prototype to production
So I built Embex.
What Makes Embex Different?
1. Universal API - Switch Providers Instantly
# Development: LanceDB (embedded, zero setup)
client = await EmbexClient.new_async("lancedb://./data")
# Production: Switch to Qdrant (just change the config!)
client = await EmbexClient.new_async("qdrant://https://your-cluster.com", api_key="...")
# Everything else stays the same!
await collection.insert(points)
results = await collection.search(vector, top_k=10)
Same code. Different backend. Zero rewrites.
Note: Embex uses async initialization (
new_async()) to properly handle both local (LanceDB) and cloud providers (Qdrant, Pinecone) with the same API. This ensures consistent behavior across all providers.
2. Rust Core with SIMD Acceleration
Embex isn't a wrapper—it's built on a shared Rust core with SIMD intrinsics:
- 4x faster vector operations (dot product, cosine similarity)
- Zero-copy data passing between languages
- < 5% overhead vs native clients
# This runs on optimized Rust code with AVX2/NEON
results = await collection.search(query_vector, top_k=10)
3. Start Simple, Scale Later
Day 1: Learning
# LanceDB - embedded, no Docker, no API keys
client = await EmbexClient.new_async("lancedb://./data")
Week 2: Staging
# Qdrant - managed cloud, connection pooling
client = await EmbexClient.new_async("qdrant://https://...", api_key="...")
Month 1: Scale
# Milvus - billion-scale vectors, distributed
client = await EmbexClient.new_async("milvus://https://...")
Quick Start: Semantic Search in 5 Minutes
import asyncio
from embex import EmbexClient, Point
from sentence_transformers import SentenceTransformer
async def main():
# 1. Setup (LanceDB - zero setup!)
# Use new_async() for proper initialization across all providers
client = await EmbexClient.new_async("lancedb://./data")
model = SentenceTransformer('all-MiniLM-L6-v2')
# 2. Create collection
await client.create_collection("products", dimension=384, distance="cosine")
# 3. Insert documents
docs = [
"Apple iPhone 15 Pro Max",
"Samsung Galaxy S24 Ultra",
"Fresh Organic Bananas"
]
points = [
Point(
id=str(i),
vector=model.encode(doc).tolist(),
metadata={"text": doc}
)
for i, doc in enumerate(docs)
]
await client.insert("products", points)
# 4. Search
query = "smartphone"
query_vector = model.encode(query).tolist()
results = await client.search(collection_name="products", vector=query_vector, top_k=2)
for result in results.results:
print(f"{result.metadata['text']} (score: {result.score:.3f})")
asyncio.run(main())
That's it! No Docker, no API keys, no cloud setup. Just pip install embex lancedb sentence-transformers and you're running.
Performance Benchmarks
Based on our benchmarks:
- SIMD Acceleration: 3.6x - 4.0x faster dot product/cosine similarity
- Minimal Overhead: < 5% vs native clients
- Package Size: 8-15MB (optimized from 65MB+)
Production Features
Embex comes with everything you need for production:
- ✅ Connection Pooling: High-concurrency connection management
- ✅ Migrations: Git-like version control for schema
- ✅ Observability: OpenTelemetry metrics and tracing
- ✅ Type Safety: Full TypeScript definitions and Python type hints
Supported Providers
- LanceDB - Embedded, zero setup
- Qdrant - Production & local
- Pinecone - Serverless
- Chroma - AI-native
- PgVector - PostgreSQL extension
- Milvus - Billion-scale
- Weaviate - Modular
Try It Now
# Python
pip install embex lancedb sentence-transformers
# Node.js
npm install @bridgerust/embex lancedb @xenova/transformers
Documentation: bridgerust.dev/embex
GitHub: github.com/bridgerust/bridgerust
What's Next?
I'm actively developing Embex and would love your feedback:
- What features are missing?
- What use cases should I prioritize?
- Performance issues you've encountered?
- Provider support requests?
Questions? Feedback? Issues? Drop a comment below or open an issue on GitHub!
Top comments (0)