Why Do We Need Specialized Vector Databases?
In the first five articles, we figured out how to chunk documents and generate embeddings. Now where do these vectors live, and how are they efficiently retrieved?
You might wonder: "Can't I just store vectors in Redis or PostgreSQL?"
No — traditional databases are designed for exact queries (e.g., WHERE id = 123), while vector retrieval is Approximate Nearest Neighbor (ANN) search: given a query vector, quickly find the Top-K most similar vectors among hundreds of millions of document vectors. Traditional database indexes (B-trees, hash tables) are powerless against this type of "similarity query."
Example:
- Traditional query:
Find user with id=42→ O(1) or O(log n) - Vector query:
Find 10 people most similar to user A→ requires comparing against all vectors, brute-force O(n) is too slow
Vector databases use specialized ANN indexes (HNSW, IVF, etc.) to reduce O(n) to O(log n), completing similarity searches across billions of vectors in milliseconds.
Three Core Capabilities of Vector Databases
1. Vector Storage
Store massive numbers of vectors (millions to billions), with each vector attached to raw text and metadata. Support incremental writes and deletions.
2. ANN Approximate Nearest Neighbor Search
Core algorithms:
| Algorithm | Principle | Pros | Cons |
|---|---|---|---|
| HNSW | Hierarchical Navigable Small World graph, multi-layer structure, search from coarse to fine | Fast, accurate, supports dynamic updates | Higher memory usage |
| IVF | Inverted File, partition vector space into clusters, find nearest cluster first then search | Memory-efficient, good for static data | Adding/removing vectors requires index rebuild |
| Flat | Brute-force full comparison | 100% accurate | Extremely slow, only for small datasets |
Recommendation: Use Flat for development (simplicity), HNSW for production (best performance).
3. Metadata Filtering
This is the key capability that distinguishes vector databases from pure vector libraries (like FAISS). You can do two things simultaneously:
- Find semantically relevant content using vector similarity
- Apply exact filtering with metadata conditions (e.g.,
time > 2024-01-01 AND category = "technical")
# Example: Only retrieve technical documents after 2024
results = vectorstore.similarity_search(
query="microservices monitoring",
k=5,
filter={
"category": "technical",
"year": {"$gte": 2024}
}
)
Mainstream Vector Database Comparison
Five Databases at a Glance
| Database | Positioning | Deployment | Index | Metadata Filter | Best For |
|---|---|---|---|---|---|
| Chroma | Dev/Prototyping | Local/Embedded | HNSW | ✅ | Local quick validation, small projects |
| Qdrant | Production self-hosted | Docker/K8s | HNSW | ✅ | Enterprise choice, strong performance & filtering |
| Weaviate | Hybrid search | Docker/Managed | HNSW | ✅ | When BM25 + vector hybrid retrieval is needed |
| pgvector | PG extension | PostgreSQL plugin | HNSW/IVF | ✅ | Existing PG environment, avoid new databases |
| Pinecone | Managed cloud | Fully managed | Auto | ✅ | Zero ops, quick launch |
Detailed Analysis
Chroma — The Developer's Best Friend
from langchain_chroma import Chroma
# Embedded operation, zero configuration
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
- ✅ Zero config, works after pip install
- ✅ Supports persistence to local disk
- ❌ Single-machine performance limited, not for high concurrency
- ❌ Weak distributed capabilities
Qdrant — The Enterprise Choice for Production
from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
vectorstore = Qdrant(
client=client,
collection_name="docs",
embeddings=embeddings,
)
- ✅ Written in Rust, extremely high performance
- ✅ Very powerful metadata filtering expressions
- ✅ Supports distributed clustering
- ✅ Cloud-hosted version available
- ❌ Requires additional service deployment
Weaviate — The Hybrid Search Specialist
from langchain_weaviate import WeaviateVectorStore
import weaviate
client = weaviate.connect_to_local()
vectorstore = WeaviateVectorStore(
client=client,
index_name="Docs",
text_key="text",
embedding=embeddings,
)
- ✅ Native BM25 + vector hybrid retrieval support
- ✅ Built-in vectorization module (optional)
- ❌ Higher resource consumption
- ❌ Steep learning curve
pgvector — A Blessing for PostgreSQL Users
-- Install extension in PostgreSQL
CREATE EXTENSION vector;
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1024),
category VARCHAR(50)
);
-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
from langchain_community.vectorstores import PGVector
vectorstore = PGVector(
connection_string="postgresql://user:pass@localhost/db",
embedding_function=embeddings,
collection_name="docs",
)
- ✅ Seamless integration with existing PG databases
- ✅ Full SQL expressiveness
- ✅ Transaction support (ACID)
- ❌ Vector retrieval performance lower than dedicated databases
- ❌ PostgreSQL itself becomes bottleneck at large scale
Pinecone — For Those Who Want Zero Ops
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone
pc = Pinecone(api_key="your-key")
index = pc.Index("docs")
vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
- ✅ Fully managed, zero operations
- ✅ Auto-scaling
- ✅ Good metadata filtering support
- ❌ Higher price
- ❌ Data lock-in (high migration cost)
Practical: Chroma (Development) vs Qdrant (Production)
Scenario
Let's build RAG for a technical blog system:
- Development: Use Chroma for quick validation, local execution
- Production: Migrate to Qdrant, supporting multi-tenancy and metadata filtering
Development Phase — Chroma
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="BAAI/bge-large-zh-v1.5",
api_key=os.getenv("SILICONFLOW_API_KEY"),
base_url="https://api.siliconflow.cn/v1",
chunk_size=32,
)
# Create local vector store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db",
collection_metadata={"hnsw:space": "cosine"}
)
# Search
results = vectorstore.similarity_search("microservices decomposition principles", k=3)
for doc in results:
print(f"Source: {doc.metadata['source']}")
print(f"Content: {doc.page_content[:100]}...")
print()
# Persistence (Chroma auto-saves)
# Next time load with:
# vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
Production Phase — Qdrant
from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient, models
# Connect to Qdrant service
client = QdrantClient(url="http://localhost:6333")
# Create Collection (like a database table)
client.create_collection(
collection_name="blog_docs",
vectors_config=models.VectorParams(
size=1024, # BGE-large-zh dimensions
distance=models.Distance.COSINE,
),
)
# Write data
vectorstore = Qdrant(
client=client,
collection_name="blog_docs",
embeddings=embeddings,
)
vectorstore.add_documents(documents=chunks)
# Search with metadata filtering
results = vectorstore.similarity_search(
query="microservices monitoring",
k=5,
filter=models.Filter(
must=[
models.FieldCondition(
key="category",
match=models.MatchValue(value="microservices")
),
models.FieldCondition(
key="year",
range=models.Range(gte=2024)
),
]
)
)
Migrating from Chroma to Qdrant
# 1. Export from Chroma
chroma_store = Chroma(
persist_directory="./chroma_db",
embedding_function=embeddings
)
all_docs = chroma_store.get()
# 2. Import to Qdrant
qdrant_store = Qdrant(
client=qdrant_client,
collection_name="blog_docs",
embeddings=embeddings,
)
# 3. Batch write (Qdrant supports efficient batch import)
from langchain_core.documents import Document
docs = [
Document(page_content=text, metadata=meta)
for text, meta in zip(all_docs["documents"], all_docs["metadatas"])
]
qdrant_store.add_documents(docs)
How to Choose a Similarity Algorithm?
When comparing two vectors, the database needs a "distance metric." Three common ones:
Cosine Similarity
cosine(A, B) = (A · B) / (||A|| × ||B||)
- Measures: Cosine of the angle between two vectors
- Characteristics: Only cares about direction, not magnitude
- Best for: Text semantic similarity (most common choice)
- Range: -1 (opposite) to 1 (identical), typically > 0.7 is similar
Dot Product
dot(A, B) = A · B = Σ(Ai × Bi)
- Measures: Sum of element-wise products
- Characteristics: Considers both direction and magnitude
- Best for: Recommendation systems (preference intensity matters)
- Note: If vectors aren't normalized, dot product is affected by vector length
Euclidean Distance
euclidean(A, B) = √Σ(Ai - Bi)²
- Measures: Straight-line distance between two points
- Characteristics: Absolute distance, sensitive to numerical differences
- Best for: Image retrieval, numerical feature scenarios
- Range: 0 (identical) to ∞, smaller is more similar
Selection Guide
| Scenario | Recommended Algorithm | Reasoning |
|---|---|---|
| Text semantic retrieval | Cosine Similarity | Standard choice, insensitive to vector length |
| Recommendation systems | Dot Product | Considers user interest intensity |
| Image retrieval | Euclidean Distance | Pixel-level differences are more intuitive |
| Uncertain | Cosine Similarity | Safest choice |
⚠️ Important: The embedding model and similarity algorithm must match! BGE models recommend cosine similarity, and OpenAI text-embedding-3 series also recommend cosine similarity.
Metadata Filtering: From "Needle in a Haystack" to "Precise Targeting"
Why Metadata Filtering?
Suppose your knowledge base has 100,000 documents covering tech, product, operations, and sales. A user asks: "What are this year's sales targets?"
Pure vector retrieval might return:
- ✅ Sales department 2024 target document
- ❌ Operations document mentioning "sales" workflows
- ❌ Technical document about "sales system architecture"
Adding metadata filter {"department": "sales", "year": 2024} precisely scopes the search.
Metadata Filtering in LangChain
from langchain_chroma import Chroma
# Write with metadata
docs = [
Document(
page_content="2024 sales target: 30% revenue growth...",
metadata={"department": "sales", "year": 2024, "type": "target"}
),
Document(
page_content="Sales system uses Redis caching...",
metadata={"department": "tech", "year": 2024, "type": "architecture"}
),
]
vectorstore = Chroma.from_documents(docs, embeddings)
# Search with filter
results = vectorstore.similarity_search(
"sales targets",
k=3,
filter={"department": "sales", "year": 2024}
)
Qdrant Advanced Filtering Expressions
from qdrant_client import models
filter = models.Filter(
must=[ # AND conditions
models.FieldCondition(key="department", match=models.MatchValue(value="sales")),
models.FieldCondition(key="year", range=models.Range(gte=2024)),
],
should=[ # OR conditions
models.FieldCondition(key="type", match=models.MatchValue(value="target")),
models.FieldCondition(key="type", match=models.MatchValue(value="summary")),
],
must_not=[ # NOT conditions
models.FieldCondition(key="status", match=models.MatchValue(value="draft")),
]
)
Selection Summary
By Scenario
| Scenario | Recommended Database | Reasoning |
|---|---|---|
| Local dev / quick prototype | Chroma | Zero config, pip install and go |
| Production self-hosted | Qdrant | Best performance, most flexible filtering, stable Rust |
| Need hybrid search (BM25 + vector) | Weaviate | Native support for both retrieval types |
| Existing PostgreSQL | pgvector | No new components, full SQL expressiveness |
| Zero ops desired | Pinecone | Fully managed, auto-scales |
| Ultra-large scale (billion vectors) | Milvus | Designed for massive vector scale (not covered in detail) |
Migration Path from Dev to Production
Phase 1: Development Validation
└── Chroma (local embedded, zero config)
↓
Phase 2: Testing Environment
└── Qdrant Docker (single node, validate functionality)
↓
Phase 3: Production Launch
└── Qdrant Cluster / Pinecone Managed (high availability)
Summary
This article covered the core knowledge of vector databases:
- Why vector databases are needed — ANN retrieval is something traditional databases can't do
- Three core capabilities — storage, ANN retrieval, metadata filtering
- Five database comparison — Chroma, Qdrant, Weaviate, pgvector, Pinecone
- Practical code — complete examples for Chroma development and Qdrant production
- Similarity algorithms — choosing among cosine, dot product, and Euclidean distance
- Metadata filtering — from "needle in a haystack" to "precise targeting"
Key Insight: The best vector database isn't the most expensive one — it's the one that fits your scenario. Use Chroma for development, Qdrant for production, pgvector if you have PostgreSQL, Pinecone if you want zero ops. No silver bullet, only the right fit.
Top comments (0)