DEV Community

Cover image for RAG Series (6): Vector Databases — Storage and Retrieval Infrastructure
WonderLab
WonderLab

Posted on

RAG Series (6): Vector Databases — Storage and Retrieval Infrastructure

Why Do We Need Specialized Vector Databases?

In the first five articles, we figured out how to chunk documents and generate embeddings. Now where do these vectors live, and how are they efficiently retrieved?

You might wonder: "Can't I just store vectors in Redis or PostgreSQL?"

No — traditional databases are designed for exact queries (e.g., WHERE id = 123), while vector retrieval is Approximate Nearest Neighbor (ANN) search: given a query vector, quickly find the Top-K most similar vectors among hundreds of millions of document vectors. Traditional database indexes (B-trees, hash tables) are powerless against this type of "similarity query."

Example:

  • Traditional query: Find user with id=42 → O(1) or O(log n)
  • Vector query: Find 10 people most similar to user A → requires comparing against all vectors, brute-force O(n) is too slow

Vector databases use specialized ANN indexes (HNSW, IVF, etc.) to reduce O(n) to O(log n), completing similarity searches across billions of vectors in milliseconds.


Three Core Capabilities of Vector Databases

1. Vector Storage

Store massive numbers of vectors (millions to billions), with each vector attached to raw text and metadata. Support incremental writes and deletions.

2. ANN Approximate Nearest Neighbor Search

Core algorithms:

Algorithm Principle Pros Cons
HNSW Hierarchical Navigable Small World graph, multi-layer structure, search from coarse to fine Fast, accurate, supports dynamic updates Higher memory usage
IVF Inverted File, partition vector space into clusters, find nearest cluster first then search Memory-efficient, good for static data Adding/removing vectors requires index rebuild
Flat Brute-force full comparison 100% accurate Extremely slow, only for small datasets

Recommendation: Use Flat for development (simplicity), HNSW for production (best performance).

3. Metadata Filtering

This is the key capability that distinguishes vector databases from pure vector libraries (like FAISS). You can do two things simultaneously:

  • Find semantically relevant content using vector similarity
  • Apply exact filtering with metadata conditions (e.g., time > 2024-01-01 AND category = "technical")
# Example: Only retrieve technical documents after 2024
results = vectorstore.similarity_search(
    query="microservices monitoring",
    k=5,
    filter={
        "category": "technical",
        "year": {"$gte": 2024}
    }
)
Enter fullscreen mode Exit fullscreen mode

Mainstream Vector Database Comparison

Five Databases at a Glance

Database Positioning Deployment Index Metadata Filter Best For
Chroma Dev/Prototyping Local/Embedded HNSW Local quick validation, small projects
Qdrant Production self-hosted Docker/K8s HNSW Enterprise choice, strong performance & filtering
Weaviate Hybrid search Docker/Managed HNSW When BM25 + vector hybrid retrieval is needed
pgvector PG extension PostgreSQL plugin HNSW/IVF Existing PG environment, avoid new databases
Pinecone Managed cloud Fully managed Auto Zero ops, quick launch

Detailed Analysis

Chroma — The Developer's Best Friend

from langchain_chroma import Chroma

# Embedded operation, zero configuration
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
Enter fullscreen mode Exit fullscreen mode
  • ✅ Zero config, works after pip install
  • ✅ Supports persistence to local disk
  • ❌ Single-machine performance limited, not for high concurrency
  • ❌ Weak distributed capabilities

Qdrant — The Enterprise Choice for Production

from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")
vectorstore = Qdrant(
    client=client,
    collection_name="docs",
    embeddings=embeddings,
)
Enter fullscreen mode Exit fullscreen mode
  • ✅ Written in Rust, extremely high performance
  • ✅ Very powerful metadata filtering expressions
  • ✅ Supports distributed clustering
  • ✅ Cloud-hosted version available
  • ❌ Requires additional service deployment

Weaviate — The Hybrid Search Specialist

from langchain_weaviate import WeaviateVectorStore
import weaviate

client = weaviate.connect_to_local()
vectorstore = WeaviateVectorStore(
    client=client,
    index_name="Docs",
    text_key="text",
    embedding=embeddings,
)
Enter fullscreen mode Exit fullscreen mode
  • ✅ Native BM25 + vector hybrid retrieval support
  • ✅ Built-in vectorization module (optional)
  • ❌ Higher resource consumption
  • ❌ Steep learning curve

pgvector — A Blessing for PostgreSQL Users

-- Install extension in PostgreSQL
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1024),
    category VARCHAR(50)
);

-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode
from langchain_community.vectorstores import PGVector

vectorstore = PGVector(
    connection_string="postgresql://user:pass@localhost/db",
    embedding_function=embeddings,
    collection_name="docs",
)
Enter fullscreen mode Exit fullscreen mode
  • ✅ Seamless integration with existing PG databases
  • ✅ Full SQL expressiveness
  • ✅ Transaction support (ACID)
  • ❌ Vector retrieval performance lower than dedicated databases
  • ❌ PostgreSQL itself becomes bottleneck at large scale

Pinecone — For Those Who Want Zero Ops

from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("docs")
vectorstore = PineconeVectorStore(index=index, embedding=embeddings)
Enter fullscreen mode Exit fullscreen mode
  • ✅ Fully managed, zero operations
  • ✅ Auto-scaling
  • ✅ Good metadata filtering support
  • ❌ Higher price
  • ❌ Data lock-in (high migration cost)

Practical: Chroma (Development) vs Qdrant (Production)

Scenario

Let's build RAG for a technical blog system:

  • Development: Use Chroma for quick validation, local execution
  • Production: Migrate to Qdrant, supporting multi-tenancy and metadata filtering

Development Phase — Chroma

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="BAAI/bge-large-zh-v1.5",
    api_key=os.getenv("SILICONFLOW_API_KEY"),
    base_url="https://api.siliconflow.cn/v1",
    chunk_size=32,
)

# Create local vector store
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_metadata={"hnsw:space": "cosine"}
)

# Search
results = vectorstore.similarity_search("microservices decomposition principles", k=3)
for doc in results:
    print(f"Source: {doc.metadata['source']}")
    print(f"Content: {doc.page_content[:100]}...")
    print()

# Persistence (Chroma auto-saves)
# Next time load with:
# vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
Enter fullscreen mode Exit fullscreen mode

Production Phase — Qdrant

from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient, models

# Connect to Qdrant service
client = QdrantClient(url="http://localhost:6333")

# Create Collection (like a database table)
client.create_collection(
    collection_name="blog_docs",
    vectors_config=models.VectorParams(
        size=1024,  # BGE-large-zh dimensions
        distance=models.Distance.COSINE,
    ),
)

# Write data
vectorstore = Qdrant(
    client=client,
    collection_name="blog_docs",
    embeddings=embeddings,
)

vectorstore.add_documents(documents=chunks)

# Search with metadata filtering
results = vectorstore.similarity_search(
    query="microservices monitoring",
    k=5,
    filter=models.Filter(
        must=[
            models.FieldCondition(
                key="category",
                match=models.MatchValue(value="microservices")
            ),
            models.FieldCondition(
                key="year",
                range=models.Range(gte=2024)
            ),
        ]
    )
)
Enter fullscreen mode Exit fullscreen mode

Migrating from Chroma to Qdrant

# 1. Export from Chroma
chroma_store = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)
all_docs = chroma_store.get()

# 2. Import to Qdrant
qdrant_store = Qdrant(
    client=qdrant_client,
    collection_name="blog_docs",
    embeddings=embeddings,
)

# 3. Batch write (Qdrant supports efficient batch import)
from langchain_core.documents import Document

docs = [
    Document(page_content=text, metadata=meta)
    for text, meta in zip(all_docs["documents"], all_docs["metadatas"])
]
qdrant_store.add_documents(docs)
Enter fullscreen mode Exit fullscreen mode

How to Choose a Similarity Algorithm?

When comparing two vectors, the database needs a "distance metric." Three common ones:

Cosine Similarity

cosine(A, B) = (A · B) / (||A|| × ||B||)
Enter fullscreen mode Exit fullscreen mode
  • Measures: Cosine of the angle between two vectors
  • Characteristics: Only cares about direction, not magnitude
  • Best for: Text semantic similarity (most common choice)
  • Range: -1 (opposite) to 1 (identical), typically > 0.7 is similar

Dot Product

dot(A, B) = A · B = Σ(Ai × Bi)
Enter fullscreen mode Exit fullscreen mode
  • Measures: Sum of element-wise products
  • Characteristics: Considers both direction and magnitude
  • Best for: Recommendation systems (preference intensity matters)
  • Note: If vectors aren't normalized, dot product is affected by vector length

Euclidean Distance

euclidean(A, B) = √Σ(Ai - Bi)²
Enter fullscreen mode Exit fullscreen mode
  • Measures: Straight-line distance between two points
  • Characteristics: Absolute distance, sensitive to numerical differences
  • Best for: Image retrieval, numerical feature scenarios
  • Range: 0 (identical) to ∞, smaller is more similar

Selection Guide

Scenario Recommended Algorithm Reasoning
Text semantic retrieval Cosine Similarity Standard choice, insensitive to vector length
Recommendation systems Dot Product Considers user interest intensity
Image retrieval Euclidean Distance Pixel-level differences are more intuitive
Uncertain Cosine Similarity Safest choice

⚠️ Important: The embedding model and similarity algorithm must match! BGE models recommend cosine similarity, and OpenAI text-embedding-3 series also recommend cosine similarity.


Metadata Filtering: From "Needle in a Haystack" to "Precise Targeting"

Why Metadata Filtering?

Suppose your knowledge base has 100,000 documents covering tech, product, operations, and sales. A user asks: "What are this year's sales targets?"

Pure vector retrieval might return:

  • ✅ Sales department 2024 target document
  • ❌ Operations document mentioning "sales" workflows
  • ❌ Technical document about "sales system architecture"

Adding metadata filter {"department": "sales", "year": 2024} precisely scopes the search.

Metadata Filtering in LangChain

from langchain_chroma import Chroma

# Write with metadata
docs = [
    Document(
        page_content="2024 sales target: 30% revenue growth...",
        metadata={"department": "sales", "year": 2024, "type": "target"}
    ),
    Document(
        page_content="Sales system uses Redis caching...",
        metadata={"department": "tech", "year": 2024, "type": "architecture"}
    ),
]

vectorstore = Chroma.from_documents(docs, embeddings)

# Search with filter
results = vectorstore.similarity_search(
    "sales targets",
    k=3,
    filter={"department": "sales", "year": 2024}
)
Enter fullscreen mode Exit fullscreen mode

Qdrant Advanced Filtering Expressions

from qdrant_client import models

filter = models.Filter(
    must=[  # AND conditions
        models.FieldCondition(key="department", match=models.MatchValue(value="sales")),
        models.FieldCondition(key="year", range=models.Range(gte=2024)),
    ],
    should=[  # OR conditions
        models.FieldCondition(key="type", match=models.MatchValue(value="target")),
        models.FieldCondition(key="type", match=models.MatchValue(value="summary")),
    ],
    must_not=[  # NOT conditions
        models.FieldCondition(key="status", match=models.MatchValue(value="draft")),
    ]
)
Enter fullscreen mode Exit fullscreen mode

Selection Summary

By Scenario

Scenario Recommended Database Reasoning
Local dev / quick prototype Chroma Zero config, pip install and go
Production self-hosted Qdrant Best performance, most flexible filtering, stable Rust
Need hybrid search (BM25 + vector) Weaviate Native support for both retrieval types
Existing PostgreSQL pgvector No new components, full SQL expressiveness
Zero ops desired Pinecone Fully managed, auto-scales
Ultra-large scale (billion vectors) Milvus Designed for massive vector scale (not covered in detail)

Migration Path from Dev to Production

Phase 1: Development Validation
    └── Chroma (local embedded, zero config)
            ↓
Phase 2: Testing Environment
    └── Qdrant Docker (single node, validate functionality)
            ↓
Phase 3: Production Launch
    └── Qdrant Cluster / Pinecone Managed (high availability)
Enter fullscreen mode Exit fullscreen mode

Summary

This article covered the core knowledge of vector databases:

  1. Why vector databases are needed — ANN retrieval is something traditional databases can't do
  2. Three core capabilities — storage, ANN retrieval, metadata filtering
  3. Five database comparison — Chroma, Qdrant, Weaviate, pgvector, Pinecone
  4. Practical code — complete examples for Chroma development and Qdrant production
  5. Similarity algorithms — choosing among cosine, dot product, and Euclidean distance
  6. Metadata filtering — from "needle in a haystack" to "precise targeting"

Key Insight: The best vector database isn't the most expensive one — it's the one that fits your scenario. Use Chroma for development, Qdrant for production, pgvector if you have PostgreSQL, Pinecone if you want zero ops. No silver bullet, only the right fit.


References

Top comments (0)