DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

Semantic Search Implementation Guide: Embeddings, Vector Databases, and Reranking

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Semantic Search Implementation Guide: Embeddings, Vector Databases, and Reranking

Beyond Keyword Search

Keyword search (TF-IDF, BM25) matches exact words — great when users type the right keywords, terrible when they don't. Semantic search understands meaning: "how to deploy a Next.js app" matches "deploy a React application" even without shared keywords. In 2026, implementing semantic search is practical with open-source tools and embedding APIs. Here's how to actually build it.

The Architecture

┌──────────┐    ┌──────────────┐    ┌───────────────┐    ┌──────────┐
│ Query │───▶│ Embedding │───▶│ Vector Search │───▶│ Results │
│ String │ │ Model/API │ │ (pgvector, etc)│ │ (ranked) │
└──────────┘ └──────────────┘ └───────────────┘ └──────────┘
│ │
▼ ▼
float32[] array cosine similarity
(1536 dims typical) or approximate (ANN)
Enter fullscreen mode Exit fullscreen mode




Embedding Model Comparison

Model Dimensions Max Tokens Cost MTEB Score (Retrieval) Self-Hostable
OpenAI text-embedding-3-small 512/1536 8,191 $0.02/1M tokens 62.3 (1536d) No
OpenAI text-embedding-3-large 256/1024/3072 8,191 $0.13/1M tokens 64.6 (3072d) No
Cohere Embed v4 1024/2048 8,192 $0.10/1M tokens 63.8 No
BGE-M3 (BAAI) 1024 8,192 Free (self-host) 62.0 Yes (MIT)
jina-embeddings-v3 1024 8,192 $0.02/1M tokens 62.5 No (API only)
gte-Qwen2-7B-instruct 3584 32,768 Free (self-host) 66.3 (leading) Yes (Apache 2.0)

MTEB = Massive Text Embedding Benchmark. Higher is better. Scores from MTEB leaderboard as of early 2026.

Vector Database Options

Database Type Index Types Filtering Best For Pricing
pgvector (PostgreSQL) Postgres extension IVFFlat, HNSW Full SQL WHERE + joins Apps already on Postgres, metadata-rich filtering Free (OSS, Postgres license)
Qdrant Dedicated vector DB HNSW, quantization (binary, scalar, product) Payload filtering High performance, advanced quantization, filtering Free (OSS) / Cloud from $25/mo
Pinecone Managed vector DB Proprietary (serverless) Metadata filtering Zero-ops, serverless scaling, no tuning needed Free tier (2GB) → $0.33/GB/mo
Weaviate Vector + hybrid DB HNSW, flat, dynamic GraphQL filtering, BM25 + vector hybrid Hybrid search (keyword + semantic), built-in modules Free (OSS) / Cloud from $25/mo
Milvus Distributed vector DB 12+ index types Scalar filtering, boolean expressions Billion-scale vectors, distributed, GPU acceleration Free (OSS) / Cloud from $0.55/hr

Implementation Steps

Step 1: Chunk your documents. The quality of your chunks determines the quality of your search. Strategies: fixed-size (simple, 256-512 tokens with overlap), sentence-based (split on sentence boundaries), recursive character splitting (LangChain's default — splits on separators: , , ., space), semantic chunking (use a smaller model to detect topic boundaries — most accurate, more expensive). For most applications, recursive splitting with 512-token chunks and 50-token overlap works well. For code search, split on function/class boundaries.

Step 2: Generate and store embeddings. For a collection of 10,000 documents with 512-token chunks: ~15,000 chunks × 1,536 dimensions × 4 bytes = ~92 MB of vectors. This fits easily in pgvector on a small Postgres instance. Batch embedding generation: 15,000 chunks via OpenAI text-embedding-3-small = ~$0.20. Cache embeddings — don't re-embed unchanged documents.

Step 3: Implement search with reranking. Two-stage retrieval is the standard architecture for production: Stage 1 — Vector search returns top 20-50 candidates (fast, approximate). Stage 2 — Reranker model (Cross-encoder like Cohere Rerank v3 or BGE-Reranker-v2) scores the candidates more precisely and returns top 5-10. Stage 2 adds ~50ms latency but dramatically improves relevance. Without reranking, vector search alone returns "in the ballpark" results; with reranking, you get precisely relevant results.

Hybrid Search: The Best of Both Worlds

Pure semantic search fails for exact match queries (searching for "error code ERR_SSL_PROTOCOL" should match the exact string, not semantically similar concepts). Pure keyword search fails for conceptual queries ("how to deploy" won't match "deployment guide"). Hybrid search combines both: run BM25 + vector search in parallel, merge results via Reciprocal Rank Fusion (RRF). Weaviate and Elasticsearch have hybrid search built in; with pgvector, you implement the combination yourself.

When Semantic Search Is Worth It

Use Case Semantic Search? Why
Documentation search (user-facing) Yes Users don't know your terminology; they describe problems
Internal knowledge base Yes Employees search differently; semantic bridges the gap

E-comm


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)