My AI system had been extracting knowledge from emails for weeks. Thousands of facts, entities, patterns, all sitting in PostgreSQL. The problem was finding any of it. The brain was using hardcoded SQL filters like WHERE category = 'infrastructure' to pull context before making decisions. If a fact about hosting costs was categorized under "billing," the brain would never see it when reasoning about infrastructure.
I needed to search by meaning, not by label. But I also needed to search by exact match. An invoice number like "TDS-2026-003" has no semantic meaning. You can't find it with vector search. You need both approaches working together, in one query.
That's what led me to Elasticsearch, and that's what this post is about.
The Problem With Vector-Only Search
ChromaDB had been in my Docker Compose file since the early days. The plan was a full RAG pipeline: embed documents, store vectors, retrieve relevant chunks. In practice, ChromaDB sat mostly idle while the knowledge base grew faster than expected. By the time I circled back to search, the requirements had changed.
The knowledge base isn't just natural language. It contains invoice reference codes, domain names, specific dates, and contract numbers. All things that have no semantic meaning. When you vector-search for "TDS-2026-003," the embedding model encodes it into a 768-dimensional space and looks for nearby vectors. But an arbitrary reference code has no meaningful position in that space. The vector is vaguely related to "invoices" but won't reliably surface the one document containing that exact string.
This isn't a ChromaDB problem. It's the nature of pure vector search. Any system that only does similarity matching will struggle with exact lookups.
The requirement that forced the decision: the brain needs to find "TDS-2026-003" (exact keyword match) AND "hosting expenses" (semantic similarity) through the same search system. Ideally in the same query.
Why Not pgvector
I was already running PostgreSQL with 95+ tables. Adding pgvector would mean no extra container, no extra service to maintain. It was the obvious first choice.
The problem is combining results. pgvector gives you vector similarity. PostgreSQL's built-in tsvector gives you full-text search. But merging the two result sets into a single ranked list, where a document that scores well on both keyword match and semantic similarity ranks higher than one that only matches on one, requires building your own scoring logic. You're essentially building a search engine inside your database.
Elasticsearch has a built-in algorithm for this called Reciprocal Rank Fusion (RRF). It takes the ranking from BM25 (keyword matching) and the ranking from kNN (vector similarity) and combines them mathematically. A document that appears near the top of both lists gets a higher combined score than one that only appears in one. No custom scoring logic, no manual result merging.
For simpler use cases where you only need vector similarity or only need keyword search but not both combined with principled ranking, pgvector is the better choice. Less infrastructure, less complexity, and it lives inside a database you're already running.
The Migration
Setting up Elasticsearch itself was straightforward. Single node, no cluster, security disabled (behind Tailscale, no public access):
elasticsearch:
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms4g -Xmx4g
- xpack.ml.enabled=false
Disabling the ML features saved about 500MB of heap. I don't use Elasticsearch's built-in ML since all inference runs through Ollama.
The index design followed a pattern I'd already established in PostgreSQL: separate data domains. Facts, entities, patterns, emails, transactions, and contracts each got their own index. Each domain has different fields and different query needs. Cramming everything into one giant index would mean sparse fields everywhere and confusing relevance scoring.
Every index follows the same hybrid field pattern. The primary searchable text is stored twice: once as a text field for BM25 keyword matching, and once as a dense_vector field for kNN similarity:
# Example: the facts index
"fact": {"type": "text", "analyzer": "standard"},
"fact_vector": {"type": "dense_vector", "dims": 768,
"index": True, "similarity": "cosine"},
The embedding model is nomic-embed-text running on the Mac mini through Ollama. 768 dimensions, decent multilingual support, which matters for a business that receives emails in Dutch, English, and Arabic. Each text gets truncated to 500 characters before embedding, which keeps throughput consistent and captures most of the semantic signal.
One thing I learned during indexing: enriching the text before embedding makes a big difference. "Hetzner" alone produces a generic vector. "Hetzner (company) cloud hosting hetzner.com" produces a much more useful one that captures what the entity actually is. Same approach for emails (subject + body snippet) and transactions (counterparty + description).
What Hybrid Search Looks Like
The core function combines BM25 and kNN in a single Elasticsearch query:
async def hybrid_search(index, query, k=10, filters=None):
text_field, vector_field = INDEX_FIELDS[index]
query_vector = await embed_text(query)
es_query = {
"size": k,
"query": {
"bool": {
"should": [
{"match": {text_field: query}}
],
}
},
"knn": {
"field": vector_field,
"query_vector": query_vector,
"k": k,
"num_candidates": max(k * 5, 50),
},
"rank": {"rrf": {"window_size": 50, "rank_constant": 20}},
"_source": {"excludes": [vector_field]},
}
# ... execute and return hits
The rank.rrf does the heavy lifting. For each document, it computes a score based on where it ranks in both the keyword results and the vector results. A document that ranks high in both lists gets a combined score higher than one that only appears in one. The rank_constant of 20 controls how much weight goes to top-ranked results versus the rest.
In practice, this means:
Searching for "TDS-2026-003": BM25 finds the exact document. The kNN results are vaguely invoice-related noise. RRF correctly puts the exact match at #1.
Searching for "hosting expenses": BM25 might find documents with those literal words. kNN finds "server rental charges," "VPS monthly payment," "cloud infrastructure billing." Conceptually identical, zero shared keywords. RRF combines both, giving you broader and more useful results than either approach alone.
Searching for "Hetzner invoice": BM25 catches the exact name "Hetzner." kNN catches hosting-invoice-related concepts. Documents specifically about Hetzner invoices appear in both result sets and rank highest.
Keeping It In Sync
New knowledge gets extracted from emails continuously. Those new facts need to end up in Elasticsearch. Rather than syncing inline, which would add latency to the extraction pipeline, I added an es_indexed boolean column to each table with a trigger:
CREATE OR REPLACE FUNCTION fn_mark_es_unindexed()
RETURNS TRIGGER AS $$
BEGIN
NEW.es_indexed := FALSE;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Fires on INSERT or content UPDATE, not metadata changes
CREATE TRIGGER trg_facts_es_sync
BEFORE INSERT OR UPDATE OF fact, category, confidence
ON memory_facts
FOR EACH ROW EXECUTE FUNCTION fn_mark_es_unindexed();
A workflow polls every few minutes for rows where es_indexed = FALSE, embeds them, indexes them to Elasticsearch, and flips the flag. Partial indexes on the boolean column make the "find unindexed rows" query fast since the index only contains rows that actually need syncing.
It's not real-time, but for my use case a few minutes of delay between extraction and searchability is fine.
The Trade-Offs
Search actually works now. The brain can find relevant context regardless of how facts were categorized. Exact matches and semantic matches in one query. Structured filtering by category, confidence score, or date range combined with free-text search.
The cost: 4.5GB of RAM on a machine already running 20 containers. ChromaDB used 53MB. That's roughly 85x more memory. Elasticsearch is a JVM application and it shows. The heap allocation alone is 4GB, and you can't really go lower for production use.
The query DSL is also significantly more verbose than ChromaDB's Python API. ChromaDB is collection.query(query_texts=["hosting"], n_results=10). The Elasticsearch equivalent is the nested JSON you saw above. I wrapped it in helper functions so the rest of the codebase doesn't have to deal with it, but the learning curve is real.
Would I do it again? Yes, because hybrid search solved a real problem I couldn't solve any other way. But I wouldn't recommend Elasticsearch for every project that needs search. If you only need vector similarity, ChromaDB or pgvector are simpler and lighter. Elasticsearch earns its 4GB when you need keyword matching and semantic search working together in the same query.
This is Part 3 of "One Developer, 22 Containers." The series covers building an AI office management system on consumer hardware, the choices, the trade-offs, and the things that broke along the way.
Find me on GitHub.
Top comments (0)