Using Elasticsearch as a unified vector store + event bus for a 7-agent AI manufacturing platform — architecture breakdown
I want to share a detailed write-up of how I used Elasticsearch as the core vector database in FactoryOS, a multi-agent AI platform I built for my final year project. This isn't a "I used pgvector" post — I want to get into the actual index design, retrieval strategy, and some non-obvious architectural choices.
The Setup
7 autonomous agents, each handling a distinct manufacturing lifecycle stage:
- Procurement Agent — supplier selection, PO generation
- Model Analysis Agent — product spec comparison
- Digital Twin Agent — real-time factory floor state
- Incoming Orders Agent — delivery timeline prediction
- Invoice Management Agent — duplicate/anomaly detection
- Treasury Agent — autonomous inventory reordering
- Defect Analysis Agent — RAG-based root cause analysis
All agents share a single Elasticsearch cluster on Elastic Cloud. No agent has a private vector store. Elasticsearch is their collective long-term memory.
Why Elasticsearch over Pinecone / Weaviate / Qdrant?
The honest answer: manufacturing data doesn't fit the pure-vector-DB model well.
You're dealing with two fundamentally different query patterns simultaneously:
Semantic queries: "Find suppliers that have delivered corrosion-resistant fasteners for marine environments" — the document says "stainless M8 bolt, ISO 9227 salt-spray certified." Pure kNN handles this.
Exact / structured queries: SKU lookups, batch ID filters, date range queries on invoice archives, threshold checks on inventory levels. Dedicated vector DBs are awkward here — you end up bolting on a separate DB or doing metadata filtering that degrades recall.
Elasticsearch's hybrid search via Reciprocal Rank Fusion (RRF) solved both in a single query. BM25 handles the structured/keyword side, kNN handles the semantic side, and RRF fuses the ranked lists without requiring you to manually tune alpha weights. In practice this outperformed both pure kNN and pure BM25 significantly on our eval set of supplier matching queries.
Index Design
Each agent owns one or more indices. All use the same embedding model (all-MiniLM-L6-v2, 384 dims) so cross-index semantic queries are coherent.
Procurement index mapping (abbreviated):
"embedding": dense_vector, dims=384, similarity=cosine, indexed=true
"product_category": text, analyzer=english
"invoice_summary": text
"supplier_name": keyword
"reliability_score": float
"avg_lead_time_days": float
Defect index mapping:
"embedding": dense_vector, dims=384, similarity=cosine, indexed=true
"defect_description": text
"batch_id": keyword
"root_cause": text
"severity": keyword (enum: low/medium/high/critical)
"corrective_action": text
"timestamp": date
Inventory index (used by Treasury Agent):
"sku": keyword
"current_stock": integer
"safety_threshold": integer
"unit_cost": float
"last_updated": date
"embedding": dense_vector, dims=384 (for semantic reorder suggestions)
Hybrid Search Query (Procurement Agent)
This is the actual retriever structure used when the Procurement Agent needs to find best-fit suppliers for a new order:
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"multi_match": {
"query": "<order description>",
"fields": ["product_category", "invoice_summary"]
}
}
}
},
{
"knn": {
"field": "embedding",
"query_vector": [...],
"num_candidates": 50,
"k": 10
}
}
],
"rank_window_size": 20,
"rank_constant": 60
}
}
}
rank_constant: 60 is the standard RRF default and worked well without tuning. We experimented with lower values (20–40) but saw marginal gains that didn't justify the complexity.
RAG Pipeline — Defect Analysis Agent
This is the most interesting retrieval use case in the project. When a new defect report comes in:
- Embed the defect description using the same sentence-transformer model
- kNN search against the defect index,
k=5,num_candidates=50 - Retrieve
defect_description,root_cause,corrective_action,batch_idfor each hit - Construct a prompt: system context + top-5 historical defect docs + new defect
- LLM (GPT-4o-mini) generates a root cause hypothesis + recommended corrective action
The quality of retrieval here was highly sensitive to embedding model choice. A generic model caused semantic drift on technical terminology — "flux contamination" and "welding residue" weren't being retrieved together. Fine-tuning on a small corpus of manufacturing maintenance docs (scraped from public CMMS datasets) cut false negatives by ~40%.
Non-obvious Choice: Elasticsearch as the Agent Message Bus
Instead of Kafka or a task queue, agents communicate through a factoryos-events index. Events are timestamped documents:
{
"event_type": "reorder_triggered",
"sku": "M8-SS-BOLT",
"quantity_needed": 5000,
"handled": false,
"triggered_by": "treasury_agent",
"timestamp": "2025-11-15T09:32:00Z",
"embedding": [...]
}
Agents poll with bool queries filtering on event_type + handled: false. On pickup, they update handled: true with a partial update.
Why this worked better than expected:
- Full audit trail of every inter-agent action, queryable in Kibana
- Replay: re-run any agent's decision by replaying unhandled events from a timestamp
- Cross-event semantic search: "find all events semantically related to flux contamination issues" actually works because events are embedded
- Zero additional infrastructure
The downside: polling latency (we ran polls every 5s) and no push-based triggering. For a real-time production system you'd add a watcher or use Elasticsearch's percolate API to trigger agents on index writes.
Treasury Agent — Autonomous Reordering Logic
Script query to find items below threshold:
{
"query": {
"script": {
"script": {
"source": "doc['current_stock'].value < doc['safety_threshold'].value"
}
}
}
}
For each result, the agent:
- Runs a hybrid search on the procurement index to rank suppliers by semantic fit + reliability score
- Filters by
avg_lead_time_days < required_lead_timeusing a post-filter - Generates a PO document and indexes it to
factoryos-orders - Publishes a
purchase_order_createdevent tofactoryos-events
The Procurement Agent picks up the event, verifies supplier availability via an external API call, and either confirms or triggers a fallback supplier search.
What I'd Do Differently
- ELSER instead of sentence-transformers: Elastic's learned sparse encoder is better suited for domain-specific industrial text without requiring fine-tuning. I didn't use it because I wanted full local control over embeddings, but for a production system ELSER would reduce the embedding infrastructure overhead significantly.
- Percolate API for event-driven triggers: Polling every 5s works but is inelegant. Percolate queries registered per agent type would allow true push-based agent activation.
- ILM from day one: I set up Index Lifecycle Management policies late in the project. The events and defect indices grew fast. Should have been day-one config.
Happy to go deep on any specific part — the hybrid search tuning, the embedding model choices, or the event bus design.
Stack: Node.js, Elasticsearch 8.x (Elastic Cloud), sentence-transformers, GPT-4o-mini, FastAPI


Top comments (0)