Anbu Taco

Posted on Mar 4

How to use Elasticsearch as the Neural Backbone of a Multi-Agent AI Manufacturing and Monitoring Platform

#elasticsearch #elasticblogathon #vectordatabase #ai

Using Elasticsearch as a unified vector store + event bus for a 7-agent AI manufacturing platform — architecture breakdown

I want to share a detailed write-up of how I used Elasticsearch as the core vector database in FactoryOS, a multi-agent AI platform I built for my final year project. This isn't a "I used pgvector" post — I want to get into the actual index design, retrieval strategy, and some non-obvious architectural choices.

The Setup

7 autonomous agents, each handling a distinct manufacturing lifecycle stage:

Procurement Agent — supplier selection, PO generation
Model Analysis Agent — product spec comparison
Digital Twin Agent — real-time factory floor state
Incoming Orders Agent — delivery timeline prediction
Invoice Management Agent — duplicate/anomaly detection
Treasury Agent — autonomous inventory reordering
Defect Analysis Agent — RAG-based root cause analysis

All agents share a single Elasticsearch cluster on Elastic Cloud. No agent has a private vector store. Elasticsearch is their collective long-term memory.

Why Elasticsearch over Pinecone / Weaviate / Qdrant?

The honest answer: manufacturing data doesn't fit the pure-vector-DB model well.

You're dealing with two fundamentally different query patterns simultaneously:

Semantic queries: "Find suppliers that have delivered corrosion-resistant fasteners for marine environments" — the document says "stainless M8 bolt, ISO 9227 salt-spray certified." Pure kNN handles this.
Exact / structured queries: SKU lookups, batch ID filters, date range queries on invoice archives, threshold checks on inventory levels. Dedicated vector DBs are awkward here — you end up bolting on a separate DB or doing metadata filtering that degrades recall.

Elasticsearch's hybrid search via Reciprocal Rank Fusion (RRF) solved both in a single query. BM25 handles the structured/keyword side, kNN handles the semantic side, and RRF fuses the ranked lists without requiring you to manually tune alpha weights. In practice this outperformed both pure kNN and pure BM25 significantly on our eval set of supplier matching queries.

Index Design

Each agent owns one or more indices. All use the same embedding model (all-MiniLM-L6-v2, 384 dims) so cross-index semantic queries are coherent.

Procurement index mapping (abbreviated):

"embedding":        dense_vector, dims=384, similarity=cosine, indexed=true
"product_category": text, analyzer=english
"invoice_summary":  text
"supplier_name":    keyword
"reliability_score": float
"avg_lead_time_days": float

Defect index mapping:

"embedding":           dense_vector, dims=384, similarity=cosine, indexed=true
"defect_description":  text
"batch_id":            keyword
"root_cause":          text
"severity":            keyword (enum: low/medium/high/critical)
"corrective_action":   text
"timestamp":           date

Inventory index (used by Treasury Agent):

"sku":               keyword
"current_stock":     integer
"safety_threshold":  integer
"unit_cost":         float
"last_updated":      date
"embedding":         dense_vector, dims=384 (for semantic reorder suggestions)

Hybrid Search Query (Procurement Agent)

This is the actual retriever structure used when the Procurement Agent needs to find best-fit suppliers for a new order:

{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "multi_match": {
                "query": "<order description>",
                "fields": ["product_category", "invoice_summary"]
              }
            }
          }
        },
        {
          "knn": {
            "field": "embedding",
            "query_vector": [...],
            "num_candidates": 50,
            "k": 10
          }
        }
      ],
      "rank_window_size": 20,
      "rank_constant": 60
    }
  }
}

rank_constant: 60 is the standard RRF default and worked well without tuning. We experimented with lower values (20–40) but saw marginal gains that didn't justify the complexity.

RAG Pipeline — Defect Analysis Agent

This is the most interesting retrieval use case in the project. When a new defect report comes in:

Embed the defect description using the same sentence-transformer model
kNN search against the defect index, k=5, num_candidates=50
Retrieve defect_description, root_cause, corrective_action, batch_id for each hit
Construct a prompt: system context + top-5 historical defect docs + new defect
LLM (GPT-4o-mini) generates a root cause hypothesis + recommended corrective action

The quality of retrieval here was highly sensitive to embedding model choice. A generic model caused semantic drift on technical terminology — "flux contamination" and "welding residue" weren't being retrieved together. Fine-tuning on a small corpus of manufacturing maintenance docs (scraped from public CMMS datasets) cut false negatives by ~40%.

Non-obvious Choice: Elasticsearch as the Agent Message Bus

Instead of Kafka or a task queue, agents communicate through a factoryos-events index. Events are timestamped documents:

{
  "event_type": "reorder_triggered",
  "sku": "M8-SS-BOLT",
  "quantity_needed": 5000,
  "handled": false,
  "triggered_by": "treasury_agent",
  "timestamp": "2025-11-15T09:32:00Z",
  "embedding": [...]
}

Agents poll with bool queries filtering on event_type + handled: false. On pickup, they update handled: true with a partial update.

Why this worked better than expected:

Full audit trail of every inter-agent action, queryable in Kibana
Replay: re-run any agent's decision by replaying unhandled events from a timestamp
Cross-event semantic search: "find all events semantically related to flux contamination issues" actually works because events are embedded
Zero additional infrastructure

The downside: polling latency (we ran polls every 5s) and no push-based triggering. For a real-time production system you'd add a watcher or use Elasticsearch's percolate API to trigger agents on index writes.

Treasury Agent — Autonomous Reordering Logic

Script query to find items below threshold:

{
  "query": {
    "script": {
      "script": {
        "source": "doc['current_stock'].value < doc['safety_threshold'].value"
      }
    }
  }
}

For each result, the agent:

Runs a hybrid search on the procurement index to rank suppliers by semantic fit + reliability score
Filters by avg_lead_time_days < required_lead_time using a post-filter
Generates a PO document and indexes it to factoryos-orders
Publishes a purchase_order_created event to factoryos-events

The Procurement Agent picks up the event, verifies supplier availability via an external API call, and either confirms or triggers a fallback supplier search.

What I'd Do Differently

ELSER instead of sentence-transformers: Elastic's learned sparse encoder is better suited for domain-specific industrial text without requiring fine-tuning. I didn't use it because I wanted full local control over embeddings, but for a production system ELSER would reduce the embedding infrastructure overhead significantly.
Percolate API for event-driven triggers: Polling every 5s works but is inelegant. Percolate queries registered per agent type would allow true push-based agent activation.
ILM from day one: I set up Index Lifecycle Management policies late in the project. The events and defect indices grew fast. Should have been day-one config.

Happy to go deep on any specific part — the hybrid search tuning, the embedding model choices, or the event bus design.

Stack: Node.js, Elasticsearch 8.x (Elastic Cloud), sentence-transformers, GPT-4o-mini, FastAPI

Elasticsearch #VectorSearch #HybridSearch #RAG #AIAgents #VectorDatabase #ElasticBlogathon

DEV Community

How to use Elasticsearch as the Neural Backbone of a Multi-Agent AI Manufacturing and Monitoring Platform

Elasticsearch #VectorSearch #HybridSearch #RAG #AIAgents #VectorDatabase #ElasticBlogathon

Top comments (0)