DEV Community: Anbu Taco

How to use Elasticsearch as the Neural Backbone of a Multi-Agent AI Manufacturing and Monitoring Platform

Anbu Taco — Wed, 04 Mar 2026 17:54:11 +0000

Using Elasticsearch as a unified vector store + event bus for a 7-agent AI manufacturing platform — architecture breakdown

I want to share a detailed write-up of how I used Elasticsearch as the core vector database in FactoryOS, a multi-agent AI platform I built for my final year project. This isn't a "I used pgvector" post — I want to get into the actual index design, retrieval strategy, and some non-obvious architectural choices.

The Setup

7 autonomous agents, each handling a distinct manufacturing lifecycle stage:

Procurement Agent — supplier selection, PO generation
Model Analysis Agent — product spec comparison
Digital Twin Agent — real-time factory floor state
Incoming Orders Agent — delivery timeline prediction
Invoice Management Agent — duplicate/anomaly detection
Treasury Agent — autonomous inventory reordering
Defect Analysis Agent — RAG-based root cause analysis

All agents share a single Elasticsearch cluster on Elastic Cloud. No agent has a private vector store. Elasticsearch is their collective long-term memory.

Why Elasticsearch over Pinecone / Weaviate / Qdrant?

The honest answer: manufacturing data doesn't fit the pure-vector-DB model well.

You're dealing with two fundamentally different query patterns simultaneously:

Semantic queries: "Find suppliers that have delivered corrosion-resistant fasteners for marine environments" — the document says "stainless M8 bolt, ISO 9227 salt-spray certified." Pure kNN handles this.
Exact / structured queries: SKU lookups, batch ID filters, date range queries on invoice archives, threshold checks on inventory levels. Dedicated vector DBs are awkward here — you end up bolting on a separate DB or doing metadata filtering that degrades recall.

Elasticsearch's hybrid search via Reciprocal Rank Fusion (RRF) solved both in a single query. BM25 handles the structured/keyword side, kNN handles the semantic side, and RRF fuses the ranked lists without requiring you to manually tune alpha weights. In practice this outperformed both pure kNN and pure BM25 significantly on our eval set of supplier matching queries.

Index Design

Each agent owns one or more indices. All use the same embedding model (all-MiniLM-L6-v2, 384 dims) so cross-index semantic queries are coherent.

Procurement index mapping (abbreviated):

"embedding":        dense_vector, dims=384, similarity=cosine, indexed=true
"product_category": text, analyzer=english
"invoice_summary":  text
"supplier_name":    keyword
"reliability_score": float
"avg_lead_time_days": float

Defect index mapping:

"embedding":           dense_vector, dims=384, similarity=cosine, indexed=true
"defect_description":  text
"batch_id":            keyword
"root_cause":          text
"severity":            keyword (enum: low/medium/high/critical)
"corrective_action":   text
"timestamp":           date

Inventory index (used by Treasury Agent):

"sku":               keyword
"current_stock":     integer
"safety_threshold":  integer
"unit_cost":         float
"last_updated":      date
"embedding":         dense_vector, dims=384 (for semantic reorder suggestions)

Hybrid Search Query (Procurement Agent)

This is the actual retriever structure used when the Procurement Agent needs to find best-fit suppliers for a new order:

{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "multi_match": {
                "query": "<order description>",
                "fields": ["product_category", "invoice_summary"]
              }
            }
          }
        },
        {
          "knn": {
            "field": "embedding",
            "query_vector": [...],
            "num_candidates": 50,
            "k": 10
          }
        }
      ],
      "rank_window_size": 20,
      "rank_constant": 60
    }
  }
}

rank_constant: 60 is the standard RRF default and worked well without tuning. We experimented with lower values (20–40) but saw marginal gains that didn't justify the complexity.

RAG Pipeline — Defect Analysis Agent

This is the most interesting retrieval use case in the project. When a new defect report comes in:

Embed the defect description using the same sentence-transformer model
kNN search against the defect index, k=5, num_candidates=50
Retrieve defect_description, root_cause, corrective_action, batch_id for each hit
Construct a prompt: system context + top-5 historical defect docs + new defect
LLM (GPT-4o-mini) generates a root cause hypothesis + recommended corrective action

The quality of retrieval here was highly sensitive to embedding model choice. A generic model caused semantic drift on technical terminology — "flux contamination" and "welding residue" weren't being retrieved together. Fine-tuning on a small corpus of manufacturing maintenance docs (scraped from public CMMS datasets) cut false negatives by ~40%.

Non-obvious Choice: Elasticsearch as the Agent Message Bus

Instead of Kafka or a task queue, agents communicate through a factoryos-events index. Events are timestamped documents:

{
  "event_type": "reorder_triggered",
  "sku": "M8-SS-BOLT",
  "quantity_needed": 5000,
  "handled": false,
  "triggered_by": "treasury_agent",
  "timestamp": "2025-11-15T09:32:00Z",
  "embedding": [...]
}

Agents poll with bool queries filtering on event_type + handled: false. On pickup, they update handled: true with a partial update.

Why this worked better than expected:

Full audit trail of every inter-agent action, queryable in Kibana
Replay: re-run any agent's decision by replaying unhandled events from a timestamp
Cross-event semantic search: "find all events semantically related to flux contamination issues" actually works because events are embedded
Zero additional infrastructure

The downside: polling latency (we ran polls every 5s) and no push-based triggering. For a real-time production system you'd add a watcher or use Elasticsearch's percolate API to trigger agents on index writes.

Treasury Agent — Autonomous Reordering Logic

Script query to find items below threshold:

{
  "query": {
    "script": {
      "script": {
        "source": "doc['current_stock'].value < doc['safety_threshold'].value"
      }
    }
  }
}

For each result, the agent:

Runs a hybrid search on the procurement index to rank suppliers by semantic fit + reliability score
Filters by avg_lead_time_days < required_lead_time using a post-filter
Generates a PO document and indexes it to factoryos-orders
Publishes a purchase_order_created event to factoryos-events

The Procurement Agent picks up the event, verifies supplier availability via an external API call, and either confirms or triggers a fallback supplier search.

What I'd Do Differently

ELSER instead of sentence-transformers: Elastic's learned sparse encoder is better suited for domain-specific industrial text without requiring fine-tuning. I didn't use it because I wanted full local control over embeddings, but for a production system ELSER would reduce the embedding infrastructure overhead significantly.
Percolate API for event-driven triggers: Polling every 5s works but is inelegant. Percolate queries registered per agent type would allow true push-based agent activation.
ILM from day one: I set up Index Lifecycle Management policies late in the project. The events and defect indices grew fast. Should have been day-one config.

Happy to go deep on any specific part — the hybrid search tuning, the embedding model choices, or the event bus design.

Stack: Node.js, Elasticsearch 8.x (Elastic Cloud), sentence-transformers, GPT-4o-mini, FastAPI

Elasticsearch #VectorSearch #HybridSearch #RAG #AIAgents #VectorDatabase #ElasticBlogathon

COBOL in Big 25

Anbu Taco — Fri, 05 Dec 2025 23:23:44 +0000

DarkLedger: Precision Payroll Infrastructure

Request a Demo

💀 The Inspiration: The Floating Point Ghost

In the world of Web3 and modern SaaS, we have forgotten the old gods. We build financial systems on JavaScript and Python, languages that rely on IEEE 754 floating-point arithmetic.

In a standard modern environment, simple addition can yield terrifying results due to binary approximation errors:

0.1 + 0.2 = 0.30000000000000004

In a payroll run of $10,000,000, these "micro-pennies" accumulate. In the traditional banking world, this leads to audit failures. In the crypto world, where transactions are immutable, this leads to irreversible financial loss.

We asked ourselves: "What if we brought the dead back to life to save the future?"

⚡ What It Is

DarkLedger (formerly Ledger-De-Main) is a "Frankenstein" architecture. We stitched together the oldest, most reliable financial engine (COBOL) with the newest, fastest settlement layer (Base L2).

We refuse to modernize the math. We run the core payroll logic in a compiled COBOL binary—the same technology that powers 95% of the world's ATM swipes—ensuring 100% decimal precision. We then "stitch" this brain to a Python nervous system that executes payouts on the blockchain.

🏗 How We Built It (The Architecture)

We utilized Kiro's Agentic IDE to bridge two incompatible eras of computing.

The Brain: COBOL (Legacy Core)

We generated a GnuCOBOL program using Vibe Coding. This component handles the Gross-to-Net logic using Fixed-Point Arithmetic.

Unlike floating point, our COBOL engine treats money as integers scaled by a power of 10. For a value ( V ), stored as an integer ( I ):

\frac{I}{10^2}

This ensures that our tax calculations (15% Federal, 5% State) are exact:

\text{Net Pay} = (\text{Hours} \times \text{Rate}) - \lfloor \text{Gross} \times 0.15 \rfloor - \lfloor \text{Gross} \times 0.05 \rfloor

The Stitch: Python & Agent Hooks

The biggest challenge was connecting a modern JSON-based frontend to a legacy binary that expects fixed-width text files.

We used Kiro Agent Hooks to automate the "Stitching" process.

The Hook: On every save of payroll.cbl, Kiro parses the DATA DIVISION, extracts the byte positions (e.g., PIC X(10)), and auto-generates a Python struct parser.

The Interface:

Input: 23 Bytes (Employee ID, Hours, Rate, Tax Code)

Output: 60 Bytes (ID, Gross, Taxes, Net, Status)

The Hands: Settlement on Base

Once the math is verified by the "Brain," the "Body" (Python) uses the Coinbase CDP SDK to convert the Net Pay into a USDC transaction on the Base L2 network.

🧟‍♂️ Challenges & Lessons Learned

Click to expand our learnings

Challenge 1: The Language Barrier

COBOL does not speak JSON. It speaks Bytes.

Lesson: We learned that modern ease-of-use has made us lazy with data types. We had to build a rigid byte-level contract (defined in our design.md spec) to ensure Python didn't send garbage to the mainframe.

Challenge 2: Containerizing a Monster

Running COBOL in the cloud isn't standard.

Lesson: We built a custom Docker container that acts as a time machine. It installs a lightweight Linux OS, pulls the gnucobol compiler dependencies, compiles the legacy code on build, and then spins up a modern FastAPI server to listen for requests.

Challenge 3: "Audit-Proofing"

We learned that "close enough" isn't good enough for payroll. By enforcing Banker's Rounding in COBOL (COMPUTE ROUNDED), we achieved a level of precision that standard JavaScript libraries struggle to replicate without heavy dependencies.

🚀 Usage

Run the Container

docker build -t ledger-de-main .
docker run -p 8000:8000 --env-file .env ledger-de-main

Execute Payroll

Navigate to our Retro-Terminal UI and execute a batch command:

RUN PAYROLL --BATCH 2025-10-31

The system will:

Generate a fixed-width input.dat.

Spawn a subprocess to run cobol/bin/payroll.

Read the output.rpt.

Execute a gasless USDC transfer on Base.

📜 Kiro Implementation Details

Specs: Used requirements.md to define the 23-byte input constraint.

Steering: Enforced a "Sacred Timeline" rule in .kiro/steering/tech.md that forbade refactoring COBOL logic into Python.

Hooks: Automated the binary compilation and Python model updates.

Built with 💚 (and 🧟‍♂️) for Kiroween 2025