DEV Community

Cover image for Building a RAG Pipeline with IteraTools: Chunk Embed Store Search
Fred Santos
Fred Santos

Posted on • Originally published at iteratools.com

Building a RAG Pipeline with IteraTools: Chunk Embed Store Search

From Raw Text to Searchable Knowledge — One API

RAG (Retrieval-Augmented Generation) is how you give LLMs long-term memory. The idea is simple: split your documents into chunks, embed them into vectors, store them, and retrieve the most relevant ones at query time.

IteraTools now has every piece of that pipeline as a pay-per-use API endpoint. No infra. No self-hosting. Just HTTP.


The Full RAG Pipeline

Step 1: Chunk your document — POST /text/chunk ($0.001)

The newest endpoint. Splits any text (up to 500,000 chars) into overlapping chunks ready for embedding.

Three strategies:

  • token — fixed character windows (1 token ≈ 4 chars), great for guaranteed context limits
  • sentence — groups complete sentences up to chunk_size tokens; preserves readability (default)
  • paragraph — groups \n\n-separated paragraphs; ideal for structured documents
curl -X POST https://api.iteratools.com/text/chunk \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long document here...",
    "chunk_size": 500,
    "overlap": 50,
    "strategy": "sentence"
  }'
Enter fullscreen mode Exit fullscreen mode
{
  "ok": true,
  "data": {
    "chunks": ["First chunk...", "Overlapping second chunk..."],
    "count": 12,
    "strategy": "sentence",
    "avg_length": 487
  }
}
Enter fullscreen mode Exit fullscreen mode

The overlap parameter prevents context loss at chunk boundaries — crucial for coherent retrieval.

Zero external API calls. Runs entirely in Node.js. No surprises.


Step 2: Embed chunks — POST /embeddings ($0.001)

Convert your chunks to float vectors using OpenAI text-embedding-3-small (1536 dims) or large (3072 dims). Accepts up to 100 strings per request — so you can batch all your chunks in one call.

curl -X POST https://api.iteratools.com/embeddings \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"text": ["chunk 1...", "chunk 2...", "chunk 3..."]}'
Enter fullscreen mode Exit fullscreen mode

Returns { embeddings: [[...], [...]], model, dimensions, count, tokens }.


Step 3: Store in vector memory — POST /memory/upsert ($0.003)

Stores a document with its embedding in an isolated, per-API-key namespace.

curl -X POST https://api.iteratools.com/memory/upsert \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{
    "namespace": "my-docs",
    "id": "chunk-001",
    "text": "The first chunk of my document...",
    "metadata": {"source": "manual.pdf", "page": 1}
  }'
Enter fullscreen mode Exit fullscreen mode

Namespaces are automatically isolated per API key — no user can access another user's data.


Step 4: Semantic search — POST /memory/search ($0.002)

At query time, pass a natural language question. The API embeds it, computes cosine similarity against all stored chunks, and returns the top-K most relevant ones.

curl -X POST https://api.iteratools.com/memory/search \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"namespace": "my-docs", "query": "What is the refund policy?", "top_k": 5}'
Enter fullscreen mode Exit fullscreen mode

Feed those chunks as context into /ai/chat or your own LLM call. That's RAG.


Other Recent Endpoints

While building out the RAG toolkit, a few other useful tools landed:

POST /barcode/generate ($0.001)

Generate barcode images locally — Code128, Code39, EAN-13, EAN-8, UPC-A, ITF14, DataMatrix. Returns PNG as base64. Zero external API.

curl -X POST https://api.iteratools.com/barcode/generate \
  -d '{"data": "1234567890", "type": "ean13"}'
Enter fullscreen mode Exit fullscreen mode

POST /document/ocr ($0.015)

AI-powered OCR via Mistral mistral-ocr-latest. Far superior to Tesseract for:

  • Scanned PDFs and forms
  • Tables and multi-column layouts
  • Invoices, receipts, Brazilian notas fiscais

Returns plain text, markdown (with table structure preserved), and extracted table objects.

POST /json/validate ($0.001)

Five modes in one endpoint: validate, format (pretty-print), minify, stats (depth, key count, types), and get (extract value by dot-bracket path like user.name or items[0].id).


The Full RAG Flow in One Script

import requests

API = "https://api.iteratools.com"
KEY = "YOUR_KEY"
HEADERS = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

# 1. Read your document
with open("manual.txt") as f:
    document = f.read()

# 2. Chunk it
chunks_resp = requests.post(f"{API}/text/chunk", headers=HEADERS, json={
    "text": document, "chunk_size": 500, "overlap": 50, "strategy": "sentence"
}).json()
chunks = chunks_resp["data"]["chunks"]
print(f"Created {len(chunks)} chunks")

# 3. Store each chunk
for i, chunk in enumerate(chunks):
    requests.post(f"{API}/memory/upsert", headers=HEADERS, json={
        "namespace": "manual",
        "id": f"chunk-{i:04d}",
        "text": chunk,
        "metadata": {"index": i}
    })
print("All chunks stored!")

# 4. Query
question = "What is the return policy?"
results = requests.post(f"{API}/memory/search", headers=HEADERS, json={
    "namespace": "manual", "query": question, "top_k": 3
}).json()

context = "\n\n".join(r["text"] for r in results["data"]["results"])
print("Context retrieved:", context[:200])

# 5. Answer with LLM
answer = requests.post(f"{API}/ai/chat", headers=HEADERS, json={
    "message": question,
    "system": f"Answer based only on this context:\n\n{context}"
}).json()
print("Answer:", answer["data"]["response"])
Enter fullscreen mode Exit fullscreen mode

Total cost for a 10-page document (~5,000 tokens, ~10 chunks): ~$0.06 for the full pipeline.


Current Tool Count: 58

IteraTools now has 58 tools across images, video, web, audio, text processing, code execution, external integrations, and full RAG pipelines — all pay-per-use with x402 micropayments on Base.

Browse all tools: iteratools.com/tools

Get your API key: iteratools.com/docs

MCP package (Claude, Cursor, Windsurf): npx mcp-iteratools

Top comments (0)