DEV Community

diwushennian4955
diwushennian4955

Posted on • Originally published at nexa-api.com

VelociRAG + NexaAPI: Build the Fastest AI Agent RAG Pipeline (No PyTorch!)

VelociRAG + NexaAPI: Build the Fastest AI Agent RAG Pipeline (No PyTorch!)

I just found a new RAG library on PyPI that's doing something different: VelociRAG runs on ONNX runtime instead of PyTorch. No 2GB PyTorch install, no CUDA setup — just fast, lean retrieval.

Paired with NexaAPI ($0.003/image, 56+ models), you get the fastest, cheapest AI agent stack available today.

What is VelociRAG?

VelociRAG is a Python package for Retrieval-Augmented Generation (RAG) that uses ONNX runtime instead of PyTorch. Key features:

  • ONNX-powered: ~200MB footprint vs 2-4GB for PyTorch
  • 4-layer fusion: High-quality retrieval
  • MCP server: Native integration with AI agent frameworks
  • ~5ms retrieval: vs ~20ms with PyTorch
  • Install: pip install velocirag (no PyTorch!)

What is NexaAPI?

NexaAPI is the cheapest AI inference API:

  • $0.003/image — 13x cheaper than DALL-E 3
  • 56+ models: Flux Schnell, Flux Dev, SDXL, Stable Diffusion 3, DALL-E, and more
  • Text, Image, TTS, Video: Full multimodal stack
  • Free tier: 100 images at rapidapi.com/user/nexaquency
  • Install: pip install nexaapi or npm install nexaapi

The Architecture

User Query
    ↓
VelociRAG (ONNX retrieval, 4-layer fusion, ~5ms)
    ↓
Retrieved Context
    ↓
NexaAPI (text/image/TTS generation, 56+ models)
    ↓
Response + Generated Assets
Enter fullscreen mode Exit fullscreen mode

Python Tutorial

# pip install velocirag nexaapi
import velocirag
from nexaapi import NexaAPI

# Initialize NexaAPI client
# Free key: https://rapidapi.com/user/nexaquency
client = NexaAPI(api_key='YOUR_NEXAAPI_KEY')

# Initialize VelociRAG — ONNX-powered, no PyTorch needed
rag = velocirag.VelociRAG()

# Add documents to the RAG index
documents = [
    'NexaAPI provides 56+ AI models at the cheapest prices.',
    'Image generation costs only $0.003 per image.',
    'Supports Flux, Stable Diffusion, SDXL, and more.',
]
rag.add_documents(documents)

# Query the RAG system
query = 'What image generation models are available?'
retrieved_context = rag.retrieve(query, top_k=3)

# Use NexaAPI to generate a response grounded in retrieved context
prompt = f'Context: {retrieved_context}\n\nQuestion: {query}\n\nAnswer:'
response = client.text.generate(
    model='gpt-4o-mini',  # or any of 56+ supported models
    prompt=prompt
)
print(response.text)

# Bonus: Generate an image based on the RAG answer
image = client.image.generate(
    model='flux-schnell',
    prompt='AI agent processing documents at lightning speed',
    width=1024,
    height=1024
)
print(f'Image URL: {image.url}')  # Cost: $0.003
Enter fullscreen mode Exit fullscreen mode

JavaScript Tutorial

// npm install nexaapi
// Note: VelociRAG is Python-only; in JS, call VelociRAG via MCP server
import NexaAPI from 'nexaapi';

const client = new NexaAPI({ apiKey: 'YOUR_NEXAAPI_KEY' });

async function ragAgentWorkflow() {
  // Simulate RAG-retrieved context (from VelociRAG Python service or MCP server)
  const retrievedContext = [
    'NexaAPI supports 56+ models including Flux, SDXL, Kling video, and TTS.',
    'Pricing starts at $0.003 per image — cheapest in the market.',
  ].join(' ');

  const query = 'Generate a product image for my AI startup';

  // Step 1: Generate text response grounded in context
  const textResponse = await client.text.generate({
    model: 'gpt-4o-mini',
    prompt: `Context: ${retrievedContext}\n\nTask: ${query}`,
  });
  console.log('Agent response:', textResponse.text);

  // Step 2: Generate image via NexaAPI ($0.003!)
  const imageResponse = await client.image.generate({
    model: 'flux-schnell',
    prompt: 'Futuristic AI agent with lightning speed, digital art',
    width: 1024,
    height: 1024,
  });
  console.log('Generated image:', imageResponse.url);

  // Step 3: Generate audio narration
  const audioResponse = await client.audio.tts({
    text: textResponse.text,
    voice: 'alloy',
  });
  console.log('Audio URL:', audioResponse.url);
}

ragAgentWorkflow();
Enter fullscreen mode Exit fullscreen mode

MCP Server Integration

VelociRAG includes a built-in MCP server. Add it to your claude.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "python",
      "args": ["-m", "velocirag.mcp_server"],
      "env": {
        "VELOCIRAG_INDEX": "/path/to/your/index"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With this setup, Claude or any MCP-compatible agent can call VelociRAG to retrieve context, then use NexaAPI to generate responses.

Performance Comparison

Metric VelociRAG + NexaAPI PyTorch RAG + DALL-E
Install time ~30 seconds 5-10 minutes
Memory footprint ~200MB 2-4GB
Retrieval speed ~5ms (ONNX) ~20ms (PyTorch)
Image generation cost $0.003 $0.040
Models available 56+ 1
Serverless-friendly ✅ Yes ❌ No

Resources


Try NexaAPI free — 56+ models, $0.003/image. Get your free API key →

Top comments (0)