DEV Community

Cover image for Vectra: A Provider-Agnostic RAG SDK That Lets You Swap Anything Without Rewriting Code
Abhishek
Abhishek

Posted on

Vectra: A Provider-Agnostic RAG SDK That Lets You Swap Anything Without Rewriting Code

I got tired of rebuilding RAG pipelines every time I wanted to change a provider. So I built an SDK where the entire pipeline is config-driven. 4.5K+ downloads and counting.

Every RAG tutorial follows the same pattern: pick an embedding model, pick a vector DB, pick an LLM, write a bunch of glue code, and hope it works. Three months later, you need to swap from OpenAI to Gemini, or move from Chroma to Postgres — and suddenly you're rewriting half your backend.

I built Vectra to fix that. It's an open-source, provider-agnostic SDK for building full RAG pipelines where every component is swappable through config.

npm install vectra-js
# or
pip install vectra-rag-py
Enter fullscreen mode Exit fullscreen mode

What It Actually Does

Vectra covers the full pipeline:

Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
Enter fullscreen mode Exit fullscreen mode

All of it is configured through a single object. Here's a complete working example:

const { VectraClient, ProviderType } = require('vectra-js');
const { Pool } = require('pg');

const client = new VectraClient({
  embedding: {
    provider: ProviderType.OPENAI,
    apiKey: process.env.OPENAI_API_KEY,
    modelName: 'text-embedding-3-small'
  },
  llm: {
    provider: ProviderType.GEMINI,
    apiKey: process.env.GOOGLE_API_KEY,
    modelName: 'gemini-2.5-flash'
  },
  database: {
    type: 'postgres',
    clientInstance: new Pool({ connectionString: process.env.DATABASE_URL }),
    tableName: 'document',
    columnMap: { content: 'content', metadata: 'metadata', vector: 'vector' }
  },
  chunking: {
    strategy: 'recursive',
    chunkSize: 1000,
    chunkOverlap: 200
  },
  retrieval: { strategy: 'hybrid' },
  reranking: { enabled: true, windowSize: 20, topN: 5 }
});

// Ingest
await client.ingestDocuments('./docs');

// Query
const res = await client.queryRAG('What is the vacation policy?');
console.log(res.answer);

// Stream
const stream = await client.queryRAG('Summarize the policy', null, true);
for await (const chunk of stream) process.stdout.write(chunk.delta || '');
Enter fullscreen mode Exit fullscreen mode

Notice how the embedding provider is OpenAI and the LLM is Gemini. You can mix and match freely. Want to switch to Anthropic for generation? Change one line:

llm: {
  provider: ProviderType.ANTHROPIC,
  apiKey: process.env.ANTHROPIC_API_KEY,
  modelName: 'claude-sonnet-4-20250514'
}
Enter fullscreen mode Exit fullscreen mode

Your application code stays identical.

The Full Config — Everything That's Swappable

The quick start above shows a partial config. Here's the full picture — every stage of the RAG pipeline you can control:

Pipeline Stage What You Configure Options
Embedding Provider, model, dimensions, API key OpenAI, Gemini, Ollama, HuggingFace
LLM Provider, model, temperature, max tokens OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
Vector Store Backend, connection, table/collection PostgreSQL (pgvector), Prisma, ChromaDB, Qdrant, Milvus
Chunking Strategy, chunk size, overlap Recursive (character-aware) or Agentic (LLM-driven semantic)
Retrieval Search strategy Naive, HyDE, Multi-Query, Hybrid RRF, MMR
Reranking Enable/disable, window size, top N LLM-based reordering of retrieved chunks
Memory Backend, max messages, session config In-memory, Redis, PostgreSQL
Observability Enable/disable, storage path SQLite-backed traces + web dashboard
Metadata Enrichment Per-chunk summaries, keywords, hypothetical Qs Generated at ingestion time
Query Planning Grounding strictness, context assembly How strictly answers must cite retrieved text
Streaming Toggle per query Unified async generator across all providers
Ingestion File/directory, format handling PDF, DOCX, XLSX, TXT, Markdown

Prototype with Chroma + Ollama on your laptop. Ship with Postgres + OpenAI in prod. Your app code doesn't change.

Features That Matter in Production

Agentic Chunking

Instead of blindly splitting by character count, Vectra can use an LLM to split documents into semantic propositions:

chunking: {
  strategy: 'agentic',
  agenticLlm: {
    provider: ProviderType.OPENAI,
    apiKey: process.env.OPENAI_API_KEY,
    modelName: 'gpt-4o-mini'
  }
}
Enter fullscreen mode Exit fullscreen mode

This makes a huge difference for policy documents, legal text, and anything with complex structure.

Built-in Observability

observability: {
  enabled: true,
  sqlitePath: 'vectra-observability.db'
}
Enter fullscreen mode Exit fullscreen mode

Then run vectra dashboard to get a local web UI showing ingestion latency, query traces, retrieval performance, and chat sessions.

Conversation Memory

memory: {
  enabled: true,
  type: 'redis',
  maxMessages: 20,
  redis: {
    clientInstance: redisClient,
    keyPrefix: 'vectra:chat:'
  }
}
Enter fullscreen mode Exit fullscreen mode

Pass a sessionId and Vectra maintains multi-turn context automatically.

Evaluation

await client.evaluate([
  { question: 'Capital of France?', expectedGroundTruth: 'Paris' }
]);
Enter fullscreen mode Exit fullscreen mode

Built-in faithfulness and relevance metrics. Know if your pipeline is actually working before shipping.

CLI

# Ingest from terminal
vectra ingest ./docs --config=./config.json

# Query from terminal
vectra query "What is our leave policy?" --config=./config.json --stream

# Interactive config builder
vectra webconfig

# Observability dashboard
vectra dashboard
Enter fullscreen mode Exit fullscreen mode

Why I Built This

I'm a solo dev who kept running into the same problem: every RAG project started with hours of plumbing before I could write any actual application logic. And when requirements changed (they always do), switching providers meant touching code everywhere.

Vectra's design principle is simple: RAG is a pipeline, not a pile of libraries. Configure the pipeline once. Change any piece without touching the rest.

Numbers

~4,500 downloads across npm and PyPI. 8 stars on GitHub. It's early, but the developers using it are using it because it actually solves the problem.

Links

Star the repo if this resonates. Open an issue if something's broken. PRs welcome.


Built by @iamabhishek-n

Top comments (0)