I got tired of rebuilding RAG pipelines every time I wanted to change a provider. So I built an SDK where the entire pipeline is config-driven. 4.5K+ downloads and counting.
Every RAG tutorial follows the same pattern: pick an embedding model, pick a vector DB, pick an LLM, write a bunch of glue code, and hope it works. Three months later, you need to swap from OpenAI to Gemini, or move from Chroma to Postgres — and suddenly you're rewriting half your backend.
I built Vectra to fix that. It's an open-source, provider-agnostic SDK for building full RAG pipelines where every component is swappable through config.
npm install vectra-js
# or
pip install vectra-rag-py
What It Actually Does
Vectra covers the full pipeline:
Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
All of it is configured through a single object. Here's a complete working example:
const { VectraClient, ProviderType } = require('vectra-js');
const { Pool } = require('pg');
const client = new VectraClient({
embedding: {
provider: ProviderType.OPENAI,
apiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-3-small'
},
llm: {
provider: ProviderType.GEMINI,
apiKey: process.env.GOOGLE_API_KEY,
modelName: 'gemini-2.5-flash'
},
database: {
type: 'postgres',
clientInstance: new Pool({ connectionString: process.env.DATABASE_URL }),
tableName: 'document',
columnMap: { content: 'content', metadata: 'metadata', vector: 'vector' }
},
chunking: {
strategy: 'recursive',
chunkSize: 1000,
chunkOverlap: 200
},
retrieval: { strategy: 'hybrid' },
reranking: { enabled: true, windowSize: 20, topN: 5 }
});
// Ingest
await client.ingestDocuments('./docs');
// Query
const res = await client.queryRAG('What is the vacation policy?');
console.log(res.answer);
// Stream
const stream = await client.queryRAG('Summarize the policy', null, true);
for await (const chunk of stream) process.stdout.write(chunk.delta || '');
Notice how the embedding provider is OpenAI and the LLM is Gemini. You can mix and match freely. Want to switch to Anthropic for generation? Change one line:
llm: {
provider: ProviderType.ANTHROPIC,
apiKey: process.env.ANTHROPIC_API_KEY,
modelName: 'claude-sonnet-4-20250514'
}
Your application code stays identical.
The Full Config — Everything That's Swappable
The quick start above shows a partial config. Here's the full picture — every stage of the RAG pipeline you can control:
| Pipeline Stage | What You Configure | Options |
|---|---|---|
| Embedding | Provider, model, dimensions, API key | OpenAI, Gemini, Ollama, HuggingFace |
| LLM | Provider, model, temperature, max tokens | OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace |
| Vector Store | Backend, connection, table/collection | PostgreSQL (pgvector), Prisma, ChromaDB, Qdrant, Milvus |
| Chunking | Strategy, chunk size, overlap | Recursive (character-aware) or Agentic (LLM-driven semantic) |
| Retrieval | Search strategy | Naive, HyDE, Multi-Query, Hybrid RRF, MMR |
| Reranking | Enable/disable, window size, top N | LLM-based reordering of retrieved chunks |
| Memory | Backend, max messages, session config | In-memory, Redis, PostgreSQL |
| Observability | Enable/disable, storage path | SQLite-backed traces + web dashboard |
| Metadata Enrichment | Per-chunk summaries, keywords, hypothetical Qs | Generated at ingestion time |
| Query Planning | Grounding strictness, context assembly | How strictly answers must cite retrieved text |
| Streaming | Toggle per query | Unified async generator across all providers |
| Ingestion | File/directory, format handling | PDF, DOCX, XLSX, TXT, Markdown |
Prototype with Chroma + Ollama on your laptop. Ship with Postgres + OpenAI in prod. Your app code doesn't change.
Features That Matter in Production
Agentic Chunking
Instead of blindly splitting by character count, Vectra can use an LLM to split documents into semantic propositions:
chunking: {
strategy: 'agentic',
agenticLlm: {
provider: ProviderType.OPENAI,
apiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4o-mini'
}
}
This makes a huge difference for policy documents, legal text, and anything with complex structure.
Built-in Observability
observability: {
enabled: true,
sqlitePath: 'vectra-observability.db'
}
Then run vectra dashboard to get a local web UI showing ingestion latency, query traces, retrieval performance, and chat sessions.
Conversation Memory
memory: {
enabled: true,
type: 'redis',
maxMessages: 20,
redis: {
clientInstance: redisClient,
keyPrefix: 'vectra:chat:'
}
}
Pass a sessionId and Vectra maintains multi-turn context automatically.
Evaluation
await client.evaluate([
{ question: 'Capital of France?', expectedGroundTruth: 'Paris' }
]);
Built-in faithfulness and relevance metrics. Know if your pipeline is actually working before shipping.
CLI
# Ingest from terminal
vectra ingest ./docs --config=./config.json
# Query from terminal
vectra query "What is our leave policy?" --config=./config.json --stream
# Interactive config builder
vectra webconfig
# Observability dashboard
vectra dashboard
Why I Built This
I'm a solo dev who kept running into the same problem: every RAG project started with hours of plumbing before I could write any actual application logic. And when requirements changed (they always do), switching providers meant touching code everywhere.
Vectra's design principle is simple: RAG is a pipeline, not a pile of libraries. Configure the pipeline once. Change any piece without touching the rest.
Numbers
~4,500 downloads across npm and PyPI. 8 stars on GitHub. It's early, but the developers using it are using it because it actually solves the problem.
Links
- GitHub: github.com/iamabhishek-n/vectra-js
- npm: npmjs.com/package/vectra-js
- PyPI: pypi.org/project/vectra-rag-py
- Docs: vectra.thenxtgenagents.com
Star the repo if this resonates. Open an issue if something's broken. PRs welcome.
Built by @iamabhishek-n
Top comments (0)