Why I built ragwise: pip-installable RAG with hybrid search, streaming, and agent tools by default

#rag #llm #python #opensource

I spent a weekend integrating a RAG pipeline into a client project using LangChain.

By Sunday evening I had 200 lines of boilerplate, a requirements.txt with 47 packages, and retrieval quality that was noticeably worse than I knew it could be. Not because LangChain is bad. Because I'd made the same mistake everyone makes: I left dense-only search as the default.

Dense search misses exact keyword queries. Error codes. Product names. Version numbers. "Why is my API returning error 0x80004005" does not embed well. BM25 catches it immediately. Every benchmark where you test on real user queries shows the same thing — hybrid wins.

That's the insight behind ragwise. Hybrid search should be the default, not the opt-in.

The problem with existing options

There are two kinds of RAG libraries right now:

Easy to start, weak on retrieval — LangChain, LlamaIndex. Low barrier to entry, large ecosystems, but dense-only by default. Hybrid is opt-in, which means most teams ship without it because they never get around to configuring it.

Strong on retrieval, heavy to run — RAGFlow. Ships hybrid search by default. Makes the right call. But requires Docker, Docker Compose, and 16GB RAM minimum. You're running a platform, not importing a library.

There was no pip-installable library that shipped hybrid search as the default. So I built one.

What ragwise looks like

pip install ragwise

from ragwise import RAG

async with RAG(llm="openai/gpt-4o-mini") as rag:
    await rag.ingest("./docs/")
    answer = await rag.query("What is the refund policy?")
    print(answer.text)
    print(answer.citations)

That's it. BM25 + dense retrieval, fused with Reciprocal Rank Fusion, runs on that query() call automatically. No configuration. No extra imports.

How the hybrid search works

Each query triggers two retrievals in parallel:

Dense search — embed the query, find nearest neighbours by cosine similarity
Sparse search (BM25) — tokenise the query, score documents by term frequency

Then Reciprocal Rank Fusion merges both ranked lists:

RRF(doc) = Σ 1 / (60 + rank_i(doc))

Rank-based, so no score normalisation needed between BM25 and cosine scores. A document that ranks 3rd in both lists beats one that ranks 1st in only one. Simple, robust, needs zero training data.

Streaming

Most RAG in 2026 is embedded in UIs where users expect to see text stream in real time.

async for token in rag.stream_query("What changed in v2?"):
    print(token, end="", flush=True)

Works with OpenAI, Anthropic, and Ollama. For custom LLMs that don't implement streaming, it falls back gracefully to yielding the full response as one token — no errors, no code changes needed.

RAG as an agent tool

The market has shifted from "chatbot with RAG" to "agent with RAG as one of its tools." ragwise ships ready-made tool definitions for both Anthropic and OpenAI agents:

from ragwise.agent import as_claude_tool

tool = as_claude_tool(rag)           # Anthropic-compatible tool schema
results = await rag.search("query")  # raw retrieval, no generation

Your agent decides when to search. ragwise handles the retrieval. The tool schema is pre-built — you don't write it.

Multi-tenant isolation

SaaS products need org A's documents to never appear in org B's queries. ragwise handles this with tenant_id at ingest and filtering at query time:

await rag.ingest("./org_a/", tenant_id="org_a")
await rag.ingest("./org_b/", tenant_id="org_b")

answer = await rag.query(
    "policy?",
    config=QueryConfig(tenant_id="org_a")
)

Filtering happens post-retrieval (after RRF fusion) and works with all three store backends without schema changes.

The store upgrade path

The decision I'm most happy with: the store is a single string, and the API is identical across all three backends.

# Dev — volatile, zero setup
RAG(store="memory")

# Persistent dev — embedded, no server, survives restarts
RAG(store="lance://./ragwise-index")

# Production — PostgreSQL + pgvector
RAG(store="postgresql://user:pass@localhost/mydb")

Same application code. Only the connection string changes.

Incremental indexing

Production teams call ingest() repeatedly as their doc corpus changes. Without incremental indexing, every call re-embeds everything.

result1 = await rag.ingest("./docs/")   # indexes 200 files
result2 = await rag.ingest("./docs/")   # 0 files re-indexed (hashes unchanged)

SHA-256 hash per file. Skip if unchanged. Simple.

How it compares

	ragwise	LangChain	LlamaIndex	RAGFlow
Lines to get started	4	40+	20+	Docker
Hybrid search default	✅	❌	opt-in	✅ Docker
pip install, no server	✅	✅	✅	❌
Async-first	✅	partial	partial	❌
Streaming	✅	partial	partial	❌
Agent tool built-in	✅	❌	❌	❌
Multi-tenant isolation	✅	❌	❌	❌
Incremental indexing	✅	❌	opt-in	❌