DEV Community

Laxmikanta Nayak
Laxmikanta Nayak

Posted on

Why I built ragwise: pip-installable RAG with hybrid search, streaming, and agent tools by default

I spent a weekend integrating a RAG pipeline into a client project using LangChain.

By Sunday evening I had 200 lines of boilerplate, a requirements.txt with 47 packages, and retrieval quality that was noticeably worse than I knew it could be. Not because LangChain is bad. Because I'd made the same mistake everyone makes: I left dense-only search as the default.

Dense search misses exact keyword queries. Error codes. Product names. Version numbers. "Why is my API returning error 0x80004005" does not embed well. BM25 catches it immediately. Every benchmark where you test on real user queries shows the same thing — hybrid wins.

That's the insight behind ragwise. Hybrid search should be the default, not the opt-in.

The problem with existing options

There are two kinds of RAG libraries right now:

Easy to start, weak on retrieval — LangChain, LlamaIndex. Low barrier to entry, large ecosystems, but dense-only by default. Hybrid is opt-in, which means most teams ship without it because they never get around to configuring it.

Strong on retrieval, heavy to run — RAGFlow. Ships hybrid search by default. Makes the right call. But requires Docker, Docker Compose, and 16GB RAM minimum. You're running a platform, not importing a library.

There was no pip-installable library that shipped hybrid search as the default. So I built one.

What ragwise looks like

pip install ragwise

from ragwise import RAG

async with RAG(llm="openai/gpt-4o-mini") as rag:
    await rag.ingest("./docs/")
    answer = await rag.query("What is the refund policy?")
    print(answer.text)
    print(answer.citations)
Enter fullscreen mode Exit fullscreen mode

That's it. BM25 + dense retrieval, fused with Reciprocal Rank Fusion, runs on that query() call automatically. No configuration. No extra imports.

Hybrid search running — answer with citations from 4 docs in 3487ms

How the hybrid search works

Each query triggers two retrievals in parallel:

  1. Dense search — embed the query, find nearest neighbours by cosine similarity
  2. Sparse search (BM25) — tokenise the query, score documents by term frequency

Then Reciprocal Rank Fusion merges both ranked lists:

RRF(doc) = Σ 1 / (60 + rank_i(doc))
Enter fullscreen mode Exit fullscreen mode

Rank-based, so no score normalisation needed between BM25 and cosine scores. A document that ranks 3rd in both lists beats one that ranks 1st in only one. Simple, robust, needs zero training data.

Streaming

Most RAG in 2026 is embedded in UIs where users expect to see text stream in real time.

async for token in rag.stream_query("What changed in v2?"):
    print(token, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Works with OpenAI, Anthropic, and Ollama. For custom LLMs that don't implement streaming, it falls back gracefully to yielding the full response as one token — no errors, no code changes needed.

Streaming response — tokens arriving in real time

RAG as an agent tool

The market has shifted from "chatbot with RAG" to "agent with RAG as one of its tools." ragwise ships ready-made tool definitions for both Anthropic and OpenAI agents:

from ragwise.agent import as_claude_tool

tool = as_claude_tool(rag)           # Anthropic-compatible tool schema
results = await rag.search("query")  # raw retrieval, no generation
Enter fullscreen mode Exit fullscreen mode

Your agent decides when to search. ragwise handles the retrieval. The tool schema is pre-built — you don't write it.

Agent tools — schema table and raw search results

Multi-tenant isolation

SaaS products need org A's documents to never appear in org B's queries. ragwise handles this with tenant_id at ingest and filtering at query time:

await rag.ingest("./org_a/", tenant_id="org_a")
await rag.ingest("./org_b/", tenant_id="org_b")

answer = await rag.query(
    "policy?",
    config=QueryConfig(tenant_id="org_a")
)
Enter fullscreen mode Exit fullscreen mode

Filtering happens post-retrieval (after RRF fusion) and works with all three store backends without schema changes.

Multi-tenant isolation — acme_corp and globex_inc fully isolated

The store upgrade path

The decision I'm most happy with: the store is a single string, and the API is identical across all three backends.

# Dev — volatile, zero setup
RAG(store="memory")

# Persistent dev — embedded, no server, survives restarts
RAG(store="lance://./ragwise-index")

# Production — PostgreSQL + pgvector
RAG(store="postgresql://user:pass@localhost/mydb")
Enter fullscreen mode Exit fullscreen mode

Same application code. Only the connection string changes.

Store upgrade path — three backends, one API

Incremental indexing

Production teams call ingest() repeatedly as their doc corpus changes. Without incremental indexing, every call re-embeds everything.

result1 = await rag.ingest("./docs/")   # indexes 200 files
result2 = await rag.ingest("./docs/")   # 0 files re-indexed (hashes unchanged)
Enter fullscreen mode Exit fullscreen mode

SHA-256 hash per file. Skip if unchanged. Simple.

How it compares

ragwise LangChain LlamaIndex RAGFlow
Lines to get started 4 40+ 20+ Docker
Hybrid search default opt-in ✅ Docker
pip install, no server
Async-first partial partial
Streaming partial partial
Agent tool built-in
Multi-tenant isolation
Incremental indexing opt-in

Try it

pip install ragwise
Enter fullscreen mode Exit fullscreen mode

If you build something with it or the API feels wrong anywhere, open an issue or start a discussion. It's still early and I want the feedback.

Top comments (0)