DEV Community

Philip Solobay
Philip Solobay

Posted on

RAGGuard: Filter During Vector Search, Not After Retrieval

If you're building a RAG application with document-level permissions, you've probably implemented something like this:

  1. User makes a query
  2. Retrieve top-k documents from vector DB
  3. Filter out documents the user shouldn't see
  4. Send remaining docs to LLM

The problem? By step 3, unauthorized documents have already been retrieved. They've hit your retrieval layer, been processed, and potentially logged.

Enter RAGGuard

I built RAGGuard to fix this. Instead of post-retrieval filtering, it translates permission policies into native vector database filters. Unauthorized documents are never retrieved in the first place.

from ragguard.langchain import SecureRetriever

retriever = SecureRetriever(
    base_retriever=your_retriever,
    policy=your_policy
)

# Filtered at the DB level - zero unauthorized exposure
docs = retriever.get_relevant_documents(query, user_context=user)
Enter fullscreen mode Exit fullscreen mode

What it supports

  • 14 Vector DBs: Qdrant, ChromaDB, Pinecone, pgvector, Weaviate, Milvus, and more
  • Any Auth: OPA, Cerbos, OpenFGA, or custom RBAC
  • Frameworks: LangChain, LlamaIndex, LangGraph

Why it matters

  • Compliance (HIPAA, SOC2, GDPR)
  • Multi-tenant isolation
  • Blocks 19/19 tested attack patterns

Get started

pip install ragguard
Enter fullscreen mode Exit fullscreen mode

Open source (Apache 2.0). Feedback welcome!

Top comments (0)