DEV Community

Alain Airom
Alain Airom

Posted on

Setting up Linagora’s OpenRAG locally

OpenRAG from LinaGora setup on my laptop using the source code!

🔍 Exploring LinaGora’s OpenRAG

Recently, I delved into the GitHub repository for LinaGora’s OpenRAG. As is often the case when encountering a promising open-source project, my immediate curiosity was piqued, prompting the desire to set it up and test it locally on my machine.

💡 Understanding the Value Proposition

Before embarking on the technical steps of installing and testing the solution, it’s crucial to first understand what LinaGora OpenRAG claims to provide. At its core, OpenRAG is designed to offer a complete, open-source framework for implementing a Retrieval-Augmented Generation (RAG) system.

This framework typically aims to enhance the factual accuracy and relevance of large language model (LLM) responses by integrating an external knowledge source (retrieval) with the generative capabilities of the LLM (generation). LinaGora OpenRAG specifically claims to provide a ready-to-use, modular, and customizable solution for building highly effective RAG applications, promising features like:

  • Ease of integration with various data sources.
  • Optimized retrieval methods for better context sourcing.
  • A full stack implementation from ingestion to generation.

The GitHub Projet

The GitHub project page (link provided below) explains the projet;

OpenRag is a lightweight, modular and extensible Retrieval-Augmented Generation (RAG) framework designed to explore and test advanced RAG techniques — 100% open source and focused on experimentation, not lock-in.

Built by the Linagora, OpenRag offers a sovereign-by-design alternative to mainstream RAG stacks.

✨ Key Features (excerpt of GitHub)

📁 Rich File Format Support

OpenRag supports a comprehensive range of file formats for seamless document ingestion:

  • Text Files: txt, md
  • Document Files: pdf, docx, doc, pptx - Advanced PDF parsing with OCR support and Office document processing
  • Audio Files: wav, mp3, mp4, ogg, flv, wma, aac - Audio transcription and content extraction
  • Images: png, jpeg, jpg, svg - Vision Language Model (VLM) powered image captioning and analysis
  • All files are intelligently converted to Markdown format with images replaced by AI-generated captions, ensuring consistent processing across all document types.

🎛️ Native Web-Based Indexer UI

Experience intuitive document management through our built-in web interface.

Indexer UI Features

  • Drag-and-drop file upload with batch processing capabilities
  • Real-time indexing progress monitoring and status updates
  • Admin Dashbord to monitore RAG components (Indexer, VectorDB, TaskStateManager, etc)
  • Partition management — organize documents into logical collections
  • Visual document preview and metadata inspection
  • Search and filtering capabilities for indexed content

🗂️ Partition-Based Architecture

Organize your knowledge base with flexible partition management:

  • Multi-tenant support — isolate different document collections

💬 Interactive Chat UI with Source Attribution

Engage with your documents through our sophisticated chat interface:

Chat UI Features

  • Chainlit-powered UI — modern, responsive chat experience
  • Source transparency — every response includes relevant document references

🔌 OpenAI API Compatibility

OpenRag API is tailored to be compatible with the OpenAI format (see the openai-compatibility section for more details), enabling seamless integration of your deployed RAG into popular frontends and workflows such as OpenWebUI, LangChain, N8N, and more. This ensures flexibility and ease of adoption without requiring custom adapters.

Summary of features

  • Drop-in replacement for OpenAI API endpoints
  • Compatible with popular frontends like OpenWebUI, LangChain, N8N, and more
  • Authentication support — secure your API with token-based auth

⚡ Distributed Ray Deployment

Scale your RAG pipeline across multiple machines and GPUs.

Distributed Ray Deployment

  • Horizontal scaling — distribute processing across worker nodes
  • GPU acceleration — optimize inference across available hardware
  • Resource management — intelligent allocation of compute resources
  • Monitoring dashboard — real-time cluster health and performance metrics
  • See the section on distributed deployment in a ray cluster for more details

🔍 Advanced Retrieval & Reranking

OpenRag Leverages state-of-the-art retrieval techniques for superior accuracy.

Implemented advanced retrieval techniques

  • Hybrid search — combines semantic similarity with BM25 keyword matching
  • Contextual retrieval — Anthropic’s technique for enhanced chunk relevance
  • Multilingual reranking — using Alibaba-NLP/gte-multilingual-reranker-base

🚧 Coming Soon

  • 📂 Expanded Format Support: Future updates will introduce compatibility with additional formats such as csv, odt, html, and other widely used open-source document types.
  • 🔄 Unified Markdown Conversion: All files will continue to be converted to markdown using a consistent chunker. Format-specific chunkers (e.g., for CSV, HTML) are planned for enhanced processing.
  • 🤖 Advanced Features: Upcoming releases will include Tool Calling, Agentic RAG, and MCP to elevate your RAG workflows.
  • Enhanced Security: Ensures data encryption both during transit and at rest.

You want to test it?

Okay, now if you want to test this on your own, hereafter are the steps I realized with my local Podman/Podman desktop and Ollama.

  • Clone the repository
git clone https://github.com/linagora/openrag.git
#
cd openrag
Enter fullscreen mode Exit fullscreen mode
  • Install “docker-compose”
brew install docker-compose
Enter fullscreen mode Exit fullscreen mode
  • Make a copy of the provided environment file template for your own settings;
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode
  • From this point, I made changes according to my own configuration ⬇️
# .env

# LLM --> using Ollama
BASE_URL=http://host.docker.internal:11434/v1
API_KEY=
MODEL=granite4

# VLM (Visual Language Model) you can set it to the same as LLM if your LLM supports images
VLM_BASE_URL=
VLM_API_KEY=
VLM_MODEL=

## FastAPI App (no need to change it)
# APP_PORT=8080 # this is the forwarded port
# API_NUM_WORKERS=1 # Number of uvicorn workers for the FastAPI app

## To enable API HTTP authentication via HTTPBearer
# AUTH_TOKEN=sk-openrag-1234

# SAVE_UPLOADED_FILES=true # usefull for chainlit (chat interface) source viewing

# Set to true, it will mount chainlit chat ui to the fastapi app (Default: true)
## WITH_CHAINLIT_UI=true

# EMBEDDER ---> Using Ollama
EMBEDDER_MODEL_NAME=granite-embedding:latest # or other embedder from huggingface compatible with vllm 
EMBEDDER_BASE_URL=http://host.docker.internal:11434/v1
EMBEDDER_API_KEY=EMPTY


# RETRIEVER
# RETRIEVER_TOP_K=20 # number of top documents to retrieve, before reranking (lower (~10) is faster on CPU | on GPU, you can try to increase the value (~40) ).

# RERANKER
#RERANKER_ENABLED=true # deactivate the reranker if your CPU is not powerful enough
RERANKER_ENABLED=false
RERANKER_MODEL=Alibaba-NLP/gte-multilingual-reranker-base # or jinaai/jina-reranker-v2-base-multilingual

# Prompts
PROMPTS_DIR=../prompts/example1

# Ray
RAY_DEDUP_LOGS=0 # turns off ray log deduplication that appear across multiple processes
RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 # # to enable logs at task level in ray dashboard
RAY_task_retry_delay_ms=3000
RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 # critical with the newest version of UV

# Indexer UI 
## 1. replace X.X.X.X with localhost if launching local or with your server IP
## 2. Used by the frondend. Replace APP_PORT (8080 by default) with the actual port number of your FastAPI backend
## 3. Replace INDEXERUI_PORT with its value in the INDEXERUI_URL variable

INCLUDE_CREDENTIALS=false                       # set true if fastapi authentification is enabled, i.e AUTH_TOKEN is set
INDEXERUI_PORT=8060                             # Port to expose the Indexer UI (default is 3042)
#### YOU can let it as is or adapt it to your http://localhost:8060/ if errors are generated
# INDEXERUI_URL='http://localhost:8060'           # Update X.X.X.X to localhost
# API_BASE_URL='http://localhost:8080'            # Update X.X.X.X to localhost and APP_PORT to 8080
INDEXERUI_URL='http://X.X.X.X:INDEXERUI_PORT'                 
API_BASE_URL='http://X.X.X.X:APP_PORT'          # Base URL of your FastAPI backend. 

# LOGGING
LOG_LEVEL=DEBUG # See possible values https://loguru.readthedocs.io/en/stable/api/logger.html
Enter fullscreen mode Exit fullscreen mode
  • Adapt the “extern/infinity.yaml” to your environment (CPU for me)
x-reranker: &reranker_template
  networks:
    default:
      aliases:
        - reranker
  volumes:
    - ${VLLM_CACHE:-/root/.cache/huggingface}:/app/.cache/huggingface # Model weights for RAG
  # ports:
  #   - ${RERANKER_PORT:-7997}:${RERANKER_PORT:-7997}

services:
  reranker:
    <<: *reranker_template
    image: michaelf34/infinity
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all 
              capabilities: [gpu]
    command: >
      v2
      --model-id ${RERANKER_MODEL:-Alibaba-NLP/gte-multilingual-reranker-base}
      --port ${RERANKER_PORT:-7997}
    profiles:
      - ''

  reranker-cpu:
    <<: *reranker_template
    image: michaelf34/infinity:latest-cpu
    platform: linux/amd64
    deploy: {}
    command: >
      v2
      --engine torch
      --model-id ${RERANKER_MODEL:-Alibaba-NLP/gte-multilingual-reranker-base}
      --port ${RERANKER_PORT:-7997}
    profiles:
      - 'cpu'
Enter fullscreen mode Exit fullscreen mode
  • Once these tasks done, run the whole configuration for CPU (again, my configuration)
# CPU deployment
docker compose --profile cpu up -d
# docker compose --profile cpu down # to stop the application

# the URL is --> http://localhost:8060
Enter fullscreen mode Exit fullscreen mode

I will do more tests and keep you posted 📨

Conclusion

These initial steps represent my first hands-on engagement with LinaGora’s OpenRAG framework. This entire domain — the development of LLM-agnostic, and sovereign Retrieval-Augmented Generation (RAG) solutions — is evolving at a remarkably fast pace. While I maintain a clear focus on experimenting with the code and tools to gain a pragmatic understanding of the landscape, this exploration is purely for oversight and learning. It offers insights into current best practices and architectural trends, without constituting an immediate endorsement or commitment to production deployment. Getting hands-on ensures that I stay abreast of the powerful capabilities emerging in this rapidly advancing field.

Links

Top comments (0)