OpenRAG from LinaGora setup on my laptop using the source code!
🔍 Exploring LinaGora’s OpenRAG
Recently, I delved into the GitHub repository for LinaGora’s OpenRAG. As is often the case when encountering a promising open-source project, my immediate curiosity was piqued, prompting the desire to set it up and test it locally on my machine.
💡 Understanding the Value Proposition
Before embarking on the technical steps of installing and testing the solution, it’s crucial to first understand what LinaGora OpenRAG claims to provide. At its core, OpenRAG is designed to offer a complete, open-source framework for implementing a Retrieval-Augmented Generation (RAG) system.
This framework typically aims to enhance the factual accuracy and relevance of large language model (LLM) responses by integrating an external knowledge source (retrieval) with the generative capabilities of the LLM (generation). LinaGora OpenRAG specifically claims to provide a ready-to-use, modular, and customizable solution for building highly effective RAG applications, promising features like:
- Ease of integration with various data sources.
- Optimized retrieval methods for better context sourcing.
- A full stack implementation from ingestion to generation.
The GitHub Projet
The GitHub project page (link provided below) explains the projet;
OpenRag is a lightweight, modular and extensible Retrieval-Augmented Generation (RAG) framework designed to explore and test advanced RAG techniques — 100% open source and focused on experimentation, not lock-in.
Built by the Linagora, OpenRag offers a sovereign-by-design alternative to mainstream RAG stacks.
✨ Key Features (excerpt of GitHub)
📁 Rich File Format Support
OpenRag supports a comprehensive range of file formats for seamless document ingestion:
- Text Files: txt, md
- Document Files: pdf, docx, doc, pptx - Advanced PDF parsing with OCR support and Office document processing
- Audio Files: wav, mp3, mp4, ogg, flv, wma, aac - Audio transcription and content extraction
- Images: png, jpeg, jpg, svg - Vision Language Model (VLM) powered image captioning and analysis
- All files are intelligently converted to Markdown format with images replaced by AI-generated captions, ensuring consistent processing across all document types.
🎛️ Native Web-Based Indexer UI
Experience intuitive document management through our built-in web interface.
Indexer UI Features
- Drag-and-drop file upload with batch processing capabilities
- Real-time indexing progress monitoring and status updates
- Admin Dashbord to monitore RAG components (Indexer, VectorDB, TaskStateManager, etc)
- Partition management — organize documents into logical collections
- Visual document preview and metadata inspection
- Search and filtering capabilities for indexed content
🗂️ Partition-Based Architecture
Organize your knowledge base with flexible partition management:
- Multi-tenant support — isolate different document collections
💬 Interactive Chat UI with Source Attribution
Engage with your documents through our sophisticated chat interface:
Chat UI Features
- Chainlit-powered UI — modern, responsive chat experience
- Source transparency — every response includes relevant document references
🔌 OpenAI API Compatibility
OpenRag API is tailored to be compatible with the OpenAI format (see the openai-compatibility section for more details), enabling seamless integration of your deployed RAG into popular frontends and workflows such as OpenWebUI, LangChain, N8N, and more. This ensures flexibility and ease of adoption without requiring custom adapters.
Summary of features
- Drop-in replacement for OpenAI API endpoints
- Compatible with popular frontends like OpenWebUI, LangChain, N8N, and more
- Authentication support — secure your API with token-based auth
⚡ Distributed Ray Deployment
Scale your RAG pipeline across multiple machines and GPUs.
Distributed Ray Deployment
- Horizontal scaling — distribute processing across worker nodes
- GPU acceleration — optimize inference across available hardware
- Resource management — intelligent allocation of compute resources
- Monitoring dashboard — real-time cluster health and performance metrics
- See the section on distributed deployment in a ray cluster for more details
🔍 Advanced Retrieval & Reranking
OpenRag Leverages state-of-the-art retrieval techniques for superior accuracy.
Implemented advanced retrieval techniques
- Hybrid search — combines semantic similarity with BM25 keyword matching
- Contextual retrieval — Anthropic’s technique for enhanced chunk relevance
- Multilingual reranking — using Alibaba-NLP/gte-multilingual-reranker-base
🚧 Coming Soon
- 📂 Expanded Format Support: Future updates will introduce compatibility with additional formats such as csv, odt, html, and other widely used open-source document types.
- 🔄 Unified Markdown Conversion: All files will continue to be converted to markdown using a consistent chunker. Format-specific chunkers (e.g., for CSV, HTML) are planned for enhanced processing.
- 🤖 Advanced Features: Upcoming releases will include Tool Calling, Agentic RAG, and MCP to elevate your RAG workflows.
- Enhanced Security: Ensures data encryption both during transit and at rest.
You want to test it?
Okay, now if you want to test this on your own, hereafter are the steps I realized with my local Podman/Podman desktop and Ollama.
- Clone the repository
git clone https://github.com/linagora/openrag.git
#
cd openrag
- Install “docker-compose”
brew install docker-compose
- Make a copy of the provided environment file template for your own settings;
cp .env.example .env
- From this point, I made changes according to my own configuration ⬇️
# .env
# LLM --> using Ollama
BASE_URL=http://host.docker.internal:11434/v1
API_KEY=
MODEL=granite4
# VLM (Visual Language Model) you can set it to the same as LLM if your LLM supports images
VLM_BASE_URL=
VLM_API_KEY=
VLM_MODEL=
## FastAPI App (no need to change it)
# APP_PORT=8080 # this is the forwarded port
# API_NUM_WORKERS=1 # Number of uvicorn workers for the FastAPI app
## To enable API HTTP authentication via HTTPBearer
# AUTH_TOKEN=sk-openrag-1234
# SAVE_UPLOADED_FILES=true # usefull for chainlit (chat interface) source viewing
# Set to true, it will mount chainlit chat ui to the fastapi app (Default: true)
## WITH_CHAINLIT_UI=true
# EMBEDDER ---> Using Ollama
EMBEDDER_MODEL_NAME=granite-embedding:latest # or other embedder from huggingface compatible with vllm
EMBEDDER_BASE_URL=http://host.docker.internal:11434/v1
EMBEDDER_API_KEY=EMPTY
# RETRIEVER
# RETRIEVER_TOP_K=20 # number of top documents to retrieve, before reranking (lower (~10) is faster on CPU | on GPU, you can try to increase the value (~40) ).
# RERANKER
#RERANKER_ENABLED=true # deactivate the reranker if your CPU is not powerful enough
RERANKER_ENABLED=false
RERANKER_MODEL=Alibaba-NLP/gte-multilingual-reranker-base # or jinaai/jina-reranker-v2-base-multilingual
# Prompts
PROMPTS_DIR=../prompts/example1
# Ray
RAY_DEDUP_LOGS=0 # turns off ray log deduplication that appear across multiple processes
RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 # # to enable logs at task level in ray dashboard
RAY_task_retry_delay_ms=3000
RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 # critical with the newest version of UV
# Indexer UI
## 1. replace X.X.X.X with localhost if launching local or with your server IP
## 2. Used by the frondend. Replace APP_PORT (8080 by default) with the actual port number of your FastAPI backend
## 3. Replace INDEXERUI_PORT with its value in the INDEXERUI_URL variable
INCLUDE_CREDENTIALS=false # set true if fastapi authentification is enabled, i.e AUTH_TOKEN is set
INDEXERUI_PORT=8060 # Port to expose the Indexer UI (default is 3042)
#### YOU can let it as is or adapt it to your http://localhost:8060/ if errors are generated
# INDEXERUI_URL='http://localhost:8060' # Update X.X.X.X to localhost
# API_BASE_URL='http://localhost:8080' # Update X.X.X.X to localhost and APP_PORT to 8080
INDEXERUI_URL='http://X.X.X.X:INDEXERUI_PORT'
API_BASE_URL='http://X.X.X.X:APP_PORT' # Base URL of your FastAPI backend.
# LOGGING
LOG_LEVEL=DEBUG # See possible values https://loguru.readthedocs.io/en/stable/api/logger.html
- Adapt the “extern/infinity.yaml” to your environment (CPU for me)
x-reranker: &reranker_template
networks:
default:
aliases:
- reranker
volumes:
- ${VLLM_CACHE:-/root/.cache/huggingface}:/app/.cache/huggingface # Model weights for RAG
# ports:
# - ${RERANKER_PORT:-7997}:${RERANKER_PORT:-7997}
services:
reranker:
<<: *reranker_template
image: michaelf34/infinity
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command: >
v2
--model-id ${RERANKER_MODEL:-Alibaba-NLP/gte-multilingual-reranker-base}
--port ${RERANKER_PORT:-7997}
profiles:
- ''
reranker-cpu:
<<: *reranker_template
image: michaelf34/infinity:latest-cpu
platform: linux/amd64
deploy: {}
command: >
v2
--engine torch
--model-id ${RERANKER_MODEL:-Alibaba-NLP/gte-multilingual-reranker-base}
--port ${RERANKER_PORT:-7997}
profiles:
- 'cpu'
- Once these tasks done, run the whole configuration for CPU (again, my configuration)
# CPU deployment
docker compose --profile cpu up -d
# docker compose --profile cpu down # to stop the application
# the URL is --> http://localhost:8060
I will do more tests and keep you posted 📨
Conclusion
These initial steps represent my first hands-on engagement with LinaGora’s OpenRAG framework. This entire domain — the development of LLM-agnostic, and sovereign Retrieval-Augmented Generation (RAG) solutions — is evolving at a remarkably fast pace. While I maintain a clear focus on experimenting with the code and tools to gain a pragmatic understanding of the landscape, this exploration is purely for oversight and learning. It offers insights into current best practices and architectural trends, without constituting an immediate endorsement or commitment to production deployment. Getting hands-on ensures that I stay abreast of the powerful capabilities emerging in this rapidly advancing field.
Links
- Main site page: https://open-rag.ai/
- GitHub repository: https://github.com/linagora/openrag





Top comments (0)