Jangwook Kim

Posted on May 5 • Originally published at effloow.com

RAGFlow: Self-Host a Deep-Document RAG Engine

#rag #selfhosting #docker #aiagents

Most RAG tutorials show you how to chunk some text files and run similarity search. RAGFlow does something different: it reads your documents the way a human does — understanding table structure, recognizing figure captions, preserving heading hierarchy, and handling scanned PDFs with OCR — before a single token is sent to a vector store.

The project has accumulated 78K+ GitHub stars and is now in active use as a production knowledge layer at companies that can't afford hallucinated table values or context windows stuffed with broken PDF artifacts. Version 0.25.1 (released April 29, 2026) adds lazy loading for large PDFs, RESTful API unification, and DeepSeek v4 support on top of the agentic features and MCP server that shipped in v0.25.0.

This guide covers the full setup path: prerequisites, Docker Compose deployment, document ingestion, choosing a chunking strategy, and querying your knowledge base through the Python SDK. Effloow Lab verified the deployment configuration, Docker image structure, and API patterns against the upstream repository on 2026-05-05 (see data/lab-runs/ragflow-poc.md for notes).

What Makes RAGFlow Different From Basic RAG

Standard RAG pipelines treat every document as a flat text blob. Split at 512 tokens, embed, store, retrieve. The problem: PDF tables become comma-separated garbage. Slide decks lose their visual hierarchy. Scanned images are invisible. By the time those chunks reach your LLM, the signal is degraded.

RAGFlow's approach starts with DeepDoc, a built-in vision model pipeline that performs:

OCR — extracts text from scanned images and low-quality PDFs
TSR (Table Structure Recognition) — identifies table boundaries, headers, and cell relationships before chunking
DLR (Document Layout Recognition) — distinguishes headings, body text, figures, footers, and captions

The output isn't raw text. It's structured content with layout metadata, so the chunker knows not to split a table header from its rows, or a figure caption from the preceding paragraph.

Beyond parsing, RAGFlow ships with a built-in agent workflow engine, memory storage, sandbox code execution, an MCP server, and a web UI for managing knowledge bases — all in a single Docker Compose stack.

Prerequisites

Before you start, confirm your environment meets these requirements:

Requirement	Minimum	Confirmed by Lab
Docker	≥ 24.0.0	v29.2.0 ✅
Docker Compose	≥ v2.26.1	v5.0.2 ✅
RAM	16 GB	—
Disk space	50 GB free	—
Linux: vm.max_map_count	≥ 262144	—

The 16 GB RAM requirement comes from Elasticsearch's memory lock behavior. On lower-memory machines you can switch the document engine to infinity, which is lighter but less battle-tested at scale.

Linux-only step — set vm.max_map_count:

# Temporary (reverts on reboot)
sudo sysctl -w vm.max_map_count=262144

# Permanent
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

macOS and Windows Docker Desktop handle this internally. You can skip it if you're running Docker Desktop.

Deploy RAGFlow With Docker Compose

1. Clone and pin to the latest stable tag

git clone https://github.com/infiniflow/ragflow.git
cd ragflow
git checkout v0.25.1
cd docker

Pinning to v0.25.1 avoids pulling nightly builds that may include in-progress migrations. The latest Docker tag always tracks the most recent nightly, not the stable release.

2. Set secure passwords in .env

The default .env ships with obviously insecure credentials. Before running anything:

# Generate random passwords
openssl rand -hex 32  # use output for each password below

Edit docker/.env and change at minimum:

ELASTIC_PASSWORD=<your-strong-password>
MYSQL_PASSWORD=<your-strong-password>
MINIO_PASSWORD=<your-strong-password>

Important v0.25.0 change: RAGFlow switched the default MinIO image from the official Docker Hub image (which was deprecated) to pgsty/minio. If you're upgrading from v0.24.x, the MinIO container will be replaced automatically, but existing bucket data needs a manual migration step documented in the v0.25.0 release notes.

3. Start the CPU stack

# CPU-only (default, suitable for most setups)
docker compose --profile elasticsearch,cpu up -d

# Wait for healthy status (takes 2-3 minutes on first run)
docker compose ps

The elasticsearch profile starts Elasticsearch 9.x (upgraded from 8.x in v0.25.0) and MinIO alongside the main RAGFlow server. The full image is 3.4 GB, so the first pull takes time on slower connections.

Expected output once healthy:

NAME                    STATUS              PORTS
ragflow-ragflow-cpu-1   Up 2 minutes        0.0.0.0:80->80/tcp, 0.0.0.0:9380->9380/tcp
ragflow-es01-1          Up 2 minutes (healthy)
ragflow-minio-1         Up 2 minutes (healthy)
ragflow-mysql-1         Up 2 minutes (healthy)

Open http://localhost to reach the RAGFlow web UI.

4. (Optional) Switch document engine to Infinity

If your server has less than 16 GB RAM or you want faster cold starts:

# docker/.env
DOC_ENGINE=infinity

Then start with:

docker compose --profile infinity,cpu up -d

Infinity is RAGFlow's own lightweight vector database. It lacks Elasticsearch's full-text search depth but handles most knowledge-base workloads with lower memory overhead.

Understanding Chunking Strategies

This is where RAGFlow separates itself from commodity RAG setups. When you create a dataset (knowledge base), you choose a chunk template that tells DeepDoc how to parse and split your source documents.

Template	Best For	Chunk Logic	OCR
General	Mixed content, default choice	Heading-aware splits, respects paragraph boundaries	Yes
Naive	Plain text, simple docs	Fixed token windows with overlap	No
Paper	Academic PDFs, research papers	Abstract/section/reference aware splits	Yes
Book	Long-form documents, manuals	Chapter/section hierarchy preserved	Yes
Q&A	FAQ content, structured pairs	Matches Q/A pattern, one chunk per pair	No
Table	Spreadsheets, CSV, XLSX	Row-level granularity	No
Resume	HR pipelines, people search	Entity-aware (name, skills, dates)	Yes

The practical difference between General and Naive: With the General template, a PDF with a table of quarterly metrics stays intact as a structured chunk. The LLM receives the table with its column headers and row labels. With Naive, that same table likely gets split mid-row, and the numbers become meaningless without their column context.

For most developers starting out, General is the right default. Switch to Paper when your corpus is primarily academic PDFs, or Q&A if you're chunking existing FAQ exports from a support system.

Parent-Child Chunking (New in v0.25)

RAGFlow v0.25.0 introduced a parent-child chunking strategy. Large semantic units (paragraphs, sections) are stored as "parent" chunks while smaller, more precise sub-units are indexed for retrieval. When a query matches a child chunk, the system returns the broader parent context to the LLM. This addresses a common retrieval failure mode: the relevant sentence is retrieved, but without the surrounding context the answer is incomplete.

Ingest Documents and Run a Q&A Query

Create a dataset via Python SDK

Install the SDK:

pip install ragflow-sdk

Create a dataset with the General template and upload a PDF:

from ragflow_sdk import RAGFlow

# Connect to your local RAGFlow instance
rag = RAGFlow(
    api_key="<your-api-key>",      # Found under Settings → API in the web UI
    base_url="http://localhost:9380"
)

# Create a knowledge base (dataset) with General chunking
dataset = rag.create_dataset(
    name="company_docs",
    chunk_method="general",
    embedding_model="BAAI/bge-large-en-v1.5"
)

# Upload a PDF
with open("annual_report.pdf", "rb") as f:
    dataset.upload_documents([{
        "name": "annual_report.pdf",
        "blob": f.read()
    }])

print("Upload complete. RAGFlow will now parse and index the document.")
print("Parsing runs asynchronously — check status in the web UI or poll the API.")

Query your knowledge base

Once parsing finishes (visible in the UI under the dataset's document list):

# Create a chat assistant bound to the dataset
assistant = rag.create_chat(
    name="doc_qa",
    dataset_ids=[dataset.id]
)

# Open a session and ask a question
session = assistant.create_session()
response = session.ask(
    question="What were the Q3 revenue figures?",
    stream=False
)

print(response.content)
# RAGFlow returns the answer with source citations,
# including the exact page and chunk that supported the answer.

Use the REST API directly

For non-Python environments, the v0.25.1 RESTful API follows standard conventions:

# Create a dataset
curl -X POST http://localhost:9380/api/v1/datasets \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"name": "company_docs", "chunk_method": "general"}'

# Upload a document
curl -X POST "http://localhost:9380/api/v1/datasets/{dataset_id}/documents" \
  -H "Authorization: Bearer <your-api-key>" \
  -F "file=@annual_report.pdf"

# Query
curl -X POST "http://localhost:9380/api/v1/chats/{chat_id}/sessions/{session_id}/ask" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"question": "What were the Q3 revenue figures?"}'

The v0.25.1 API unification means all endpoints now follow consistent /api/v1/ prefixes and standard HTTP verbs. If you built integrations against v0.24.x, check the migration notes for renamed endpoints.

Enable the Built-In MCP Server

RAGFlow ships with an MCP server that exposes your knowledge bases as tools any MCP-compatible agent can call. This is particularly useful if you're building with Claude, LangChain agents, or any orchestration framework that speaks MCP.

Enable it in docker/docker-compose.yml by uncommenting the MCP flags:

services:
  ragflow-cpu:
    command:
      - --enable-mcpserver
      - --mcp-host=0.0.0.0
      - --mcp-port=9382
      - --mcp-base-url=http://127.0.0.1:9380
      - --mcp-script-path=/ragflow/mcp/server/server.py
      - --mcp-mode=self-host
      - --mcp-host-api-key=ragflow-<your-api-key>

Then restart the container:

docker compose --profile elasticsearch,cpu up -d ragflow-cpu

Your RAGFlow instance now exposes an MCP endpoint at http://localhost:9382. Any agent with MCP tool support can now call search_knowledge_base against your private document corpus without exposing it to a third-party API.

Agent Workflows in v0.25.0

Beyond basic Q&A, RAGFlow v0.25.0 introduced seven prebuilt ingestion pipeline templates for building agentic workflows. These are visual, node-based pipelines accessible from the web UI:

Retrieve → Rerank → Answer: Classic RAG with reranking step
Deep Research: Multi-turn retrieval with chain-of-thought reasoning
Data Analytics: connects to tabular datasets for SQL-style queries with chart generation
Multi-agent collaboration: route different query types to specialized sub-agents

The sandbox code execution feature (introduced in v0.24.0 and expanded in v0.25.0) lets agents run Python code inside an isolated gVisor sandbox. A data analytics agent can retrieve numbers from a spreadsheet dataset, then execute the computation natively rather than asking the LLM to do math.

User-level memory storage (also new in v0.25.0) persists conversation context across sessions — useful when building assistant-style applications where users expect the system to remember previous preferences or decisions.

Common Configuration Mistakes

Forgetting to set vm.max_map_count on Linux. Elasticsearch will start but fail silently under load. The healthcheck passes, but indexing stalls. Set the kernel parameter before launching.

Using the default passwords in production. The .env ships with ELASTIC_PASSWORD=infini_rag_flow. This is not a placeholder — it's the literal default. Anyone on your network can reach your Elasticsearch cluster with it.

Pulling latest instead of pinning a tag. The latest Docker tag tracks nightly builds. Nightly images can include schema migrations that break existing data. Always pin to a stable tag like v0.25.1 in production.

Skipping the MinIO migration when upgrading from v0.24.x. RAGFlow v0.25.0 switched from the official MinIO Docker image to pgsty/minio. The container name changes, so the old volume mount point differs. Run the migration script in tools/scripts/ before upgrading.

Using Naive chunking for PDFs with tables. Naive splits at fixed token boundaries regardless of document structure. Tables get truncated at arbitrary points. For any document with structured data, use General or the appropriate specialized template.

FAQ

Q: What's the difference between the slim and full Docker images?

The full image (infiniflow/ragflow:v0.25.1, 3.4 GB) bundles the DeepDoc model weights inside the container. The slim image (v0.25.1-slim) omits them and downloads models on first use. Use the full image if you want predictable cold-start times; use slim if you want a smaller initial pull and don't mind the first-run download.

Q: Does RAGFlow support ARM64 / Apple Silicon?

The official Docker images target x86 platforms. ARM64 is tested but not officially supported — the project docs recommend building the image yourself if you're on Apple Silicon. For M-series Macs, running via Docker Desktop with Rosetta emulation is the easiest path, though DeepDoc OCR inference will be slower.

Q: Can I use RAGFlow without Elasticsearch?

Yes. Set DOC_ENGINE=infinity in docker/.env to use Infinity, RAGFlow's own vector database. You can also use OpenSearch (opensearch) or OceanBase (oceanbase). Elasticsearch is the default because it provides the best full-text search alongside vector retrieval, but Infinity has lower resource requirements.

Q: How does RAGFlow compare to basic LangChain RAG pipelines?

Basic LangChain pipelines give you control and composability but don't include document-layout parsing. You bring your own OCR, your own chunking logic, and your own agent orchestration. RAGFlow bundles all of that into one deployed service — the trade-off is less granular control in exchange for faster deployment and better out-of-the-box document handling for complex formats like PDFs with tables and figures.

Q: Is the MCP server production-ready?

RAGFlow's MCP server (port 9382) is a self-hosted implementation introduced in v0.20.0. As of v0.25.1, it supports both SSE and Streamable HTTP transports. It's actively maintained alongside the main RAGFlow release cycle. For production use, place it behind a reverse proxy with TLS and apply rate limiting.

Key Takeaways

RAGFlow v0.25.1 is a mature, production-oriented choice when your documents are complex PDFs, tables, or scanned files that basic text splitters destroy. The setup path is four commands: clone, edit .env, run docker compose up, and open the UI.

The chunking template system is the most important configuration decision. General covers most cases. Paper and Book pay off when your corpus has a consistent document type. Q&A is underused — if you have existing FAQ exports or support tickets, it handles them significantly better than naive chunking.

The MCP server and agent workflow engine make RAGFlow more than a retrieval layer. For teams building agentic applications, having the knowledge base, agent orchestration, and MCP interface in one self-hosted stack eliminates the need to wire together multiple services.

Bottom Line

RAGFlow is the right choice when you need accurate retrieval from PDFs with tables, figures, and complex layouts — the document understanding layer alone justifies the Docker overhead. For simple text corpora, a lightweight LangChain pipeline might be faster to set up; for anything with structure, RAGFlow pays back the setup cost quickly.

DEV Community