The RAG tool that auto-generates Q&A pairs from your documents

#ai #docker #selfhosted #ollama

Title:

The RAG tool that auto-generates Q&A pairs from your documents
Tags:

ai, docker, ollama, selfhosted
Body:

Most RAG tools split your documents into chunks and embed them. FastGPT does something smarter: it uses an LLM to read your documents and generate question-answer pairs automatically.

27K GitHub stars. Visual workflow builder. Almost no English integration content. Here's the guide.

What is FastGPT?

FastGPT is an LLM-based knowledge platform with two standout features:

1. QA-pair extraction

Instead of naive chunking, FastGPT reads your document with an LLM and extracts pairs like:

Q: What is the return window? → A: 30 days from purchase with original receipt.
Q: Which payment methods are accepted? → A: Visa, Mastercard, PayPal.

These pairs are what gets embedded and retrieved. At query time, question matches question — dramatically more accurate than matching a question against a random document chunk.

Enable it: Dataset → Upload → Processing Mode → QA Split (not Simple Split).

2. Visual workflow builder

FastGPT has a node editor for building branching RAG pipelines without code. Classify intent → route to FAQ or document search → format output. Each step is a configurable node.

⚠️ License Notice

FastGPT uses its own license that prohibits reselling it as a SaaS service to others.

✅ Self-hosted for your own team — OK
✅ As a backend component in your own product — OK
❌ Selling FastGPT as a service to customers — not permitted

If you need commercial freedom, use MaxKB (Apache 2.0) or WeKnora (MIT) instead.

Setup

Clone the repo, copy .env.example to .env, then docker compose up -d. Open localhost:3000 → root / 1234.

FastGPT needs MongoDB (conversation storage) and PostgreSQL with pgvector (vector search). Both are included in the docker-compose.

Full docker-compose: fastgpt-production-stack

Connecting Ollama

FastGPT uses the OpenAI-compatible API that Ollama provides at /v1.

Settings → AI Models → Add:

Provider: OpenAI Compatible
Base URL: http://ollama:11434/v1
API Key: ollama (any non-empty string works)
Model name: llama3

QA-Pair Extraction vs Simple Chunking

This is FastGPT's real advantage. A concrete example:

Your document says: "Returns are accepted within 30 days of the original purchase date, provided the item is in original condition with all packaging."

Simple chunking embeds that sentence as-is. When a user asks "Can I return something after 3 weeks?" the retrieval depends on semantic similarity between the question and that chunk.

QA-pair extraction creates: Q: "What is the return deadline?" → A: "30 days from purchase." Now the question directly matches a question — much higher retrieval confidence.

For knowledge bases where accuracy matters more than speed, this technique consistently outperforms naive chunking.

FastGPT vs the Alternatives

	FastGPT	WeKnora	MaxKB	RAGFlow
QA-pair extraction	✅	❌	❌	❌
Visual workflow	✅	❌	❌	❌
License	Custom*	MIT	Apache 2.0	Apache 2.0
Setup complexity	Medium	Medium	Easy	Hard
PDF table parsing	Good	Basic	Basic	Excellent

Pick FastGPT when: you want QA-pair extraction for maximum accuracy, or you need a visual pipeline builder for complex routing logic.

Production Deployment

FastGPT needs Nginx + SSL for a real domain. The production docker-compose with Nginx config and Let's Encrypt instructions:

→ fastgpt-production-stack

Full Series

This is the last article in the Chinese AI tools series:

Meta repo with Docker Compose + Ollama + n8n for all five:
→ chinese-ai-tools-english-guide

QA-pair extraction is underrated. If you're building a customer support bot or internal knowledge base, try it once — going back to naive chunking feels wrong after.