The Question and Answer Chain node is n8n's built-in RAG (Retrieval-Augmented Generation) node. You connect it to a vector store, give it a user question, and it retrieves relevant document chunks, passes them to a language model, and returns a grounded answer — all without writing code.
This guide covers how the node works, the full wiring pattern, configuration options, the 6 gotchas that trip people up, and 3 production-ready workflow patterns with free JSON.
What the Question and Answer Chain Node Does
The QA Chain node implements the classic RAG pattern:
- Retrieve — embed the user's question and query a vector store for semantically similar chunks
- Augment — inject the retrieved chunks into a prompt as context
- Generate — pass the augmented prompt to a language model and return the answer
The result is an answer that's grounded in your actual documents — not just a hallucinated LLM response.
Typical inputs:
- User questions (from a webhook, form, or chat trigger)
- A populated vector store (Pinecone, Qdrant, Weaviate, in-memory)
Typical outputs:
- A text answer grounded in retrieved context
- Optionally: source chunk metadata (which document the answer came from)
Node Wiring
The QA Chain has two mandatory sub-node connections:
[Question source — Webhook / Chat Trigger / Form Trigger]
↓
[Question and Answer Chain]
↑ ↑
[Chat Model] [Vector Store Retriever]
↑
[Vector Store node]
(Pinecone / Qdrant / Weaviate / In-Memory)
Required connections:
- Chat Model — the LLM that generates the answer (OpenAI, Anthropic, Gemini, Ollama, etc.)
- Vector Store Retriever — a retriever sub-node wrapping a populated vector store
How to connect the Vector Store Retriever:
- Add a Vector Store node (e.g., Pinecone Vector Store)
- Set it to Retrieve Documents (For Agent/Chain) mode
- Connect it to the QA Chain's Vector Store Retriever input
Configuration Options
Query field
The field on the incoming item that contains the user's question. Default is query — if your question is in a different field (e.g., message, text, question), change this.
Number of documents to retrieve (Top K)
How many chunks to pull from the vector store per query. Default is typically 4. Higher values give the LLM more context but increase token cost and can dilute relevance. For most use cases, 3–6 works well.
Return source documents
When enabled, the node includes the source chunk metadata in the output — which document, page, or chunk the answer came from. Essential for citation-backed answers and debugging.
6 Gotchas
1. Your vector store must be pre-populated
The QA Chain retrieves from an existing vector store — it does not ingest documents. You need a separate ingestion workflow that embeds your documents and stores them. The QA Chain is read-only at query time.
2. The Embeddings model must match the ingestion model
When you query, n8n embeds the question using the Embeddings model connected to your Vector Store Retriever. This must be the same model (and same dimensions) used when you originally ingested the documents. Mismatched embeddings produce garbage retrieval — the cosine similarity scores will be meaningless.
3. Query field mismatch returns empty answers
If your incoming question is in message but the node is configured to read query, it passes an empty string to the vector store. The retriever returns no chunks, and the LLM answers from its training data alone (or returns "I don't know"). Always verify the Query field setting matches your actual data structure.
4. Chunk size affects answer quality
If chunks are too small (< 100 tokens), retrieved context is fragmented and the LLM can't synthesize a complete answer. If chunks are too large (> 1000 tokens), you hit context limits faster and pay more per query. 300–500 tokens per chunk with ~50-token overlap is a good default for most document types.
5. The LLM still hallucinates without guardrails
The QA Chain grounds the LLM in retrieved context, but it can still hallucinate if the retrieved chunks don't fully answer the question. Add an explicit instruction in your prompt to respond with "I don't have enough information to answer that" when context is insufficient. Some vector store + chain combos support this natively; check the node's prompt template options.
6. In-Memory Vector Store doesn't persist across executions
n8n's built-in in-memory vector store resets every workflow execution. It's useful for testing and single-run batch jobs but not for a live QA system where you want to query the same corpus repeatedly. For production, use Pinecone, Qdrant, or Weaviate.
3 Workflow Patterns
Pattern 1: Internal Knowledge Base Bot
Scenario: Your team has a Notion wiki, internal docs, or a Confluence space. You want a Slack or Teams bot that answers employee questions by searching the actual docs.
Flow:
Ingestion workflow (runs once or on schedule):
HTTP Request (fetch docs from Notion/Confluence API)
→ Code node (chunk text into ~400-token segments)
→ Embeddings node (OpenAI text-embedding-3-small)
→ Pinecone Vector Store (upsert chunks with metadata: doc_title, url, updated_at)
Query workflow (runs on each question):
Webhook Trigger (Slack slash command or Events API)
→ Question and Answer Chain
↑ Chat Model (GPT-4o or Claude)
↑ Pinecone Vector Store Retriever (Top K: 5, return source docs: on)
→ Code node (format answer + source links)
→ HTTP Request (POST reply to Slack)
Why it works: The ingestion workflow keeps the vector store fresh. The query workflow is stateless and fast — each Slack question triggers a live retrieval + generation cycle. Source doc metadata lets you include links in the answer.
Free JSON: Download the Knowledge Base Bot workflow →
Pattern 2: PDF Document Q&A (Customer-Facing)
Scenario: You have product manuals, compliance documents, or legal agreements as PDFs. Customers or internal users need to ask questions about specific documents without reading them end to end.
Flow:
Ingestion workflow:
HTTP Request (download PDF from URL or S3)
→ Extract From File node (read PDF as text)
→ Code node (chunk into 400-token segments, tag with doc_id metadata)
→ OpenAI Embeddings
→ Qdrant Vector Store (upsert with namespace = doc_id)
Query workflow:
Webhook Trigger (POST: { doc_id: "...", question: "..." })
→ Question and Answer Chain
↑ Chat Model
↑ Qdrant Vector Store Retriever (filter by doc_id namespace, Top K: 4)
→ HTTP Response (return answer JSON)
Why it works: Namespacing by doc_id lets you host thousands of documents in one vector store and scope each question to the relevant document. Users get answers grounded in the actual text, not a generic LLM response.
Free JSON: Download the PDF Q&A workflow →
Pattern 3: Support Ticket Auto-Responder
Scenario: Your support team has hundreds of resolved tickets with proven answers. When new tickets arrive, you want to auto-draft a response based on similar past answers, then route to a human for review before sending.
Flow:
Ingestion workflow (batch, run periodically):
HTTP Request (fetch resolved tickets from Zendesk/Linear API)
→ Filter node (only tickets with "resolved" + agent_rating > 4)
→ Code node (format as "Problem: ... Solution: ..." chunks)
→ Embeddings → Pinecone (upsert with ticket_id, category metadata)
Query workflow:
Webhook (new ticket created)
→ Question and Answer Chain
↑ Chat Model (with system prompt: "Draft a support reply based only on the provided context")
↑ Pinecone Retriever (Top K: 3, filter by ticket category)
→ HTTP Request (POST draft reply to Zendesk internal note)
→ Slack notification (agent review required)
Why it works: The LLM drafts based on proven past answers, not generic training data. Filtering by category improves retrieval precision. The human-in-the-loop step before sending keeps quality high.
Free JSON: Download the Support Auto-Responder workflow →
QA Chain vs Other n8n AI Nodes
| Node | Best for |
|---|---|
| Question and Answer Chain | Grounded answers from a pre-built document corpus |
| Basic LLM Chain | Free-form text generation with no retrieval |
| AI Agent | Multi-step reasoning with dynamic tool calls |
| Information Extractor | Pulling specific fields from a single piece of text |
| Summarization Chain | Condensing long documents into shorter summaries |
Use the QA Chain when you need answers grounded in your own documents. Use Basic LLM Chain when you're fine with the model's training knowledge alone.
Quick Reference
Node: Question and Answer Chain
Required sub-nodes:
- Chat Model (mandatory)
- Vector Store Retriever (mandatory — wraps a populated vector store)
Key settings:
- Query field: must match incoming item field name
- Top K: 3–6 chunks (tune for quality vs cost)
- Return source documents: on for citation-backed answers
Gotchas:
- Pre-populate your vector store in a separate ingestion workflow
- Embeddings model must match between ingestion and query
- In-memory vector store resets between executions
- Add "I don't know" fallback instruction to your prompt
- Verify Query field matches actual data structure
- Chunk size: 300–500 tokens with overlap is a good default
Get the Free Workflow JSON
All three patterns above are included in the n8n Workflow Packs available on Gumroad. One download, instant access, plug the JSON into your n8n instance and go.
→ Download the n8n Workflow Pack
Found this useful? Drop a comment below — I'm especially curious what vector store you're using and what document corpus you're building QA over.
Top comments (0)