Pirate Prentice

Posted on Jul 5 • Edited on Jul 21

n8n Question and Answer Chain Node: Build Retrieval-Augmented Workflows with Any Document [Free Workflow JSON]

#n8n #ai #automation #javascript

The Question and Answer Chain node is n8n's built-in RAG (Retrieval-Augmented Generation) node. You connect it to a vector store, give it a user question, and it retrieves relevant document chunks, passes them to a language model, and returns a grounded answer — all without writing code.

This guide covers how the node works, the full wiring pattern, configuration options, the 6 gotchas that trip people up, and 3 production-ready workflow patterns with free JSON.

What the Question and Answer Chain Node Does

The QA Chain node implements the classic RAG pattern:

Retrieve — embed the user's question and query a vector store for semantically similar chunks
Augment — inject the retrieved chunks into a prompt as context
Generate — pass the augmented prompt to a language model and return the answer

The result is an answer that's grounded in your actual documents — not just a hallucinated LLM response.

Typical inputs:

User questions (from a webhook, form, or chat trigger)
A populated vector store (Pinecone, Qdrant, Weaviate, in-memory)

Typical outputs:

A text answer grounded in retrieved context
Optionally: source chunk metadata (which document the answer came from)

Node Wiring

The QA Chain has two mandatory sub-node connections:

[Question source — Webhook / Chat Trigger / Form Trigger]
          ↓
[Question and Answer Chain]
    ↑              ↑
[Chat Model]  [Vector Store Retriever]
                    ↑
              [Vector Store node]
              (Pinecone / Qdrant / Weaviate / In-Memory)

Required connections:

Chat Model — the LLM that generates the answer (OpenAI, Anthropic, Gemini, Ollama, etc.)
Vector Store Retriever — a retriever sub-node wrapping a populated vector store

How to connect the Vector Store Retriever:

Add a Vector Store node (e.g., Pinecone Vector Store)
Set it to Retrieve Documents (For Agent/Chain) mode
Connect it to the QA Chain's Vector Store Retriever input

Configuration Options

Query field

The field on the incoming item that contains the user's question. Default is query — if your question is in a different field (e.g., message, text, question), change this.

Number of documents to retrieve (Top K)

How many chunks to pull from the vector store per query. Default is typically 4. Higher values give the LLM more context but increase token cost and can dilute relevance. For most use cases, 3–6 works well.

Return source documents

When enabled, the node includes the source chunk metadata in the output — which document, page, or chunk the answer came from. Essential for citation-backed answers and debugging.

6 Gotchas

1. Your vector store must be pre-populated

The QA Chain retrieves from an existing vector store — it does not ingest documents. You need a separate ingestion workflow that embeds your documents and stores them. The QA Chain is read-only at query time.

2. The Embeddings model must match the ingestion model

When you query, n8n embeds the question using the Embeddings model connected to your Vector Store Retriever. This must be the same model (and same dimensions) used when you originally ingested the documents. Mismatched embeddings produce garbage retrieval — the cosine similarity scores will be meaningless.

3. Query field mismatch returns empty answers

If your incoming question is in message but the node is configured to read query, it passes an empty string to the vector store. The retriever returns no chunks, and the LLM answers from its training data alone (or returns "I don't know"). Always verify the Query field setting matches your actual data structure.

4. Chunk size affects answer quality

If chunks are too small (< 100 tokens), retrieved context is fragmented and the LLM can't synthesize a complete answer. If chunks are too large (> 1000 tokens), you hit context limits faster and pay more per query. 300–500 tokens per chunk with ~50-token overlap is a good default for most document types.

5. The LLM still hallucinates without guardrails

The QA Chain grounds the LLM in retrieved context, but it can still hallucinate if the retrieved chunks don't fully answer the question. Add an explicit instruction in your prompt to respond with "I don't have enough information to answer that" when context is insufficient. Some vector store + chain combos support this natively; check the node's prompt template options.

6. In-Memory Vector Store doesn't persist across executions

n8n's built-in in-memory vector store resets every workflow execution. It's useful for testing and single-run batch jobs but not for a live QA system where you want to query the same corpus repeatedly. For production, use Pinecone, Qdrant, or Weaviate.

3 Workflow Patterns

Pattern 1: Internal Knowledge Base Bot

Scenario: Your team has a Notion wiki, internal docs, or a Confluence space. You want a Slack or Teams bot that answers employee questions by searching the actual docs.

Flow:

Ingestion workflow (runs once or on schedule):

HTTP Request (fetch docs from Notion/Confluence API)
→ Code node (chunk text into ~400-token segments)
→ Embeddings node (OpenAI text-embedding-3-small)
→ Pinecone Vector Store (upsert chunks with metadata: doc_title, url, updated_at)

Query workflow (runs on each question):

Webhook Trigger (Slack slash command or Events API)
→ Question and Answer Chain
    ↑ Chat Model (GPT-4o or Claude)
    ↑ Pinecone Vector Store Retriever (Top K: 5, return source docs: on)
→ Code node (format answer + source links)
→ HTTP Request (POST reply to Slack)

Why it works: The ingestion workflow keeps the vector store fresh. The query workflow is stateless and fast — each Slack question triggers a live retrieval + generation cycle. Source doc metadata lets you include links in the answer.

Free JSON: Download the Knowledge Base Bot workflow →

Pattern 2: PDF Document Q&A (Customer-Facing)

Scenario: You have product manuals, compliance documents, or legal agreements as PDFs. Customers or internal users need to ask questions about specific documents without reading them end to end.

Flow:

Ingestion workflow:

HTTP Request (download PDF from URL or S3)
→ Extract From File node (read PDF as text)
→ Code node (chunk into 400-token segments, tag with doc_id metadata)
→ OpenAI Embeddings
→ Qdrant Vector Store (upsert with namespace = doc_id)

Query workflow:

Webhook Trigger (POST: { doc_id: "...", question: "..." })
→ Question and Answer Chain
    ↑ Chat Model
    ↑ Qdrant Vector Store Retriever (filter by doc_id namespace, Top K: 4)
→ HTTP Response (return answer JSON)

Why it works: Namespacing by doc_id lets you host thousands of documents in one vector store and scope each question to the relevant document. Users get answers grounded in the actual text, not a generic LLM response.

Free JSON: Download the PDF Q&A workflow →

Pattern 3: Support Ticket Auto-Responder

Scenario: Your support team has hundreds of resolved tickets with proven answers. When new tickets arrive, you want to auto-draft a response based on similar past answers, then route to a human for review before sending.

Flow:

Ingestion workflow (batch, run periodically):

HTTP Request (fetch resolved tickets from Zendesk/Linear API)
→ Filter node (only tickets with "resolved" + agent_rating > 4)
→ Code node (format as "Problem: ... Solution: ..." chunks)
→ Embeddings → Pinecone (upsert with ticket_id, category metadata)

Query workflow:

Webhook (new ticket created)
→ Question and Answer Chain
    ↑ Chat Model (with system prompt: "Draft a support reply based only on the provided context")
    ↑ Pinecone Retriever (Top K: 3, filter by ticket category)
→ HTTP Request (POST draft reply to Zendesk internal note)
→ Slack notification (agent review required)

Why it works: The LLM drafts based on proven past answers, not generic training data. Filtering by category improves retrieval precision. The human-in-the-loop step before sending keeps quality high.

Free JSON: Download the Support Auto-Responder workflow →

QA Chain vs Other n8n AI Nodes

Node	Best for
Question and Answer Chain	Grounded answers from a pre-built document corpus
Basic LLM Chain	Free-form text generation with no retrieval
AI Agent	Multi-step reasoning with dynamic tool calls
Information Extractor	Pulling specific fields from a single piece of text
Summarization Chain	Condensing long documents into shorter summaries

Use the QA Chain when you need answers grounded in your own documents. Use Basic LLM Chain when you're fine with the model's training knowledge alone.

Quick Reference

Node: Question and Answer Chain
Required sub-nodes:
  - Chat Model (mandatory)
  - Vector Store Retriever (mandatory — wraps a populated vector store)

Key settings:
  - Query field: must match incoming item field name
  - Top K: 3–6 chunks (tune for quality vs cost)
  - Return source documents: on for citation-backed answers

Gotchas:
  - Pre-populate your vector store in a separate ingestion workflow
  - Embeddings model must match between ingestion and query
  - In-memory vector store resets between executions
  - Add "I don't know" fallback instruction to your prompt
  - Verify Query field matches actual data structure
  - Chunk size: 300–500 tokens with overlap is a good default

Get the Free Workflow JSON

All three patterns above are included in the n8n Workflow Packs available on Gumroad. One download, instant access, plug the JSON into your n8n instance and go.

→ Download the n8n Workflow Pack

Found this useful? Drop a comment below — I'm especially curious what vector store you're using and what document corpus you're building QA over.

n8n Information Extractor Node: Extract Structured Data from Text
n8n Summarization Chain Node: Summarize Long Documents
n8n Basic LLM Chain Node: Add Language Model Text Generation\n\n---\n\n*Free: n8n Integration Checklist*\n25 production checks before you ship any n8n workflow — credentials, error handling, dedup, and integration-specific gotchas.\n→ Get the free checklist

Top comments (1)

Pirate Prentice • Jul 5

Are you using the QA Chain node to build RAG workflows — knowledge base bots, PDF Q&A, or support auto-responders? What vector store are you connecting to it (Pinecone, Qdrant, Weaviate, in-memory)? Would love to hear what document corpus you're building over.

DEV Community

n8n Question and Answer Chain Node: Build Retrieval-Augmented Workflows with Any Document [Free Workflow JSON]

What the Question and Answer Chain Node Does

Node Wiring

Configuration Options

Query field

Number of documents to retrieve (Top K)

Return source documents

6 Gotchas

1. Your vector store must be pre-populated

2. The Embeddings model must match the ingestion model

3. Query field mismatch returns empty answers

4. Chunk size affects answer quality

5. The LLM still hallucinates without guardrails

6. In-Memory Vector Store doesn't persist across executions

3 Workflow Patterns

Pattern 1: Internal Knowledge Base Bot

Pattern 2: PDF Document Q&A (Customer-Facing)

Pattern 3: Support Ticket Auto-Responder

QA Chain vs Other n8n AI Nodes

Quick Reference

Get the Free Workflow JSON

Related Articles

Top comments (1)