DEV Community

Pirate Prentice
Pirate Prentice

Posted on

n8n Summarization Chain Node: Summarize Long Documents and Web Pages in Your Workflows [Free Workflow JSON]

n8n Summarization Chain Node: Summarize Long Documents and Web Pages in Your Workflows [Free Workflow JSON]

The Summarization Chain node in n8n lets you condense long text — PDFs, emails, web pages, transcripts, knowledge-base articles — into concise summaries using any supported language model, without writing a single line of code. This guide covers every setting, the three summarization strategies, common gotchas, and three ready-to-use workflow patterns you can copy today.


What the Summarization Chain Node Does

The Summarization Chain node wraps LangChain's summarization chain and exposes it as a native n8n node. You feed it a Document (from a Document Loader or a Convert-to-Document node), attach a Chat Model, and get back a summary string.

It handles chunking, token management, and multi-pass summarization automatically — no manual splitting required.


Required Inputs

Input Required Description
Chat Model Yes Any n8n-supported LLM (OpenAI, Anthropic, Mistral, Ollama, etc.)
Document Yes One or more Document objects (from Document Loader sub-nodes)

Important: The node accepts LangChain Document objects, not raw text strings. You must pass data through a Document Loader (Default Data Loader, PDF Loader, etc.) or use the convertToDocument expression helper.


Summarization Type (Strategy)

The node offers three strategies. Select in the Summarization Type dropdown.

1. Map Reduce (default — recommended for long documents)

Splits the document into chunks, summarizes each chunk independently ("map"), then summarizes all chunk summaries together ("reduce").

  • Best for: PDFs, long transcripts, knowledge-base articles, multi-page docs
  • Token cost: Moderate (each chunk + final reduce)
  • Quality: High — each chunk gets full model attention

Settings exposed:

  • Combine Map Summaries Prompt — prompt used in the reduce step (customize to add structure)

2. Refine

Processes chunks sequentially: summarizes the first chunk, then passes that summary + the next chunk to the model to produce a refined summary, repeating until done.

  • Best for: Narrative text where context flows chronologically (meeting transcripts, stories, emails)
  • Token cost: Higher — each step sees growing context
  • Quality: Coherent narrative output

Settings exposed:

  • Refine Prompt — the instruction given at each refinement step

3. Stuff

Stuffs all document content into a single prompt and summarizes in one pass.

  • Best for: Short documents that fit within the model's context window
  • Token cost: Lowest
  • Gotcha: Fails silently or hallucinates if the document exceeds the model's context limit. Only use for documents you know are short.

Prompt Customization

All three strategies expose prompt fields. The default prompts are generic ("Write a concise summary of the following…"). Override them to get structured output:

Summarize the following customer support ticket in 2–3 sentences.
Focus on: (1) the customer's core problem, (2) what was tried, (3) the resolution status.
Reply in plain text only.
Enter fullscreen mode Exit fullscreen mode

Prompts use {text} as the placeholder for document content. Do not remove it.


Output

The node outputs a single item with one field:

{ "response": "Your summary text here..." }
Enter fullscreen mode Exit fullscreen mode

Access it downstream with {{ $json.response }}.


6 Common Gotchas

1. Passing raw text instead of a Document object
The node rejects plain strings. Wrap raw text using a Document Loader node set to "Default Data Loader" with your text in the Data field, or use convertToDocument() in a Code node.

2. Stuff strategy context-window overflow
There is no automatic guard. If your text exceeds the model's context limit, the model truncates silently or errors. Use Map Reduce for anything longer than ~3,000 words.

3. No {text} placeholder in custom prompts
The node injects document content via {text}. If you remove this placeholder from a custom prompt, the model summarizes an empty string and returns generic filler.

4. Slow execution on Map Reduce with many chunks
Each chunk is a separate API call. A 50-page PDF with 1,500-token chunks and a Map Reduce strategy may make 20+ API calls. Watch rate limits and latency. Use a faster/cheaper model (e.g., GPT-4o-mini, claude-haiku) for the map step.

5. Document Loader node required — not an HTTP Request node output
You cannot directly connect an HTTP Request node to the Document input. You need a Document Loader (e.g., the "Default Data Loader" or "PDF Loader" sub-node) between them.

6. Summarization Chain vs Basic LLM Chain for summarization
You can summarize with a Basic LLM Chain by passing text in the prompt — but you must handle chunking yourself. The Summarization Chain handles chunking automatically. For documents that reliably fit in context, Basic LLM Chain is simpler; for variable-length documents, always use Summarization Chain.


3 Workflow Patterns

Pattern 1: PDF Inbox Summarizer → Slack

Trigger: Gmail Attachment Trigger (PDF received)
→ Download attachment (HTTP Request or Gmail node)
→ Default Data Loader (PDF Loader)
→ Summarization Chain (Map Reduce, custom prompt: "Summarize this document in 5 bullet points, focusing on action items.")
→ Slack node (post to #documents channel with {{ $json.response }})

Use case: Legal, finance, and ops teams get instant Slack summaries of every inbound PDF without opening the file.

Pattern 2: Customer Support Ticket Summarizer

Trigger: Webhook (new ticket from Zendesk/Freshdesk)
→ Default Data Loader (ticket body + thread history as text)
→ Summarization Chain (Stuff — tickets are short; Map Reduce for long threads)
→ HTTP Request (PATCH ticket — write summary to custom field)

Use case: Agents see a 2-sentence summary at the top of every ticket before reading the thread. Cuts handle time.

Pattern 3: Nightly News Digest

Trigger: Schedule Trigger (daily at 6 AM)
→ RSS Feed Read node (10+ tech feeds)
→ Split In Batches (process each article URL)
→ HTTP Request (fetch article HTML)
→ HTML Extract (extract body text)
→ Default Data Loader
→ Summarization Chain (Map Reduce, prompt: "Summarize this article in 3 sentences. Include the main claim and any data points.")
→ Aggregate (collect all summaries)
→ Basic LLM Chain (combine into a single digest email)
→ Send Email (Gmail node)

Use case: Daily digest of 10 articles in under 30 minutes of reading — all AI-summarized.


Free Workflow JSON

Here is a minimal Summarization Chain workflow you can import directly into n8n:

{
  "name": "Summarization Chain — Starter",
  "nodes": [
    {
      "parameters": {},
      "name": "When clicking 'Test workflow'",
      "type": "n8n-nodes-base.manualTrigger",
      "typeVersion": 1,
      "position": [250, 300]
    },
    {
      "parameters": {
        "dataType": "defineBelow",
        "jsonData": "=Your long text goes here. Replace this with {{ $json.body }} or any text field from an upstream node."
      },
      "name": "Default Data Loader",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "typeVersion": 1,
      "position": [500, 420]
    },
    {
      "parameters": {
        "model": "gpt-4o-mini"
      },
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "typeVersion": 1,
      "position": [500, 180]
    },
    {
      "parameters": {
        "summarizationMethodAndPrompts": {
          "summarizationMethod": "mapReduce"
        }
      },
      "name": "Summarization Chain",
      "type": "@n8n/n8n-nodes-langchain.chainSummarization",
      "typeVersion": 2,
      "position": [750, 300]
    }
  ],
  "connections": {
    "When clicking 'Test workflow'": {
      "main": [[{ "node": "Summarization Chain", "type": "main", "index": 0 }]]
    },
    "Default Data Loader": {
      "ai_document": [[{ "node": "Summarization Chain", "type": "ai_document", "index": 0 }]]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [[{ "node": "Summarization Chain", "type": "ai_languageModel", "index": 0 }]]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Want a complete implementation of any of the three patterns above? The n8n Workflow Starter Pack includes production-ready versions of all three patterns with error handling, retry logic, and Slack/email delivery built in.


Summarization Chain vs Other n8n Nodes

Node Best For Chunking Output
Summarization Chain Long documents of any length Automatic Summary string
Basic LLM Chain Short text, custom prompts Manual Any LLM output
AI Agent node Multi-step reasoning, tool calls Manual Agent response
Information Extractor Structured data from text None JSON schema

Key Takeaways

  • Use Map Reduce for anything over ~3,000 words
  • Use Refine for narrative content where sequence matters
  • Use Stuff only for short, known-length documents
  • Always customize the prompt — the default is generic
  • Output is always {{ $json.response }}
  • Pair with Document Loaders, not raw text nodes

The Summarization Chain is the cleanest way to add document intelligence to any n8n workflow without managing chunking, token limits, or multi-step prompts yourself.


Building document processing pipelines with n8n? The n8n Workflow Starter Pack includes production-ready summarization workflows for PDF inboxes, support ticket summarizers, and nightly news digests — $29, instant download.

Top comments (0)