Pirate Prentice

Posted on Jul 3 • Edited on Jul 21

n8n Extract From File Node: Parse CSV, JSON, HTML, XML, and Text Files in Your Workflows [Free Workflow JSON]

#n8n #automation #webdev #javascript

If you receive files as binary data — CSV exports, JSON payloads, HTML pages, XML feeds — the Extract From File node is how you turn them into structured items n8n can route, filter, and transform. This guide covers every supported format, the gotchas that trip people up, and three ready-to-use workflow patterns with free JSON.

What the Extract From File Node Does

The Extract From File node reads a binary file already present in your workflow (from an HTTP Request, email attachment, S3 download, FTP pull, etc.) and outputs its contents as structured n8n items.

It replaces the old Move Binary Data node's JSON-extraction path and the deprecated Spreadsheet File read path for non-Excel formats.

Supported Formats

Format	Notes
CSV	Delimiter auto-detected or set manually; first row = headers by default
JSON	Root must be an array or object; nested objects become sub-keys
HTML	Extracts text content; combine with HTML Extract node for CSS selector targeting
XML	Converted to JSON-like structure; attribute prefix configurable
Text	Splits on newline by default; returns one item per line
iCal	Parses `.ics` calendar files into event objects
ODS	OpenDocument Spreadsheet (LibreOffice)

Node Configuration

Input Binary Field — the property name on the incoming item that holds the binary file (default: data). If your HTTP Request node stores the file in attachment or file, change this to match.

File Type — select the format. n8n does not auto-detect; wrong type = empty output or error.

CSV options:

Delimiter — default comma; set to \t for TSV
Header Row — toggle off if your CSV has no headers (columns become column0, column1, …)
Skip Empty Lines — recommended on
Include Empty Cells — toggle on to preserve blank cells as empty strings instead of omitting them

JSON options:

Root Property — if the JSON is {"records": [...]}, set this to records to unwrap the array

XML options:

Attribute Prefix — default @; XML attributes appear as @id, @class, etc.

Common Gotchas

1. Binary field name mismatch
The most frequent error: Error: No binary data found for property "data". Run a snapshot of the previous node, find the actual binary property name, and update Input Binary Field.

2. CSV with BOM
Files exported from Excel often start with a UTF-8 BOM (\uFEFF). This corrupts the first header name. Strip it with a Code node upstream:

const raw = $binary.data.data; // base64
const decoded = Buffer.from(raw, 'base64').toString('utf-8').replace(/^\uFEFF/, ');
// re-encode and pass forward

3. JSON root is an object, not an array
If your JSON is {"user": {...}} rather than [{...}], set Root Property to user — otherwise you get one item with the whole object as a nested value.

4. Large files and memory
Extract From File loads the entire file into memory. For files > 50 MB on constrained instances, use Split in Batches + streaming patterns or pre-process server-side.

5. HTML extraction is text-only
The HTML format strips tags and returns raw text. If you need to target specific elements, feed the output through the HTML Extract node with CSS selectors.

Three Workflow Patterns

Pattern 1: CSV Email Attachment Parser

Trigger: Email → Gmail / IMAP node receives email with CSV attachment

Extract From File: Input Binary Field = attachment, File Type = CSV

Filter: Remove rows where status = "cancelled"

Google Sheets: Append remaining rows to a tracking sheet

This replaces a manual download-open-copy-paste cycle. One workflow processes every inbound CSV report automatically.

{
  "name": "CSV Email Parser",
  "nodes": [
    {"type": "n8n-nodes-base.gmail", "name": "Watch Inbox", "parameters": {"operation": "getAll", "filters": {"q": "has:attachment filename:csv"}}},
    {"type": "n8n-nodes-base.extractFromFile", "name": "Parse CSV", "parameters": {"operation": "csv", "binaryPropertyName": "attachment"}},
    {"type": "n8n-nodes-base.filter", "name": "Active Only", "parameters": {"conditions": {"string": [{"value1": "={{$json.status}}", "operation": "notEqual", "value2": "cancelled"}]}}},
    {"type": "n8n-nodes-base.googleSheets", "name": "Append Rows", "parameters": {"operation": "append", "sheetId": "YOUR_SHEET_ID"}}
  ]
}

Pattern 2: JSON API Export Processor

Trigger: Schedule (daily)

HTTP Request: Download JSON export from internal API (returns {"records": [...]})

Extract From File: File Type = JSON, Root Property = records

Set: Map fields to your schema

HTTP Request: POST each record to downstream service

Useful when a vendor only offers file exports instead of a live API.

Pattern 3: XML Feed Normalizer

Trigger: Schedule (hourly)

HTTP Request: Fetch RSS/Atom or vendor XML feed

Extract From File: File Type = XML

Set: Flatten @ prefixed attributes — ={{$json['@id']}} → id

Filter: Only items published in the last hour

Slack / Email: Alert on new entries

This is a lightweight alternative to the RSS Feed Read node when you need full XML access (namespaces, attributes, custom tags).

Free Workflow JSON

Download a ready-to-import workflow covering all three patterns above:

👉 n8n Workflow Starter Pack — pirateprentice.gumroad.com/l/sxcoe

Import via n8n → Settings → Import Workflow. Credentials not included — wire your own after import.

n8n Spreadsheet File Node — Excel/CSV read & write
n8n Read/Write Files Node — local disk access (self-hosted)
n8n Move Binary Data Node — rename/reorganize binary properties
n8n HTTP Request Node — fetching files from APIs
n8n Item Lists Node — manipulating the items after extraction

Have a file format you're struggling to parse in n8n? Drop a comment below.\n\n---\n\n*Free: n8n Integration Checklist*\n25 production checks before you ship any n8n workflow — credentials, error handling, dedup, and integration-specific gotchas.\n→ Get the free checklist

Top comments (1)

Pirate Prentice • Jul 3

Which file format are you parsing most often with the Extract From File node — CSV exports, JSON payloads, or XML feeds? And have you hit the BOM or binary field name mismatch gotchas? Drop your use case below.

DEV Community