DEV Community

Pirate Prentice
Pirate Prentice

Posted on

n8n Extract From File Node: Parse CSV, JSON, HTML, XML, and Text Files in Your Workflows [Free Workflow JSON]

If you receive files as binary data — CSV exports, JSON payloads, HTML pages, XML feeds — the Extract From File node is how you turn them into structured items n8n can route, filter, and transform. This guide covers every supported format, the gotchas that trip people up, and three ready-to-use workflow patterns with free JSON.


What the Extract From File Node Does

The Extract From File node reads a binary file already present in your workflow (from an HTTP Request, email attachment, S3 download, FTP pull, etc.) and outputs its contents as structured n8n items.

It replaces the old Move Binary Data node's JSON-extraction path and the deprecated Spreadsheet File read path for non-Excel formats.


Supported Formats

Format Notes
CSV Delimiter auto-detected or set manually; first row = headers by default
JSON Root must be an array or object; nested objects become sub-keys
HTML Extracts text content; combine with HTML Extract node for CSS selector targeting
XML Converted to JSON-like structure; attribute prefix configurable
Text Splits on newline by default; returns one item per line
iCal Parses .ics calendar files into event objects
ODS OpenDocument Spreadsheet (LibreOffice)

Node Configuration

Input Binary Field — the property name on the incoming item that holds the binary file (default: data). If your HTTP Request node stores the file in attachment or file, change this to match.

File Type — select the format. n8n does not auto-detect; wrong type = empty output or error.

CSV options:

  • Delimiter — default comma; set to \t for TSV
  • Header Row — toggle off if your CSV has no headers (columns become column0, column1, …)
  • Skip Empty Lines — recommended on
  • Include Empty Cells — toggle on to preserve blank cells as empty strings instead of omitting them

JSON options:

  • Root Property — if the JSON is {"records": [...]}, set this to records to unwrap the array

XML options:

  • Attribute Prefix — default @; XML attributes appear as @id, @class, etc.

Common Gotchas

1. Binary field name mismatch
The most frequent error: Error: No binary data found for property "data". Run a snapshot of the previous node, find the actual binary property name, and update Input Binary Field.

2. CSV with BOM
Files exported from Excel often start with a UTF-8 BOM (\uFEFF). This corrupts the first header name. Strip it with a Code node upstream:

const raw = $binary.data.data; // base64
const decoded = Buffer.from(raw, 'base64').toString('utf-8').replace(/^\uFEFF/, ');
// re-encode and pass forward
Enter fullscreen mode Exit fullscreen mode

3. JSON root is an object, not an array
If your JSON is {"user": {...}} rather than [{...}], set Root Property to user — otherwise you get one item with the whole object as a nested value.

4. Large files and memory
Extract From File loads the entire file into memory. For files > 50 MB on constrained instances, use Split in Batches + streaming patterns or pre-process server-side.

5. HTML extraction is text-only
The HTML format strips tags and returns raw text. If you need to target specific elements, feed the output through the HTML Extract node with CSS selectors.


Three Workflow Patterns

Pattern 1: CSV Email Attachment Parser

Trigger: Email → Gmail / IMAP node receives email with CSV attachment

Extract From File: Input Binary Field = attachment, File Type = CSV

Filter: Remove rows where status = "cancelled"

Google Sheets: Append remaining rows to a tracking sheet

This replaces a manual download-open-copy-paste cycle. One workflow processes every inbound CSV report automatically.

{
  "name": "CSV Email Parser",
  "nodes": [
    {"type": "n8n-nodes-base.gmail", "name": "Watch Inbox", "parameters": {"operation": "getAll", "filters": {"q": "has:attachment filename:csv"}}},
    {"type": "n8n-nodes-base.extractFromFile", "name": "Parse CSV", "parameters": {"operation": "csv", "binaryPropertyName": "attachment"}},
    {"type": "n8n-nodes-base.filter", "name": "Active Only", "parameters": {"conditions": {"string": [{"value1": "={{$json.status}}", "operation": "notEqual", "value2": "cancelled"}]}}},
    {"type": "n8n-nodes-base.googleSheets", "name": "Append Rows", "parameters": {"operation": "append", "sheetId": "YOUR_SHEET_ID"}}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Pattern 2: JSON API Export Processor

Trigger: Schedule (daily)

HTTP Request: Download JSON export from internal API (returns {"records": [...]})

Extract From File: File Type = JSON, Root Property = records

Set: Map fields to your schema

HTTP Request: POST each record to downstream service

Useful when a vendor only offers file exports instead of a live API.


Pattern 3: XML Feed Normalizer

Trigger: Schedule (hourly)

HTTP Request: Fetch RSS/Atom or vendor XML feed

Extract From File: File Type = XML

Set: Flatten @ prefixed attributes — ={{$json['@id']}}id

Filter: Only items published in the last hour

Slack / Email: Alert on new entries

This is a lightweight alternative to the RSS Feed Read node when you need full XML access (namespaces, attributes, custom tags).


Free Workflow JSON

Download a ready-to-import workflow covering all three patterns above:

👉 n8n Workflow Starter Pack — pirateprentice.gumroad.com/l/sxcoe

Import via n8n → Settings → Import Workflow. Credentials not included — wire your own after import.


Related Articles


Have a file format you're struggling to parse in n8n? Drop a comment below.

Top comments (1)

Collapse
 
pirateprentice profile image
Pirate Prentice

Which file format are you parsing most often with the Extract From File node — CSV exports, JSON payloads, or XML feeds? And have you hit the BOM or binary field name mismatch gotchas? Drop your use case below.