1. Introduction: The Engine Room of Automation
n8n is a robust, open-source workflow automation tool and low-code platform. While standard nodes (Set, If, Merge, Switch) handle linear task execution, the Python Code Node represents the "engine room" of sophisticated automation. It allows users to implement custom logic, complex data manipulations, and performance optimizations that standard nodes cannot achieve.
1.1 The Role of Python in n8n
The Code Node and Inline Expressions empower users to perform operations—specifically in data science, string manipulation, and mathematical calculation—that are often more verbose or difficult in JavaScript. A single Python node can effectively replace chains of 10 to 15 standard nodes, resulting in cleaner, more maintainable, and significantly faster workflows.
Why Use Python in n8n?
- Logic Complexity: Supports nested loops, complex conditionals, regex, and advanced control flow where standard nodes would require multiple "If" nodes.
- Data Processing: High performance processing. Can process 1,000+ items instantly in memory, whereas standard node chains have high overhead due to node instantiation per step.
- Libraries: Access to
pandas,numpy,math,re,scikit-learn,requests, and more (if configured). - Data Shaping: Advanced list comprehensions, flattening, pivoting, and restructuring of complex JSON/Arrays.
- Batch Processing: Efficiently process multiple items together.
- Cross-item Logic: Handle items that need to reference each other (e.g., deduplication based on global context).
1.2 Performance Benefits and Benchmarks
The performance benefits of consolidating logic into a Code node are substantial. When processing a dataset of 1,000 items:
- Code Node (Python): Efficient execution via Pyodide or Local Python process.
- Standard Set/If Node Chains: Significantly higher overhead due to node instantiation and context switching between nodes.
Verdict: For high-volume data, the Code Node is drastically faster due to reduced execution overhead. This makes it essential for enterprise-grade automations.
2. Architecture & Execution Environments
Before writing code, it is critical to understand where and how your Python code runs. You must select an environment via the Language dropdown in the Code Node.
2.1 Option A: Python (Standard) - The Default
- Engine: Pyodide (WebAssembly) running inside the main n8n process.
- Best For: String manipulation, date formatting, regex, math, and small loops (<500 items).
- Startup: Instant (no network overhead).
- Limitations:
- No access to the host filesystem.
- Cannot use C-extension libraries like
pandasornumpy. - Slower for heavy iteration compared to native Python.
2.2 Option B: Python (Native) - The Heavy Lifter
- Engine: A full Python process running on an isolated Task Runner (or sidecar).
- Best For: Heavy Data Processing, ETL, Machine Learning, and large datasets (10k+ items).
- Capabilities: Full access to external libraries (
pandas,numpy,scikit-learn,requests) if configured. - Startup: Slight latency (~10-50ms) for serialization/network, but exponentially faster execution speeds for data.
3. Configuration & Infrastructure: The Docker Reality
Many users fail here. Simply adding pandas to the "Allow List" is not enough; the library must physically exist in the container. The default n8n image is lightweight and does not contain data science libraries (though some versions may pre-install pandas/numpy, relying on this is risky without verification).
3.1 Step 1: Build a Custom Docker Image
To use libraries like pandas, numpy, or requests in the Native environment, you must extend the base n8n image.
Dockerfile:
# Use the official n8n image as base
FROM n8nio/n8n:latest
USER root
# Install Python3 and PIP (Alpine Linux syntax)
RUN apk add --update --no-cache python3 py3-pip
# Install the heavy libraries
RUN pip3 install pandas numpy requests beautifulsoup4
# Switch back to the node user for security
USER node
3.2 Step 2: Configure the Allow List
Once your custom image is running, you must tell n8n it is safe to import these modules via Environment Variables on the n8n instance (or Task Runners).
Docker Compose (docker-compose.yml):
services:
n8n:
build: . # Point to your custom Dockerfile
# or image: n8nio/n8n if using default
environment:
# Allow specific external Python modules (comma separated)
- N8N_PYTHON_MODULE_ALLOW_LIST=pandas,numpy,requests,beautifulsoup4
ports:
- "5678:5678"
4. Modules and Libraries
4.1 Standard Library (Built-in)
These are available immediately in both Standard (Pyodide) and Native environments without configuration.
| Module | Purpose | Example Code |
|---|---|---|
json |
JSON parsing/dumping | json.loads('{"a":1}') |
re |
Regular Expressions | re.search(r'pattern', text) |
math |
Mathematical functions | math.ceil(4.2) |
datetime |
Date and time handling | datetime.now() |
hashlib |
Hashing (SHA256, MD5) | hashlib.sha256(b'data') |
random |
Random number generation | random.choice(my_list) |
itertools |
Efficient looping | itertools.chain(...) |
collections |
Specialized containers | collections.Counter(data) |
urllib |
URL handling | urllib.parse.urlparse(url) |
4.2 Common External Packages (Requires Configuration)
These must be installed in the environment and allowed via N8N_PYTHON_MODULE_ALLOW_LIST.
| Package | Use Case | Quick Example |
|---|---|---|
pandas |
Heavy Data Analysis | df = pd.DataFrame(data) |
numpy |
Scientific Computing | np.array([1, 2, 3]) |
requests |
HTTP Requests (Synchronous) | requests.get('https://...') |
beautifulsoup4 |
HTML Parsing/Scraping | BeautifulSoup(html, 'html.parser') |
faker |
Generating fake data | faker.Faker().name() |
4.3 Restrictions
- System Access: Direct file system access (
open(),os.system) is often restricted or ephemeral depending on the hosting environment. - Async: While Python supports
asyncio, the n8n Python node logic is often written synchronously for data transformation, thoughawaitis required for n8n specific helpers (like_helpers.httpRequest).
5. Syntax & Core Concepts: The "Strict Dictionary" Rule
The most common mistake developers make is applying JavaScript dot-notation to Python. n8n Python nodes require Strict Dictionary Syntax.
5.1 Syntax Cheatsheet
| Feature | ❌ JS / Legacy Style (Avoid) | ✅ Python Standard (Use This) |
|---|---|---|
| JSON Access | item.json.myField |
item['json']['myField'] |
| Nested Data | item.json.user.id |
item['json']['user']['id'] |
| Safe Access | item.json.optional? |
item['json'].get('optional', default_val) |
| Params | _input.params.myVal |
_input.params['myVal'] |
5.2 Inline Expressions vs. Code Node
It is vital to distinguish between the two systems:
- Inline Expressions
{{ $json.id }}: Always JavaScript. Acts as the "Tournament engine". Used inside node parameters (toggle "Fixed / Expression"). - Code Node: Python. Used for logic scripts.
5.3 The Bridge: Using Python to Replace Inline Logic
Because Inline Expressions are JS-only, if you prefer Python, you should move that logic into a Python Code Node preceding your target node.
Example: Instead of {{ $json.name.toUpperCase() }} in a Set node (JS), do this in a Python node:
for item in _input.all():
item['json']['name'] = item['json']['name'].upper()
return _input.all()
6. Python Equivalents to Inline Transformations
Since you cannot use Python in the {{ }} fields, here are the Python Code Node equivalents for common transformations.
6.1 String Transformations
- Check Email:
re.match(r"[^@]+@[^@]+\.[^@]+", email) - Extract Domain:
email.split('@')[1]orurllib.parse - Remove Tags:
re.sub('<[^<]+?>', '', text) - Base64:
base64.b64encode(data) - Snake Case:
text.lower().replace(' ', '_') - Trim:
text.strip()
6.2 List (Array) Transformations
- Sum:
sum([10, 20])→30 - Remove Duplicates:
list(set(my_list)) - Merge:
list1 + list2orlist1.extend(list2) - Is Empty:
len(my_list) == 0 - Random Item:
random.choice(my_list) - First/Last:
my_list[0]/my_list[-1]
6.3 Number Transformations
- Round:
round(amount, 2) - To Boolean:
bool(value) - Format:
"{:,.2f}".format(price)(e.g., 1,000.00) - Is Even:
num % 2 == 0
6.4 Dictionary (Object) Transformations
- Is Empty:
not my_dict - Remove Key:
my_dict.pop('key', None) - Merge:
{**dict1, **dict2}ordict1 | dict2(Python 3.9+) - To JSON String:
json.dumps(my_dict)
6.5 Date & Time (datetime)
- Current Time:
now = datetime.now() - Parse String:
dt = datetime.fromisoformat('2023-01-01') - Add Time:
dt + timedelta(weeks=1) - Format:
dt.strftime('%Y-%m-%d') - Is Weekend:
dt.weekday() >= 5
7. Input, Output, and Execution Modes
How you access data depends entirely on the "Mode" toggle. Warning: Using the wrong access method for the selected mode will crash the node.
7.1 Mode: "Run Once for All Items" (Recommended)
The script runs once. You receive the full batch of data as a list.
- Best For: Aggregation, Filtering, Pandas, Cross-referencing items.
- Performance: Most performant. The interpreter initializes once.
- Data Access:
_input.all()(Returns List). - Unavailable:
_input.item(Will throw error).
items = _input.all()
# Process all items at once
return [
{'json': {**item['json'], 'processed': True}}
for item in items
]
7.2 Mode: "Run Once for Each Item"
The script runs separately for every single incoming item.
- Best For: Simple 1-to-1 transformations where items are isolated.
- Performance Hit: Higher overhead (interpreter context switching).
- Data Access:
_input.item(Returns Dictionary). - Unavailable:
_input.all()(Will throw error). The list of other items is not available.
7.3 The Output Contract
To prevent workflow errors, your return statement must strictly follow the n8n data structure: A List of Dictionaries containing a 'json' key.
Strict Rules:
- Always Return a List: Output must be
[...]. - Use the
jsonKey: Data must be inside a dictionary key named'json'. - No Primitives:
return "success"is invalid. - Dictionary Structure:
{'json': {...}, 'binary': {...}}.
The Golden Valid Format:
return [
{ 'json': { 'id': 1, 'name': 'Alice' } },
{ 'json': { 'id': 2, 'name': 'Bob' } }
]
8. Accessing Data Contexts & State Management
n8n exposes special objects prefixed with _.
8.1 Accessing Input Data
| Variable | Description | Execution Mode |
|---|---|---|
_input.all() |
Returns a list of all items. | "Run Once for All" |
_input.item |
The specific item (Dict) being processed. | "Run Once for Each" |
_input.first() |
The very first item in the batch. | Both |
_input.last() |
The last item in the batch. | Both |
_input.params |
Node configuration parameters. | Both |
8.2 Accessing Other Nodes (Cross-Node)
-
_("NodeName").all(): Access output from any previous node. -
_("NodeName").first()/.last(): First/Last item. -
_("NodeName").item: (Run Once Each) Item aligned with current index.
8.3 Global Variables & Metadata
-
_vars['my_var']: Global variables defined in Workflow Settings. -
_env['API_KEY']: System environment variables. -
_secrets: External secrets. -
_workflow: Metadata (id,name,active). -
_execution: Metadata (id,mode).
8.4 Persisting State (staticData)
Variables (_vars) are constant. If you need to store data between executions (e.g., for deduplication or polling triggers), you must use staticData.
Note: staticData is only saved when the workflow is **Active. It does not persist in manual "Test" runs.
Basic Example (Polling):
# Access global static data store
static_data = _get_workflow_static_data('global')
# Retrieve last processed ID (default to 0)
last_id = static_data.get('last_id', 0)
# Update the state for the next execution
static_data['last_id'] = 150
Caching Example:
static_data = _getWorkflowStaticData('global')
# Initialize cache if not exists
if 'cache' not in static_data:
static_data['cache'] = {}
item_id = str(_input.item['json']['id'])
# Check cache
if item_id in static_data['cache']:
return [{'json': static_data['cache'][item_id]}]
# Fetch and update cache
data = {'id': item_id, 'fetched': True}
static_data['cache'][item_id] = data
return [{'json': data}]
9. Essential Transformations & Recipes
9.1 Data Cleaning (List Comprehension)
Mode: Run Once for All Items
items = _input.all()
return [
{
'json': {
# .get() for safety (like optional chaining)
'id': item['json'].get('user_id', 'unknown'),
'email': item['json'].get('email', '').strip().lower(),
# Default value
'role': item['json'].get('role', 'guest')
}
}
for item in items
if item['json'].get('isActive') # Filter: Only active users
]
9.2 Filtering
Use Python's list comprehensions with if clauses.
items = _input.all()
# Filter users that are active and have a company email
filtered_users = [
{'json': {**item['json'], 'valid': True}} # Unpack and add flag
for item in items
if item['json'].get('isActive') and '@company.com' in item['json'].get('email', '')
]
return filtered_users
9.3 High-Speed Aggregation (Pandas)
Mode: Run Once for All Items | **Requires: Python (Native) & Custom Docker Image* *
import pandas as pd
# 1. Load Data
data = [i['json'] for i in _input.all()]
df = pd.DataFrame(data)
# 2. Group by Category and Sum Price
report = df.groupby('category')['price'].sum().reset_index()
# 3. Convert back to n8n format
return [{'json': row} for row in report.to_dict(orient='records')]
9.4 Grouping and Aggregation (Standard Python)
If you cannot use Pandas, use collections.defaultdict.
from collections import defaultdict
items = _input.all()
grouped = defaultdict(list)
for item in items:
cat = item['json'].get('category', 'Other')
grouped[cat].append(item['json'])
results = []
for category, products in grouped.items():
total_price = sum(p['price'] for p in products)
results.append({
'json': {
'category': category,
'count': len(products),
'total': total_price
}
})
return results
9.5 Flattening
Explode lists (e.g., 1 Order → 5 Line Items).
items = _input.all()
results = []
for item in items:
order_id = item['json']['id']
for product in item['json']['lineItems']:
results.append({
'json': {
'orderId': order_id,
'productName': product['name'],
'price': product['price']
}
})
return results
9.6 Recursive Flattening
For deep API responses.
def flatten_dict(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
return [{'json': flatten_dict(_input.item['json'])}]
9.7 Multi-Input Handling
If you cannot use a Merge node:
inputs = _input.all()
if len(inputs) < 2:
raise Exception('Expected 2 inputs')
primary = inputs[0]
secondary = inputs[1]
return [{'json': {'primary': primary['json'], 'secondary': secondary['json']}}]
9.8 Item Linking (pairedItem)
Preserve lineage for UI debugging.
items = _input.all()
output = []
for index, item in enumerate(items):
output.append({
'json': item['json'],
'pairedItem': {'item': index}
})
return output
9.9 Binary Data Manipulation
Mode: Run Once for All Items
Binary data in n8n is not a file object; it is a Base64 String.
Modifying Binary Data:
import base64
items = _input.all()
for item in items:
# 1. Extract Base64 string from input structure
b64_string = item['binary']['data']['data']
# 2. Decode to Bytes (and string if text)
file_content = base64.b64decode(b64_string).decode('utf-8')
# 3. Modify content
modified_content = file_content.replace('Draft', 'Final')
# 4. Re-encode to Base64
encoded = base64.b64encode(modified_content.encode('utf-8')).decode('utf-8')
# 5. Save back to the item
item['binary']['data']['data'] = encoded
return items
Creating Binary Data:
import base64
text_content = "<html><h1>Hello</h1></html>"
# Encode to base64 bytes, then decode to string for JSON compatibility
b64_string = base64.b64encode(text_content.encode('utf-8')).decode('utf-8')
return [{
'json': {'generated': True},
'binary': {
'data': {
'data': b64_string,
'mimeType': 'text/html',
'fileName': 'report.html'
}
}
}]
9.10 The Datetime Serialization Trap
n8n cannot automatically serialize Python datetime objects back to JSON. You must convert them to ISO strings before returning.
from datetime import datetime
items = _input.all()
for item in items:
# INPUT: n8n sends dates as ISO Strings
date_str = item['json']['created_at']
# Convert string to Python Object
dt_obj = datetime.fromisoformat(date_str.replace('Z', '+00:00'))
# ... perform logic (e.g. dt_obj.weekday()) ...
# OUTPUT: Must convert back to String
item['json']['processed_date'] = dt_obj.isoformat()
return items
10. HTTP Requests & Async Logic
In n8n's Python node, standard logic is synchronous. However, n8n provides a helper for HTTP requests that handles authentication automatically.
10.1 Using _helpers.httpRequest (Recommended)
Works in both Standard and Native modes. Requires await. This is often preferred over the requests library because it utilizes credentials stored in n8n.
# Top-level await is supported
response = await _helpers.httpRequest({
'method': 'GET',
'url': 'https://api.example.com/data',
'headers': {'Authorization': f"Bearer {_env['API_KEY']}"},
'json': True
})
return [{'json': response}]
10.2 Concurrent Requests (Batching)
Since n8n's Python environment handles async differently than Node.js, simple looping with await is the standard approach.
import asyncio
items = _input.all()
results = []
for item in items:
try:
# Perform request
res = await _helpers.httpRequest({
'url': 'https://api.example.com/update',
'method': 'POST',
'body': item['json']
})
results.append({'json': {'success': True, 'data': res}})
except Exception as e:
results.append({'json': {'success': False, 'error': str(e)}})
return results
11. Infrastructure Limits & Optimization
Even with Python, n8n is not a Big Data platform. You must respect hardware limits.
11.1 Memory Limits (OOM Kills)
n8n loads all input data into RAM. If you query 50,000 rows from a database and pass them to a Python node, the node will likely crash with an "Out of Memory" error or simply restart silently.
- Rule of Thumb: Keep total payload under 50MB per execution.
11.2 The Timeout Trap
Python nodes often have a default timeout (e.g., 300 seconds). Heavy ETL jobs will be killed if they exceed this.
11.3 Optimization Techniques
- Batching: If processing >5,000 items, use a "Split In Batches" node before the Python node. Set batch size to 500 or 1,000, process, and loop back.
- List Comprehensions: Generally faster than
forloops. - Generators: Use generators
(x for x in list)instead of lists[x for x in list]for intermediate calculations to save memory. - Vectorization: If Pandas is enabled, use DataFrame operations instead of looping.
-
Avoid N+1 Problems (Lookup Maps): Fetch reference data once before the loop and create a dictionary for O(1) lookups.
# Get all items from the "Refs" node ref_items = _("Refs").all() # Create a lookup dictionary: { '123': {Data}, '124': {Data} } id_map = {item['json']['id']: item['json'] for item in ref_items} items = _input.all() # O(1) lookup instead of O(N) search inside the loop return [ {'json': {**item['json'], 'match': id_map.get(item['json']['id'])}} for item in items ]
12. Real-World Recipes
12.1 LinkedIn URL ID Extraction
url = _("LinkedIn").item['json']['query']['url']
# Extract path, split by slash, get appropriate segment
linked_in_id = url.split('?')[0].strip('/').split('/')[-1]
return [{'json': {'id': linked_in_id}}]
12.2 Invoice Generation
Calculates totals, tax, and formats dates.
items = _input.all()
results = []
from datetime import datetime
for item in items:
# Calculate total
total = sum(line['qty'] * line['price'] for line in item['json']['items'])
results.append({
'json': {
**item['json'],
'total': f"{total:.2f}",
'greeting': f"Dear {item['json']['name']}",
'date': datetime.now().isoformat()
}
})
return results
12.3 Data Enrichment (Dictionary Merge)
user_data = _input.item['json']
# Simulate fetching extra data or pulling from variables
enrichment = {'status': 'vip', 'score': 99}
# Python 3.9+ syntax for merging dicts
merged = user_data | enrichment
return [{'json': merged}]
12.4 Cryptography (Hashlib)
import hashlib
email = _input.item['json']['email']
email_hash = hashlib.sha256(email.encode('utf-8')).hexdigest()
return [{'json': {'email_hash': email_hash}}]
13. FAQ and Checklists
13.1 Common Questions
- Can I use Python in Inline Expressions
{{ }}? No. Parameter fields are strictly JavaScript. Use a Code Node for Python. - Why does
import pandasfail? You must checkN8N_PYTHON_MODULE_ALLOW_LISTand ensure the library is installed in the environment (requires custom Docker image usually). - How do I access nested JSON? Use dictionary syntax:
item['json']['key']. Do not use dot notation (item.json.keywon't work). - "Object of type ... is not JSON serializable"? You are trying to return a Python object (like a
datetimeobject) directly. Convert it to a string first. - Can I use
print()? Yes, it logs to the execution console/server logs, but doesn't output data to the next node. Use it for debugging.
13.2 Final Checklist for Success
- [ ] Docker Setup: If using
pandas, did you build a custom image and add it toALLOW_LIST? - [ ] Mode Selection: Did you choose "Run Once for All Items" (usually best)?
- [ ] Return: Is your output wrapped in
[{ 'json': ... }]? - [ ] Syntax: Are you using
['key']access (Strict Dict) and not.key? - [ ] Imports: Did you import necessary libraries (
json,datetime)? - [ ] Dates: Did you
.isoformat()alldatetimeobjects before returning? - [ ] Binary: Did you Base64 decode/encode correctly?
- [ ] Error Handling: Wrapped risky code in
try...except? - [ ] Batching: Are you processing huge lists? If so, did you add a Split in Batches node?
Top comments (0)