Yigit Konur

Posted on Nov 24

N8N Code Node Best Practices for Python (+Task Runner Examples)

#n8n #n8ncodenode #codenode #n8njscode

1. Introduction: The Engine Room of Automation

n8n is a robust, open-source workflow automation tool and low-code platform. While standard nodes (Set, If, Merge, Switch) handle linear task execution, the Python Code Node represents the "engine room" of sophisticated automation. It allows users to implement custom logic, complex data manipulations, and performance optimizations that standard nodes cannot achieve.

1.1 The Role of Python in n8n

The Code Node and Inline Expressions empower users to perform operations—specifically in data science, string manipulation, and mathematical calculation—that are often more verbose or difficult in JavaScript. A single Python node can effectively replace chains of 10 to 15 standard nodes, resulting in cleaner, more maintainable, and significantly faster workflows.

Why Use Python in n8n?

Logic Complexity: Supports nested loops, complex conditionals, regex, and advanced control flow where standard nodes would require multiple "If" nodes.
Data Processing: High performance processing. Can process 1,000+ items instantly in memory, whereas standard node chains have high overhead due to node instantiation per step.
Libraries: Access to pandas, numpy, math, re, scikit-learn, requests, and more (if configured).
Data Shaping: Advanced list comprehensions, flattening, pivoting, and restructuring of complex JSON/Arrays.
Batch Processing: Efficiently process multiple items together.
Cross-item Logic: Handle items that need to reference each other (e.g., deduplication based on global context).

1.2 Performance Benefits and Benchmarks

The performance benefits of consolidating logic into a Code node are substantial. When processing a dataset of 1,000 items:

Code Node (Python): Efficient execution via Pyodide or Local Python process.
Standard Set/If Node Chains: Significantly higher overhead due to node instantiation and context switching between nodes.

Verdict: For high-volume data, the Code Node is drastically faster due to reduced execution overhead. This makes it essential for enterprise-grade automations.

2. Architecture & Execution Environments

Before writing code, it is critical to understand where and how your Python code runs. You must select an environment via the Language dropdown in the Code Node.

2.1 Option A: Python (Standard) - The Default

Engine: Pyodide (WebAssembly) running inside the main n8n process.
Best For: String manipulation, date formatting, regex, math, and small loops (<500 items).
Startup: Instant (no network overhead).
Limitations:
- No access to the host filesystem.
- Cannot use C-extension libraries like pandas or numpy.
- Slower for heavy iteration compared to native Python.

2.2 Option B: Python (Native) - The Heavy Lifter

Engine: A full Python process running on an isolated Task Runner (or sidecar).
Best For: Heavy Data Processing, ETL, Machine Learning, and large datasets (10k+ items).
Capabilities: Full access to external libraries (pandas, numpy, scikit-learn, requests) if configured.
Startup: Slight latency (~10-50ms) for serialization/network, but exponentially faster execution speeds for data.

3. Configuration & Infrastructure: The Docker Reality

Many users fail here. Simply adding pandas to the "Allow List" is not enough; the library must physically exist in the container. The default n8n image is lightweight and does not contain data science libraries (though some versions may pre-install pandas/numpy, relying on this is risky without verification).

3.1 Step 1: Build a Custom Docker Image

To use libraries like pandas, numpy, or requests in the Native environment, you must extend the base n8n image.

Dockerfile:

# Use the official n8n image as base
FROM n8nio/n8n:latest

USER root

# Install Python3 and PIP (Alpine Linux syntax)
RUN apk add --update --no-cache python3 py3-pip

# Install the heavy libraries
RUN pip3 install pandas numpy requests beautifulsoup4

# Switch back to the node user for security
USER node

3.2 Step 2: Configure the Allow List

Once your custom image is running, you must tell n8n it is safe to import these modules via Environment Variables on the n8n instance (or Task Runners).

Docker Compose (docker-compose.yml):

services:
  n8n:
    build: . # Point to your custom Dockerfile
    # or image: n8nio/n8n if using default
    environment:
      # Allow specific external Python modules (comma separated)
      - N8N_PYTHON_MODULE_ALLOW_LIST=pandas,numpy,requests,beautifulsoup4
    ports:
      - "5678:5678"

4. Modules and Libraries

4.1 Standard Library (Built-in)

These are available immediately in both Standard (Pyodide) and Native environments without configuration.

Module	Purpose	Example Code
`json`	JSON parsing/dumping	`json.loads('{"a":1}')`
`re`	Regular Expressions	`re.search(r'pattern', text)`
`math`	Mathematical functions	`math.ceil(4.2)`
`datetime`	Date and time handling	`datetime.now()`
`hashlib`	Hashing (SHA256, MD5)	`hashlib.sha256(b'data')`
`random`	Random number generation	`random.choice(my_list)`
`itertools`	Efficient looping	`itertools.chain(...)`
`collections`	Specialized containers	`collections.Counter(data)`
`urllib`	URL handling	`urllib.parse.urlparse(url)`

4.2 Common External Packages (Requires Configuration)

These must be installed in the environment and allowed via N8N_PYTHON_MODULE_ALLOW_LIST.

Package	Use Case	Quick Example
`pandas`	Heavy Data Analysis	`df = pd.DataFrame(data)`
`numpy`	Scientific Computing	`np.array([1, 2, 3])`
`requests`	HTTP Requests (Synchronous)	`requests.get('https://...')`
`beautifulsoup4`	HTML Parsing/Scraping	`BeautifulSoup(html, 'html.parser')`
`faker`	Generating fake data	`faker.Faker().name()`

4.3 Restrictions

System Access: Direct file system access (open(), os.system) is often restricted or ephemeral depending on the hosting environment.
Async: While Python supports asyncio, the n8n Python node logic is often written synchronously for data transformation, though await is required for n8n specific helpers (like _helpers.httpRequest).

5. Syntax & Core Concepts: The "Strict Dictionary" Rule

The most common mistake developers make is applying JavaScript dot-notation to Python. n8n Python nodes require Strict Dictionary Syntax.

5.1 Syntax Cheatsheet

Feature	❌ JS / Legacy Style (Avoid)	✅ Python Standard (Use This)
JSON Access	`item.json.myField`	`item['json']['myField']`
Nested Data	`item.json.user.id`	`item['json']['user']['id']`
Safe Access	`item.json.optional?`	`item['json'].get('optional', default_val)`
Params	`_input.params.myVal`	`_input.params['myVal']`

5.2 Inline Expressions vs. Code Node

It is vital to distinguish between the two systems:

Inline Expressions {{ $json.id }}: Always JavaScript. Acts as the "Tournament engine". Used inside node parameters (toggle "Fixed / Expression").
Code Node: Python. Used for logic scripts.

5.3 The Bridge: Using Python to Replace Inline Logic

Because Inline Expressions are JS-only, if you prefer Python, you should move that logic into a Python Code Node preceding your target node.

Example: Instead of {{ $json.name.toUpperCase() }} in a Set node (JS), do this in a Python node:

for item in _input.all():
    item['json']['name'] = item['json']['name'].upper()
return _input.all()

6. Python Equivalents to Inline Transformations

Since you cannot use Python in the {{ }} fields, here are the Python Code Node equivalents for common transformations.

6.1 String Transformations

Check Email: re.match(r"[^@]+@[^@]+\.[^@]+", email)
Extract Domain: email.split('@')[1] or urllib.parse
Remove Tags: re.sub('<[^<]+?>', '', text)
Base64: base64.b64encode(data)
Snake Case: text.lower().replace(' ', '_')
Trim: text.strip()

6.2 List (Array) Transformations

Sum: sum([10, 20]) → 30
Remove Duplicates: list(set(my_list))
Merge: list1 + list2 or list1.extend(list2)
Is Empty: len(my_list) == 0
Random Item: random.choice(my_list)
First/Last: my_list[0] / my_list[-1]

6.3 Number Transformations

Round: round(amount, 2)
To Boolean: bool(value)
Format: "{:,.2f}".format(price) (e.g., 1,000.00)
Is Even: num % 2 == 0

6.4 Dictionary (Object) Transformations

Is Empty: not my_dict
Remove Key: my_dict.pop('key', None)
Merge: {**dict1, **dict2} or dict1 | dict2 (Python 3.9+)
To JSON String: json.dumps(my_dict)

6.5 Date & Time (datetime)

Current Time: now = datetime.now()
Parse String: dt = datetime.fromisoformat('2023-01-01')
Add Time: dt + timedelta(weeks=1)
Format: dt.strftime('%Y-%m-%d')
Is Weekend: dt.weekday() >= 5

7. Input, Output, and Execution Modes

How you access data depends entirely on the "Mode" toggle. Warning: Using the wrong access method for the selected mode will crash the node.

7.1 Mode: "Run Once for All Items" (Recommended)

The script runs once. You receive the full batch of data as a list.

Best For: Aggregation, Filtering, Pandas, Cross-referencing items.
Performance: Most performant. The interpreter initializes once.
Data Access: _input.all() (Returns List).
Unavailable: _input.item (Will throw error).

items = _input.all()
# Process all items at once
return [
    {'json': {**item['json'], 'processed': True}}
    for item in items
]

7.2 Mode: "Run Once for Each Item"

The script runs separately for every single incoming item.

Best For: Simple 1-to-1 transformations where items are isolated.
Performance Hit: Higher overhead (interpreter context switching).
Data Access: _input.item (Returns Dictionary).
Unavailable: _input.all() (Will throw error). The list of other items is not available.

7.3 The Output Contract

To prevent workflow errors, your return statement must strictly follow the n8n data structure: A List of Dictionaries containing a 'json' key.

Strict Rules:

Always Return a List: Output must be [...].
Use the json Key: Data must be inside a dictionary key named 'json'.
No Primitives: return "success" is invalid.
Dictionary Structure: {'json': {...}, 'binary': {...}}.

The Golden Valid Format:

return [
    { 'json': { 'id': 1, 'name': 'Alice' } },
    { 'json': { 'id': 2, 'name': 'Bob' } }
]

8. Accessing Data Contexts & State Management

n8n exposes special objects prefixed with _.

8.1 Accessing Input Data

Variable	Description	Execution Mode
`_input.all()`	Returns a list of all items.	"Run Once for All"
`_input.item`	The specific item (Dict) being processed.	"Run Once for Each"
`_input.first()`	The very first item in the batch.	Both
`_input.last()`	The last item in the batch.	Both
`_input.params`	Node configuration parameters.	Both

8.2 Accessing Other Nodes (Cross-Node)

_("NodeName").all(): Access output from any previous node.
_("NodeName").first() / .last(): First/Last item.
_("NodeName").item: (Run Once Each) Item aligned with current index.

8.3 Global Variables & Metadata

_vars['my_var']: Global variables defined in Workflow Settings.
_env['API_KEY']: System environment variables.
_secrets: External secrets.
_workflow: Metadata (id, name, active).
_execution: Metadata (id, mode).

8.4 Persisting State (`staticData`)

Variables (_vars) are constant. If you need to store data between executions (e.g., for deduplication or polling triggers), you must use staticData.

Note: staticData is only saved when the workflow is **Active. It does not persist in manual "Test" runs.

Basic Example (Polling):

# Access global static data store
static_data = _get_workflow_static_data('global')

# Retrieve last processed ID (default to 0)
last_id = static_data.get('last_id', 0)

# Update the state for the next execution
static_data['last_id'] = 150

Caching Example:

static_data = _getWorkflowStaticData('global')

# Initialize cache if not exists
if 'cache' not in static_data:
    static_data['cache'] = {}

item_id = str(_input.item['json']['id'])

# Check cache
if item_id in static_data['cache']:
    return [{'json': static_data['cache'][item_id]}]

# Fetch and update cache
data = {'id': item_id, 'fetched': True} 
static_data['cache'][item_id] = data

return [{'json': data}]

9. Essential Transformations & Recipes

9.1 Data Cleaning (List Comprehension)

Mode: Run Once for All Items

items = _input.all()
return [
    {
        'json': {
            # .get() for safety (like optional chaining)
            'id': item['json'].get('user_id', 'unknown'),
            'email': item['json'].get('email', '').strip().lower(),
            # Default value
            'role': item['json'].get('role', 'guest')
        }
    }
    for item in items
    if item['json'].get('isActive') # Filter: Only active users
]

9.2 Filtering

Use Python's list comprehensions with if clauses.

items = _input.all()

# Filter users that are active and have a company email
filtered_users = [
    {'json': {**item['json'], 'valid': True}} # Unpack and add flag
    for item in items
    if item['json'].get('isActive') and '@company.com' in item['json'].get('email', '')
]

return filtered_users

9.3 High-Speed Aggregation (Pandas)

Mode: Run Once for All Items | **Requires: Python (Native) & Custom Docker Image* *

import pandas as pd

# 1. Load Data
data = [i['json'] for i in _input.all()]
df = pd.DataFrame(data)

# 2. Group by Category and Sum Price
report = df.groupby('category')['price'].sum().reset_index()

# 3. Convert back to n8n format
return [{'json': row} for row in report.to_dict(orient='records')]

9.4 Grouping and Aggregation (Standard Python)

If you cannot use Pandas, use collections.defaultdict.

from collections import defaultdict

items = _input.all()
grouped = defaultdict(list)

for item in items:
    cat = item['json'].get('category', 'Other')
    grouped[cat].append(item['json'])

results = []
for category, products in grouped.items():
    total_price = sum(p['price'] for p in products)
    results.append({
        'json': {
            'category': category,
            'count': len(products),
            'total': total_price
        }
    })

return results

9.5 Flattening

Explode lists (e.g., 1 Order → 5 Line Items).

items = _input.all()
results = []

for item in items:
    order_id = item['json']['id']
    for product in item['json']['lineItems']:
        results.append({
            'json': {
                'orderId': order_id,
                'productName': product['name'],
                'price': product['price']
            }
        })

return results

9.6 Recursive Flattening

For deep API responses.

def flatten_dict(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

return [{'json': flatten_dict(_input.item['json'])}]

9.7 Multi-Input Handling

If you cannot use a Merge node:

inputs = _input.all()
if len(inputs) < 2:
    raise Exception('Expected 2 inputs')

primary = inputs[0]
secondary = inputs[1]

return [{'json': {'primary': primary['json'], 'secondary': secondary['json']}}]

9.8 Item Linking (`pairedItem`)

Preserve lineage for UI debugging.

items = _input.all()
output = []
for index, item in enumerate(items):
    output.append({
        'json': item['json'],
        'pairedItem': {'item': index}
    })
return output

9.9 Binary Data Manipulation

Mode: Run Once for All Items
Binary data in n8n is not a file object; it is a Base64 String.

Modifying Binary Data:

import base64

items = _input.all()

for item in items:
    # 1. Extract Base64 string from input structure
    b64_string = item['binary']['data']['data']

    # 2. Decode to Bytes (and string if text)
    file_content = base64.b64decode(b64_string).decode('utf-8')

    # 3. Modify content
    modified_content = file_content.replace('Draft', 'Final')

    # 4. Re-encode to Base64
    encoded = base64.b64encode(modified_content.encode('utf-8')).decode('utf-8')

    # 5. Save back to the item
    item['binary']['data']['data'] = encoded

return items

Creating Binary Data:

import base64

text_content = "<html><h1>Hello</h1></html>"
# Encode to base64 bytes, then decode to string for JSON compatibility
b64_string = base64.b64encode(text_content.encode('utf-8')).decode('utf-8')

return [{
    'json': {'generated': True},
    'binary': {
        'data': {
            'data': b64_string,
            'mimeType': 'text/html',
            'fileName': 'report.html'
        }
    }
}]

9.10 The Datetime Serialization Trap

n8n cannot automatically serialize Python datetime objects back to JSON. You must convert them to ISO strings before returning.

from datetime import datetime

items = _input.all()

for item in items:
    # INPUT: n8n sends dates as ISO Strings
    date_str = item['json']['created_at'] 
    # Convert string to Python Object
    dt_obj = datetime.fromisoformat(date_str.replace('Z', '+00:00'))

    # ... perform logic (e.g. dt_obj.weekday()) ...

    # OUTPUT: Must convert back to String
    item['json']['processed_date'] = dt_obj.isoformat() 

return items

10. HTTP Requests & Async Logic

In n8n's Python node, standard logic is synchronous. However, n8n provides a helper for HTTP requests that handles authentication automatically.

10.1 Using `_helpers.httpRequest` (Recommended)

Works in both Standard and Native modes. Requires await. This is often preferred over the requests library because it utilizes credentials stored in n8n.

# Top-level await is supported
response = await _helpers.httpRequest({
    'method': 'GET',
    'url': 'https://api.example.com/data',
    'headers': {'Authorization': f"Bearer {_env['API_KEY']}"},
    'json': True
})

return [{'json': response}]

10.2 Concurrent Requests (Batching)

Since n8n's Python environment handles async differently than Node.js, simple looping with await is the standard approach.

import asyncio

items = _input.all()
results = []

for item in items:
    try:
        # Perform request
        res = await _helpers.httpRequest({
            'url': 'https://api.example.com/update', 
            'method': 'POST',
            'body': item['json']
        })
        results.append({'json': {'success': True, 'data': res}})
    except Exception as e:
        results.append({'json': {'success': False, 'error': str(e)}})

return results

11. Infrastructure Limits & Optimization

Even with Python, n8n is not a Big Data platform. You must respect hardware limits.

11.1 Memory Limits (OOM Kills)

n8n loads all input data into RAM. If you query 50,000 rows from a database and pass them to a Python node, the node will likely crash with an "Out of Memory" error or simply restart silently.

Rule of Thumb: Keep total payload under 50MB per execution.

11.2 The Timeout Trap

Python nodes often have a default timeout (e.g., 300 seconds). Heavy ETL jobs will be killed if they exceed this.

11.3 Optimization Techniques

Batching: If processing >5,000 items, use a "Split In Batches" node before the Python node. Set batch size to 500 or 1,000, process, and loop back.
List Comprehensions: Generally faster than for loops.
Generators: Use generators (x for x in list) instead of lists [x for x in list] for intermediate calculations to save memory.
Vectorization: If Pandas is enabled, use DataFrame operations instead of looping.

Avoid N+1 Problems (Lookup Maps): Fetch reference data once before the loop and create a dictionary for O(1) lookups.

# Get all items from the "Refs" node
ref_items = _("Refs").all()
# Create a lookup dictionary: { '123': {Data}, '124': {Data} }
id_map = {item['json']['id']: item['json'] for item in ref_items}

items = _input.all()
# O(1) lookup instead of O(N) search inside the loop
return [
    {'json': {**item['json'], 'match': id_map.get(item['json']['id'])}}
    for item in items
]

12. Real-World Recipes

12.1 LinkedIn URL ID Extraction

url = _("LinkedIn").item['json']['query']['url']
# Extract path, split by slash, get appropriate segment
linked_in_id = url.split('?')[0].strip('/').split('/')[-1]
return [{'json': {'id': linked_in_id}}]

12.2 Invoice Generation

Calculates totals, tax, and formats dates.

items = _input.all()
results = []
from datetime import datetime

for item in items:
    # Calculate total
    total = sum(line['qty'] * line['price'] for line in item['json']['items'])

    results.append({
        'json': {
            **item['json'],
            'total': f"{total:.2f}",
            'greeting': f"Dear {item['json']['name']}",
            'date': datetime.now().isoformat()
        }
    })

return results

12.3 Data Enrichment (Dictionary Merge)

user_data = _input.item['json']
# Simulate fetching extra data or pulling from variables
enrichment = {'status': 'vip', 'score': 99}

# Python 3.9+ syntax for merging dicts
merged = user_data | enrichment 

return [{'json': merged}]

12.4 Cryptography (Hashlib)

import hashlib

email = _input.item['json']['email']
email_hash = hashlib.sha256(email.encode('utf-8')).hexdigest()

return [{'json': {'email_hash': email_hash}}]

13. FAQ and Checklists

13.1 Common Questions

Can I use Python in Inline Expressions {{ }}? No. Parameter fields are strictly JavaScript. Use a Code Node for Python.
Why does import pandas fail? You must check N8N_PYTHON_MODULE_ALLOW_LIST and ensure the library is installed in the environment (requires custom Docker image usually).
How do I access nested JSON? Use dictionary syntax: item['json']['key']. Do not use dot notation (item.json.key won't work).
"Object of type ... is not JSON serializable"? You are trying to return a Python object (like a datetime object) directly. Convert it to a string first.
Can I use print()? Yes, it logs to the execution console/server logs, but doesn't output data to the next node. Use it for debugging.

13.2 Final Checklist for Success

[ ] Docker Setup: If using pandas, did you build a custom image and add it to ALLOW_LIST?
[ ] Mode Selection: Did you choose "Run Once for All Items" (usually best)?
[ ] Return: Is your output wrapped in [{ 'json': ... }]?
[ ] Syntax: Are you using ['key'] access (Strict Dict) and not .key?
[ ] Imports: Did you import necessary libraries (json, datetime)?
[ ] Dates: Did you .isoformat() all datetime objects before returning?
[ ] Binary: Did you Base64 decode/encode correctly?
[ ] Error Handling: Wrapped risky code in try...except?
[ ] Batching: Are you processing huge lists? If so, did you add a Split in Batches node?