I have been writing agents that build n8n workflows. The hard part is not "call the n8n API and post a workflow JSON." The hard part is "pick the right node, with the right operation, with the right parameters, without hallucinating fields that do not exist."
The n8n GUI is the source of truth. The TypeScript source files are the second source of truth. Neither is a thing you can hand to an LLM at inference time.
So I extracted everything into one structured catalog and put it on HuggingFace.
524 nodes. Every operation. Every credential type. Properties schema. Free. CC-BY-4.0.
- HuggingFace: automatelab/n8n-nodes-catalog
- Browsable index: automatelab.tech/products/datasets/n8n-nodes-catalog/
Why a catalog and not "just scrape n8n.io"
Three real problems with leaving this implicit:
-
Hallucination cost is high. An LLM that invents a
slack.sendDMoperation will produce a workflow that imports fine and fails at runtime. Hard to detect, expensive to debug. - Context window pressure. Dropping the entire n8n source tree into a prompt is not realistic. You want a compact index the agent can search.
-
Coverage is non-obvious. There are two source packages (
nodes-baseand@n8n/nodes-langchain), and the split between them is not visible in the UI.
The catalog flattens all of that into one row per node.
What is in each row
| Field | What it is |
|---|---|
node_name |
Internal id (e.g. slack, airtable, lmChatOpenAi) |
display_name |
UI label |
categories |
Top-level categories (Communication, AI, Data and Storage) |
subcategories |
Leaf taxonomy values |
group |
input, output, or transform
|
version |
Default version for multi-version nodes |
description |
One-liner |
credentials_required |
Credential type names (e.g. slackApi, openAiApi) |
operations_supported |
Operation values for the node |
properties_schema |
JSON describing top-level property descriptors |
source_package |
nodes-base or @n8n
|
source_file_path |
Repo-relative path to the .node.ts
|
github_permalink |
Pinned GitHub link to the source |
Format: JSON and Parquet (Snappy). License: CC-BY-4.0. Updates monthly.
A sample row
Here is the Slack node, trimmed:
{
"node_name": "slack",
"display_name": "Slack",
"categories": ["Communication"],
"group": ["transform"],
"version": "2.3",
"description": "Send and read messages, manage channels",
"credentials_required": ["slackApi"],
"operations_supported": ["message", "channel", "user", "reaction"],
"properties_schema": "[{\"name\":\"resource\",\"type\":\"options\"},{\"name\":\"operation\",\"type\":\"options\"}]",
"source_package": "nodes-base",
"github_permalink": "https://github.com/n8n-io/n8n/blob/stable/packages/nodes-base/nodes/Slack/Slack.node.ts"
}
And an AI node, to show the cross-package coverage:
{
"node_name": "lmChatOpenAi",
"display_name": "OpenAI Chat Model",
"categories": ["AI"],
"subcategories": ["Language Models", "Chat Models (Recommended)"],
"group": ["transform"],
"version": "1.3",
"credentials_required": ["openAiApi"],
"source_package": "@n8n",
"source_file_path": "packages/@n8n/nodes-langchain/nodes/llms/LMChatOpenAi/LmChatOpenAi.node.ts"
}
Numbers I did not expect
A few things that fell out of the catalog once it existed:
-
431 nodes from
nodes-base, 93 from@n8n/nodes-langchain. The langchain side is a real and growing chunk. - The single most common credential type, by a wide margin, is
httpBasicAuth(because the generic HTTP Request node is everywhere). After that the long tail starts immediately. - A non-trivial number of nodes have an empty
operations_supportedlist. Those are usually root nodes (LLMs, vector stores, output parsers) where the "operation" abstraction does not apply.
Useful to know if you are writing a planner that filters by operation.
How agents actually use it
from datasets import load_dataset
ds = load_dataset("automatelab/n8n-nodes-catalog")["train"]
# Filter to nodes that can post messages somewhere
messaging = ds.filter(
lambda r: "message" in (r["operations_supported"] or [])
)
for row in messaging:
print(row["node_name"], row["credentials_required"])
Typical pipeline:
- Embed every row (description, operations, credentials) into a vector store.
- At plan time, retrieve the top N nodes for a user request.
- Hand the agent only those rows. Compact context, no hallucinated operations.
- The agent emits an n8n workflow JSON. Validation against
properties_schemacatches malformed configs before deploy.
This is the same shape as RAG over a tool catalog, which is becoming a pattern in its own right.
Caveats
- The properties schema is a top-level summary, not the full recursive parameter tree. For deep parameter shapes the
github_permalinkis your friend. - Multi-version nodes only report the default version. If you need every version of a node, the source link covers it.
- License is CC-BY-4.0 on the catalog additions; the n8n source itself is governed by n8n's own license, which you should respect when you ship.
Links
- Dataset: automatelab/n8n-nodes-catalog
- Browsable index: automatelab.tech/products/datasets/n8n-nodes-catalog/
- All open datasets: automatelab.tech/products/datasets/
If you build agent tooling on top of this, the thing I would most like to see is an open eval set: prompts in, expected n8n workflow JSON out. That is the next obvious missing piece, and I do not think anyone has shipped one yet.
Top comments (0)