FlowYantra

Posted on Mar 21

How to Run AI Workflows Locally with n8n + Ollama (No API Costs)

#n8n #ollama #automation #ai

Every AI workflow tutorial assumes you are paying OpenAI $0.03 per 1K tokens. But what if you could run the same workflows locally, with zero API costs, and keep your data on your own machine?

You can. Here is how to connect n8n with Ollama to build local AI workflows.

What You Need

n8n (self-hosted or desktop) -- install guide
Ollama -- local LLM runner, dead simple to install
A machine with at least 8GB RAM (16GB recommended for larger models)

That is it. No API keys, no billing dashboards, no usage limits.

Step 1: Install Ollama

Head to ollama.com and download the installer for your OS. On Mac and Windows it is a standard installer. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

Verify it is running:

ollama --version

Ollama runs a local API server on http://localhost:11434 by default. This is what n8n will talk to.

Step 2: Pull a Model

Ollama supports dozens of open-source models. For workflow automation, I recommend starting with one of these:

# Fast and lightweight (3.8B params) -- good for summarization and extraction
ollama pull phi3

# More capable (7B params) -- good for content generation
ollama pull mistral

phi3 runs comfortably on 8GB RAM. mistral needs about 8GB free and runs better with 16GB.

Test it works:

ollama run phi3 "Summarize this in one sentence: n8n is an open-source workflow automation tool."

You should see a response in a few seconds. Model is loaded and ready.

Step 3: Connect n8n to Ollama

Ollama exposes an OpenAI-compatible API. In n8n, you connect to it using the HTTP Request node -- no special plugin needed.

The endpoint you will call:

POST http://localhost:11434/api/generate

The request body:

{
  "model": "phi3",
  "prompt": "Your prompt here",
  "stream": false
}

Setting stream: false is important -- it makes Ollama return the complete response in one JSON object instead of streaming chunks.

The response looks like:

{
  "model": "phi3",
  "response": "The generated text appears here...",
  "done": true
}

You grab {{ $json.response }} in the next node and use it however you want.

Step 4: Build a Text Summarizer Workflow

Let us build a practical example: a webhook that accepts text and returns an AI-generated summary.

The workflow (4 nodes):

1. Webhook node (trigger)

Method: POST
Path: /summarize
This receives the text to summarize

2. HTTP Request node (Ollama call)

Method: POST
URL: http://localhost:11434/api/generate
Body (JSON):

{
  "model": "phi3",
  "prompt": "Summarize the following text in 2-3 sentences. Be concise and capture the key points.\n\nText: {{ $json.body.text }}",
  "stream": false
}

3. Set node (format response)

Set a field summary to {{ $json.response }}

4. Respond to Webhook node

Returns the summary to the caller

Test it:

curl -X POST http://localhost:5678/webhook/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "n8n is a workflow automation tool that allows users to connect various services and automate tasks. It supports over 400 integrations and can be self-hosted for complete data privacy. Unlike SaaS alternatives, n8n has no per-task pricing, making it cost-effective for high-volume automation."}'

You get back a clean summary, generated locally, with zero API costs.

Going Further: Chat Completions API

Ollama also supports the OpenAI-compatible chat completions endpoint:

POST http://localhost:11434/v1/chat/completions

This means you can use n8n's built-in OpenAI node by pointing it at http://localhost:11434/v1 as a custom base URL. Same node, same interface, but the model runs on your hardware.

When to Use Local vs Cloud AI

Use Case	Local (Ollama)	Cloud (OpenAI/Claude)
Summarization	Great	Overkill
Text extraction	Great	Overkill
Content generation	Good (7B+ models)	Better quality
Complex reasoning	Limited	Much better
Data privacy	Full control	Data leaves your machine
Cost at scale	Free	Adds up fast

For most automation tasks -- summarizing, extracting, classifying, reformatting -- a local 7B model is more than enough. Save the cloud APIs for tasks that genuinely need GPT-4 level reasoning.

What We Are Building

At FlowYantra, we are working on n8n templates that use local LLMs for privacy-first automation. If you need a cloud-based AI workflow right now, check out our Blog to Social AI template -- it uses OpenAI today but the architecture is model-agnostic, so swapping to Ollama is straightforward.

All our templates (free and paid) are on our Gumroad store.

Local AI is not the future. It is already here. And with n8n, it takes about 20 minutes to set up.

DEV Community