DEV Community

FlowYantra
FlowYantra

Posted on

How to Run AI Workflows Locally with n8n + Ollama (No API Costs)

Every AI workflow tutorial assumes you are paying OpenAI $0.03 per 1K tokens. But what if you could run the same workflows locally, with zero API costs, and keep your data on your own machine?

You can. Here is how to connect n8n with Ollama to build local AI workflows.


What You Need

  • n8n (self-hosted or desktop) -- install guide
  • Ollama -- local LLM runner, dead simple to install
  • A machine with at least 8GB RAM (16GB recommended for larger models)

That is it. No API keys, no billing dashboards, no usage limits.


Step 1: Install Ollama

Head to ollama.com and download the installer for your OS. On Mac and Windows it is a standard installer. On Linux:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Verify it is running:

ollama --version
Enter fullscreen mode Exit fullscreen mode

Ollama runs a local API server on http://localhost:11434 by default. This is what n8n will talk to.


Step 2: Pull a Model

Ollama supports dozens of open-source models. For workflow automation, I recommend starting with one of these:

# Fast and lightweight (3.8B params) -- good for summarization and extraction
ollama pull phi3

# More capable (7B params) -- good for content generation
ollama pull mistral
Enter fullscreen mode Exit fullscreen mode

phi3 runs comfortably on 8GB RAM. mistral needs about 8GB free and runs better with 16GB.

Test it works:

ollama run phi3 "Summarize this in one sentence: n8n is an open-source workflow automation tool."
Enter fullscreen mode Exit fullscreen mode

You should see a response in a few seconds. Model is loaded and ready.


Step 3: Connect n8n to Ollama

Ollama exposes an OpenAI-compatible API. In n8n, you connect to it using the HTTP Request node -- no special plugin needed.

The endpoint you will call:

POST http://localhost:11434/api/generate
Enter fullscreen mode Exit fullscreen mode

The request body:

{
  "model": "phi3",
  "prompt": "Your prompt here",
  "stream": false
}
Enter fullscreen mode Exit fullscreen mode

Setting stream: false is important -- it makes Ollama return the complete response in one JSON object instead of streaming chunks.

The response looks like:

{
  "model": "phi3",
  "response": "The generated text appears here...",
  "done": true
}
Enter fullscreen mode Exit fullscreen mode

You grab {{ $json.response }} in the next node and use it however you want.


Step 4: Build a Text Summarizer Workflow

Let us build a practical example: a webhook that accepts text and returns an AI-generated summary.

The workflow (4 nodes):

1. Webhook node (trigger)

  • Method: POST
  • Path: /summarize
  • This receives the text to summarize

2. HTTP Request node (Ollama call)

  • Method: POST
  • URL: http://localhost:11434/api/generate
  • Body (JSON):
{
  "model": "phi3",
  "prompt": "Summarize the following text in 2-3 sentences. Be concise and capture the key points.\n\nText: {{ $json.body.text }}",
  "stream": false
}
Enter fullscreen mode Exit fullscreen mode

3. Set node (format response)

  • Set a field summary to {{ $json.response }}

4. Respond to Webhook node

  • Returns the summary to the caller

Test it:

curl -X POST http://localhost:5678/webhook/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "n8n is a workflow automation tool that allows users to connect various services and automate tasks. It supports over 400 integrations and can be self-hosted for complete data privacy. Unlike SaaS alternatives, n8n has no per-task pricing, making it cost-effective for high-volume automation."}'
Enter fullscreen mode Exit fullscreen mode

You get back a clean summary, generated locally, with zero API costs.


Going Further: Chat Completions API

Ollama also supports the OpenAI-compatible chat completions endpoint:

POST http://localhost:11434/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

This means you can use n8n's built-in OpenAI node by pointing it at http://localhost:11434/v1 as a custom base URL. Same node, same interface, but the model runs on your hardware.


When to Use Local vs Cloud AI

Use Case Local (Ollama) Cloud (OpenAI/Claude)
Summarization Great Overkill
Text extraction Great Overkill
Content generation Good (7B+ models) Better quality
Complex reasoning Limited Much better
Data privacy Full control Data leaves your machine
Cost at scale Free Adds up fast

For most automation tasks -- summarizing, extracting, classifying, reformatting -- a local 7B model is more than enough. Save the cloud APIs for tasks that genuinely need GPT-4 level reasoning.


What We Are Building

At FlowYantra, we are working on n8n templates that use local LLMs for privacy-first automation. If you need a cloud-based AI workflow right now, check out our Blog to Social AI template -- it uses OpenAI today but the architecture is model-agnostic, so swapping to Ollama is straightforward.

All our templates (free and paid) are on our Gumroad store.

Local AI is not the future. It is already here. And with n8n, it takes about 20 minutes to set up.

Top comments (0)