How I Built a Local AI Drafting Pipeline Using n8n and Ollama

#n8n #ollama #ai #automation

A real build log of a local AI content pipeline of what worked, what failed, and why the boring solutions beat the clever ones.

The Problem With Paid AI Writing Tools

If you run multiple content sites, the math on AI writing APIs turns ugly fast. Every draft, every rewrite, every metadata pass costs tokens. Multiply that across six blogs with different niches and different content strategies, and you're looking at a monthly API bill that eats into whatever AdSense is paying out.

The alternative most people land on is prompting ChatGPT manually and copy-pasting into WordPress. That's not automation. That's just a fancier way to do the same work with an extra tab open.

The Stack

n8n — orchestration, running natively on Windows without Docker
Ollama — local inference, serving Mistral on a GTX 1660 (6GB VRAM)
WordPress REST API — draft delivery via application passwords, no plugins

One trigger. Six sub-workflows in sequence. Six WordPress drafts in 13 minutes at zero marginal cost.

What Broke First

phi4 Timed Out

phi4 at 9.1GB doesn't fit in 6GB of VRAM. It spills to CPU memory, inference slows to a crawl, and n8n times out before the draft completes. Mistral at 4.4GB fits entirely in VRAM. Reliable inference, pipeline completes. A draft with flaws beats a timeout with nothing.

JSON Output Format Broke Everything

Asking Mistral to return a JSON object with a multi-paragraph article in the content field is asking for parse errors. Newlines, apostrophes, and special characters, all of it breaks JSON parsing. Three sessions debugging the same failure from different angles.

The fix: plain text separators. ---TITLE---, ---CONTENT---. A Code node extracts sections using string indexing. String indexing doesn't care about special characters. It hasn't failed since.

The Groq Detour That Wasted an Hour

The plan included a quality rewrite pass via Groq's free API tier. n8n's HTTP Request node kept double-encoding the request body. An hour of debugging, zero progress. Cut the rewrite pass. Human editorial review replaced it and does better work anyway.

What Actually Works

Separator-based output format — Mistral writes content, the Code node handles structure. Neither task interferes with the other.

JavaScript scoring instead of LLM ranking — Sending topics back to Mistral for ranking produced completely different topics ranked in place of the originals. A JavaScript scoring function running in milliseconds checks demand signals and niche keywords. Reliable and fast.

One click, walk away — The master executor chains six sub-workflows via Execute Sub-workflow nodes. Change the content type (info/hook/money) in one field, hit execute, come back in 13 minutes to six drafts.

When Local Is Enough

Mistral on a mid-range consumer GPU is enough when the output feeds a human editorial process not when it needs to be publication-ready on its own. Topic clusters, first drafts, metadata suggestions: local handles all of it.

The pipeline is also designed to swap Ollama for a cloud API call without rebuilding anything. When revenue crosses the threshold where an API subscription costs less than the editorial time saved, the upgrade makes sense. Until then, local is the right call.