Self-Hosting Gemma 4 for Production Automation Revealed Two Ollama Bugs

Robin Converse — Sat, 16 May 2026 11:27:31 +0000

I thought Gemma 4's reasoning traces were wasting tokens. During testing, I realized they were acting as an audit layer for automation. That realization changed how I designed an n8n node for self-hosted AI workflows.

In most automation systems, the model output is the only thing the operator sees. But once AI starts triggering downstream workflows, hidden reasoning becomes operationally important. If the model is making decisions on behalf of a business, the logic path matters as much as the final response.

Here's what I built, what I found, and what it means for AI automation on owned infrastructure.

What I Built

An n8n community node that connects any n8n workflow to a self-hosted Gemma 4 26B MoE endpoint. The node calls Ollama's native /api/generate API, returns clean text, and works with a custom model called triava-prod — a Gemma 4 26B derivative with Triava Labs' brand voice baked in.

The tagline for Triava Labs is "Your model. Your voice. Your business." This node operationalizes that idea.

Repo: github.com/triavalabs/n8n-nodes-triava

The Infrastructure

Everything runs on a single Hetzner CCX33 server: Ollama serving the model, Caddy as reverse proxy, Let's Encrypt for SSL.

No GPU cluster.
No cloud API dependency.
One server, owned infrastructure, real inference.

triava-prod is a Q4_K_M quantization of Gemma 4 26B MoE — 25.8B parameters loaded, roughly 4B active per token. Built using Ollama's Modelfile system with a custom system prompt that encodes Triava's brand voice:

SYSTEM "You are a direct, professional AI assistant for independent operators.
Reply with the answer only. Never show reasoning, drafts, or thinking process.
Match the operator's voice and tone. Be concise unless asked for detail."

Why Gemma 4 26B MoE

The MoE design gives high-capability reasoning behavior at roughly 4B active-parameter inference cost per token. That means it runs at practical throughput on a single owned server — which is the whole point of sovereign infrastructure. A model that requires an A100 cluster isn't sovereign in any meaningful sense for an independent operator or small agency.

Gemma 4 also introduced native system-role support. That matters specifically for this project because the brand voice IS a system prompt. The whole pipeline depends on reliable system-role adherence and consistent on-voice output.

Then I actually tested it in production-like conditions:

Cold inference on a Hetzner CCX33: ~16-31 seconds via /api/generate for a full brand-voice response
Output quality: coherent, on-tone, holds the voice across 150+ word outputs

The model reasons before writing.

What initially looked like a bug turned out to be a feature.

What I Actually Discovered

Two upstream Ollama bugs, found through methodical testing during Phase 2 build.

Bug 1 — `/v1/chat/completions` returns empty content for all Gemma 4 models

(Ollama issue #15288)

When using Gemma 4 via the OpenAI-compatible endpoint, the content field is always empty. The reasoning trace exhausts the max_tokens budget before any final output is generated. I confirmed this affects the base gemma4:26b model too — it's not a Modelfile issue.

I diagnosed it with five comparative curl tests: three against /v1/chat/completions (all empty), two against /api/generate (both clean). The native endpoint folds reasoning and output into one response field and runs 4× faster — ~16s vs ~60s.

Decision: the node targets /api/generate. This isn't a workaround — it's the correct endpoint for Gemma 4 on Ollama right now.

Bug 2 — Long system prompts return empty responses on the 26B MoE

(Ollama issue #15428)

The Gemma 4 26B MoE returns empty output when the combined system prompt exceeds roughly 500 characters. Dense models handle the same prompt correctly. This is isolated to the MoE architecture.

Practical implication: triava-prod already has the brand voice baked into its Modelfile. Passing an additional system prompt from the node adds to the total and can hit the threshold. The fix: leave the System Prompt field blank when using triava-prod. The voice lives in the model, not in the API call.

This is what "your voice" actually means architecturally. The brand voice isn't injected per-call. It lives in the model you own.

The Demo

The workflow: Manual Trigger → Triava node → brand-voice content.

n8n Workflow
     ↓
Triava Node
     ↓
Ollama /api/generate
     ↓
triava-prod (Gemma 4 26B MoE)
     ↓
Reasoning + Output
     ↓
Clean text → downstream automation

The reasoning trace can optionally be retained for auditability before execution.

The prompt:

Write 3 X post variants (max 280 characters each) about this topic.

Topic: Building a sovereign n8n node for Gemma 4
Angle: The reasoning trace is an auditing layer, not noise
Voice: First-person, direct, technically honest. No hype.

Number them V1, V2, V3. No hashtags.

Output from triava-prod:

V1: Building a sovereign n8n node for Gemma 4. The reasoning trace isn't noise; it's an auditing layer. You need to be able to verify the logic behind the automation.

V2: I'm working on a Gemma 4 node for n8n. I'm treating the reasoning trace as an audit log rather than extra text. For sovereign workflows, seeing the steps taken is more important than a clean output.

V3: Developing an n8n node for Gemma 4. The reasoning trace is often seen as clutter, but I'm using it as an auditing layer. It provides the transparency necessary to trust automated decisions.

Model Selection — The Honest Summary

I picked the 26B MoE. I tested it. I found two real bugs. I made deliberate engineering decisions based on what the tests showed.

The 26B MoE delivers high-capability reasoning behavior at ~4B active-parameter inference cost on hardware an independent operator can actually own. It has native system-role support that makes brand-voice workflows possible. And its reasoning behavior — which initially looked like a problem — turns out to be an auditing layer that makes the model's logic inspectable before it triggers downstream automation.

If automation is going to make decisions on behalf of operators, the reasoning layer cannot remain invisible.

That last point isn't something I planned to write about. It's something I observed. Which is the only kind of model-selection story worth telling.

What's Next

The OpenAI-compatible path (/v1/chat/completions) is a real goal for Triava Labs — if the upstream Ollama issue gets resolved, the node's architecture is already designed to support it. That's a v1.5 roadmap item, not a contest deliverable.

The node is at github.com/triavalabs/n8n-nodes-triava. npm publish is in progress via GitHub Actions with provenance.

Triava Labs v1 is in active development at triavalabs.com. The node is the first production component of the broader Triava Labs infrastructure.

The deeper lesson from this build was that self-hosting a model is only part of sovereignty. The other part is being able to inspect the model's reasoning before automation turns it into action.

Update — May 16, 2026

Since publishing, an unexpected cross-article thread emerged with @alimafana, who independently hit complementary Gemma 4 26B MoE failure modes from a completely different deployment context — a production Arabic e-commerce chat router on Google AI Studio rather than self-hosted Ollama.

Their finding: MoE and Dense handle ambiguous instructions in opposite ways. Same prompt, two architectures, inverse failures.

The intersection: both findings point to the same underlying picture — each Gemma 4 variant has its own tax, paid on different inputs. Their behavioral observation from the application layer and my infrastructure-level bug documentation appear to be two angles on the same architectural reality.

The upstream bugs filed:

Ollama issue #15288 — /v1/chat/completions empty content for all Gemma 4 models
Ollama issue #15428 — long system prompts return empty responses on the 26B MoE

Related:

@alimafana's submission — "I Added Three Rules to Gemma 4. The MoE Searched. The Dense Model Refused."

Built by Robin Converse · Triava Labs · "Your model. Your voice. Your business."

DEV Community: Robin Converse