DEV Community

Hugo Kuznicki
Hugo Kuznicki

Posted on

How I Run My Content Tooling on a Local Model for $0

A few months ago I added up what I was spending on AI APIs just to draft social posts. It wasn't a lot — a few dollars here, a few there — but it was a recurring cost for something I do every single day. And every time I wanted to experiment, regenerate, or tweak a prompt, a little meter ticked in the back of my head telling me to stop wasting tokens.

So I moved the whole thing local. No API keys, no per-token billing, nothing leaving my machine. Here's exactly how, including the parts that aren't as clean as the pitch.

Why local at all?

Three reasons, in order of how much they actually mattered to me:

  1. Cost goes to zero. Not "cheaper" — zero. Once the model is on your disk, generating a thousand drafts costs the same as generating one.
  2. Iteration becomes free, which changes your behavior. This is the part nobody tells you. When each generation is metered, you ration attempts. When it's free, you regenerate aggressively — and the output gets better because you stop being precious about it.
  3. Privacy by default. My prompts, drafts, and half-baked ideas never touch a third-party server. For content I haven't published yet, that's a real comfort.

The setup: Ollama in five minutes

Ollama is the easiest way to run an LLM locally. Install it, pull a model, and you've got an HTTP server on localhost that speaks a simple API.

# Install (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an instruct-tuned model
ollama pull llama3.1:8b

# It's now serving on http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

That's the entire infrastructure. No account, no key, no dashboard. The model runs as a local service and you talk to it over HTTP like any other API — except this one is on your machine and free.

The pipeline

My content workflow is deliberately boring: one topic in, a batch of platform-specific posts out. The whole thing is a thin layer around three ideas — a per-platform prompt template, a call to the local model, and a tiny bit of cleanup.

Here's the core call. Ollama exposes a /api/generate endpoint:

import requests

def generate(prompt, model="llama3.1:8b"):
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt, "stream": False},
    )
    return resp.json()["response"].strip()
Enter fullscreen mode Exit fullscreen mode

No SDK, no auth header, no OPENAI_API_KEY in your environment. It's just a POST to localhost.

The interesting part is the templating. Each platform gets its own prompt with its own constraints baked in:

TEMPLATES = {
    "twitter": (
        "Write 3 punchy tweet hooks about: {topic}\n"
        "Rules: under 280 chars, no hashtags, no emoji spam, "
        "lead with the most surprising angle."
    ),
    "linkedin": (
        "Write a short LinkedIn post about: {topic}\n"
        "Rules: 1 strong opening line, 3 short paragraphs, "
        "a question at the end. Plain language, no buzzwords."
    ),
    "thread": (
        "Outline a 5-tweet thread about: {topic}\n"
        "Each tweet on its own line, numbered, each able to stand alone."
    ),
}

def run(topic, platforms):
    out = {}
    for p in platforms:
        prompt = TEMPLATES[p].format(topic=topic)
        out[p] = generate(prompt)
    return out
Enter fullscreen mode Exit fullscreen mode

Call run("local LLMs for content", ["twitter", "linkedin", "thread"]) and you get a dict of drafts back, generated entirely on your own hardware, for nothing.

The real product wraps this with a UI, a platform picker, and output cleanup — but the engine is genuinely this small. That's the point. Most of the value isn't in the model; it's in the templates that constrain the model into something usable.

The thing that actually makes it good: tight prompts

Smaller local models are less forgiving than a frontier API. A vague prompt to GPT-class hosted models still produces something passable. A vague prompt to an 8B local model produces mush. So the work shifts from "pay for a smarter model" to "write a sharper prompt."

Concretely, what moved quality the most:

  • Bake the constraints into the template, not the topic. Character limits, tone, structure — put them in the reusable template so every generation inherits them.
  • Ask for multiple options. "Write 3 hooks" beats "write a hook" — you pick the best and the model explores more of the space.
  • Keep a Modelfile for a custom system prompt if you find yourself repeating instructions:
FROM llama3.1:8b
SYSTEM "You are a concise copywriter. No clichés, no 'in today's
fast-paced world', no emoji unless asked. Plain, specific language."
Enter fullscreen mode Exit fullscreen mode
ollama create copywriter -f Modelfile
Enter fullscreen mode Exit fullscreen mode

Now copywriter carries that voice everywhere and your per-call prompts get shorter.

The honest tradeoffs

I'm not going to pretend local is strictly better. It isn't.

  • Long-form coherence is weaker. For short-form (hooks, captions, threads) local models are great. For a 2,000-word essay that needs to hold an argument, a frontier API still wins. Know which job you're doing.
  • Cold-start latency is real. The first request after the model unloads is slow. Keep it warm if you generate in bursts (ollama run in the background, or a keepalive ping).
  • You own the ops. No hosted API means no one else patches, scales, or babysits it. For a personal tool that's fine; for a product serving others it's a real consideration.
  • Hardware matters. An 8B model is comfortable on a modern laptop. Bigger models want more RAM/VRAM. Match the model to your machine instead of reaching for the biggest one.

The trade I'm making — slightly less polish in exchange for $0 cost, full privacy, and unlimited iteration — is overwhelmingly worth it for high-frequency, templated work. That's most of what content generation actually is.

Wrapping up

The headline isn't "local models are magic." It's that for the specific job of churning out daily, templated content, the economics and the workflow both flip in local's favor — and the setup is genuinely a five-minute Ollama install plus a few prompt templates.

I packaged my own version of this into a small tool called Content Studio (idea → batch of posts, runs fully local, $0 to run) if you'd rather not wire it up yourself — it's on Gumroad and the open-source pieces live on my GitHub. And if you want the longer build-in-public breakdowns, I write them up in my newsletter.

But honestly — even if you build your own from the snippets above, do it. Watching your API bill hit $0 while your output goes up is a weirdly satisfying way to start a week.

Top comments (0)