<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hugo Kuznicki</title>
    <description>The latest articles on DEV Community by Hugo Kuznicki (@hugo_kuznicki_1ff20709904).</description>
    <link>https://dev.to/hugo_kuznicki_1ff20709904</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4006064%2Fd2d4b695-b32c-4fe1-921d-4a92a08977e4.jpg</url>
      <title>DEV Community: Hugo Kuznicki</title>
      <link>https://dev.to/hugo_kuznicki_1ff20709904</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hugo_kuznicki_1ff20709904"/>
    <language>en</language>
    <item>
      <title>How I Run My Content Tooling on a Local Model for $0</title>
      <dc:creator>Hugo Kuznicki</dc:creator>
      <pubDate>Sun, 28 Jun 2026 04:58:53 +0000</pubDate>
      <link>https://dev.to/hugo_kuznicki_1ff20709904/how-i-run-my-content-tooling-on-a-local-model-for-0-1oig</link>
      <guid>https://dev.to/hugo_kuznicki_1ff20709904/how-i-run-my-content-tooling-on-a-local-model-for-0-1oig</guid>
      <description>&lt;p&gt;A few months ago I added up what I was spending on AI APIs just to draft social posts. It wasn't a lot — a few dollars here, a few there — but it was a &lt;em&gt;recurring&lt;/em&gt; cost for something I do every single day. And every time I wanted to experiment, regenerate, or tweak a prompt, a little meter ticked in the back of my head telling me to stop wasting tokens.&lt;/p&gt;

&lt;p&gt;So I moved the whole thing local. No API keys, no per-token billing, nothing leaving my machine. Here's exactly how, including the parts that aren't as clean as the pitch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why local at all?
&lt;/h2&gt;

&lt;p&gt;Three reasons, in order of how much they actually mattered to me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost goes to zero.&lt;/strong&gt; Not "cheaper" — &lt;em&gt;zero&lt;/em&gt;. Once the model is on your disk, generating a thousand drafts costs the same as generating one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iteration becomes free, which changes your behavior.&lt;/strong&gt; This is the part nobody tells you. When each generation is metered, you ration attempts. When it's free, you regenerate aggressively — and the output gets &lt;em&gt;better&lt;/em&gt; because you stop being precious about it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy by default.&lt;/strong&gt; My prompts, drafts, and half-baked ideas never touch a third-party server. For content I haven't published yet, that's a real comfort.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The setup: Ollama in five minutes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; is the easiest way to run an LLM locally. Install it, pull a model, and you've got an HTTP server on &lt;code&gt;localhost&lt;/code&gt; that speaks a simple API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install (macOS/Linux)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull an instruct-tuned model&lt;/span&gt;
ollama pull llama3.1:8b

&lt;span class="c"&gt;# It's now serving on http://localhost:11434&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire infrastructure. No account, no key, no dashboard. The model runs as a local service and you talk to it over HTTP like any other API — except this one is on your machine and free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;My content workflow is deliberately boring: &lt;strong&gt;one topic in, a batch of platform-specific posts out.&lt;/strong&gt; The whole thing is a thin layer around three ideas — a per-platform prompt template, a call to the local model, and a tiny bit of cleanup.&lt;/p&gt;

&lt;p&gt;Here's the core call. Ollama exposes a &lt;code&gt;/api/generate&lt;/code&gt; endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SDK, no auth header, no &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; in your environment. It's just a POST to localhost.&lt;/p&gt;

&lt;p&gt;The interesting part is the templating. Each platform gets its own prompt with its own constraints baked in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TEMPLATES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write 3 punchy tweet hooks about: {topic}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rules: under 280 chars, no hashtags, no emoji spam, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lead with the most surprising angle.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short LinkedIn post about: {topic}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rules: 1 strong opening line, 3 short paragraphs, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a question at the end. Plain language, no buzzwords.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Outline a 5-tweet thread about: {topic}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Each tweet on its own line, numbered, each able to stand alone.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;platforms&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;platforms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TEMPLATES&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call &lt;code&gt;run("local LLMs for content", ["twitter", "linkedin", "thread"])&lt;/code&gt; and you get a dict of drafts back, generated entirely on your own hardware, for nothing.&lt;/p&gt;

&lt;p&gt;The real product wraps this with a UI, a platform picker, and output cleanup — but the engine is genuinely this small. That's the point. Most of the value isn't in the model; it's in the &lt;em&gt;templates&lt;/em&gt; that constrain the model into something usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that actually makes it good: tight prompts
&lt;/h2&gt;

&lt;p&gt;Smaller local models are less forgiving than a frontier API. A vague prompt to GPT-class hosted models still produces something passable. A vague prompt to an 8B local model produces mush. So the work shifts from "pay for a smarter model" to "write a sharper prompt."&lt;/p&gt;

&lt;p&gt;Concretely, what moved quality the most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bake the constraints into the template, not the topic.&lt;/strong&gt; Character limits, tone, structure — put them in the reusable template so every generation inherits them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask for multiple options.&lt;/strong&gt; "Write 3 hooks" beats "write a hook" — you pick the best and the model explores more of the space.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a &lt;code&gt;Modelfile&lt;/code&gt; for a custom system prompt&lt;/strong&gt; if you find yourself repeating instructions:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; llama3.1:8b&lt;/span&gt;
SYSTEM "You are a concise copywriter. No clichés, no 'in today's
fast-paced world', no emoji unless asked. Plain, specific language."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama create copywriter &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;copywriter&lt;/code&gt; carries that voice everywhere and your per-call prompts get shorter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest tradeoffs
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend local is strictly better. It isn't.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-form coherence is weaker.&lt;/strong&gt; For short-form (hooks, captions, threads) local models are great. For a 2,000-word essay that needs to hold an argument, a frontier API still wins. Know which job you're doing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold-start latency is real.&lt;/strong&gt; The first request after the model unloads is slow. Keep it warm if you generate in bursts (&lt;code&gt;ollama run&lt;/code&gt; in the background, or a keepalive ping).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You own the ops.&lt;/strong&gt; No hosted API means no one else patches, scales, or babysits it. For a personal tool that's fine; for a product serving others it's a real consideration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware matters.&lt;/strong&gt; An 8B model is comfortable on a modern laptop. Bigger models want more RAM/VRAM. Match the model to your machine instead of reaching for the biggest one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade I'm making — slightly less polish in exchange for $0 cost, full privacy, and unlimited iteration — is overwhelmingly worth it for high-frequency, templated work. That's most of what content generation actually is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;The headline isn't "local models are magic." It's that &lt;strong&gt;for the specific job of churning out daily, templated content, the economics and the workflow both flip in local's favor&lt;/strong&gt; — and the setup is genuinely a five-minute Ollama install plus a few prompt templates.&lt;/p&gt;

&lt;p&gt;I packaged my own version of this into a small tool called &lt;strong&gt;Content Studio&lt;/strong&gt; (idea → batch of posts, runs fully local, $0 to run) if you'd rather not wire it up yourself — it's &lt;a href="https://kuznicki6.gumroad.com/l/kqusjo" rel="noopener noreferrer"&gt;on Gumroad&lt;/a&gt; and the open-source pieces live on &lt;a href="https://github.com/kuznickicapital-ship-it" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;. And if you want the longer build-in-public breakdowns, I write them up in &lt;a href="https://hugos-newsletter-e0c067.beehiiv.com/" rel="noopener noreferrer"&gt;my newsletter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But honestly — even if you build your own from the snippets above, do it. Watching your API bill hit $0 while your output goes &lt;em&gt;up&lt;/em&gt; is a weirdly satisfying way to start a week.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ollama</category>
      <category>localllm</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
