<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: StickyTr33</title>
    <description>The latest articles on DEV Community by StickyTr33 (@stickyhashtr33).</description>
    <link>https://dev.to/stickyhashtr33</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4000679%2Fd0fa7622-7da7-4aa4-b9a4-04b25b15a43b.png</url>
      <title>DEV Community: StickyTr33</title>
      <link>https://dev.to/stickyhashtr33</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stickyhashtr33"/>
    <language>en</language>
    <item>
      <title>Why I Run AI Locally Instead of Using ChatGPT for Client Work</title>
      <dc:creator>StickyTr33</dc:creator>
      <pubDate>Wed, 24 Jun 2026 14:01:27 +0000</pubDate>
      <link>https://dev.to/stickyhashtr33/why-i-run-ai-locally-instead-of-using-chatgpt-for-client-work-20l</link>
      <guid>https://dev.to/stickyhashtr33/why-i-run-ai-locally-instead-of-using-chatgpt-for-client-work-20l</guid>
      <description>&lt;p&gt;Let me start with a question my clients ask me a lot:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Can't we just use ChatGPT for this?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My answer is always the same: &lt;strong&gt;it depends on what "this" is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When "this" involves client intake forms for a law firm, tax documents for an accounting practice, or patient records for a medical office — the answer is no. And once I explain why, they always get it.&lt;/p&gt;

&lt;p&gt;This post is about that explanation, and the toolchain I actually use instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part Everyone Glosses Over
&lt;/h2&gt;

&lt;p&gt;When you send a prompt to ChatGPT or Claude via the API, that data leaves your network. It travels to a third-party server, gets processed, and comes back. The companies have policies about how they handle it — and you should read them — but the fundamental truth is: &lt;em&gt;you handed your client's sensitive information to someone else.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For a lot of use cases, that's totally fine. Write me a landing page? Sure, use whatever.&lt;/p&gt;

&lt;p&gt;But when the prompt contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attorney-client communications&lt;/li&gt;
&lt;li&gt;Personally Identifiable Information (PII)&lt;/li&gt;
&lt;li&gt;Financial records subject to confidentiality agreements&lt;/li&gt;
&lt;li&gt;Proprietary business logic a client has spent a decade refining&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you're in a different conversation. One that involves client trust, potential legal exposure, and in some industries, real regulatory obligations. HIPAA doesn't care that the AI gave a good answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Use Instead: Ollama
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; is the cleanest tool I've found for running large language models locally. It runs on Mac, Linux, and Windows, wraps model management into a simple CLI, and exposes a local REST API. That API is compatible with the OpenAI format — which means most integrations you'd build against ChatGPT work against Ollama with one line changed.&lt;/p&gt;

&lt;p&gt;Getting started takes about five minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama (macOS/Linux)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull a model — llama3.2 is a solid general-purpose starting point&lt;/span&gt;
ollama pull llama3.2

&lt;span class="c"&gt;# Start the server&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it's running, you have a local API at &lt;code&gt;http://localhost:11434&lt;/code&gt;. No API key. No rate limits. No bill at the end of the month. Here's a basic Python call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_local_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Example: summarize a client intake document
&lt;/span&gt;&lt;span class="n"&gt;intake_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Client Jane Doe, referred by attorney Martinez, is seeking...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ask_local_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this client intake in 3 concise bullet points:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;intake_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The model runs on the local machine. The data never leaves the building.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Client Use Cases
&lt;/h2&gt;

&lt;p&gt;Here's where it gets concrete. These are the kinds of deployments I've built:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Law office — client intake summaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Attorneys were drowning in intake forms and needed quick summaries before consultations. The obvious fix is AI. The blocker: those forms contain PII, case details, and sometimes confidential disclosures that flat-out cannot go to a cloud provider.&lt;/p&gt;

&lt;p&gt;Solution: Ollama running on a local machine in their office, a Python script that reads the intake PDF, summarizes it with &lt;code&gt;llama3.2&lt;/code&gt;, and outputs a clean brief. Setup time: half a day. Data never leaves their network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accounting firm — document Q&amp;amp;A&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Staff needed to locate specific information across large financial documents and past filings quickly. Paired Ollama with a basic RAG (retrieval-augmented generation) pipeline — documents get chunked and embedded locally, queries get answered against the local vector store. The client's financial data stays on their server. As a bonus, it's actually faster than cloud solutions for this use case because there's zero round-trip latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Small business — proprietary process assistant&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one was less about compliance and more about competitive advantage. The client had a pricing model they'd refined over ten years. They were not interested in that logic ending up anywhere near a third-party training pipeline. Local deployment was the only acceptable option, full stop.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Trade-offs
&lt;/h2&gt;

&lt;p&gt;I'm not going to oversell this. Here's what you give up going local:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model capability&lt;/strong&gt; — &lt;code&gt;llama3.2&lt;/code&gt; is impressive for its size. It is not GPT-4o. For pure reasoning tasks with no sensitivity concerns, the frontier cloud models still have an edge on harder problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware requirements&lt;/strong&gt; — Running a useful model locally needs real resources. I typically recommend at least 16GB of RAM and, ideally, a dedicated GPU. Clients who already have a server are usually fine. Clients on thin hardware turn into a hardware conversation first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup and maintenance overhead&lt;/strong&gt; — There's no sign-up-and-get-a-key path here. You're managing software, models, and updates. For non-technical clients, that means building something bulletproof or staying on the hook for maintenance.&lt;/p&gt;

&lt;p&gt;For the right client, these trade-offs are absolutely worth it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part I Didn't Expect
&lt;/h2&gt;

&lt;p&gt;The clients who care most about local deployment aren't always the most technical. They're often the ones who've been in business long enough to be careful. When I tell them their data stays in-house — no monthly API bill that scales with usage, no third-party terms of service to worry about, they own the whole stack — that lands differently than any feature comparison I could make.&lt;/p&gt;

&lt;p&gt;Local AI isn't for everyone. But when the fit is right, it's a genuinely different value proposition than "here's your ChatGPT wrapper with some prompt engineering on top."&lt;/p&gt;

&lt;p&gt;If you're building for clients who handle sensitive data, have this conversation before you default to the cloud. You might be surprised how often they've already been thinking about it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm stickytr33 — I build AI integrations, local LLM deployments, and IT infrastructure for small businesses. If this is relevant to what you're working on, find me on &lt;a href="https://github.com/stickyhashtr33" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or drop a comment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ollama</category>
      <category>privacy</category>
      <category>python</category>
    </item>
  </channel>
</rss>
