<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: EveryLocalAI</title>
    <description>The latest articles on DEV Community by EveryLocalAI (@everylocalai).</description>
    <link>https://dev.to/everylocalai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976359%2F71d4622f-e778-4e5d-9bb3-63d38db82c18.png</url>
      <title>DEV Community: EveryLocalAI</title>
      <link>https://dev.to/everylocalai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/everylocalai"/>
    <language>en</language>
    <item>
      <title>How to Set Up a Local AI Coding Assistant in VS Code – Free &amp; Private</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Thu, 18 Jun 2026 09:02:36 +0000</pubDate>
      <link>https://dev.to/everylocalai/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private-2nkk</link>
      <guid>https://dev.to/everylocalai/how-to-set-up-a-local-ai-coding-assistant-in-vs-code-free-private-2nkk</guid>
      <description>&lt;p&gt;Want a Cursor/Copilot-style coding assistant that runs entirely on your machine? Your code never leaves your computer and there's no subscription fee. Here's how to set it up with VS Code, Continue, and Ollama.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tab autocomplete (like Copilot) that suggests code as you type&lt;/li&gt;
&lt;li&gt;Chat with your codebase - ask questions, generate functions, write tests&lt;/li&gt;
&lt;li&gt;100% local - zero data sent to any cloud service&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 24GB+ VRAM (RTX 3090/4090 or better)&lt;/li&gt;
&lt;li&gt;For smaller GPUs (8-12GB), use Qwen2.5 Coder 7B instead&lt;/li&gt;
&lt;li&gt;Ollama installed (see ollama.com)&lt;/li&gt;
&lt;li&gt;VS Code (free from code.visualstudio.com)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Pull the Model
&lt;/h2&gt;

&lt;p&gt;Open a terminal and pull a coding-focused model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5-coder:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This takes a few minutes depending on your internet. The model is ~8GB at Q4 quantization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Install Continue
&lt;/h2&gt;

&lt;p&gt;In VS Code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Extensions (Ctrl+Shift+X)&lt;/li&gt;
&lt;li&gt;Search for "Continue"&lt;/li&gt;
&lt;li&gt;Click Install&lt;/li&gt;
&lt;li&gt;Reload VS Code when prompted&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 3: Configure
&lt;/h2&gt;

&lt;p&gt;Create or edit &lt;code&gt;~/.continue/config.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Qwen2.5 Coder 14B&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen2.5-coder:14b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;chat&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;edit&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Qwen2.5 Coder (autocomplete)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen2.5-coder:14b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;autocomplete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Use It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Autocomplete&lt;/strong&gt;: Start typing. Continue suggests completions in gray. Press Tab to accept.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat&lt;/strong&gt;: Press Ctrl+L (or Cmd+L on Mac) to open the chat panel. Ask questions about your code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit&lt;/strong&gt;: Select code and press Ctrl+Shift+L to ask for changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline&lt;/strong&gt;: Highlight code, press Ctrl+I, and describe what you want changed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Notes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3090 (24GB)&lt;/td&gt;
&lt;td&gt;Qwen2.5-Coder 14B&lt;/td&gt;
&lt;td&gt;25-35 tok/s&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 (24GB)&lt;/td&gt;
&lt;td&gt;Qwen2.5-Coder 14B&lt;/td&gt;
&lt;td&gt;40-50 tok/s&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3060 (12GB)&lt;/td&gt;
&lt;td&gt;Qwen2.5-Coder 7B&lt;/td&gt;
&lt;td&gt;30-40 tok/s&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4060 (8GB)&lt;/td&gt;
&lt;td&gt;Qwen2.5-Coder 7B (Q4)&lt;/td&gt;
&lt;td&gt;20-30 tok/s&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why Go Local?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$0/month&lt;/strong&gt; vs $20/seat for Copilot or Cursor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt;: your proprietary code never touches a third-party server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline&lt;/strong&gt;: works without internet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model choice&lt;/strong&gt;: swap models anytime, no vendor lock-in&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/local-cursor" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vscode</category>
    </item>
    <item>
      <title>Build Your Own Private ChatGPT in 15 Minutes – Local AI, Zero Cloud Cost</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Thu, 18 Jun 2026 09:01:47 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-your-own-private-chatgpt-in-15-minutes-local-ai-zero-cloud-cost-1gbm</link>
      <guid>https://dev.to/everylocalai/build-your-own-private-chatgpt-in-15-minutes-local-ai-zero-cloud-cost-1gbm</guid>
      <description>&lt;p&gt;Want a ChatGPT-like experience that runs entirely on your own GPU? No monthly fees, no data leaving your machine, and it works offline. Here's how to set it up in 15 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A full ChatGPT-style web UI running locally&lt;/li&gt;
&lt;li&gt;Your choice of open-source LLM (Qwen3 14B or Llama 3.1 8B)&lt;/li&gt;
&lt;li&gt;Multiple user accounts for your LAN&lt;/li&gt;
&lt;li&gt;100% private - nothing leaves your network&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 12GB+ VRAM (RTX 3060 12GB works great)&lt;/li&gt;
&lt;li&gt;Docker + Docker Compose installed&lt;/li&gt;
&lt;li&gt;NVIDIA Container Toolkit for GPU passthrough (Linux) or WSL2 (Windows)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;docker-compose.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;open-webui&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_URL=http://ollama:11434&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;open-webui:/app/backend/data&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;strong&gt;&lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;&lt;/strong&gt;, create your admin account, pick &lt;code&gt;qwen3:14b&lt;/code&gt; from the dropdown, and start chatting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Great
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$0/month&lt;/strong&gt; vs $20/month for ChatGPT Plus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full privacy&lt;/strong&gt; - conversations stay on your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works offline&lt;/strong&gt; - no internet connection needed after setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-user&lt;/strong&gt; - share with family or your team on the same LAN&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model switching&lt;/strong&gt; - swap between different models mid-conversation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;On an RTX 3060 12GB with Qwen3 14B (Q4): ~20-25 tok/s, smooth for chat. For 8GB cards, use Llama 3.1 8B instead.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/private-chatgpt" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
    </item>
    <item>
      <title>Run Qwen3.6-27B Locally: The Most Capable Open Model for a Single GPU</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Thu, 18 Jun 2026 08:34:28 +0000</pubDate>
      <link>https://dev.to/everylocalai/run-qwen36-27b-locally-the-most-capable-open-model-for-a-single-gpu-3bio</link>
      <guid>https://dev.to/everylocalai/run-qwen36-27b-locally-the-most-capable-open-model-for-a-single-gpu-3bio</guid>
      <description>&lt;h1&gt;
  
  
  Run Qwen3.6-27B Locally: The Most Capable Open Model for a Single GPU
&lt;/h1&gt;

&lt;p&gt;Qwen3.6-27B is a dense 27-billion parameter model from Alibaba that scores 77.2% on SWE-bench Verified — matching closed-source models like Claude Sonnet 4.5 on real-world coding tasks. It ships under Apache 2.0 license with native vision support, 262K context window, and hybrid thinking mode.&lt;/p&gt;

&lt;p&gt;Paired with Ollama for one-command serving and Open WebUI for a ChatGPT-like interface, this stack gives you a private AI assistant that rivals cloud services with no monthly fee.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes Qwen3.6-27B special
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vision understanding&lt;/strong&gt; — baked-in vision encoder, upload images and ask about them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;262K context window&lt;/strong&gt; — entire codebases or long documents in one pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid thinking&lt;/strong&gt; — shows reasoning before answering, skip with /no_think&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;77.2% SWE-bench&lt;/strong&gt; — competes with Sonnet 4.5 on real PRs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0 license&lt;/strong&gt; — free for any use&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hardware requirements
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;VRAM needed&lt;/th&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;16-18 GB&lt;/td&gt;
&lt;td&gt;RTX 3090, RTX 4070 Ti Super, Mac 24GB+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q8_0&lt;/td&gt;
&lt;td&gt;28 GB&lt;/td&gt;
&lt;td&gt;RTX 4090, Mac 32GB+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BF16&lt;/td&gt;
&lt;td&gt;54 GB&lt;/td&gt;
&lt;td&gt;2x RTX 4090, A100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;Q4_K_M&lt;/strong&gt; sweet spot fits a single RTX 3090 (24GB, ~$750 used). On Mac, you need 24GB+ unified memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-command setup with Ollama
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama (macOS/Linux)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull Qwen3.6-27B (auto-selects Q4 for your hardware)&lt;/span&gt;
ollama pull qwen3.6:27b

&lt;span class="c"&gt;# Run it&lt;/span&gt;
ollama run qwen3.6:27b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Hybrid thinking is on by default — the model shows reasoning before answering. Use &lt;code&gt;/no_think&lt;/code&gt; for faster responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add a chat UI with Open WebUI
&lt;/h2&gt;

&lt;p&gt;Run Open WebUI alongside Ollama for a polished ChatGPT experience:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_URL=http://host.docker.internal:11434&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance on consumer GPUs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;Q4 speed&lt;/th&gt;
&lt;th&gt;Q8 speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3090 (24GB)&lt;/td&gt;
&lt;td&gt;25-35 tok/s&lt;/td&gt;
&lt;td&gt;15-20 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4070 Ti Super (16GB)&lt;/td&gt;
&lt;td&gt;10-15 tok/s&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac M4 Max (48GB)&lt;/td&gt;
&lt;td&gt;20-30 tok/s&lt;/td&gt;
&lt;td&gt;12-18 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac M2 Pro (24GB)&lt;/td&gt;
&lt;td&gt;10-15 tok/s&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cost vs cloud
&lt;/h2&gt;

&lt;p&gt;Local: $0/month + $750 for RTX 3090. Claude Sonnet: $20/month + per-token charges. The GPU pays for itself in ~8 months of heavy API use. Plus complete privacy and no rate limits.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/qwen36-local-multimodal" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
    </item>
    <item>
      <title>Build a Private Windows AI Assistant with LM Studio and AnythingLLM</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Thu, 18 Jun 2026 08:33:50 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-a-private-windows-ai-assistant-with-lm-studio-and-anythingllm-4mki</link>
      <guid>https://dev.to/everylocalai/build-a-private-windows-ai-assistant-with-lm-studio-and-anythingllm-4mki</guid>
      <description>&lt;h1&gt;
  
  
  Build a Private Windows AI Assistant with LM Studio and AnythingLLM
&lt;/h1&gt;

&lt;p&gt;A fully private AI stack for Windows that never touches the cloud. LM Studio serves as your local model server with a visual interface — browse, download, and run models from HuggingFace without typing a single command. AnythingLLM adds document RAG, workspace isolation, and agent skills on top.&lt;/p&gt;

&lt;p&gt;This stack is built for &lt;strong&gt;Windows users who prefer a graphical interface&lt;/strong&gt; — no Docker, no terminal commands beyond the basics.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual model browser&lt;/strong&gt; — search HuggingFace models inside LM Studio, download with one click&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop-in document Q&amp;amp;A&lt;/strong&gt; — PDF, DOCX, TXT, CSV, code files. Drag them into AnythingLLM and ask questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No data leaves your PC&lt;/strong&gt; — all inference and embedding runs locally, works completely offline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Docker, no WSL, no CLI&lt;/strong&gt; — both apps are native Windows desktop installers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0/month&lt;/strong&gt; — the only cost is the GPU you already own&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Windows 11 (64-bit)&lt;/li&gt;
&lt;li&gt;GPU with 4GB+ VRAM (6GB+ preferred), CPU works but slower&lt;/li&gt;
&lt;li&gt;16GB RAM minimum&lt;/li&gt;
&lt;li&gt;10-30GB free disk for models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Install LM Studio
&lt;/h2&gt;

&lt;p&gt;Go to &lt;a href="https://lmstudio.ai" rel="noopener noreferrer"&gt;lmstudio.ai&lt;/a&gt; and download the Windows installer. Run it — default path is fine.&lt;/p&gt;

&lt;p&gt;LM Studio is both a model manager and a local OpenAI-compatible API server. You search models from Hugging Face visually and serve them over a local HTTP endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Download a model
&lt;/h2&gt;

&lt;p&gt;In LM Studio, go to the &lt;strong&gt;Discover&lt;/strong&gt; tab and search for &lt;code&gt;Qwen2.5-14B&lt;/code&gt;. Look for a &lt;strong&gt;Q4_K_M&lt;/strong&gt; quantized version — best balance of quality and size. Click &lt;strong&gt;Download&lt;/strong&gt; and wait (~8 GB).&lt;/p&gt;

&lt;p&gt;If you have 8GB VRAM or less, search for &lt;code&gt;Qwen2.5-7B&lt;/code&gt; or &lt;code&gt;Llama 3.2 3B&lt;/code&gt; instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Start the local server
&lt;/h2&gt;

&lt;p&gt;Go to the &lt;strong&gt;Developer&lt;/strong&gt; tab in LM Studio, select your model, and click &lt;strong&gt;Start Server&lt;/strong&gt;. You should see: &lt;code&gt;Server listening on http://localhost:1234&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Install AnythingLLM
&lt;/h2&gt;

&lt;p&gt;Go to &lt;a href="https://anythingllm.com/desktop" rel="noopener noreferrer"&gt;anythingllm.com/desktop&lt;/a&gt; and download the Windows installer. &lt;strong&gt;Install for Current User only&lt;/strong&gt; — not All Users — to avoid a known spawn error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Connect AnythingLLM to LM Studio
&lt;/h2&gt;

&lt;p&gt;In AnythingLLM Settings &amp;gt; LLM Preference, select &lt;strong&gt;LM Studio&lt;/strong&gt; as the provider and set the base URL to &lt;code&gt;http://localhost:1234&lt;/code&gt;. Save changes. Go to Embedding Model and set to &lt;strong&gt;AnythingLLM built-in&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Chat and upload documents
&lt;/h2&gt;

&lt;p&gt;Create a workspace, then drag files into the chat area. AnythingLLM creates embeddings locally and lets you ask questions about your documents. Workspaces are isolated — perfect for keeping work and personal contexts separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance by GPU
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Max model&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3060 12GB&lt;/td&gt;
&lt;td&gt;14B at Q4&lt;/td&gt;
&lt;td&gt;15-20 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4060 8GB&lt;/td&gt;
&lt;td&gt;7B at Q4&lt;/td&gt;
&lt;td&gt;20-30 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU-only 16GB&lt;/td&gt;
&lt;td&gt;3B at Q4&lt;/td&gt;
&lt;td&gt;3-5 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cost comparison
&lt;/h2&gt;

&lt;p&gt;Local stack: $0/month + $200 for used RTX 3060. ChatGPT Plus: $20/month with no privacy guarantees. The GPU pays for itself in 10 months.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/lm-studio-anythingllm-windows" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tutorialwindows</category>
    </item>
    <item>
      <title>Build a Private Voice Assistant with Whisper, Ollama, and Kokoro TTS</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 22:09:16 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-a-private-voice-assistant-with-whisper-ollama-and-kokoro-tts-3400</link>
      <guid>https://dev.to/everylocalai/build-a-private-voice-assistant-with-whisper-ollama-and-kokoro-tts-3400</guid>
      <description>&lt;p&gt;Have you ever wanted your own Jarvis? A voice assistant that listens, thinks, and speaks back - all running privately on your own hardware? Here's how to build one with Whisper.cpp, Ollama, and Kokoro TTS.&lt;/p&gt;

&lt;p&gt;No cloud, no wake-word fees, no data leaving your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hardware:&lt;/strong&gt; Any modern computer with a microphone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software:&lt;/strong&gt; Python 3.10+, &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; installed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~30 minutes setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install Ollama and Pull a Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Install Whisper.cpp
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ggerganov/whisper.cpp.git
&lt;span class="nb"&gt;cd &lt;/span&gt;whisper.cpp
cmake &lt;span class="nt"&gt;-B&lt;/span&gt; build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cmake &lt;span class="nt"&gt;--build&lt;/span&gt; build &lt;span class="nt"&gt;--config&lt;/span&gt; Release
bash models/download-ggml-model.sh medium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Install Kokoro TTS
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;kokoro pyaudio requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wiring It All Together
&lt;/h2&gt;

&lt;p&gt;Save this as &lt;code&gt;voice_assistant.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wave&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kokoro&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KPipeline&lt;/span&gt;

&lt;span class="n"&gt;OLLAMA_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:14b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;WHISPER_BIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./whisper.cpp/build/bin/whisper-cli&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;WHISPER_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./whisper.cpp/models/ggml-medium.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;tts_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PyAudio&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;frames_per_buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;frames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;terminate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.wav&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;wf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wave&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;wf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setnchannels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;wf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setsampwidth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;wf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setframerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;wf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeframes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_file&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;WHISPER_BIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WHISPER_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio_file&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                          &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OLLAMA_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tts_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.wav&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ffplay&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-nodisp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-autoexit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Run it
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Listening...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;audio_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;record_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python voice_assistant.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Speak into your mic. Wait 5 seconds. Hear the AI respond.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Whisper medium&lt;/strong&gt; on CPU: transcribes in 2-4 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen3 14B&lt;/strong&gt; on RTX 3060: responds in 3-5 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kokoro TTS&lt;/strong&gt; on CPU: speaks in real-time (&amp;lt; 1 second latency)&lt;/li&gt;
&lt;li&gt;Total round-trip: ~10 seconds on modest hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For faster responses, use Whisper &lt;code&gt;tiny&lt;/code&gt; or a smaller LLM like Llama 3.1 8B.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/local-voice-assistant" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Give Your Local AI Tool-Calling Superpowers with Open WebUI and MCP</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 22:07:37 +0000</pubDate>
      <link>https://dev.to/everylocalai/give-your-local-ai-tool-calling-superpowers-with-open-webui-and-mcp-3jdi</link>
      <guid>https://dev.to/everylocalai/give-your-local-ai-tool-calling-superpowers-with-open-webui-and-mcp-3jdi</guid>
      <description>&lt;p&gt;Want a ChatGPT-like experience where your AI can search the web, read your files, query databases, and run code? Open WebUI + MCP makes it possible - all running locally on your hardware.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol (MCP) is an open standard that lets AI connect to external tools. Open WebUI supports MCP natively, turning your local Ollama setup into a tool-equipped AI assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU:&lt;/strong&gt; RTX 3060 12GB or better (for Qwen3 14B at Q8)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software:&lt;/strong&gt; Docker + Docker Compose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~25 minutes setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_URL=http://ollama:11434&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;MCP_ENABLE=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ENABLE_TOOLS=true&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;open-webui:/app/backend/data&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;

  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull a model with strong tool-calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull qwen3:14b:q8_0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;strong&gt;&lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;&lt;/strong&gt; and create your admin account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding MCP Tools
&lt;/h2&gt;

&lt;p&gt;Go to &lt;strong&gt;Admin Panel → Settings → External Tools&lt;/strong&gt; in Open WebUI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Search Tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @anthropic/mcp-server-brave-search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Filesystem Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-filesystem /allowed/path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure each tool in the Open WebUI admin panel to give your AI real-world capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;Start a new chat and click the tools icon (wrench) next to the input box. Select which tools the AI can use, then ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Search the web for latest AI news"&lt;/li&gt;
&lt;li&gt;"Read my project's README and summarize it"&lt;/li&gt;
&lt;li&gt;"Query the sales database for Q3 results"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI decides when to call tools and incorporates results into its responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;With Qwen3 14B Q8 on an RTX 4070 Super: tool calls complete in 3-5 seconds. Web search results are returned in 2-3 seconds. All data stays on your machine.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/open-webui-mcp-tools" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>beginners</category>
    </item>
    <item>
      <title>AI Pair Programming in Your Terminal with Aider and Ollama</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 22:05:39 +0000</pubDate>
      <link>https://dev.to/everylocalai/ai-pair-programming-in-your-terminal-with-aider-and-ollama-4hdi</link>
      <guid>https://dev.to/everylocalai/ai-pair-programming-in-your-terminal-with-aider-and-ollama-4hdi</guid>
      <description>&lt;p&gt;Want an AI coding assistant that works on YOUR codebase, respects YOUR git history, and doesn't send your code to the cloud? Aider + Ollama gives you exactly that.&lt;/p&gt;

&lt;p&gt;Aider is an AI pair programming tool that works directly in your terminal. It sees your files, understands your git repo, and makes real edits to your code. Paired with Ollama running a local model, you get a fully private coding assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU:&lt;/strong&gt; RTX 3090 or 4090 with 16GB+ VRAM (for Qwen3 Coder 30B at Q4)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software:&lt;/strong&gt; Python 3.10+, &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; installed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~10 minutes setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Aider&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;aider-chat

&lt;span class="c"&gt;# Pull a capable coding model&lt;/span&gt;
ollama pull qwen3-coder:30b-a3b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;Set Aider to use your local Ollama model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For bash/zsh&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8192

&lt;span class="c"&gt;# Run Aider with Ollama&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama_chat/qwen3-coder:30b-a3b &lt;span class="nt"&gt;--editor&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For persistent config, create &lt;code&gt;.env&lt;/code&gt; in your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;OLLAMA_CONTEXT_LENGTH&lt;/span&gt;=&lt;span class="m"&gt;8192&lt;/span&gt;
&lt;span class="n"&gt;AIDER_MODEL&lt;/span&gt;=&lt;span class="n"&gt;ollama_chat&lt;/span&gt;/&lt;span class="n"&gt;qwen3&lt;/span&gt;-&lt;span class="n"&gt;coder&lt;/span&gt;:&lt;span class="m"&gt;30&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;-&lt;span class="n"&gt;a3b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start Aider in your project directory&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;my-project
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama_chat/qwen3-coder:30b-a3b

&lt;span class="c"&gt;# Now just describe what you want:&lt;/span&gt;
&lt;span class="c"&gt;# "Add error handling to the API routes"&lt;/span&gt;
&lt;span class="c"&gt;# "Refactor the database connection into a singleton"&lt;/span&gt;
&lt;span class="c"&gt;# "Write unit tests for the user service"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aider reads your files, makes changes, and commits them with sensible messages. You approve each change before it's applied.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;On a &lt;strong&gt;RTX 4090&lt;/strong&gt; with Qwen3 Coder 30B at Q4: ~15-20 tok/s, enough for real-time code suggestions.&lt;/p&gt;

&lt;p&gt;Qwen2.5 Coder 14B runs faster (~35 tok/s) and fits on a 12GB GPU, great for smaller projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; - your proprietary code never leaves your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No API costs&lt;/strong&gt; - unlimited suggestions for $0/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works offline&lt;/strong&gt; - code on a plane, in a cafe, anywhere&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No rate limits&lt;/strong&gt; - use it all day without throttling&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://everylocalai.com/stack/aider-ollama-coding" rel="noopener noreferrer"&gt;everylocalai.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Build a Unified AI Gateway with LiteLLM and Ollama</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:54:58 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-a-unified-ai-gateway-with-litellm-and-ollama-387a</link>
      <guid>https://dev.to/everylocalai/build-a-unified-ai-gateway-with-litellm-and-ollama-387a</guid>
      <description>&lt;p&gt;Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama.&lt;/p&gt;

&lt;p&gt;LiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;Ollama installed and running&lt;/li&gt;
&lt;li&gt;About 20 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install LiteLLM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'litellm[proxy]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create config.yaml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen3-local&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/qwen3:14b&lt;/span&gt;
      &lt;span class="na"&gt;api_base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;
      &lt;span class="na"&gt;rpm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;
      &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;os.environ/OPENAI_API_KEY&lt;/span&gt;

&lt;span class="na"&gt;general_settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;master_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sk-your-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Start the Proxy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;litellm &lt;span class="nt"&gt;--config&lt;/span&gt; config.yaml &lt;span class="nt"&gt;--port&lt;/span&gt; 4000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Use It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:4000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3-local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Smart fallback&lt;/strong&gt; - if local model fails, auto-route to cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing&lt;/strong&gt; - distribute across multiple GPU instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost tracking&lt;/strong&gt; - per-model spend dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting&lt;/strong&gt; - control requests per user/key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One API&lt;/strong&gt; - use any tool that supports OpenAI format&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;LiteLLM + Ollama&lt;/th&gt;
&lt;th&gt;Direct Cloud APIs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gateway&lt;/td&gt;
&lt;td&gt;Free, self-hosted&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local inference&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model switching&lt;/td&gt;
&lt;td&gt;One endpoint&lt;/td&gt;
&lt;td&gt;Multiple SDKs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failover&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Full guide with advanced config examples: &lt;a href="https://everylocalai.com/stack/litellm-ollama-gateway" rel="noopener noreferrer"&gt;https://everylocalai.com/stack/litellm-ollama-gateway&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Generate Professional AI Images Locally with ComfyUI and FLUX</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:54:00 +0000</pubDate>
      <link>https://dev.to/everylocalai/generate-professional-ai-images-locally-with-comfyui-and-flux-3m7h</link>
      <guid>https://dev.to/everylocalai/generate-professional-ai-images-locally-with-comfyui-and-flux-3m7h</guid>
      <description>&lt;p&gt;Professional-grade image generation that runs entirely on your own GPU. ComfyUI + FLUX.1 Dev gives you Midjourney-quality output with full creative control and zero data leaving your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 12GB+ VRAM (24GB recommended)&lt;/li&gt;
&lt;li&gt;Python 3.10+ or the ComfyUI desktop app&lt;/li&gt;
&lt;li&gt;About 20 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A: Desktop App (Easiest)
&lt;/h3&gt;

&lt;p&gt;Download from comfy.org, install, and use the built-in model manager to download FLUX.1 Dev.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Manual Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Comfy-Org/ComfyUI.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:8188" rel="noopener noreferrer"&gt;http://localhost:8188&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic FLUX Workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Add a &lt;strong&gt;Checkpoint Loader&lt;/strong&gt; node - load flux1-dev.safetensors&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;CLIP Text Encoder&lt;/strong&gt; - enter your prompt&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;KSampler&lt;/strong&gt; - connect model, CLIP, and empty latent&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;VAE Decode&lt;/strong&gt; - decode to image&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;Save Image&lt;/strong&gt; - save result&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Queue Prompt&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Prompt example: "a photorealistic cat sitting on a vintage leather chair, warm lighting, depth of field"&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LoRA&lt;/strong&gt; - add a LoRA Loader node for style control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ControlNet&lt;/strong&gt; - pose/edge guidance with extra nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image-to-Image&lt;/strong&gt; - feed an existing image through VAE Encode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API mode&lt;/strong&gt; - integrate with n8n or custom apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch generation&lt;/strong&gt; - queue multiple prompts at once&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local&lt;/th&gt;
&lt;th&gt;Midjourney&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$10-60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per image&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0.04-0.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Stays on your GPU&lt;/td&gt;
&lt;td&gt;Sent to cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control&lt;/td&gt;
&lt;td&gt;Full node-level&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Full guide with troubleshooting and hardware tips: &lt;a href="https://everylocalai.com/stack/comfyui-flux-local-image" rel="noopener noreferrer"&gt;https://everylocalai.com/stack/comfyui-flux-local-image&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Chat With Your Documents Locally Using AnythingLLM and Ollama</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:53:08 +0000</pubDate>
      <link>https://dev.to/everylocalai/chat-with-your-documents-locally-using-anythingllm-and-ollama-j6b</link>
      <guid>https://dev.to/everylocalai/chat-with-your-documents-locally-using-anythingllm-and-ollama-j6b</guid>
      <description>&lt;p&gt;A private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Any computer (GPU optional - CPU works fine)&lt;/li&gt;
&lt;li&gt;Ollama installed&lt;/li&gt;
&lt;li&gt;About 10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AnythingLLM&lt;/td&gt;
&lt;td&gt;Desktop/server app with RAG, agents, built-in vector DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Serves local LLM for chat + embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3 14B&lt;/td&gt;
&lt;td&gt;Default model for answering questions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install from ollama.com, or run with Docker:&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ollama:/root/.ollama ollama/ollama

&lt;span class="c"&gt;# Pull a model:&lt;/span&gt;
ollama pull qwen3:14b
&lt;span class="c"&gt;# Pull an embedder:&lt;/span&gt;
ollama pull nomic-embed-text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Install AnythingLLM
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Desktop app (easiest):&lt;/strong&gt; Download from anythingllm.com&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3001:3001 &lt;span class="nt"&gt;--name&lt;/span&gt; anythingllm &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-host&lt;/span&gt; host.docker.internal:host-gateway &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; anythingllm:/app/server/storage &lt;span class="se"&gt;\&lt;/span&gt;
  mintplexlabs/anythingllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Connect &amp;amp; Use
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open AnythingLLM (&lt;a href="http://localhost:3001" rel="noopener noreferrer"&gt;http://localhost:3001&lt;/a&gt; or desktop app)&lt;/li&gt;
&lt;li&gt;Settings &amp;gt; LLM Provider &amp;gt; Select Ollama, model qwen3:14b&lt;/li&gt;
&lt;li&gt;Settings &amp;gt; Embedder &amp;gt; Select Ollama, model nomic-embed-text&lt;/li&gt;
&lt;li&gt;Create a workspace, drop in documents, start asking questions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What You Can Do
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Chat with PDFs, Word docs, code files, web pages&lt;/li&gt;
&lt;li&gt;Create isolated workspaces per project&lt;/li&gt;
&lt;li&gt;Use built-in agent skills (web search, summarization)&lt;/li&gt;
&lt;li&gt;Works on CPU-only machines like a mini PC&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local&lt;/th&gt;
&lt;th&gt;ChatGPT + GPTs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$20-200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;$0-300&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Stays on your machine&lt;/td&gt;
&lt;td&gt;Sent to cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documents&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Token-limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Full guide with troubleshooting: &lt;a href="https://everylocalai.com/stack/anythingllm-ollama-rag" rel="noopener noreferrer"&gt;https://everylocalai.com/stack/anythingllm-ollama-rag&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build Visual AI Agent Pipelines with Langflow and Ollama</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:28:30 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-visual-ai-agent-pipelines-with-langflow-and-ollama-5ah0</link>
      <guid>https://dev.to/everylocalai/build-visual-ai-agent-pipelines-with-langflow-and-ollama-5ah0</guid>
      <description>&lt;p&gt;Prototype and deploy multi-agent and RAG applications with a visual drag-and-drop interface - all running locally with your own models.&lt;/p&gt;

&lt;p&gt;Langflow is an open-source visual framework for building AI applications. Connect it to Ollama for local inference, and you get a powerful environment for designing agent architectures, RAG pipelines, and chatbot workflows without writing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 12GB+ VRAM (or CPU-only for prototyping)&lt;/li&gt;
&lt;li&gt;Docker or Python 3.10+&lt;/li&gt;
&lt;li&gt;About 15 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Langflow&lt;/td&gt;
&lt;td&gt;Visual drag-and-drop flow builder and API server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Serves local LLM models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3 14B&lt;/td&gt;
&lt;td&gt;Default model - fits 12GB at Q4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A: Docker (Recommended)
&lt;/h3&gt;

&lt;p&gt;Save this as &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

  &lt;span class="na"&gt;langflow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langflowai/langflow:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langflow&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7860:7860"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;langflow_data:/app/langflow&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;LANGFLOW_AUTO_LOGIN=true&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;langflow_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Launch it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:7860" rel="noopener noreferrer"&gt;http://localhost:7860&lt;/a&gt; to access Langflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: pip Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langflow
langflow run
&lt;span class="c"&gt;# In another terminal:&lt;/span&gt;
ollama pull qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:7860" rel="noopener noreferrer"&gt;http://localhost:7860&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect Langflow to Ollama
&lt;/h2&gt;

&lt;p&gt;In the Langflow canvas, add:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ollama Chat Model&lt;/strong&gt; component - Base URL: &lt;a href="http://ollama:11434" rel="noopener noreferrer"&gt;http://ollama:11434&lt;/a&gt; (Docker) or &lt;a href="http://localhost:11434" rel="noopener noreferrer"&gt;http://localhost:11434&lt;/a&gt; (pip)&lt;/li&gt;
&lt;li&gt;Select model: qwen3:14b&lt;/li&gt;
&lt;li&gt;Connect to a &lt;strong&gt;Prompt&lt;/strong&gt; node and &lt;strong&gt;Chat Output&lt;/strong&gt; for a basic chatbot&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;h3&gt;
  
  
  RAG Chatbot
&lt;/h3&gt;

&lt;p&gt;Drag in: File &amp;gt; Ollama Embeddings &amp;gt; Vector Store (Chroma) &amp;gt; Ollama Chat Model &amp;gt; Chat Output. Upload a PDF, ask questions - answers come from your documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Research System
&lt;/h3&gt;

&lt;p&gt;Add an Agent node with a Web Search Tool + Ollama, add a second Agent for summarization. One agent gathers info, the other condenses it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document Processing Pipeline
&lt;/h3&gt;

&lt;p&gt;Combine File Loader &amp;gt; Splitter &amp;gt; Ollama Embeddings &amp;gt; Vector Store. Add Ollama Chat Model with custom prompts for Q&amp;amp;A over your documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local Langflow + Ollama&lt;/th&gt;
&lt;th&gt;Langflow Cloud + OpenAI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$50-200+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;~$300-600 once&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data privacy&lt;/td&gt;
&lt;td&gt;Stays on your machine&lt;/td&gt;
&lt;td&gt;Sent to cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI calls&lt;/td&gt;
&lt;td&gt;Unlimited, free&lt;/td&gt;
&lt;td&gt;Per-token billing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Full guide with detailed troubleshooting and alternatives: &lt;a href="https://everylocalai.com/stack/langflow-ollama-rag-agent" rel="noopener noreferrer"&gt;https://everylocalai.com/stack/langflow-ollama-rag-agent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Build a Local AI Workflow Automation with n8n and Ollama</title>
      <dc:creator>EveryLocalAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 21:27:03 +0000</pubDate>
      <link>https://dev.to/everylocalai/build-a-local-ai-workflow-automation-with-n8n-and-ollama-3e3m</link>
      <guid>https://dev.to/everylocalai/build-a-local-ai-workflow-automation-with-n8n-and-ollama-3e3m</guid>
      <description>&lt;p&gt;Automate tasks with AI-powered workflows that run entirely on your own hardware. n8n + Ollama = self-hosted Zapier with local LLM inference. No monthly fees, no data leaving your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 12GB+ VRAM (for local AI) or any machine with Docker (n8n works CPU-only too)&lt;/li&gt;
&lt;li&gt;Docker + Docker Compose&lt;/li&gt;
&lt;li&gt;About 15 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;n8n&lt;/td&gt;
&lt;td&gt;Visual workflow engine with 500+ integrations and AI agent nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Serves local LLM via OpenAI-compatible API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3 14B&lt;/td&gt;
&lt;td&gt;Default model - strong reasoning, fits 12GB at Q4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Save this as &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

  &lt;span class="na"&gt;n8n&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker.n8n.io/n8nio/n8n&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8n&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_RUNNERS_ENABLED=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;GENERIC_TIMEZONE=America/New_York&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;TZ=America/New_York&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;n8n_data:/home/node/.n8n&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5678:5678"&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n8n_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:5678" rel="noopener noreferrer"&gt;http://localhost:5678&lt;/a&gt; to access n8n.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect n8n to Ollama
&lt;/h2&gt;

&lt;p&gt;In n8n, add an &lt;strong&gt;Ollama Chat Model&lt;/strong&gt; node and set:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL: &lt;a href="http://ollama:11434" rel="noopener noreferrer"&gt;http://ollama:11434&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model: qwen3:14b&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use it with n8n's &lt;strong&gt;AI Agent&lt;/strong&gt; node for agentic workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Workflows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Email Summarizer
&lt;/h3&gt;

&lt;p&gt;Trigger: New email → AI step: "Summarize this email in 2 sentences" → Output: Slack message&lt;/p&gt;

&lt;h3&gt;
  
  
  Content Generator
&lt;/h3&gt;

&lt;p&gt;Trigger: Cron schedule → AI step: "Write a newsletter about [topic]" → Output: Email to subscribers&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Classifier
&lt;/h3&gt;

&lt;p&gt;Trigger: Webhook (support tickets) → AI step: "Classify as billing/technical/feature" → Output: Route to different teams&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local n8n + Ollama&lt;/th&gt;
&lt;th&gt;Zapier + ChatGPT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$20-100+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;~$300 once&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data safety&lt;/td&gt;
&lt;td&gt;Stays on your LAN&lt;/td&gt;
&lt;td&gt;Sent to cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI calls&lt;/td&gt;
&lt;td&gt;Unlimited, free&lt;/td&gt;
&lt;td&gt;Token-limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflows&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Task-limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After 3-6 months the hardware pays for itself.&lt;/p&gt;




&lt;p&gt;Full guide with detailed troubleshooting and alternatives: &lt;a href="https://everylocalai.com/stack/n8n-ollama-ai-automation" rel="noopener noreferrer"&gt;https://everylocalai.com/stack/n8n-ollama-ai-automation&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
