<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arsen Apostolov</title>
    <description>The latest articles on DEV Community by Arsen Apostolov (@sikamikanikobg).</description>
    <link>https://dev.to/sikamikanikobg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1410108%2Fb7d644e9-449a-4ef9-8bcc-73a2ff63902f.jpeg</url>
      <title>DEV Community: Arsen Apostolov</title>
      <link>https://dev.to/sikamikanikobg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sikamikanikobg"/>
    <language>en</language>
    <item>
      <title>I want to let an AI agent roam my homelab — looking for someone to build the MCP server</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sun, 07 Jun 2026 08:46:32 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/i-want-to-let-an-ai-agent-roam-my-homelab-looking-for-someone-to-build-the-mcp-server-3kh7</link>
      <guid>https://dev.to/sikamikanikobg/i-want-to-let-an-ai-agent-roam-my-homelab-looking-for-someone-to-build-the-mcp-server-3kh7</guid>
      <description>&lt;p&gt;I maintain a small open-source tool called HomeLab Monitor — one dashboard for every box in my homelab: host vitals, containers, systemd services, GPU, and which AI model servers are loaded right now.&lt;/p&gt;

&lt;p&gt;It's good at being a pair of human eyes. The next thing I want is to make it a source of &lt;em&gt;context&lt;/em&gt; for an AI agent.&lt;/p&gt;

&lt;p&gt;So the idea: give it an &lt;strong&gt;MCP server&lt;/strong&gt;. Model Context Protocol is the thing that lets an agent like Claude call tools and read resources. If the monitor speaks MCP, an agent can connect and explore the whole fleet — "which container is leaking RAM?", "the GPU's been pinned for an hour, who's driving it?", "this host wants a reboot and an OS upgrade, what order is safe?" — and start helping with the maintenance instead of me squinting at graphs.&lt;/p&gt;

&lt;p&gt;The fun part for whoever builds it: it's mostly a thin wrapper over a REST API that already exists. The monitor already serves clean, read-only JSON (&lt;code&gt;/api/data&lt;/code&gt;, &lt;code&gt;/api/fleet&lt;/code&gt;, &lt;code&gt;/api/host_data/&amp;lt;name&amp;gt;&lt;/code&gt;, &lt;code&gt;/metrics&lt;/code&gt;). MCP just adds the semantics — tools and resources with names an LLM can reason about instead of a raw blob. Read-only to start; any future write tool stays opt-in.&lt;/p&gt;

&lt;p&gt;It's genuinely weekend-sized if you've wrapped an MCP server around an API before — and a great first one if you haven't and want to learn.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/SikamikanikoBG/homelab-monitor" rel="noopener noreferrer"&gt;https://github.com/SikamikanikoBG/homelab-monitor&lt;/a&gt;&lt;br&gt;
The idea + a suggested first PR: &lt;a href="https://github.com/SikamikanikoBG/homelab-monitor/issues/70" rel="noopener noreferrer"&gt;https://github.com/SikamikanikoBG/homelab-monitor/issues/70&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If wiring this sounds fun, come say hi on the issue — I'll help scope the first commit.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>help</category>
      <category>selfhosted</category>
      <category>ai</category>
    </item>
    <item>
      <title>The homelab box you forgot you own is probably 47 updates behind — here’s the safe fix</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sun, 07 Jun 2026 06:01:16 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/the-homelab-box-you-forgot-you-own-is-probably-47-updates-behind-heres-the-safe-fix-1n0i</link>
      <guid>https://dev.to/sikamikanikobg/the-homelab-box-you-forgot-you-own-is-probably-47-updates-behind-heres-the-safe-fix-1n0i</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; My homelab monitor flagged my Plex/Pi-hole box &lt;strong&gt;47 packages and a kernel behind&lt;/strong&gt; — and I'd forgotten the machine existed. Here's the 5-minute non-interactive fix, and the one upgrade I deliberately &lt;em&gt;didn't&lt;/em&gt; run.&lt;/p&gt;

&lt;p&gt;This is the dev.to short version of &lt;a href="https://medium.com/@arsen.apostolov/my-own-dashboard-caught-one-of-my-machines-stealing-16gb-of-vram-e8288ea10f31" rel="noopener noreferrer"&gt;the Medium write-up&lt;/a&gt;. Same dashboard that caught &lt;a href="https://dev.to/sikamikanikobg/reclaiming-16gb-of-idle-vram-a-30-line-sidecar-that-evicts-comfyui-when-it-stops-working-2d9l"&gt;a service hoarding 16GB of VRAM last week&lt;/a&gt; — different, more boring villain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The signal
&lt;/h2&gt;

&lt;p&gt;The overview wore one small badge: &lt;strong&gt;⚠ 1 host behind&lt;/strong&gt;. Not my GPU box that I touch daily — &lt;strong&gt;cloudy&lt;/strong&gt;, the Plex / Pi-hole / Samba box that just works and therefore never gets looked at.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmubf7ggrelymn0st511.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmubf7ggrelymn0st511.png" alt="HomeLab Monitor — " width="800" height="507"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The monitor also flags a release upgrade as available — I'm deferring that one regardless of which version it lands on (more below).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;UPDATES column: &lt;strong&gt;&lt;code&gt;47 pending · ⬆ 26.04 available&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  The diagnosis
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ssh anakin@cloudy
&lt;span class="nv"&gt;$ &lt;/span&gt;lsb_release &lt;span class="nt"&gt;-ds&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;
Ubuntu 22.04.5 LTS
5.15.0-179-generic          &lt;span class="c"&gt;# running — but 5.15.0-181 was already installed, waiting on a reboot&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;apt list &lt;span class="nt"&gt;--upgradable&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; upgradable
47
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/run/reboot-required
&lt;span class="k"&gt;***&lt;/span&gt; System restart required &lt;span class="k"&gt;***&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Nothing was broken — Plex streamed, Pi-hole resolved, shares mounted. That's the trap: &lt;strong&gt;a box that's 47 behind doesn't tell you.&lt;/strong&gt; Among the 47: &lt;code&gt;systemd&lt;/code&gt;, &lt;code&gt;snapd&lt;/code&gt;, &lt;code&gt;apparmor&lt;/code&gt;, &lt;code&gt;nftables&lt;/code&gt;, &lt;code&gt;cloud-init&lt;/code&gt;, &lt;code&gt;linux-firmware&lt;/code&gt;, &lt;code&gt;openldap&lt;/code&gt;. Plenty of it security-relevant.&lt;/p&gt;
&lt;h2&gt;
  
  
  The fix (non-interactive, config-preserving)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DEBIAN_FRONTEND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;noninteractive &lt;span class="nv"&gt;NEEDRESTART_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;a
apt-get update
apt-get &lt;span class="nt"&gt;-o&lt;/span&gt; Dpkg::Options::&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--force-confold"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;-o&lt;/span&gt; Dpkg::Options::&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--force-confdef"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;-y&lt;/span&gt; full-upgrade
apt-get &lt;span class="nt"&gt;-y&lt;/span&gt; autoremove &lt;span class="nt"&gt;--purge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--force-confold&lt;/code&gt; → keep my existing config files, don't stop to ask.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NEEDRESTART_MODE=a&lt;/code&gt; → let &lt;code&gt;needrestart&lt;/code&gt; restart affected services itself instead of showing the blue full-screen menu that hangs an unattended run.&lt;/li&gt;
&lt;li&gt;Result: &lt;strong&gt;45 upgraded, 2 newly installed, 0 removed.&lt;/strong&gt; Clean.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then activate the kernel/systemd the box had been holding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reboot              &lt;span class="c"&gt;# ~90s of no DNS for the LAN — an on-purpose action, not a background one&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;
5.15.0-181-generic    &lt;span class="c"&gt;# back on the tailnet, now on the staged kernel&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Before / after
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmubf7ggrelymn0st511.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmubf7ggrelymn0st511.png" alt="cloudy before — 47 pending" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchcqkc21yqgf6snzcds6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchcqkc21yqgf6snzcds6.png" alt="cloudy before — All updated" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;47 → 0.&lt;/strong&gt; The package badge cleared.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I deliberately did NOT run
&lt;/h2&gt;

&lt;p&gt;The monitor also flags a full Ubuntu &lt;strong&gt;release&lt;/strong&gt; upgrade waiting. &lt;code&gt;do-release-upgrade&lt;/code&gt; on a remote, headless, house-critical box is a scheduled-window job — with a backup and a console in reach — not an unattended one. The dashboard surfacing it is the win; choosing to defer it is the right call. So I left it flagged, on purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The point
&lt;/h2&gt;

&lt;p&gt;I'm not disciplined about my boring boxes — nobody is. The only reason this got caught is one badge in one dashboard I already look at. The tool is &lt;strong&gt;HomeLab Monitor&lt;/strong&gt; — one container, MIT, no Prometheus/Grafana to stand up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;span class="c"&gt;# github.com/SikamikanikoBG/homelab-monitor&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When did you last log into your most reliable box, and how would you find out it was a month behind? Mine used a badge. What's watching yours — a cron &lt;code&gt;apt list --upgradable&lt;/code&gt;, &lt;code&gt;unattended-upgrades&lt;/code&gt; mail you actually read, or nothing? Genuinely curious which holds up for people.&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>selfhosting</category>
      <category>linux</category>
      <category>sysadmin</category>
    </item>
    <item>
      <title>Reclaiming 16GB of idle VRAM: a 30-line sidecar that evicts ComfyUI when it stops working</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sat, 06 Jun 2026 10:21:05 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/reclaiming-16gb-of-idle-vram-a-30-line-sidecar-that-evicts-comfyui-when-it-stops-working-2d9l</link>
      <guid>https://dev.to/sikamikanikobg/reclaiming-16gb-of-idle-vram-a-30-line-sidecar-that-evicts-comfyui-when-it-stops-working-2d9l</guid>
      <description>&lt;p&gt;My homelab is one Linux box with a single RTX 3090. 24GB of VRAM, and&lt;br&gt;
three GPU-hungry services that all want it: ComfyUI for image generation,&lt;br&gt;
WhisperX for transcription, Ollama for local LLMs. On one card, that's&lt;br&gt;
already a negotiation.&lt;/p&gt;

&lt;p&gt;Last week the negotiation broke. My own monitoring dashboard caught the&lt;br&gt;
culprit at a glance, so this is the short version: what it was, how I saw&lt;br&gt;
it, and the 30-line container that fixed it for good.&lt;/p&gt;

&lt;p&gt;(If you want the prequel — the time the Ollama triage model reserved a&lt;br&gt;
40,000-token context to do 8,000 tokens of work — that's&lt;br&gt;
&lt;a href="https://medium.com/@arsen.apostolov/two-llms-one-3090-zero-oom-96b54ab2afdd" rel="noopener noreferrer"&gt;Two LLMs, One 3090, Zero OOM&lt;/a&gt;.&lt;br&gt;
Same box. Same lesson.)&lt;/p&gt;
&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;I opened the GPU tab of my homelab dashboard for something unrelated and&lt;br&gt;
saw the card sitting at &lt;strong&gt;71% full while nothing was running&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse990lvzj2yvcwf5hk9i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse990lvzj2yvcwf5hk9i.png" alt="GPU right now: 17GB used, comfyui holding 16.3GB, 0% utilisation" width="799" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;16.3GB held by &lt;code&gt;comfyui&lt;/code&gt;. GPU utilisation: &lt;strong&gt;0%&lt;/strong&gt;. The model was loaded&lt;br&gt;
and doing absolutely nothing. The per-service history made it&lt;br&gt;
unambiguous — ComfyUI peaked at 16.3GB and held it 100% of the window:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlk5qelirx9ilbfafoc5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlk5qelirx9ilbfafoc5.png" alt="Services on the GPU: comfyui peak 16.3GB, 100% of the time" width="799" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the whole reason I built the dashboard. &lt;code&gt;nvidia-smi&lt;/code&gt; tells you&lt;br&gt;
VRAM is at 17/24GB. It does not tell you &lt;em&gt;which service&lt;/em&gt;, &lt;em&gt;which model&lt;/em&gt;,&lt;br&gt;
and &lt;em&gt;since when&lt;/em&gt;. The GPU tab maps every VRAM-using PID back to its&lt;br&gt;
container automatically, so "who is holding my GPU" is a glance, not five&lt;br&gt;
minutes of &lt;code&gt;ps -o cgroup&lt;/code&gt; archaeology.&lt;/p&gt;
&lt;h2&gt;
  
  
  The diagnosis
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;nvidia-smi&lt;/code&gt; on the host confirmed it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;nvidia-smi &lt;span class="nt"&gt;--query-compute-apps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;pid,used_memory,process_name &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader
&lt;span class="gp"&gt;111465, 16666 MiB, python3      #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&amp;lt;- ComfyUI, idle
&lt;span class="go"&gt;109583, 588 MiB, /app/.venv/bin/python
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ComfyUI keeps the checkpoint resident after a generation so the next&lt;br&gt;
request is fast. Sensible on a dedicated image-gen box. On a shared 24GB&lt;br&gt;
card it is hostile: the FLUX fp8 checkpoint is ~16GB, and ComfyUI 0.22&lt;br&gt;
has no idle timeout to give it back. Once you've generated one image,&lt;br&gt;
that 16GB is gone until you restart the container.&lt;/p&gt;

&lt;p&gt;Good news: ComfyUI has an API for exactly this. &lt;code&gt;POST /free&lt;/code&gt; with&lt;br&gt;
&lt;code&gt;unload_models&lt;/code&gt; drops the model out of VRAM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8188/free &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="go"&gt;    -H 'Content-Type: application/json' \
    -d '{"unload_models": true, "free_memory": true}'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One call took ComfyUI from 16666 MiB to 378 MiB. The model reloads&lt;br&gt;
automatically on the next &lt;code&gt;/prompt&lt;/code&gt; — about 20–30s added to that one&lt;br&gt;
request, which for an image I generate a few times a day is free.&lt;/p&gt;

&lt;p&gt;So I don't want to call &lt;code&gt;/free&lt;/code&gt; after every job (kills warm-cache speed&lt;br&gt;
for bursts). I want to call it after ComfyUI has been &lt;strong&gt;idle&lt;/strong&gt; for a&lt;br&gt;
while. ComfyUI won't do that itself, so I bolted it on from outside.&lt;/p&gt;
&lt;h2&gt;
  
  
  The fix: an idle-unload sidecar
&lt;/h2&gt;

&lt;p&gt;No ComfyUI fork, no custom node. A tiny container that watches the queue&lt;br&gt;
and evicts the model after a few minutes of inactivity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="c"&gt;# Unload ComfyUI models from VRAM after a period of queue inactivity.&lt;/span&gt;
&lt;span class="nv"&gt;INTERVAL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INTERVAL&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;30&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;IDLE_SECONDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IDLE_SECONDS&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;300&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMFY_URL&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;http&lt;/span&gt;://localhost:8188&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;idle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INTERVAL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nv"&gt;q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; 10 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;/queue"&lt;/span&gt; 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
  &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$q&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
  &lt;span class="c"&gt;# idle == both queue_running and queue_pending are empty arrays&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;q&lt;/span&gt;&lt;span class="p"&gt;#*\&lt;/span&gt;&lt;span class="s2"&gt;"queue_running&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: []}"&lt;/span&gt;&lt;span class="p"&gt; != &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$q&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt; ] &amp;amp;&amp;amp; \
     [ &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;q&lt;/span&gt;&lt;span class="p"&gt;#*\&lt;/span&gt;&lt;span class="s2"&gt;"queue_pending&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: []}"&lt;/span&gt;&lt;span class="p"&gt; != &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$q&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt; ]; then
    idle=&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;idle &lt;span class="o"&gt;+&lt;/span&gt; INTERVAL&lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="p"&gt;
    if [ &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$idle&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt; -ge &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$IDLE_SECONDS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt; ]; then
      curl -s -m 30 -X POST &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;/free"&lt;/span&gt;&lt;span class="p"&gt; -H &lt;/span&gt;&lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt;&lt;span class="p"&gt; \
        -d &lt;/span&gt;&lt;span class="s1"&gt;'{"unload_models":true,"free_memory":true}'&lt;/span&gt;&lt;span class="p"&gt; &amp;gt;/dev/null 2&amp;gt;&amp;amp;1
      idle=0
    fi
  else
    idle=0          # a job ran — reset the idle clock
  fi
done
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It polls &lt;code&gt;/queue&lt;/code&gt; every 30s. If both &lt;code&gt;queue_running&lt;/code&gt; and &lt;code&gt;queue_pending&lt;/code&gt;&lt;br&gt;
are empty, it adds to an idle counter. After 300s of continuous idle it&lt;br&gt;
POSTs &lt;code&gt;/free&lt;/code&gt; and resets. Any job resets the counter, so a burst of&lt;br&gt;
generations keeps the model warm — eviction only happens once you've&lt;br&gt;
genuinely stopped.&lt;/p&gt;

&lt;p&gt;No new image to build — &lt;code&gt;curlimages/curl&lt;/code&gt; already has &lt;code&gt;sh&lt;/code&gt; and &lt;code&gt;curl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;docker run -d --name comfyui-idle-unloader --restart unless-stopped \
  --network host -e IDLE_SECONDS=300 -e INTERVAL=30 \
  -v /opt/comfyui-idle-unloader/unload-idle.sh:/unload-idle.sh:ro \
  --entrypoint sh curlimages/curl:latest /unload-idle.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--network host&lt;/code&gt; so it can reach ComfyUI on localhost, &lt;code&gt;--restart&lt;br&gt;
unless-stopped&lt;/code&gt; so it survives reboots. That's the whole deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before / after
&lt;/h2&gt;

&lt;p&gt;Watching it on the same dashboard, the story is one cliff:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnak5i94j9rwu4zr28ufl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnak5i94j9rwu4zr28ufl.png" alt="VRAM by service over time: long plateau at capacity, then a cliff down to ~1GB" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ComfyUI, idle&lt;/th&gt;
&lt;th&gt;VRAM held&lt;/th&gt;
&lt;th&gt;GPU "full"&lt;/th&gt;
&lt;th&gt;Free for WhisperX + Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Before&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16.3 GB&lt;/td&gt;
&lt;td&gt;71%&lt;/td&gt;
&lt;td&gt;~7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;After&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.37 GB&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;td&gt;~23 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ul1mt2hueukza60l5ll.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ul1mt2hueukza60l5ll.png" alt="After: comfyui evicted to 378MB, 16GB handed back" width="799" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;~16GB back, for the cost of one slightly slower image every once in a&lt;br&gt;
while. WhisperX and Ollama stopped fighting over the leftovers.&lt;/p&gt;

&lt;p&gt;No fork, no patch, no upstream PR to wait on. Thirty lines of &lt;code&gt;sh&lt;/code&gt; and a&lt;br&gt;
container that does one thing. If ComfyUI ships an idle TTL tomorrow, I&lt;br&gt;
delete it and lose nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a sidecar and not a patch
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decoupled.&lt;/strong&gt; It knows nothing about ComfyUI's internals — just the
public &lt;code&gt;/queue&lt;/code&gt; and &lt;code&gt;/free&lt;/code&gt; endpoints. ComfyUI can update under it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nothing to maintain.&lt;/strong&gt; It rides ComfyUI's stable HTTP API; an update
to ComfyUI doesn't touch it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same pattern works elsewhere.&lt;/strong&gt; Anything with a "unload model"
endpoint (A1111, vLLM with sleep mode, TGI) can be evicted the same
way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The meta-point, and the reason I keep building on the dashboard: I didn't&lt;br&gt;
find this by reading logs. I found it because a tool attributed 16GB to a&lt;br&gt;
named, idle service on one screen. You can't reclaim VRAM you can't see.&lt;/p&gt;

&lt;p&gt;The monitor is one container, MIT-licensed, &lt;code&gt;docker compose up -d --build&lt;/code&gt;&lt;br&gt;
to try. NVIDIA-only on the GPU panel for now, single-host by design:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/SikamikanikoBG/homelab-monitor" rel="noopener noreferrer"&gt;github.com/SikamikanikoBG/homelab-monitor&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How does everyone else handle idle model eviction on a shared GPU — a&lt;br&gt;
sidecar like this, a TTL in the model server, or do you just &lt;code&gt;docker&lt;br&gt;
restart&lt;/code&gt; and move on? Genuinely curious which approach holds up.&lt;/p&gt;

</description>
      <category>homelabselfaidockerhosted</category>
    </item>
    <item>
      <title>Fitting WhisperX large-v3 + a 24B LLM on one 3090: a reproducible context-capping recipe</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Wed, 03 Jun 2026 03:35:14 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/fitting-whisperx-large-v3-a-24b-llm-on-one-3090-a-reproducible-context-capping-recipe-22g0</link>
      <guid>https://dev.to/sikamikanikobg/fitting-whisperx-large-v3-a-24b-llm-on-one-3090-a-reproducible-context-capping-recipe-22g0</guid>
      <description>&lt;p&gt;This is the technical, reproducible version of a fix I shipped on my own homelab. If you want the narrative version, that's on Medium. This one is the recipe: the measurements, the math, the Modelfile, and the exact prompt I gave Claude Code to generate it. Copy-paste friendly.&lt;/p&gt;

&lt;p&gt;Repo for the dashboard used throughout: &lt;strong&gt;&lt;a href="https://github.com/SikamikanikoBG/homelab-monitor" rel="noopener noreferrer"&gt;https://github.com/SikamikanikoBG/homelab-monitor&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;One 24GB RTX 3090, two GPU services: &lt;strong&gt;WhisperX large-v3&lt;/strong&gt; (STT, 7.7GB peak) and a &lt;strong&gt;Devstral Small 24B&lt;/strong&gt; email-triage LLM (Q4_K_M, ~18.3GB).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;18.3 + 7.7 = 26GB&lt;/code&gt; → CUDA OOM whenever they overlapped.&lt;/li&gt;
&lt;li&gt;The LLM was loaded with a 40k context window but the triage job never needed more than ~5–8k tokens.&lt;/li&gt;
&lt;li&gt;Capped &lt;code&gt;num_ctx&lt;/code&gt; to 8192 → KV cache drops from ~6.1GB to ~1.25GB → model footprint ~18.3GB → &lt;strong&gt;~14.2GB&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;14.2 + 7.7 = 21.9GB&lt;/code&gt; → both resident, zero OOM, no quality loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;openSUSE, Xeon (56 threads), 125GB RAM, 1x RTX 3090 (24GB)&lt;/span&gt;
&lt;span class="na"&gt;GPU svc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WhisperX large-v3  (speech-to-text)&lt;/span&gt;
&lt;span class="na"&gt;GPU svc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ollama -&amp;gt; devstral-small-2 (24B, Q4_K_M) for background email triage&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both services run all the time. The OOM only happened when I dictated to my assistant (WhisperX) &lt;em&gt;while&lt;/em&gt; the triage loop was active.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — Make the contention measurable
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;nvidia-smi&lt;/code&gt; shows instantaneous VRAM. It can't show you &lt;em&gt;which&lt;/em&gt; service spiked or &lt;em&gt;when&lt;/em&gt; two of them overlapped — and an intermittent OOM is a timing problem. You need per-service VRAM history.&lt;/p&gt;

&lt;p&gt;I use my own dashboard (homelab-monitor) for this. The relevant view is "AI Models", which attributes VRAM per model server and per loaded model, over a time range, with OOM markers and a capacity ceiling line.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07mbomkuuvfftsdj7v2w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07mbomkuuvfftsdj7v2w.png" alt="VRAM by service over time, spiking into the 24GB ceiling with an OOM marker" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What the history showed at the overlap window:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Peak VRAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Devstral 24B (triage)&lt;/td&gt;
&lt;td&gt;~18.3 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WhisperX large-v3&lt;/td&gt;
&lt;td&gt;7.7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~26 GB on a 24 GB card&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4b0anarc20daso9owds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4b0anarc20daso9owds.png" alt="Per-model attribution: old model 18.3GB vs capped triage variant 14.2GB, WhisperX 7.7GB — before/after on one frame" width="799" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to reproduce the measurement, the dashboard runs as a single container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SikamikanikoBG/homelab-monitor
&lt;span class="nb"&gt;cd &lt;/span&gt;homelab-monitor
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;span class="c"&gt;# open http://&amp;lt;host&amp;gt;:9800  -&amp;gt; AI Models / GPU views&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(NVIDIA Container Toolkit required for GPU metrics. Remote hosts are monitored over SSH, no agent.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Measure what the job actually needs
&lt;/h2&gt;

&lt;p&gt;Weights are a fixed cost (~15GB for Devstral 24B at Q4_K_M). The variable cost is the &lt;strong&gt;KV cache&lt;/strong&gt;, which scales linearly with &lt;code&gt;num_ctx&lt;/code&gt;. So the question is: how much context does background email triage actually use?&lt;/p&gt;

&lt;p&gt;I pulled the request traces from Langfuse. The triage pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;truncates each email body to 300–500 chars,&lt;/li&gt;
&lt;li&gt;batches ~10 emails per call,&lt;/li&gt;
&lt;li&gt;caps generation around 2k tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real prompts never exceeded ~5–8k tokens. The model was loaded with a 40k window — ~32k tokens of reserved KV cache doing nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Do the KV-cache math
&lt;/h2&gt;

&lt;p&gt;Devstral Small is &lt;code&gt;mistral3&lt;/code&gt;. Pull the architecture straight from Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/show &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"name":"devstral-small-2:latest"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys,json;mi=json.load(sys.stdin)['model_info'];&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
print({k:v for k,v in mi.items() if 'head_count' in k or 'block_count' in k or 'length' in k})"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Relevant values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="err"&gt;block_count&lt;/span&gt; &lt;span class="err"&gt;(layers)&lt;/span&gt;      &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="err"&gt;40&lt;/span&gt;
&lt;span class="py"&gt;attention.head_count_kv&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;8&lt;/span&gt;
&lt;span class="py"&gt;attention.key_length&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;128&lt;/span&gt;
&lt;span class="py"&gt;attention.value_length&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;128&lt;/span&gt;
&lt;span class="err"&gt;context_length&lt;/span&gt; &lt;span class="err"&gt;(native)&lt;/span&gt;   &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="err"&gt;8192&lt;/span&gt;   &lt;span class="c"&gt;# rope-extended to 393216
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;KV cache per token (f16) = &lt;code&gt;2 (K+V) × layers × kv_heads × head_dim × 2 bytes&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2 × 40 × 8 × 128 × 2  =  163,840 bytes  ≈  0.156 MB / token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;num_ctx&lt;/th&gt;
&lt;th&gt;KV cache (f16)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;40,960&lt;/td&gt;
&lt;td&gt;~6.1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;~2.5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8,192&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~1.25 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;~0.6 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;8192 is the sweet spot: it's above the real worst-case prompt (~5–8k) &lt;strong&gt;and&lt;/strong&gt; it's the model's native context length, so there's no rope extrapolation quality hit. I rejected 4096 — a 10-email batch with 2k generation can brush up against it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Generate the capped model
&lt;/h2&gt;

&lt;p&gt;Ollama lets you inherit existing weights and override parameters in a Modelfile, so this costs no extra disk and no re-download.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Modelfile.triage&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; devstral-small-2:latest&lt;/span&gt;

&lt;span class="c"&gt;# Native 8K window: covers every triage prompt (10-email batches + 2K generation)&lt;/span&gt;
&lt;span class="c"&gt;# while keeping the KV cache ~1.25GB so the model + WhisperX fit on one 24GB GPU.&lt;/span&gt;
PARAMETER num_ctx 8192
PARAMETER temperature 0
PARAMETER num_predict 2048

SYSTEM """You are a background email-triage engine. Follow the exact output
format in each request. Output only the requested label(s) or field(s). Never
&lt;span class="k"&gt;add&lt;/span&gt;&lt;span class="s"&gt; explanations, preamble, or commentary. When uncertain, pick the closest&lt;/span&gt;
valid option. Be terse and deterministic."""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama create devstral-small-2:triage &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile.triage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The optional &lt;code&gt;SYSTEM&lt;/code&gt; block is a small bonus: triage prompts want terse, structured output, and pinning that behaviour cuts stray preamble (fewer reparse/retry calls = less GPU time).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Claude Code prompt I used
&lt;/h3&gt;

&lt;p&gt;I let Claude Code do the measuring and the Modelfile generation. The prompt, roughly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Analyze my background email triage. Pull the Langfuse traces to find the real prompt/context sizes the triage job uses, decide a safe &lt;code&gt;num_ctx&lt;/code&gt; cap that won't truncate worst-case batches, confirm the KV-cache savings against the model's actual architecture, and generate an Ollama Modelfile for a context-capped &lt;code&gt;:triage&lt;/code&gt; variant. Then tell me the expected VRAM footprint.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It came back with: traces show ≤8k tokens, cap at 8192 (native window), ~5GB KV saved, expected footprint ~14–16GB. Which matched what the dashboard measured after I deployed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5 — Verify on the GPU
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# load it&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/generate &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"devstral-small-2:triage","prompt":"ping","stream":false}'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null

&lt;span class="c"&gt;# check resident VRAM + context&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/ps &lt;span class="se"&gt;\&lt;/span&gt;
  | python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys,json;[print(m['name'],round(m['size_vram']/1e9,1),'GB ctx',m['context_length']) for m in json.load(sys.stdin)['models']]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: the triage model holds ~14GB resident at &lt;code&gt;ctx=8192&lt;/code&gt;, down from ~18GB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkgk2m71ohlj44egu37t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkgk2m71ohlj44egu37t.png" alt="After: both services coresident on the GPU, no pressure" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Triage LLM&lt;/td&gt;
&lt;td&gt;~18.3 GB&lt;/td&gt;
&lt;td&gt;~14.2 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WhisperX large-v3&lt;/td&gt;
&lt;td&gt;7.7 GB&lt;/td&gt;
&lt;td&gt;7.7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Combined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~26 GB → OOM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~21.9 GB → fits&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both services now sit on the card together. Full STT quality, email triage in parallel, ~2GB headroom. No quant change, no CPU offload, no smaller Whisper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A shared-GPU OOM is a timing problem.&lt;/strong&gt; Point-in-time &lt;code&gt;nvidia-smi&lt;/code&gt; can't diagnose it — get per-service VRAM &lt;em&gt;history&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Match &lt;code&gt;num_ctx&lt;/code&gt; to the real workload.&lt;/strong&gt; Reserved context is pure VRAM cost. Background jobs almost always over-reserve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer the model's native context length&lt;/strong&gt; as your cap when you can — no rope-extrapolation quality hit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure twice (traces + GPU history), cap once.&lt;/strong&gt; The fix was three lines; knowing it was the right three lines took the data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dashboard used for the per-service VRAM history: &lt;strong&gt;&lt;a href="https://github.com/SikamikanikoBG/homelab-monitor" rel="noopener noreferrer"&gt;https://github.com/SikamikanikoBG/homelab-monitor&lt;/a&gt;&lt;/strong&gt; — it's open source, runs in one container, and exists because I needed exactly this view and &lt;code&gt;nvidia-smi&lt;/code&gt; wouldn't give it to me.&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>ollama</category>
      <category>localllm</category>
      <category>devops</category>
    </item>
    <item>
      <title>I got tired of guessing which model holds my VRAM, so I built a tiny dashboard</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Tue, 26 May 2026 19:32:29 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/i-got-tired-of-guessing-which-model-holds-my-vram-so-i-built-a-tiny-dashboard-3jfn</link>
      <guid>https://dev.to/sikamikanikobg/i-got-tired-of-guessing-which-model-holds-my-vram-so-i-built-a-tiny-dashboard-3jfn</guid>
      <description>&lt;p&gt;Quick story.&lt;/p&gt;

&lt;p&gt;I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a couple of local model servers (Ollama mostly, vLLM when I'm playing around).&lt;/p&gt;

&lt;p&gt;Every "why is this model OOM-ing" turned into the same five minutes of archaeology:&lt;/p&gt;

&lt;p&gt;nvidia-smi  →  pick a PID&lt;br&gt;
ps -o cgroup -p   →  find the container ID&lt;br&gt;
docker ps  →  map ID to name&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Just to answer: &lt;strong&gt;which container, which model, is eating my VRAM right now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tried Prometheus + Grafana + node-exporter + dcgm-exporter. It works, but for one box it's a stack-on-a-stack to answer a single question.&lt;/p&gt;

&lt;p&gt;So I built a third option: one container, one page. GPU panel maps VRAM-using processes back to their Docker container automatically. AI Models panel queries each model server's own API (Ollama &lt;code&gt;/api/ps&lt;/code&gt;, vLLM &lt;code&gt;/v1/models&lt;/code&gt;, llama.cpp, TGI, A1111, ComfyUI) and shows you which model is loaded.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker compose up -d --build&lt;/code&gt; and that's the whole setup.&lt;/p&gt;

&lt;p&gt;History in SQLite, downsampled on read. No agents, no cloud, no Prometheus.&lt;/p&gt;

&lt;p&gt;The repo, with the longer technical write-up and screenshots:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/SikamikanikoBG/homelab-monitor" rel="noopener noreferrer"&gt;github.com/SikamikanikoBG/homelab-monitor&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. NVIDIA-only on the GPU panel for now — AMD/Intel back-ends are a &lt;code&gt;good first issue&lt;/code&gt; if anyone wants to extend.&lt;/p&gt;

&lt;p&gt;Curious how others here solve the "who holds my VRAM" problem. Different tool? Different stack? Or did you also build something tiny because the big stacks felt like too much for one box?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>monitoring</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sun, 22 Mar 2026 08:35:15 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/she-drew-a-dragon-dog-i-built-it-an-ai-89i</link>
      <guid>https://dev.to/sikamikanikobg/she-drew-a-dragon-dog-i-built-it-an-ai-89i</guid>
      <description>&lt;p&gt;She drew a purple dragon with six legs and called it "a dragon but also a dog."&lt;/p&gt;

&lt;p&gt;Capped the marker. Moved on. Zero doubt.&lt;/p&gt;

&lt;p&gt;I thought: what if that drawing could actually come to life?&lt;/p&gt;

&lt;p&gt;So I started building it. In secret.&lt;/p&gt;

&lt;p&gt;The full story — the design thinking, the tech stack, and why the reveal moment matters more than the launch — is on Medium:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://medium.com/@arsen.apostolov/building-a-game-for-my-daughter-with-ai-part-1" rel="noopener noreferrer"&gt;https://medium.com/@arsen.apostolov/building-a-game-for-my-daughter-with-ai-part-1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Part 2 (Lovable vs Replit, MCP architecture, GPT-4 Vision pipeline) drops next week.&lt;/p&gt;

&lt;p&gt;Follow if you want to watch a slightly obsessed AI developer build something for an audience of one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Hit a $200 AI Bill and Built My Own Server Instead - Complete Guide!</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sat, 07 Jun 2025 17:03:36 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/i-hit-a-200-ai-bill-and-built-my-own-server-instead-4a59</link>
      <guid>https://dev.to/sikamikanikobg/i-hit-a-200-ai-bill-and-built-my-own-server-instead-4a59</guid>
      <description>&lt;p&gt;Hit a $200 Claude API bill last month ($2400 and above on yearly basis!). That was my wake-up call.&lt;/p&gt;

&lt;p&gt;Built my own AI server instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTX 3090 24GB (used): $750 one-time&lt;/li&gt;
&lt;li&gt;Zero monthly costs&lt;/li&gt;
&lt;li&gt;Access from anywhere via VPN&lt;/li&gt;
&lt;li&gt;Unlimited usage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama locally&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.ai/install.sh | sh

&lt;span class="c"&gt;# Download coding models&lt;/span&gt;
ollama pull qwen2.5-coder:14b
ollama pull devstral

&lt;span class="c"&gt;# Use with aider&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama_chat/devstral &lt;span class="nt"&gt;--api-base&lt;/span&gt; http://10.0.0.1:11434
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Remote Access via WireGuard
&lt;/h2&gt;

&lt;p&gt;The trick: secure VPN tunnel to home server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux server running Ollama&lt;/li&gt;
&lt;li&gt;WireGuard VPN for encrypted access&lt;/li&gt;
&lt;li&gt;Router port forwarding (UDP 51820)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Works from coffee shops, client offices, anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results After 6 Months
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$0 monthly bills&lt;/strong&gt; (was $40-60/month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster responses&lt;/strong&gt; than cloud APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No rate limits&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% private&lt;/strong&gt; - code never leaves my network&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Want the Full Guide?
&lt;/h2&gt;

&lt;p&gt;Complete walkthrough here: &lt;strong&gt;&lt;a href="https://medium.com/@arsen.apostolov/stop-paying-for-chatgpt-run-your-own-ai-models-and-access-them-from-anywhere-32338d94b6e9" rel="noopener noreferrer"&gt;Stop Paying for ChatGPT - Run Your Own AI Models&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step-by-step server setup&lt;/li&gt;
&lt;li&gt;WireGuard VPN configuration&lt;/li&gt;
&lt;li&gt;Router setup&lt;/li&gt;
&lt;li&gt;Client configs for all platforms&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Beyond saving money, you learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure management&lt;/li&gt;
&lt;li&gt;VPN security&lt;/li&gt;
&lt;li&gt;Cost optimization&lt;/li&gt;
&lt;li&gt;Enterprise-ready solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies increasingly want AI that keeps data internal. This gives you both the skills and the setup.&lt;/p&gt;




&lt;p&gt;*Connect: &lt;a href="//www.linkedin.com/in/arsenapostolov"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>coding</category>
      <category>remote</category>
    </item>
    <item>
      <title>Stop Copy-Pasting Your Code to LLMs — I Built a Tool That Does It Automatically</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sun, 23 Mar 2025 06:46:18 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/stop-copy-pasting-your-code-to-llms-i-built-a-tool-that-does-it-automatically-383c</link>
      <guid>https://dev.to/sikamikanikobg/stop-copy-pasting-your-code-to-llms-i-built-a-tool-that-does-it-automatically-383c</guid>
      <description>&lt;p&gt;Ever wanted to ask ChatGPT or Claude about your codebase, but got tired of copy-pasting files one by one? Yeah, me too. It was driving me crazy.&lt;/p&gt;

&lt;p&gt;Why I Built This&lt;br&gt;
I was working on a project with multiple files and needed some help understanding how everything connected. So I started copy-pasting files into ChatGPT, one after another, trying to give it enough context.&lt;/p&gt;

&lt;p&gt;It was a complete pain:&lt;/p&gt;

&lt;p&gt;The context window would fill up&lt;br&gt;
I’d forget important files&lt;br&gt;
I had to manually explain connections between files&lt;br&gt;
The LLM would lose track of what was what&lt;br&gt;
After the third time doing this dance, I decided to just build something to solve it myself.&lt;/p&gt;

&lt;p&gt;Find the full guide in my &lt;a href="https://medium.com/@arsen.apostolov/stop-copy-pasting-your-code-to-llms-i-built-a-tool-that-does-it-automatically-1b554e188c17" rel="noopener noreferrer"&gt;Medium article&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>productivity</category>
      <category>tooling</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Testing Aider: Practical Experience with Different Models</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Mon, 17 Feb 2025 05:14:13 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/testing-aider-practical-experience-with-different-models-58f2</link>
      <guid>https://dev.to/sikamikanikobg/testing-aider-practical-experience-with-different-models-58f2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7qdnv1ftehapp90q8k2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7qdnv1ftehapp90q8k2.png" alt="Aider" width="800" height="1255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I like coding agents and all kind of supportive plug-ins to make my day easier. From Continue to Copilot. I must say that the most complete solution so far for me was Cursor, yet it is not free (see the plans).&lt;/p&gt;

&lt;p&gt;So i found this super cool alternative - Aider chat.&lt;/p&gt;

&lt;p&gt;I tested Aider with multiple AI backends: OpenAI, Claude, and local Ollama server.&lt;/p&gt;

&lt;p&gt;First 30 minutes were spent learning the tool - understanding commands and workflow. After this initial setup phase, development speed increased significantly.&lt;br&gt;
Model comparison from practical use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude: Best performance when working remotely&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Local setup: Ollama with deepseek r1 7b and qwen coder 2.5 7b&lt;/strong&gt;&lt;br&gt;
Home setup preference: Architect mode with Ollama models&lt;/p&gt;

&lt;p&gt;Key observation: Local models provide good performance without cloud dependencies. The initial learning curve is worth the productivity gain.&lt;/p&gt;

&lt;p&gt;What's your experience with Aider? Particularly interested in local model configurations and performance comparisons.&lt;/p&gt;

</description>
      <category>coding</category>
      <category>ai</category>
      <category>powerfuldevs</category>
      <category>programming</category>
    </item>
    <item>
      <title>I’m sharing my LLM Code Lens Python package: My Secret Weapon for AI-Powered Coding</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Fri, 10 Jan 2025 04:56:15 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/i-wrote-llm-code-lens-python-package-my-secret-weapon-for-ai-powered-coding-1e4h</link>
      <guid>https://dev.to/sikamikanikobg/i-wrote-llm-code-lens-python-package-my-secret-weapon-for-ai-powered-coding-1e4h</guid>
      <description>&lt;p&gt;Hey everyone,&lt;/p&gt;

&lt;p&gt;I want to share my tool that has completely transformed how I work with AI assistants. If you're tired of writing endless, complex prompts and struggling to get precise code insights, this is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Working with large language models like ChatGPT, Claude, or Mistral always felt like a communication battle. How do you explain an entire project's context in a single prompt?&lt;/p&gt;

&lt;h2&gt;
  
  
  My Solution: LLM Code Lens
&lt;/h2&gt;

&lt;p&gt;I built a simple, powerful package that generates a comprehensive project context file in seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Go to the output dir:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kchmzu9j9xzy3y3lsf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kchmzu9j9xzy3y3lsf2.png" alt=" " width="799" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Preview the analysis.txt It contains summary of your codebase - imports, functions, documentation etc so that LLMs can have pretty good context.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p7rv4d07csnvyaznwdt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p7rv4d07csnvyaznwdt.png" alt="analysis.txt" width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Preview the full.txt It contains your codebase aggregated in a file. Great for smaller projects (bellow 10K lines of code based on my experience).
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focrxzl82eoleedcf2qor.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focrxzl82eoleedcf2qor.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Use It
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Install the package
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llm-code-lens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Generate project context
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create condensed description of the project. Sufficient for big projects to provide context to LLMs.&lt;/span&gt;
&lt;span class="c"&gt;# Generates analysis.txt in the output folder. Default folder: .codelens&lt;/span&gt;
llmcl

&lt;span class="c"&gt;# On top of the analysis.txt it generates all the codebase in full.txt so that LLM can see all your projects. The output files are divided into 100K tokens file so that they can fit in any LLM including local Llama3.3 70B&lt;/span&gt;
llmcl &lt;span class="nt"&gt;--full&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Get context files in &lt;code&gt;.codelens/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;analysis.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;analysis.json&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paste these directly into your AI assistant&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The prompt&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60v0odkkvxkdwjxtzkub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60v0odkkvxkdwjxtzkub.png" alt=" " width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Immediate result&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62vtg8o7w1kcwq3xmap6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62vtg8o7w1kcwq3xmap6.png" alt=" " width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Instant Efficiency!
&lt;/h3&gt;

&lt;p&gt;No more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing 500-word explanations&lt;/li&gt;
&lt;li&gt;Struggling to provide context&lt;/li&gt;
&lt;li&gt;Wasting time on prompt engineering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just pure, efficient code insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Personal Experience
&lt;/h2&gt;

&lt;p&gt;This tool has been an essential part of my daily coding routine for months. It has significantly improved my development performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;🚀 &lt;a href="https://github.com/SikamikanikoBG/codelens" rel="noopener noreferrer"&gt;Star the Project on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try it. Love it. Share it.&lt;/p&gt;

&lt;p&gt;Hope it helps you as much as it's helped me.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br&gt;
Arsen&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Connect! 🤝
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;💼 Connect with me on &lt;a href="https://www.linkedin.com/in/arsenapostolov" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎮 Join our Random42 community on &lt;a href="https://discord.gg/vYEjQvtcqU" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; - AI news, Success stories, Use cases and Support for your project!&lt;/li&gt;
&lt;li&gt;📝 Follow my tech journey on &lt;a href="https://dev.to/sikamikanikobg"&gt;Dev.to&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




</description>
    </item>
    <item>
      <title>My Weekend project on GitHub: Making AI Art Creation Simple For Everyone 🎨</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Sun, 05 Jan 2025 12:30:39 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/my-weekend-project-on-github-making-ai-art-creation-simple-for-everyone-43b9</link>
      <guid>https://dev.to/sikamikanikobg/my-weekend-project-on-github-making-ai-art-creation-simple-for-everyone-43b9</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Check out the project on &lt;a href="https://github.com/SikamikanikoBG/ImageGenerator" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give it a ⭐ if you find it useful!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a developer passionate about AI, I saw an opportunity to make image generation more accessible. I wanted to create something that would enable everyone to explore the amazing possibilities of AI art without getting caught up in technical complexities. That's why I built ImageGenerator - a tool I crafted from the ground up to handle all the technical aspects behind the scenes, letting you focus purely on creativity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built ImageGenerator Different
&lt;/h2&gt;

&lt;p&gt;A key feature that sets ImageGenerator apart is that it runs completely locally on your machine. Unlike cloud-based solutions, your data never leaves your computer - there's no uploading to external servers, no privacy concerns, and no usage limits. You have full control over everything.&lt;/p&gt;

&lt;p&gt;What also sets my implementation apart is that it's built by a developer who prioritizes both user experience and data privacy. Every feature comes from solving real problems I encountered, and I've refined the interface based on actual usage and feedback. Whether you're working on personal projects or professional tasks, you can use ImageGenerator with complete peace of mind.&lt;/p&gt;

&lt;p&gt;Here's what you get out of the box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🏠 100% local setup - your data never leaves your machine&lt;/li&gt;
&lt;li&gt;🔒 Complete privacy - no cloud services or external servers needed&lt;/li&gt;
&lt;li&gt;🎯 Simple web interface - no command line needed&lt;/li&gt;
&lt;li&gt;🚀 One-click installation with all dependencies handled&lt;/li&gt;
&lt;li&gt;🎨 Support for both local and online AI models&lt;/li&gt;
&lt;li&gt;🎥 Built-in image-to-video conversion&lt;/li&gt;
&lt;li&gt;📊 Real-time generation progress and status updates&lt;/li&gt;
&lt;li&gt;⚡ No usage limits or API costs - generate as much as you want&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  See It In Action
&lt;/h2&gt;

&lt;p&gt;Here are some images I generated using ImageGenerator:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t3q7poz3t5d55rg99vj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t3q7poz3t5d55rg99vj.png" alt="Baby Gizmo" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0t5dtuxprwg6i80bm14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0t5dtuxprwg6i80bm14.png" alt="Smiling woman" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhm4ygxndzndkjuh0ry8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhm4ygxndzndkjuh0ry8z.png" alt="Kid with balloons" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8einih3ec5p0a3dbhnd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8einih3ec5p0a3dbhnd.png" alt="ISS orbiting earth" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started in 2 Minutes
&lt;/h2&gt;

&lt;p&gt;The best part? Everything runs &lt;strong&gt;locally on your machine&lt;/strong&gt;. No accounts to create, no API keys to manage, and no data privacy concerns. Just follow these simple steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https://github.com/SikamikanikoBG/ImageGenerator
&lt;span class="nb"&gt;cd &lt;/span&gt;stable-diffusion-client

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch torchvision torchaudio &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cu118
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Start the server&lt;/span&gt;
python server.py

&lt;span class="c"&gt;# In a new terminal, start the client&lt;/span&gt;
python client.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! Visit &lt;a href="http://localhost:7860" rel="noopener noreferrer"&gt;http://localhost:7860&lt;/a&gt; in your browser, and you're ready to start generating images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features That Make Life Easier
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🏠 True Local Processing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;All processing happens on your machine&lt;/li&gt;
&lt;li&gt;No internet connection needed after setup&lt;/li&gt;
&lt;li&gt;Your images and prompts stay private&lt;/li&gt;
&lt;li&gt;Generate unlimited images without restrictions&lt;/li&gt;
&lt;li&gt;Full control over your data and models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🎨 Intuitive Web Interface
&lt;/h3&gt;

&lt;p&gt;No more juggling with command-line parameters. Everything you need is organized in clear, easy-to-understand tabs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connection settings&lt;/li&gt;
&lt;li&gt;Project management&lt;/li&gt;
&lt;li&gt;Generation parameters&lt;/li&gt;
&lt;li&gt;Output gallery&lt;/li&gt;
&lt;li&gt;Video conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🤖 Smart Model Management
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatic scanning and loading of models&lt;/li&gt;
&lt;li&gt;Support for both local and Hugging Face models&lt;/li&gt;
&lt;li&gt;Easy model comparison to find what works best for you&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🎥 One-Click Video Creation
&lt;/h3&gt;

&lt;p&gt;Transform your still images into videos with preset animations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subtle movements&lt;/li&gt;
&lt;li&gt;Normal flow&lt;/li&gt;
&lt;li&gt;Slow motion&lt;/li&gt;
&lt;li&gt;Ultra-slow effects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Applications
&lt;/h2&gt;

&lt;p&gt;Students and developers are already using ImageGenerator for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating custom artwork for projects&lt;/li&gt;
&lt;li&gt;Generating placeholder images for websites&lt;/li&gt;
&lt;li&gt;Experimenting with AI art styles&lt;/li&gt;
&lt;li&gt;Building portfolios of AI-generated content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join Our Community!
&lt;/h2&gt;

&lt;p&gt;If you find ImageGenerator useful:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;⭐ Star the &lt;a href="https://github.com/SikamikanikoBG/ImageGenerator" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🤝 Join our &lt;a href="https://discord.gg/vYEjQvtcqU" rel="noopener noreferrer"&gt;Discord server&lt;/a&gt; for:

&lt;ul&gt;
&lt;li&gt;Tips and tricks&lt;/li&gt;
&lt;li&gt;Showcase your creations&lt;/li&gt;
&lt;li&gt;Get help when needed&lt;/li&gt;
&lt;li&gt;Connect with other creators&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Behind the Scenes: How I Built It
&lt;/h2&gt;

&lt;p&gt;I carefully chose each technology to create the most robust and user-friendly experience possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.8+ for its stability and extensive AI libraries&lt;/li&gt;
&lt;li&gt;FastAPI for a lightning-fast, modern backend that can handle heavy processing&lt;/li&gt;
&lt;li&gt;Gradio for building an intuitive interface that anyone can use&lt;/li&gt;
&lt;li&gt;CUDA support for GPU acceleration, which I optimized for both performance and memory usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire system took several iterations to get right. I spent considerable time optimizing the model loading process, fine-tuning the memory management, and creating a seamless experience between the backend and frontend. &lt;/p&gt;

&lt;p&gt;I'm releasing all of this under the MIT license because I believe in open source and want others to build upon what I've created. Feel free to use it in your own projects - that's exactly why I built it!&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Ready to start creating? Head over to the &lt;a href="https://github.com/SikamikanikoBG/ImageGenerator" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; and follow the quick start guide. Don't forget to star the repo if you find it useful!&lt;/p&gt;

&lt;p&gt;Have questions or want to connect with other users? Join our &lt;a href="https://discord.gg/vYEjQvtcqU" rel="noopener noreferrer"&gt;Discord community&lt;/a&gt; - we'd love to see what you create!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Looking for feedback on the next article about home lab setups - AI, automation, or ML?</title>
      <dc:creator>Arsen Apostolov</dc:creator>
      <pubDate>Fri, 03 Jan 2025 07:27:31 +0000</pubDate>
      <link>https://dev.to/sikamikanikobg/looking-for-feedback-on-the-next-article-about-home-lab-setups-ai-automation-or-ml-1kkd</link>
      <guid>https://dev.to/sikamikanikobg/looking-for-feedback-on-the-next-article-about-home-lab-setups-ai-automation-or-ml-1kkd</guid>
      <description>&lt;p&gt;Last year, I wrote this comprehensive guide about transforming a regular home PC into a powerful learning environment using open-source tools like Linux, Anaconda, Apache Airflow, and more. The article continues to help newcomers and intermediate users get more value from their hardware.&lt;/p&gt;

&lt;p&gt;Check it out here: &lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://medium.com/@arsen.apostolov/use-your-pcs-potential-expand-your-knowledge-6788e63bf341" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;medium.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;I'm planning to write Part 2, and I'd love your input on which topic would be most valuable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Home GenAI Lab Setup&lt;/strong&gt;: Running LLMs locally, experimenting with different models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Home Automation System&lt;/strong&gt;: Smart home integration, custom automation solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Home ML Lab&lt;/strong&gt;: Setting up for machine learning experiments and model training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What would you like to learn about? Share your preference in the comments!&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Connect! 🤝
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;💼 Connect with me on &lt;a href="https://www.linkedin.com/in/arsenapostolov" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎮 Join our Random42 community on &lt;a href="https://discord.gg/vYEjQvtcqU" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; - AI news, Success stories, Use cases and Support for your project!&lt;/li&gt;
&lt;li&gt;📝 Follow my tech journey on &lt;a href="https://dev.to/sikamikanikobg"&gt;Dev.to&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>homelab</category>
      <category>automation</category>
      <category>learning</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
