<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vasyl</title>
    <description>The latest articles on DEV Community by Vasyl (@mrviduus).</description>
    <link>https://dev.to/mrviduus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F333461%2Fea3cc6b2-e942-4848-8606-30c345279779.jpg</url>
      <title>DEV Community: Vasyl</title>
      <link>https://dev.to/mrviduus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mrviduus"/>
    <language>en</language>
    <item>
      <title>I put Ollama on a 4 GB mobile GPU and got 2.5 — here's the VRAM math</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Wed, 13 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/mrviduus/i-put-ollama-on-a-4-gb-mobile-gpu-and-got-25-heres-the-vram-math-3mhk</link>
      <guid>https://dev.to/mrviduus/i-put-ollama-on-a-4-gb-mobile-gpu-and-got-25-heres-the-vram-math-3mhk</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📎 Companion piece to my earlier post: &lt;a href="https://dev.to/mrviduus/i-shipped-local-llm-features-two-months-ago-production-never-ran-them-once-41g7"&gt;I shipped local LLM features two months ago — production never ran them once&lt;/a&gt;. Same &lt;code&gt;gemma4:e2b&lt;/code&gt;, same box — this one is the &lt;strong&gt;GPU offload follow-up&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔬 TL;DR
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;2.5× faster, 10°C cooler — on a 4 GB laptop GPU that "shouldn't" fit the model.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CPU only&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GPU hybrid&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tokens / sec&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;17&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;39&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-call latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~5.5 s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2.0 s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU temp under burst&lt;/td&gt;
&lt;td&gt;hot&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−10 °C&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layers on GPU&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;35 / 36&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same prompt. Same model. Same hardware. The only thing that changed was whether Ollama was allowed to touch the card.&lt;/p&gt;

&lt;p&gt;Honest take: I was hoping for more. The math at the end of this post explains exactly why &lt;strong&gt;2.5× is the ceiling&lt;/strong&gt; on 4 GB of VRAM with Gemma 4, and what it would take to push higher.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Setup
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemma4:e2b&lt;/code&gt; (2 B effective params, ~7.2 GB on disk)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AMD Ryzen 5 4600H, 6 cores / 12 threads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NVIDIA GTX 1650 Ti Mobile, &lt;strong&gt;4 GB VRAM&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS / runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ubuntu + Docker, Ollama 0.23.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distractor + hint + explanation generator from my reader app — fixed across runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output budget&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~60 tokens per call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;num_gpu=0&lt;/code&gt; → CPU only · &lt;code&gt;num_gpu=999&lt;/code&gt; → let Ollama auto-split&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Warm-up&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One throwaway call per mode before the timed samples&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both modes ran &lt;strong&gt;after warm-up&lt;/strong&gt;, so the numbers reflect steady-state inference, not first-load cost. Each &lt;code&gt;/api/generate&lt;/code&gt; response came back as NDJSON, so I pulled &lt;code&gt;eval_count&lt;/code&gt;, &lt;code&gt;eval_duration&lt;/code&gt;, and &lt;code&gt;total_duration&lt;/code&gt; straight from the engine — no external timing noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Why I picked E2B
&lt;/h2&gt;

&lt;p&gt;Gemma 4 ships in three flavours — the small E2B/E4B family, a 31B Dense model, and a 26B MoE. The model that runs in this benchmark is the smallest of those, and that wasn't accidental.&lt;/p&gt;

&lt;p&gt;The work is a fire-and-forget enrichment step inside a vocabulary-save flow — distractors plus a hint plus a short explanation, all generated in one call. It has to feel synchronous on a save action, and it has to run on the same commodity laptop as the rest of the app. Anything bigger is the wrong tool.&lt;/p&gt;

&lt;p&gt;The 31B Dense doesn't fit. The 26B MoE would, but its VRAM patterns on a 4 GB card are punishing. E4B is the obvious step up in quality from E2B, but its size pushes total memory over the line where Ollama has to keep more on CPU — slower for the same job at the latency profile a save action needs. E2B at Q4 lands the quality where I need it for distractor generation while leaving headroom for the KV cache and everything else.&lt;/p&gt;

&lt;p&gt;The framing that matters here isn't "the biggest model I could fit" but "the smallest model that gave me the output I needed." On constrained hardware, that distinction is the whole game — and it's what made the GPU experiment below worth running at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;CPU only&lt;/th&gt;
&lt;th&gt;GPU hybrid (35/36 layers on GPU)&lt;/th&gt;
&lt;th&gt;Δ&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg output tokens / call&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;~same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Avg eval latency&lt;/strong&gt; (token gen only)&lt;/td&gt;
&lt;td&gt;3,506 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,411 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.49× faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Avg total latency&lt;/strong&gt; (prompt + gen)&lt;/td&gt;
&lt;td&gt;5,390 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2,174 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.48× faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tokens / sec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;39&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.29× faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;ollama ps&lt;/code&gt; during the GPU run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME          SIZE      PROCESSOR        CONTEXT   UNTIL
gemma4:e2b    7.8 GB    74%/26% CPU/GPU  4096      Forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;nvidia-smi&lt;/code&gt; during a generation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NVIDIA GTX 1650 Ti, used 1998 MiB, free 1909 MiB, util 32 %
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;&lt;code&gt;ollama ps&lt;/code&gt; lies to you.&lt;/strong&gt;&lt;br&gt;
That "74%/26% CPU/GPU" string is a memory split, &lt;strong&gt;not a layer split&lt;/strong&gt;. The Ollama server logs are the only place that tells you which layers actually moved. Mine showed &lt;code&gt;offloaded 35/36 layers to GPU&lt;/code&gt;. Almost the whole transformer — minus one layer that matters a lot. More on that in a second.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧠 Why 2.5× and not 10×
&lt;/h2&gt;

&lt;p&gt;The model has 36 transformer layers. Ollama put &lt;strong&gt;35 of them on the GPU&lt;/strong&gt;. The lone holdout is the &lt;strong&gt;output projection layer&lt;/strong&gt; — the one that maps the final hidden state back into Gemma's vocabulary.&lt;/p&gt;

&lt;p&gt;Gemma 4's vocab is enormous (~256k tokens). That output layer is dense, fat, and would happily swallow what's left of the 4 GB after the rest of the stack moves over. So Ollama leaves it on CPU.&lt;/p&gt;

&lt;p&gt;The consequence is brutal in the steady state:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Every single generated token has to round-trip through the CPU at the end.&lt;/strong&gt; GPU is fast for the 35 layers it owns, then the pipeline stalls on the one layer the GPU couldn't take. Average across thousands of tokens and the CPU side becomes the floor.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the whole story of 2.5× instead of 10×. Hybrid inference is gated by the slower of the two devices, and on this card the slower device is doing real work on every token.&lt;/p&gt;

&lt;p&gt;The takeaway worth bolding: &lt;strong&gt;if you only ever look at &lt;code&gt;ollama ps&lt;/code&gt;, you'll get the wrong picture of what your setup is doing.&lt;/strong&gt; The server load logs are the source of truth for which layers went where.&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 What 2.5× actually buys you
&lt;/h2&gt;

&lt;p&gt;In the app, a single save — distractors + hint + short explanation, ~60 output tokens — used to take &lt;strong&gt;5.5 s&lt;/strong&gt;. Now it's &lt;strong&gt;just over 2 s&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That moves the action from the &lt;em&gt;"is this hanging?"&lt;/em&gt; zone into the &lt;em&gt;"yeah, it's working"&lt;/em&gt; zone. That's the threshold that actually matters for a save action.&lt;/p&gt;

&lt;p&gt;Five saves in a row:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before:&lt;/strong&gt; ~30 seconds of full-tilt CPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After:&lt;/strong&gt; ~10 seconds, work split between CPU and GPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bonus:&lt;/strong&gt; peak CPU temperature during that burst dropped &lt;strong&gt;~10 °C&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On a thin laptop in a small room, that last number is the difference between a fan you hear and a fan you don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 What would push it higher
&lt;/h2&gt;

&lt;p&gt;Three options, in order of how willing I am to do them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Smaller quant on just the output layer.&lt;/strong&gt; If that layer fit in the remaining ~1.9 GB, the whole model would run on GPU and you'd see the 10× numbers other writeups quote. The cost is real quality loss on the output distribution — worth measuring on your own prompt set rather than assuming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A bigger GPU.&lt;/strong&gt; A 16 GB card holds the whole thing with room to spare. The point of this exercise was specifically &lt;em&gt;"what does a commodity laptop GPU do"&lt;/em&gt;, so a $500 desktop card isn't really in scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swap engines.&lt;/strong&gt; llama.cpp direct, vLLM, etc. Two seconds is already inside budget for the action this model powers. Optimising past "fast enough" is how you end up with three benchmarks and zero users.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🛠️ Reproducing this
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Pull the model&lt;/span&gt;
ollama pull gemma4:e2b

&lt;span class="c"&gt;# 2. Force CPU only&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/generate &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "gemma4:e2b",
  "prompt": "Give me 5 distractors for the word \"warehouse\".",
  "stream": false,
  "options": { "num_gpu": 0 }
}'&lt;/span&gt; | jq &lt;span class="s1"&gt;'{tokens: .eval_count, eval_ms: (.eval_duration/1e6), total_ms: (.total_duration/1e6)}'&lt;/span&gt;

&lt;span class="c"&gt;# 3. Let Ollama use the GPU&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/generate &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "gemma4:e2b",
  "prompt": "Give me 5 distractors for the word \"warehouse\".",
  "stream": false,
  "options": { "num_gpu": 999 }
}'&lt;/span&gt; | jq &lt;span class="s1"&gt;'{tokens: .eval_count, eval_ms: (.eval_duration/1e6), total_ms: (.total_duration/1e6)}'&lt;/span&gt;

&lt;span class="c"&gt;# 4. Check what actually landed where&lt;/span&gt;
docker logs ollama 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"offloaded|layers"&lt;/span&gt;
nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;name,memory.used,memory.free,utilization.gpu &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run each curl a handful of times to flush warm-up effects, then average &lt;code&gt;eval_ms&lt;/code&gt; and &lt;code&gt;total_ms&lt;/code&gt;. The interesting number is the &lt;strong&gt;ratio&lt;/strong&gt;, not the absolute timings — they'll vary with your CPU.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4 GB VRAM is enough to be useful&lt;/strong&gt;, even on a model that "should" need more. Just don't expect 10×.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid inference is gated by the slower device.&lt;/strong&gt; If one critical layer stays on CPU, that's your floor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust the load logs, not &lt;code&gt;ollama ps&lt;/code&gt;.&lt;/strong&gt; The pretty CPU/GPU percentage is a memory split, not a layer count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2.5× is the difference between a UX that feels broken and one that doesn't.&lt;/strong&gt; That's enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop optimising once you're inside budget.&lt;/strong&gt; "Fast enough" beats "fastest" every time.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;📖 Full write-up with all the load-log spelunking on my blog: &lt;a href="https://vasyl.blog/2026/05/12/i-put-ollama-on-a-4-gb-mobile-gpu-and-got-2-5x-heres-the-vram-math/" rel="noopener noreferrer"&gt;vasyl.blog — I put Ollama on a 4 GB mobile GPU and got 2.5×&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⭐ The reader app this powers is open-source (AGPL-3.0): &lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;github.com/mrviduus/textstack&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with &lt;code&gt;gemma4:e2b&lt;/code&gt; for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge&lt;/a&gt;. If you're entering too, drop a link in the comments — happy to read yours.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ollama</category>
    </item>
    <item>
      <title>I shipped local LLM features two months ago. Production never ran them once.</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Tue, 12 May 2026 11:23:14 +0000</pubDate>
      <link>https://dev.to/mrviduus/i-shipped-local-llm-features-two-months-ago-production-never-ran-them-once-41g7</link>
      <guid>https://dev.to/mrviduus/i-shipped-local-llm-features-two-months-ago-production-never-ran-them-once-41g7</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Two months ago I shipped local-LLM features in &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;TextStack&lt;/a&gt; — an open-source reader for developers who want to finish dense English technical books in their native language. Yesterday I noticed something strange about the production server's RAM. 3 GB used out of 30. The model that runs all those features should be ~13 GB resident.&lt;/p&gt;

&lt;p&gt;I SSH'd in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama list
NAME    ID    SIZE    MODIFIED
&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing. The Ollama container had been running for 60+ days without a single model pulled. Every distractor call had fired, hit the fallback path, and returned random vocabulary words. I never noticed because the failure mode is silent — the user sees distractors, just not LLM-generated ones.&lt;/p&gt;

&lt;p&gt;This is the post-mortem of that, plus the &lt;strong&gt;two model swaps&lt;/strong&gt; that finally got the features working: &lt;code&gt;qwen3:8b → gemma4:e4b&lt;/code&gt; on day one to bring local inference up at all, then &lt;code&gt;e4b → e2b&lt;/code&gt; once production load showed e4b couldn't keep up on CPU. &lt;strong&gt;Six production bugs surfaced along the way.&lt;/strong&gt; The article ends with a real 63,000-request load test on the e2b deploy: 100% success, p95 = 20.5 ms, total OpenAI cost = $0.002.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;&lt;strong&gt;TextStack&lt;/strong&gt;&lt;/a&gt; is an open-source (&lt;a href="https://github.com/mrviduus/textstack/blob/main/LICENSE" rel="noopener noreferrer"&gt;AGPL-3.0&lt;/a&gt;) reader for developers who keep abandoning English technical books like &lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt;. Tap any term → context-aware translation that knows the book's domain ("attention" in an ML chapter gets &lt;em&gt;увага (механізм у нейромережах)&lt;/em&gt;, not the everyday meaning). Words you save feed a capped weekly SRS queue.&lt;/p&gt;

&lt;p&gt;Local &lt;strong&gt;Gemma 4 e2b&lt;/strong&gt; generates the multiple-choice distractors, hints, native-language explanations, and book metadata enrichment — four jobs that previously needed paid OpenAI calls per user. OpenAI &lt;code&gt;gpt-5-mini&lt;/code&gt; stays for translation (multilingual quality matters) and for in-reader live explanations (latency-sensitive). Everything else runs on a single-CPU 30 GB-RAM VPS, no GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🌐 &lt;strong&gt;Live:&lt;/strong&gt; &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt; — sample chapters open without signup. Tap any word in &lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt;, then check the vocabulary review.&lt;/p&gt;

&lt;p&gt;🎬 &lt;strong&gt;37-second walkthrough — read → save word → MCQ with Gemma-generated distractors → answer feedback:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf51hhn85wyb4ge1io7q.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf51hhn85wyb4ge1io7q.gif" alt="TextStack vocabulary review demo: tap-translations in DDIA, save word to vocabulary, MCQ card with 4 Gemma-generated distractors, red/green answer feedback" width="720" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📸 &lt;strong&gt;Single MCQ card — "___ the data from these external systems..." with 4 Gemma-generated distractors (battle / bringing / storm / courage):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw82vsnspx9gt45n9dz05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw82vsnspx9gt45n9dz05.png" alt="Vocabulary multiple-choice card with cloze sentence from DDIA and 4 Gemma-generated distractor options" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note for judges:&lt;/strong&gt; Sample chapters are unauthenticated; the vocabulary review needs a free account because progress and SRS state are per-user. Use any throwaway email — there's no email verification gate on read.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;📦 &lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;github.com/mrviduus/textstack&lt;/a&gt; — AGPL-3.0, 200+ merged PRs, deployed at &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⭐ &lt;strong&gt;&lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;Star the repo on GitHub&lt;/a&gt;&lt;/strong&gt; — every star tells me one more developer wants to finish DDIA without giving up&lt;/p&gt;

&lt;p&gt;📐 &lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend: ASP.NET Core 10 (clean architecture: Domain / Application / Infrastructure / Api / Worker)&lt;/li&gt;
&lt;li&gt;Database: PostgreSQL 16 with FTS for in-book search&lt;/li&gt;
&lt;li&gt;Frontend: React 19 + Vite, React Native 0.83 (Expo) for mobile&lt;/li&gt;
&lt;li&gt;LLM: Ollama running &lt;code&gt;gemma4:e2b&lt;/code&gt; for local jobs, OpenAI &lt;code&gt;gpt-5-mini&lt;/code&gt; for translation&lt;/li&gt;
&lt;li&gt;Deployment: docker-compose, Cloudflare Tunnel, single VPS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔧 &lt;strong&gt;Key commits behind the story:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/pull/232" rel="noopener noreferrer"&gt;PR #232&lt;/a&gt; — original swap &lt;code&gt;qwen3:8b&lt;/code&gt; → &lt;code&gt;gemma4:e4b&lt;/code&gt;, image pin, memory bump&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/commit/3999944" rel="noopener noreferrer"&gt;&lt;code&gt;3999944&lt;/code&gt;&lt;/a&gt; — worker &lt;code&gt;Connection refused&lt;/code&gt; fix + the real timeout bump (30s → 90s after measurement)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/commit/966b398" rel="noopener noreferrer"&gt;&lt;code&gt;966b398&lt;/code&gt;&lt;/a&gt; — the second model swap, &lt;code&gt;e4b&lt;/code&gt; → &lt;code&gt;e2b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/commit/c6db540" rel="noopener noreferrer"&gt;&lt;code&gt;c6db540&lt;/code&gt;&lt;/a&gt; — 63,000-request load test + full LoadSurge report&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full PR/commit history for the swap arc lives in &lt;a href="https://github.com/mrviduus/textstack/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;&lt;code&gt;CHANGELOG.md&lt;/code&gt; under &lt;code&gt;[Unreleased]&lt;/code&gt;&lt;/a&gt;. The Gemma-using code lives in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/blob/main/backend/src/Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs" rel="noopener noreferrer"&gt;&lt;code&gt;backend/src/Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs&lt;/code&gt;&lt;/a&gt; — prompt template, parser, fallback cascade&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mrviduus/textstack/blob/main/backend/src/Worker/Services/BookMetadataGenerator.cs" rel="noopener noreferrer"&gt;&lt;code&gt;backend/src/Worker/Services/BookMetadataGenerator.cs&lt;/code&gt;&lt;/a&gt; — fire-and-forget metadata enrichment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The model selection went through two rounds.&lt;/strong&gt; Gemma 4 ships in four sizes. The first time I built a trade-off table, I picked the wrong one — for understandable reasons. The second time I had production data and picked correctly. Both decisions live in the same article.&lt;/p&gt;

&lt;p&gt;Here's the matrix at the time of the first pick (E4B, day-one swap):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Disk&lt;/th&gt;
&lt;th&gt;RAM resident&lt;/th&gt;
&lt;th&gt;Fits on my VPS?&lt;/th&gt;
&lt;th&gt;First-pick reasoning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;E2B&lt;/strong&gt; (2B effective)&lt;/td&gt;
&lt;td&gt;7.2 GB&lt;/td&gt;
&lt;td&gt;~5 GiB&lt;/td&gt;
&lt;td&gt;✅ trivially&lt;/td&gt;
&lt;td&gt;"Too small for nuanced technical-vocab distractors" — &lt;em&gt;I'd find out this was wrong&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;E4B&lt;/strong&gt; (4B effective)&lt;/td&gt;
&lt;td&gt;9.6 GB&lt;/td&gt;
&lt;td&gt;13 GiB&lt;/td&gt;
&lt;td&gt;✅ with cgroup bump 4G → 12G&lt;/td&gt;
&lt;td&gt;"Sweet spot — strong enough on quality, fits the VPS" — &lt;em&gt;picked first&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;31B Dense&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~18 GB&lt;/td&gt;
&lt;td&gt;~24 GiB&lt;/td&gt;
&lt;td&gt;⚠️ tight, no headroom for Postgres + .NET&lt;/td&gt;
&lt;td&gt;"Overkill, no room for the rest of the stack"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;26B MoE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~15 GB&lt;/td&gt;
&lt;td&gt;~20 GiB&lt;/td&gt;
&lt;td&gt;⚠️ same constraint&lt;/td&gt;
&lt;td&gt;"MoE doesn't help short prompts here"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 31B and 26B MoE models would need either a GPU box or a much bigger VPS, neither of which fits an open-source project that has to remain deployable on a $20/month consumer host. So the real choice was between E2B and E4B. I went with E4B. I was wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Gemma 4 unlocked vs the cloud alternative.&lt;/strong&gt; Pre-swap, every distractor generation was a ~5¢ OpenAI call per word saved per user. With ~50 saved words per active reader per book, that's $2.50/book/user — fine for me running the only instance, fatal the moment someone else self-hosts it. Local Gemma 4 makes the marginal cost per distractor ~0 (just CPU on a box already running). Same for hints, explanations, and book metadata enrichment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local inference changed the economics of the feature completely.&lt;/strong&gt; That's the real reason the swap mattered — not the model quality, the cost shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surfaced when I actually flipped it on
&lt;/h2&gt;

&lt;p&gt;The bug story isn't decoration — it's how I learned what each Gemma 4 quirk does in production. &lt;strong&gt;Six lessons.&lt;/strong&gt; The first four came from getting e4b to run at all. The last two came from staring at the production stats after it was "running".&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 1: floating image tags lie
&lt;/h3&gt;

&lt;p&gt;Original &lt;code&gt;docker-compose.yml&lt;/code&gt; had:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama&lt;/span&gt;   &lt;span class="c1"&gt;# no version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker pulled &lt;code&gt;latest&lt;/code&gt; two months ago and cached it. &lt;code&gt;latest&lt;/code&gt; at that moment was 0.22.x. Gemma 4 wasn't released yet, so the binary doesn't recognize the model family. From the host's perspective, the "local Ollama" IS the latest version — &lt;code&gt;docker image ls&lt;/code&gt; shows the cached SHA, not whether upstream has moved.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- image: ollama/ollama
&lt;/span&gt;&lt;span class="gi"&gt;+ image: ollama/ollama:0.23.1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull succeeded after pinning. 9.6 GB on disk for e4b.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 2: cgroup limits were a guess from the qwen3 era
&lt;/h3&gt;

&lt;p&gt;The container memory cap (4 GB) had been sized for &lt;code&gt;qwen3:8b&lt;/code&gt; and never re-evaluated. Gemma 4 e4b weights need 9.8 GiB. Inference returned &lt;code&gt;model requires more system memory (9.8 GiB) than is available&lt;/code&gt; until I bumped the limit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;  deploy:
    resources:
      limits:
&lt;span class="gd"&gt;-       memory: 4G
&lt;/span&gt;&lt;span class="gi"&gt;+       memory: 12G
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lesson: every model swap should also re-evaluate the container resource block. Picked-once-and-forgotten limits are a category of silent drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 3: cold load and warm latency both blew past my API timeout
&lt;/h3&gt;

&lt;p&gt;First inference call hung ~60s before the first token. Default Ollama &lt;code&gt;keep_alive&lt;/code&gt; is 5 minutes — after that the model unloads and the next cold call burns 60s again. Fix: &lt;code&gt;OLLAMA_KEEP_ALIVE=-1&lt;/code&gt;, plus bump the API timeout from 10s → 30s.&lt;/p&gt;

&lt;p&gt;I shipped it. Then watched production: &lt;strong&gt;2 distractor generations out of 13 saved words succeeded.&lt;/strong&gt; The model was resident the entire time. Every miss was a wall-clock timeout. E4B on CPU just takes more than 30 seconds for many prompts.&lt;/p&gt;

&lt;p&gt;So 30s wasn't enough either:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- "TimeoutSeconds": 30
&lt;/span&gt;&lt;span class="gi"&gt;+ "TimeoutSeconds": 90
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Success rate climbed to ~100%. &lt;strong&gt;For CPU-only Gemma 4 on a 6-core consumer VPS, your timeout has to absorb 60–90 s tail latency, not 10 s.&lt;/strong&gt; That gap between toy-benchmark numbers and production reality is where most local-LLM ship-and-forget bugs live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 4: the parser silently dropped half my output
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;DistractorGenerator&lt;/code&gt;'s prompt asks for 5 wrong-answer words. Smoke test for &lt;code&gt;linearizability&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;consistency, atomicity, serialization, concurrency, visibility
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five single-word distractors. Clean. Then I tried &lt;code&gt;eventual consistency&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;strong consistency, read-after-write, data loss, causality, serialization
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now look at the parser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsLetter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;originalWord&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StringComparison&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OrdinalIgnoreCase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;      &lt;span class="c1"&gt;// ← drops "strong consistency", "data loss"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filter rejects multi-word entries. Three of the five gone. With the &lt;code&gt;distractors.Count &amp;gt;= 3&lt;/code&gt; requirement, the call returned &lt;code&gt;null&lt;/code&gt; and the fire-and-forget path fell back to the hardcoded random-word picker.&lt;/p&gt;

&lt;p&gt;The filter was there since the original implementation. &lt;strong&gt;qwen3 outputs single tokens by default, so the constraint was hidden. Gemma 4 prefers phrasal answers&lt;/strong&gt; — it's the most cross-model-family-sensitive parsing surface you'll hit when swapping. The fix was a single line in the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- SINGLE WORD ONLY — no spaces, no multi-word phrases
  (use "linearizability" not "strong consistency"). Hyphens are fine.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After all four fixes, a real production save of &lt;code&gt;warehouse&lt;/code&gt; returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"storeroom"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"depot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"facility"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"silo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"loft"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five domain-adjacent single-word distractors, exactly the shape the prompt asks for. That's the moment local Gemma 4 was finally doing real work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 5: the worker had been silently failing for two months
&lt;/h3&gt;

&lt;p&gt;While collecting production stats for &lt;em&gt;this article&lt;/em&gt;, I grepped the worker logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker compose logs worker | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Connection refused"&lt;/span&gt;
... lots of lines ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;docker-compose.yml&lt;/code&gt; had set &lt;code&gt;Ollama__BaseUrl&lt;/code&gt; on the &lt;code&gt;api&lt;/code&gt; service but &lt;strong&gt;not on the &lt;code&gt;worker&lt;/code&gt; service&lt;/strong&gt;. The worker fell back to the default (&lt;code&gt;localhost:11434&lt;/code&gt; inside the worker container — there is nothing there) and every &lt;code&gt;BookMetadataGenerator&lt;/code&gt; call hit &lt;code&gt;Connection refused&lt;/code&gt; silently. Every user-uploaded book ended up with &lt;code&gt;genre = NULL&lt;/code&gt;, which in turn meant the domain-aware translation prompt had nothing to bias against.&lt;/p&gt;

&lt;p&gt;This was a &lt;em&gt;second&lt;/em&gt; silent fallback, completely orthogonal to the original one. Same shape, different surface. Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;  worker:
    environment:
&lt;span class="gi"&gt;+     Ollama__BaseUrl: http://ollama:11434
+     Ollama__Model: gemma4:e2b
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus a one-shot &lt;code&gt;MetadataBackfillWorker&lt;/code&gt; (a small &lt;code&gt;BackgroundService&lt;/code&gt; that runs on worker startup) to heal the ~10 user-uploaded books with &lt;code&gt;genre = NULL&lt;/code&gt;, idempotently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern is the lesson.&lt;/strong&gt; Anywhere you distribute environment via a compose file, ask: which services &lt;em&gt;actually need this variable&lt;/em&gt; and is the variable set on each of them? "Inherits from .env" is not a thing in docker-compose service blocks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 6: turn off thinking mode for structured outputs
&lt;/h3&gt;

&lt;p&gt;Modern Ollama models (including Gemma 4) default to a chain-of-thought "thinking" pass before the final answer. For freeform reasoning that's a quality win. For my use case — output a 5-element list of single words — the thinking pass is pure overhead. Every request was generating 50–200 tokens of internal reasoning the parser then threw away.&lt;/p&gt;

&lt;p&gt;In the Ollama call options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- options: { "temperature": 0.7 }
&lt;/span&gt;&lt;span class="gi"&gt;+ options: { "temperature": 0.7, "think": false }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Roughly halved the per-request token output. Roughly halved end-to-end latency. The quality of the distractors did not drop in my testing — for "give me 5 plausible wrong-answer words for &lt;code&gt;warehouse&lt;/code&gt;", chain-of-thought wasn't doing anything load-bearing.&lt;/p&gt;

&lt;p&gt;If you're using Ollama for structured outputs, this is the single biggest perf knob most people don't know about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second swap: e4b → e2b
&lt;/h2&gt;

&lt;p&gt;After all six lessons above, distractor calls were succeeding at ~100%. But end-to-end save latency was still tail-heavy. Looking at the numbers honestly: most calls landed in the 30–60 s range, and the 90 s timeout was absorbing what should have been a comfortable fit.&lt;/p&gt;

&lt;p&gt;Two things were happening at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;E4B's 13 GiB resident was contesting RAM with Postgres + .NET&lt;/strong&gt; on a 30 GB box. Not OOM-level, but the working set wasn't always in cache.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Even with &lt;code&gt;think=false&lt;/code&gt;, e4b is genuinely slow on a 6-core CPU.&lt;/strong&gt; I'd been benchmarking on a warm cache and short prompts; longer prompts (explanations, multi-sentence hints) routinely hit 60 s+.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I swapped to &lt;strong&gt;e2b&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;e4b (after all fixes)&lt;/th&gt;
&lt;th&gt;e2b (current prod)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Disk&lt;/td&gt;
&lt;td&gt;9.6 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.2 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM resident with &lt;code&gt;KEEP_ALIVE=-1&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;13 GiB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference speed on same CPU&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2–3× faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality on single-word distractor task&lt;/td&gt;
&lt;td&gt;reference&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;comparable&lt;/strong&gt; for short structured outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first-pick reasoning ("E2B's quality is too weak for technical vocabulary") had been based on a &lt;em&gt;quality&lt;/em&gt; benchmark. The real production constraint turned out to be &lt;em&gt;latency&lt;/em&gt;. &lt;strong&gt;For short structured outputs — distractor lists, single-line hints — e2b is fast enough that quality differences disappear into the prompt template&lt;/strong&gt;. The prompt was doing more work than I'd given it credit for.&lt;/p&gt;

&lt;p&gt;For longer freeform outputs (the 2–3 sentence native-language explanation), e2b is measurably less polished. Acceptable for the use case (it's a study aid, not a translation). If a future task demands better explanation quality, the path is a fine-tune of e2b on TextStack's domain corpus, not jumping back to e4b. Same hardware envelope, better domain fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers (real, post-e2b)
&lt;/h2&gt;

&lt;p&gt;The numbers below are measured on the production server: AMD Ryzen 5 4600H, 6 cores / 12 threads, 30 GiB RAM, no GPU. Same box that serves traffic to &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disk (&lt;code&gt;gemma4:e2b&lt;/code&gt;)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7.2 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RAM resident&lt;/strong&gt; with &lt;code&gt;KEEP_ALIVE=-1&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;7.7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Cold load&lt;/strong&gt; (container restart)&lt;/td&gt;
&lt;td&gt;~10 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Distractor cost per word&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~0¢ (CPU on existing box)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Equivalent OpenAI cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5¢ per word at gpt-5-mini rates&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Load test: 63,000 requests, 100% success, $0.002
&lt;/h3&gt;

&lt;p&gt;After the e2b swap I stress-tested the production deploy with &lt;a href="https://github.com/mrviduus/textstack/tree/main/tests/TextStack.LoadTests" rel="noopener noreferrer"&gt;LoadSurge&lt;/a&gt;. Three scenarios — &lt;code&gt;GET /health&lt;/code&gt;, &lt;code&gt;POST /translate&lt;/code&gt;, &lt;code&gt;POST /explain&lt;/code&gt; — at 30–50 virtual users for 30–60 seconds each. Headlines:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total requests&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;63,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Success rate&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;100%&lt;/strong&gt; (0 failures)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worst-case p95 latency&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;20.5 ms&lt;/strong&gt; (smoke; translate and explain were lower)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sustained RPS at 50 VU&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;500&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI cost during the run&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0.002&lt;/strong&gt; (10 cache-prewarm calls; zero during the stress phase)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peak temperature on the host&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;42 °C&lt;/strong&gt; (throttle threshold 95 °C)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The interesting part isn't the throughput — 500 RPS on a $20 box is real but not surprising for cached HTTP. The interesting part is that the expensive path disappeared entirely behind the cache. Translate and Explain are keyed by &lt;code&gt;(input, target_language, genre, sentence)&lt;/code&gt;; on a hot cache the LLM never enters the request lifecycle.&lt;/p&gt;

&lt;p&gt;The auth-gated &lt;code&gt;POST /me/vocabulary/words&lt;/code&gt; path that triggers actual Gemma 4 distractor generation wasn't covered by this run — that's the next test, with test-auth tokens and a bounded-concurrency queue in front of Ollama. The full per-scenario breakdown is in &lt;a href="https://github.com/mrviduus/textstack/blob/main/docs/loadtest/run-20260511-103451/REPORT.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/loadtest/run-20260511-103451/REPORT.md&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where OpenAI stays
&lt;/h2&gt;

&lt;p&gt;The split after both swaps:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary distractors&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Local Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tolerable quality, fire-and-forget, no per-user cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Word hints&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Local Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native-language explanations&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Local Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same; acceptable on long-form quality given the use case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Book metadata enrichment&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Local Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation (18+ langs, incl. Ukrainian)&lt;/td&gt;
&lt;td&gt;OpenAI gpt-5-mini&lt;/td&gt;
&lt;td&gt;Small-model multilingual translation is still a weak spot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;In-reader term explanation (live)&lt;/td&gt;
&lt;td&gt;OpenAI gpt-5-mini&lt;/td&gt;
&lt;td&gt;&amp;lt;1 s latency requirement during reading&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Local LLMs aren't a wholesale cloud replacement. &lt;strong&gt;They're a tool for tasks where quality is tolerant, latency is amortizable, privacy matters, or per-user cost matters.&lt;/strong&gt; When any of those breaks down — multilingual translation, latency-sensitive UI — cloud still wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons (for anyone shipping local LLMs)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Silent fallback is the worst kind of bug.&lt;/strong&gt; Distractor generation had been failing in production for 60+ days and I had no signal — the fallback was a hardcoded random-word picker, indistinguishable to the user. &lt;strong&gt;And it happened twice in the same system, on two different surfaces&lt;/strong&gt; (Ollama-not-installed, then Worker-can't-reach-Ollama). Next time: emit &lt;code&gt;llm.success&lt;/code&gt; and &lt;code&gt;llm.fallback&lt;/code&gt; counters per service, alert if the ratio drifts above 5%, and never make fallbacks bit-for-bit indistinguishable from the primary path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Floating image tags lie.&lt;/strong&gt; Pin Ollama, pin Postgres, pin everything. &lt;code&gt;latest&lt;/code&gt; freezes the day Docker pulls it; two months later it's lagging upstream and you have no signal until a new model breaks it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defend at parse, always — even if your model behaved on first try.&lt;/strong&gt; Same prompt — qwen3 returns single tokens, Gemma 4 returns phrases. The parser's pre-existing &lt;code&gt;!w.Contains(' ')&lt;/code&gt; filter was correct in spirit but hidden from the model. Moved into the prompt, it became explicit and Gemma satisfied it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bench with real prompts on real hardware.&lt;/strong&gt; I tested e4b's quality on warm-cache short prompts and concluded it was the right pick. Real production tail latency on longer prompts was 3× what the smoke test suggested, and that's what forced the e2b downgrade. Toy benchmarks hide both model-family quirks (parsing) and hardware-bound failure modes (CPU latency).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turn off thinking mode for structured outputs.&lt;/strong&gt; &lt;code&gt;think: false&lt;/code&gt; is the single biggest perf knob on Ollama for short structured tasks. Most documentation doesn't surface it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distribute env vars deliberately across services.&lt;/strong&gt; Docker-compose service blocks don't inherit from each other. Whichever service actually needs a variable — list it explicitly in &lt;em&gt;that service's&lt;/em&gt; env block. The day you add a new service, audit every variable.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;The interesting part wasn't that the model failed. It was how long the system kept pretending it hadn't.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fine-tune Gemma 4 e2b on TextStack's distractor task.&lt;/strong&gt; I now have a real production corpus building (a few hundred (term, distractor-list) pairs per week post-fix). The corpus that existed before the fix is gone — every distractor it produced came from the hardcoded fallback, not the model. The dataset starts fresh.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a bounded-concurrency queue in front of Ollama for the write path.&lt;/strong&gt; From the load test recommendations: a &lt;code&gt;Channels&lt;/code&gt;-based worker with &lt;code&gt;MaxConcurrency = 2&lt;/code&gt; plus a per-&lt;code&gt;(word, language)&lt;/code&gt; shared cache. Mirrors the translate/explain caches that just held 500 RPS with zero LLM cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run a second load test against the auth-gated write path.&lt;/strong&gt; The 63k-request test only measured cached reads. Distractor generation is the actual bottleneck, and it sits behind authentication. Need test-auth tokens and 10–20 VU to bound it.&lt;/p&gt;

&lt;p&gt;The full TextStack codebase is AGPL-3.0 at &lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;github.com/mrviduus/textstack&lt;/a&gt;. If you've shipped local-LLM features in production, &lt;strong&gt;run &lt;code&gt;ollama list&lt;/code&gt; on your server, then &lt;code&gt;docker compose logs worker | grep -i refused&lt;/code&gt;&lt;/strong&gt;. One of those might surprise you. Mine surprised me twice in the same codebase — same shape, different surface, two months apart. That's the part of operating local LLMs that nobody writes about, and the part that takes the longest to learn.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful, the strongest signal is a star on the &lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. Every star tells me the next person abandoning DDIA mid-way might find this tool — and that's the whole point.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ollama</category>
    </item>
    <item>
      <title>Open-source licenses 101: which one to actually pick</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Thu, 07 May 2026 17:24:43 +0000</pubDate>
      <link>https://dev.to/mrviduus/open-source-licenses-101-which-one-to-actually-pick-232f</link>
      <guid>https://dev.to/mrviduus/open-source-licenses-101-which-one-to-actually-pick-232f</guid>
      <description>&lt;p&gt;Sooner or later, every developer runs into The License Question. You shipped something to GitHub, GitHub asked you to pick a license, and you scrolled the dropdown — MIT, Apache, GPL, AGPL, BUSL, MPL, ISC, Unlicense, "Other" — and picked whatever sounded least scary. That's how I did it. That's also how I ended up rewriting my LICENSE file three weeks later.&lt;/p&gt;

&lt;p&gt;Licenses are a dark forest for devs. We don't read legal docs, nothing in our day-to-day teaches us when each one matters, and most online advice is either a wall of legalese or someone's religious argument. Here's the version I wish someone had given me: a tour of the five licenses you'll actually meet, the mistakes that bite, and what changing my license did to my project's discoverability in the real world.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a license actually does
&lt;/h2&gt;

&lt;p&gt;By default, your code is "all rights reserved." That sounds like the default-est thing possible — but it means &lt;em&gt;no one&lt;/em&gt; can legally copy, fork, run, or modify your code without your written permission. Sticking your project on a public GitHub repo doesn't change that. A license is the contract you write with the world that relaxes the default.&lt;/p&gt;

&lt;p&gt;The question you're answering when you pick one: &lt;em&gt;how much can people do with this, and what do you get back?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The five you'll actually meet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MIT.&lt;/strong&gt; "Use my code. Just keep my name in the file. Don't sue me." Three paragraphs long. Maximum adoption, zero protection. Most of the JavaScript ecosystem runs on MIT, and most of those projects don't have a monetization plan, which is exactly why it works for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apache 2.0.&lt;/strong&gt; Like MIT, but explicitly grants patent rights from contributors to users. That sounds boring until you realize half the tech world is built on patented stuff and silently assumes nobody will sue. Apache is the grown-up version of MIT — same vibe, fewer landmines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPL-3.0.&lt;/strong&gt; "Modify and distribute my code? Your modifications are also GPL." This is &lt;em&gt;copyleft&lt;/em&gt;. It infects everything downstream, which is why corporate lawyers hate it and Linux thrives on it (the kernel is GPL-2). Companies can't quietly fold GPL code into their proprietary stack — the license would force the whole stack open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AGPL-3.0.&lt;/strong&gt; GPL with a single, brutal addition: §13. If you modify the code and run it as a network service — a SaaS, a hosted dashboard, anything users hit over the network — you have to publish your modifications. This closes the loophole that GPL leaves open, where a company can fork, modify privately, and host the modified version. AGPL says: nope, your fork has to be public the moment users touch it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BUSL-1.1.&lt;/strong&gt; Not actually open source by the OSI's definition — it's "source-available." You can read the code, fork it, run it for yourself; you can't sell it as a hosted commercial service competing with the original author. After four years it auto-converts to a real OSI license (usually Apache). Sentry, MariaDB, CockroachDB — all BUSL. It's a defensive license aimed at the "AWS forks our project and undercuts us on hosting" scenario.&lt;/p&gt;

&lt;p&gt;(There's also MPL-2.0 — file-level copyleft, used by Firefox. A reasonable middle ground if MIT feels too loose and AGPL too aggressive. Not your most-likely first encounter, so I'm leaving it as a footnote.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistakes I see all the time
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Picking MIT for a thing you might monetize.&lt;/strong&gt; The most expensive mistake. MIT lets a competitor fork your work, polish it, host it, and out-market you — with zero recourse. Fine for a library nobody wants to commercialize. Bad for a product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copying BUSL because Sentry uses BUSL.&lt;/strong&gt; Different threat models. Sentry has hyperscaler-competition risk; you have nobody-knows-you-exist risk. BUSL solves a problem you don't have, while costing you contributor goodwill, awesome-list eligibility, and brand clarity. I learned this one personally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slapping GPL or AGPL on a library.&lt;/strong&gt; Copyleft on a library is contagious — anything that links to it inherits your license. Devs see it and walk away because they can't safely use your code in their proprietary or differently-licensed project. Libraries should almost always be MIT or Apache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No license at all.&lt;/strong&gt; The silent killer. "All rights reserved" is the default, so a public repo with no LICENSE file is technically a public repo nobody can legally use. You're sending the message: &lt;em&gt;here's my code, but also nobody can touch it.&lt;/em&gt; If you want adoption, ship a license.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Picking the most "open" license to look generous.&lt;/strong&gt; MIT looks generous. It's also the easiest license to regret. The right question isn't "how open should I look" — it's &lt;em&gt;"what business model do I want to keep available?"&lt;/em&gt; Be honest with yourself before you optimize for image.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changing the license actually changed
&lt;/h2&gt;

&lt;p&gt;I shipped &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;TextStack&lt;/a&gt; — a reading tool I'm building solo — under BUSL-1.1. My reasoning was the same one MariaDB and Sentry articulated: &lt;em&gt;protect against AWS-style cloning before it happens.&lt;/em&gt; Sounded smart. Felt smart. Wasn't.&lt;/p&gt;

&lt;p&gt;The first sign was awesome-selfhosted. I went to add my project to the most-trafficked self-hosted directory on GitHub, opened the contributing guide, and saw a rule I hadn't expected: OSI-approved licenses only. BUSL doesn't qualify. The same pattern showed up across every awesome-* list I checked — awesome-react-native, awesome-dotnet-applications, awesome-llm-apps. Most either explicitly require an OSI-approved license or implicitly do. The world of curated, high-traffic developer discovery is gated by the OSI definition, and BUSL sits on the wrong side of the gate.&lt;/p&gt;

&lt;p&gt;Then the second-order effects started showing up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On GitHub Topics&lt;/strong&gt;, the license filter is how a lot of devs browse for tools — &lt;code&gt;license:agpl-3.0&lt;/code&gt; has its own discovery surface, &lt;code&gt;license:other&lt;/code&gt; is essentially invisible. Switching from BUSL to AGPL moved my repo from one bucket to the other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On the README itself&lt;/strong&gt;, the license badge is the first thing a potential contributor reads. "BUSL-1.1" makes most devs hesitate — &lt;em&gt;what is this, can I actually contribute?&lt;/em&gt; "AGPL-3.0" is recognized instantly. For a portfolio project where you want stars, forks, contributors, and word-of-mouth, that hesitation is the whole game.&lt;/p&gt;

&lt;p&gt;And here's the kicker: AGPL didn't even cost me the protection I was after. The §13 network-copyleft clause makes most cloud-cloning impractical — the moment a competitor publishes a hosted fork, their differentiator is public. I kept the defensive moat; I shed the friction. On top of that, AGPL leaves dual-licensing on the table — the same playbook that funds Plausible, PostHog, and Cal.com (AGPL for the community, paid commercial license for clients who can't comply with §13). With BUSL, that revenue path was already pre-closed; BUSL &lt;em&gt;is&lt;/em&gt; the commercial-restricted license, there's nothing to upgrade away from.&lt;/p&gt;

&lt;p&gt;The lesson, if you're building a portfolio project, is uncomfortable: &lt;strong&gt;license choice is a discoverability decision, not just a legal one.&lt;/strong&gt; Awesome lists, GitHub Topics, contributor pipelines — all gated by the OSI definition. Pick the one that opens doors, not closes them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to actually decide
&lt;/h2&gt;

&lt;p&gt;Three questions, in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Library or product?&lt;/strong&gt; Library → Apache 2.0. Product → keep reading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Will you monetize someday?&lt;/strong&gt; Yes → AGPL-3.0. No → MIT or Apache.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are your future customers mostly enterprises with strict no-AGPL policies?&lt;/strong&gt; Some big companies (Google, famously) ban AGPL internally. If your TAM is enterprise, lean Apache.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For most solo-dev side projects: &lt;strong&gt;AGPL-3.0&lt;/strong&gt;. It's real open source, qualifies for awesome-list submissions, attracts contributors, and keeps the dual-licensing door open if you ever decide to monetize. That's the honest default.&lt;/p&gt;

&lt;p&gt;I picked BUSL-1.1 first, switched to AGPL-3.0 two weeks later, and watched the discovery dynamics flip on the same week. The shorter version of this whole post: pick AGPL, save yourself the relicensing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://vasyl.blog/2026/05/06/open-source-licenses-101/" rel="noopener noreferrer"&gt;vasyl.blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>beginners</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Quit Designing Data-Intensive Applications (DDIA) Three Times. Here's What I Build on the Fourth Try.</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Wed, 22 Apr 2026 05:14:01 +0000</pubDate>
      <link>https://dev.to/mrviduus/i-quit-designing-data-intensive-applications-ddia-three-times-heres-what-i-build-on-the-fourth-5bom</link>
      <guid>https://dev.to/mrviduus/i-quit-designing-data-intensive-applications-ddia-three-times-heres-what-i-build-on-the-fourth-5bom</guid>
      <description>&lt;p&gt;In 2023 I bought DDIA on Kindle. Opened the replication chapter. Quit after 40 pages and didn't open it for six months.&lt;/p&gt;

&lt;p&gt;In 2024 I bought it again, because the book is clearly worth finishing. Got to page 80. Closed it.&lt;/p&gt;

&lt;p&gt;In 2025 I tried a third time with ChatGPT open in another tab to explain the hard terms. It got easier. But every lookup was the same loop — alt-tab, paste the sentence, wait, come back, find my place. After three chapters I wasn't really reading the book anymore. I was reading my own habit of switching tabs.&lt;/p&gt;

&lt;p&gt;The book still sits in my Kindle library, marked unfinished. If you have a book like that on your shelf, this post is for you. I finally figured out why I kept quitting, and built a tool that fixes it for me. Maybe it fixes it for you too.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was actually breaking
&lt;/h2&gt;

&lt;p&gt;When I quit for the third time, I sat down and tried to be honest about what was stopping me.&lt;/p&gt;

&lt;p&gt;It wasn't that the book was too hard. I understood most of what was on the page. The problem was the rest — the unfamiliar terms.&lt;/p&gt;

&lt;p&gt;Every unknown term forced a decision between two bad options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option one: stop and look it up.&lt;/strong&gt; Alt-tab, paste the sentence, wait, come back, find my place. Flow broken. The next paragraph is harder to hold in your head.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option two: skip it and hope context saves me.&lt;/strong&gt; Sometimes it does. But after a dozen skips in a chapter, the quality of my reading drops noticeably. And each "I'll figure it out later" turns into debt.&lt;/p&gt;

&lt;p&gt;The exhaustion wasn't coming from reading. It was coming from the constant small decisions.&lt;/p&gt;

&lt;p&gt;There was a third problem too. Even when I did look something up, a week later I'd forgotten it. ChatGPT doesn't remember you asked. Anki remembers, but making cards by hand is its own pile of friction. I was learning words in order to forget them. And reading books in order to quit them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I got wrong about AI and reading
&lt;/h2&gt;

&lt;p&gt;When ChatGPT arrived, a lot of people thought long books were dead. Why read 600 pages of DDIA when you can ask and get a summary in a minute?&lt;/p&gt;

&lt;p&gt;I believed that for about a year.&lt;/p&gt;

&lt;p&gt;Then I sat in a 2025 interview being asked about replication strategies in distributed systems, and realized I couldn't explain the difference between synchronous and asynchronous replication past surface-level buzzwords. I'd read dozens of summaries, listened to podcasts, watched YouTube breakdowns. I knew things on the surface. I didn't understand any of them deeply.&lt;/p&gt;

&lt;p&gt;For staying current, summaries are fine. For real understanding, nothing replaces sitting with a book that someone spent years structuring. Those are exactly the books I kept quitting around page 40.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;In January 2026 I started building what became TextStack — a reader where I could read technical books without the tab switching.&lt;/p&gt;

&lt;p&gt;The idea is simple. Tap a word you don't know. An explanation appears inline — not a dictionary entry, but a short concept explanation from Claude that takes into account what the book is about and what the sentence is doing. For everyday words, a short translation. For technical terms like RLHF, attention mechanism, or eventual consistency — two or three sentences on what it is and why it matters, with links to related ideas and common confusions.&lt;/p&gt;

&lt;p&gt;The word goes into a personal dictionary automatically. But not the way LingQ does it, where your review queue grows to hundreds of items and you quit the app. I built a filter — only words from roughly the top 15,000 English words by frequency, or technical terms, enter spaced repetition. The rest are saved as reference. The weekly review queue is capped, so it never spirals.&lt;/p&gt;

&lt;p&gt;Over three and a half months I put together a working version on .NET 10, React, and React Native. PostgreSQL, Claude API for explanations, Edge TTS for audio, offline PWA. It ingests EPUB, PDF, and FB2. The catalog started wide, but I'm pruning it hard — I'm realizing focus matters more than I thought.&lt;/p&gt;

&lt;p&gt;It lives at &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt; — full pitch at the end of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I got wrong for three months
&lt;/h2&gt;

&lt;p&gt;For the first three months I was building for an abstract "non-native English speaker who wants to read books." Nobody needs that.&lt;/p&gt;

&lt;p&gt;In April I looked at it honestly and asked who I'd actually built it for. The answer was: a developer trying to read AI engineering books. Because that's what I'd been trying to read for two years. Chip Huyen's &lt;em&gt;AI Engineering&lt;/em&gt;. &lt;em&gt;Hands-On Large Language Models&lt;/em&gt;. &lt;em&gt;Designing Machine Learning Systems&lt;/em&gt;. &lt;em&gt;Building Agentic AI Systems&lt;/em&gt;. &lt;em&gt;Prompt Engineering for LLMs&lt;/em&gt;. I bought all of them. I finished none.&lt;/p&gt;

&lt;p&gt;When I looked at other developers' reading lists online, I saw I wasn't alone. A lot of developers are trying to move into AI engineering right now. We're all reading the same books, and a lot of us aren't finishing them.&lt;/p&gt;

&lt;p&gt;This isn't a generic "non-native English" problem. It's a specific problem for a specific group going through a specific career transition.&lt;/p&gt;

&lt;p&gt;So I'm pivoting. Not "a reader for everyone." A reader for developers learning AI engineering. A narrow niche where I'm already the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  The next six months
&lt;/h2&gt;

&lt;p&gt;Four things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Rebuild the product around the AI angle.&lt;/strong&gt; Trim the catalog to 15–20 AI engineering books. Rewrite the homepage. Shift the framing from translation to explanation. Improve the prompts for technical terms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Actually start reading.&lt;/strong&gt; &lt;em&gt;Hands-On LLMs&lt;/em&gt; in May. &lt;em&gt;AI Engineering&lt;/em&gt; in June and July. &lt;em&gt;Building Agentic AI Systems&lt;/em&gt; in August. Not as a task — as something I want. I want to work as an AI engineer in two years, and the only way there is through these books. I'll read them inside TextStack, because if it doesn't work for me, it won't work for anyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Write about the process.&lt;/strong&gt; This is the first post. If you want to follow along, the blog has RSS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Find the first paying customer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'll say it openly:&lt;/strong&gt; if in six months there's one stranger paying for TextStack, I'll consider this project a success regardless of the other numbers. The first dollar from someone you don't know is a threshold most solo devs never cross. Crossing it is a big part of the work of leaving employment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Live at &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt; — you can open a sample chapter of &lt;em&gt;Pragmatic Programmer&lt;/em&gt; or &lt;em&gt;Hands-On LLMs&lt;/em&gt; without signing up.&lt;/p&gt;

&lt;p&gt;If you're in a similar spot — non-native dev, bought the AI engineering books, didn't finish them — send me a note. Twitter: &lt;a href="https://x.com/Rexetdeus" rel="noopener noreferrer"&gt;@Rexetdeus&lt;/a&gt;. Email on the site. I'll give you early access and listen to what works and what doesn't. In exchange I need honest feedback.&lt;/p&gt;

&lt;p&gt;If it's not your thing, thanks for reading this far. If someone you know is stuck on Chapter 3 of &lt;em&gt;AI Engineering&lt;/em&gt;, maybe forward them this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  P.S.
&lt;/h2&gt;

&lt;p&gt;One more thing. This problem — quitting hard books at page 40 — isn't really about English and isn't really about AI. It's that reading tools are stuck in the early 2010s while content has gotten much denser.&lt;/p&gt;

&lt;p&gt;Kindle Word Wise is from 2014, and it still shows single-word definitions that can't handle &lt;em&gt;eventual consistency&lt;/em&gt; or &lt;em&gt;attention mechanism&lt;/em&gt;. LingQ has been showing translations and adding words to SRS for close to two decades, and the core experience hasn't really changed. Readlang was a clever browser extension in 2013; development stopped when the founder went to Duolingo.&lt;/p&gt;

&lt;p&gt;Modern books need different tools. Not dictionaries — explanations. Not infinite queues — capped ones. Not one experience for everyone — context-aware understanding.&lt;/p&gt;

&lt;p&gt;That's the opening I'm walking into. I'll let you know in six months how it went.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;First post in a series about building TextStack as an AI engineering books reader. Star the repo if you want to follow along: &lt;a href="https://github.com/mrviduus/textstack" rel="noopener noreferrer"&gt;github.com/mrviduus/textstack&lt;/a&gt; · &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;textstack.app&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How We Made Our React SPA Visible to Google Without Rewriting Everything</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Sat, 17 Jan 2026 23:31:51 +0000</pubDate>
      <link>https://dev.to/mrviduus/how-we-made-our-react-spa-visible-to-google-without-rewriting-everything-1916</link>
      <guid>https://dev.to/mrviduus/how-we-made-our-react-spa-visible-to-google-without-rewriting-everything-1916</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; We needed Google to index 500+ book pages on our SPA. Instead of migrating to Next.js or building a complex SSR solution, we added dynamic rendering with Prerender in 3 files. Here's exactly how.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Google Can't See Your Beautiful SPA
&lt;/h2&gt;

&lt;p&gt;We built &lt;a href="https://textstack.app" rel="noopener noreferrer"&gt;TextStack&lt;/a&gt; — a free online library with a Kindle-like reader. React frontend, ASP.NET Core API, PostgreSQL. Classic stack, works great.&lt;/p&gt;

&lt;p&gt;One problem: &lt;strong&gt;Google saw nothing.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- What users see --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;As I Lay Dying by William Faulkner | TextStack&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;As I Lay Dying&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;After a woman in rural Mississippi dies...&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- 98 chapters, rich metadata, Schema.org markup --&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- What Googlebot saw --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Free Online Library | TextStack&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"root"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- Empty. Nothing. Void. --&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We had 500+ books with beautiful SEO metadata, Schema.org structured data, Open Graph tags — all generated client-side. Googlebot executes JavaScript, but it's inconsistent and slow. Our pages weren't getting indexed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Options We Considered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Server-Side Rendering (Next.js/Remix)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Industry standard, great DX, built-in optimizations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; Complete frontend rewrite. Our React app was ~50 components, custom reader with offline sync, complex state management. Estimated time: 3-4 weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Static Site Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Fastest possible page loads, works everywhere&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; We tried this. Built a Next.js SSG version. It worked... until we opened it in the browser. The reader was broken. Styles were wrong. We'd essentially need to maintain two frontends.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Dynamic Rendering (Prerender)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Zero changes to existing React app. Add a service, configure nginx, done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; Additional infrastructure, slight latency for first bot request.&lt;/p&gt;

&lt;p&gt;We chose &lt;strong&gt;Option 3&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Dynamic Rendering?
&lt;/h2&gt;

&lt;p&gt;Dynamic rendering means serving different content based on who's asking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Regular User (Chrome, Safari, Firefox):
  User → nginx → SPA (index.html + JS) → JS renders in browser

Search Bot (Googlebot, Bingbot):
  Bot → nginx → Prerender → Headless Chrome renders page → HTML response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Google &lt;a href="https://developers.google.com/search/docs/crawling-indexing/javascript/dynamic-rendering" rel="noopener noreferrer"&gt;officially supports this approach&lt;/a&gt; and doesn't consider it cloaking (as long as the content is the same).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here's our setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                         nginx                                │
│  ┌─────────────────────────────────────────────────────────┐│
│  │ Check User-Agent                                        ││
│  │ Is it Googlebot/Bingbot/etc?                           ││
│  └──────────────┬────────────────────┬────────────────────┘│
│                 │ YES               │ NO                    │
│                 ▼                   ▼                       │
│  ┌──────────────────┐    ┌──────────────────┐              │
│  │ Prerender        │    │ Static Files /   │              │
│  │ (Headless Chrome)│    │ Vite Dev Server  │              │
│  └────────┬─────────┘    └──────────────────┘              │
│           │                                                 │
│           ▼                                                 │
│  ┌──────────────────┐                                      │
│  │ Fetch &amp;amp; Render   │                                      │
│  │ React App        │──────► API                           │
│  └──────────────────┘                                      │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Add Prerender Service
&lt;/h3&gt;

&lt;p&gt;We used — a lightweight Docker image with headless Chrome:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;prerender&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tvanro/prerender-alpine:7.2.0&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;books_prerender&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;MEMORY_CACHE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;      &lt;span class="c1"&gt;# Enable in-memory cache&lt;/span&gt;
      &lt;span class="na"&gt;CACHE_MAXSIZE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;   &lt;span class="c1"&gt;# Cache up to 500 pages&lt;/span&gt;
      &lt;span class="na"&gt;CACHE_TTL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;      &lt;span class="c1"&gt;# 1 hour cache&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3030:3000"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1G&lt;/span&gt;       &lt;span class="c1"&gt;# Chrome is hungry&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Configure nginx Bot Detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bot detection map&lt;/span&gt;
&lt;span class="k"&gt;map&lt;/span&gt; &lt;span class="nv"&gt;$http_user_agent&lt;/span&gt; &lt;span class="nv"&gt;$prerender_ua&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;default&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*googlebot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*bingbot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*yandex"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*facebookexternalhit"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*twitterbot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*linkedinbot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*slackbot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*whatsapp"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;"~*applebot"&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;# Add more as needed&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Route Bots to Prerender
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;textstack.app&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;# Internal prerender location&lt;/span&gt;
    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/prerender-internal/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://prerender:3000/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Host&lt;/span&gt; &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_connect_timeout&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# Check if bot&lt;/span&gt;
        &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$prerender&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prerender_ua&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$prerender&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Don't prerender static files&lt;/span&gt;
        &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="p"&gt;~&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;.(js|css|png|jpg|svg|woff2)&lt;/span&gt;$&lt;span class="s"&gt;")&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$prerender&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Route bots to prerender&lt;/span&gt;
        &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prerender&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;rewrite&lt;/span&gt; &lt;span class="s"&gt;^(.*)&lt;/span&gt;$ &lt;span class="n"&gt;/prerender-internal/http://&lt;/span&gt;&lt;span class="nv"&gt;$host$1&lt;/span&gt; &lt;span class="s"&gt;last&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Normal users get SPA&lt;/span&gt;
        &lt;span class="kn"&gt;try_files&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="n"&gt;/&lt;/span&gt; &lt;span class="n"&gt;/index.html&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Challenge: API Calls Inside Prerender
&lt;/h2&gt;

&lt;p&gt;Here's where it got interesting. After setting everything up, our book detail pages showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Error&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;Failed to fetch&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Our SPA makes API calls to fetch book data. The API URL was configured as &lt;code&gt;http://localhost:8080&lt;/code&gt;. Inside the Prerender container, &lt;code&gt;localhost&lt;/code&gt; is... the container itself. Not our API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution:&lt;/strong&gt; Vite's dev server proxy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// vite.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://api:8080&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Docker service name&lt;/span&gt;
        &lt;span class="na"&gt;changeOrigin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;rewrite&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;api/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;// Allow prerender to access via Docker network&lt;/span&gt;
    &lt;span class="na"&gt;allowedHosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;web&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we changed our API base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;API_BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:8080&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;API_BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;// Relative, works everywhere&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when Prerender's Chrome loads our SPA:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;JS executes and calls &lt;code&gt;/api/en/books/some-book&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Vite proxies to &lt;code&gt;http://api:8080/en/books/some-book&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;API returns data&lt;/li&gt;
&lt;li&gt;React renders the page&lt;/li&gt;
&lt;li&gt;Prerender captures the HTML&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before (what Googlebot saw):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Free Online Library | TextStack&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"root"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;As I Lay Dying by William Faulkner | TextStack&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"description"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"After a woman in rural Mississippi dies,
her husband and five children begin an arduous journey..."&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"application/ld+json"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@context&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://schema.org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Book&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;As I Lay Dying&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;author&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Person&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;William Faulkner&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;description&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inLanguage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;As I Lay Dying&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"book-detail__author"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;William Faulkner&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;ul&amp;gt;&lt;/span&gt;&lt;span class="c"&gt;&amp;lt;!-- 98 chapters with links --&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/ul&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First request (cold): ~3-5 seconds (Chrome needs to render)&lt;/li&gt;
&lt;li&gt;Cached requests: ~50ms&lt;/li&gt;
&lt;li&gt;Cache hit rate: ~95% (bots recrawl the same pages)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Test
&lt;/h2&gt;

&lt;p&gt;Want to see what Googlebot sees on your site?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Your site as a regular user&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"https://yoursite.com/page"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;title&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Your site as Googlebot&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="s2"&gt;"Googlebot"&lt;/span&gt; &lt;span class="s2"&gt;"https://yoursite.com/page"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;title&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the titles are different (or the second one is empty), you have an SEO problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Files Changed
&lt;/h2&gt;

&lt;p&gt;The entire implementation touched &lt;strong&gt;5 files&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Lines&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;docker-compose.yml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;+20&lt;/td&gt;
&lt;td&gt;Add prerender service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nginx.conf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;+85&lt;/td&gt;
&lt;td&gt;Bot detection &amp;amp; routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vite.config.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;+10&lt;/td&gt;
&lt;td&gt;API proxy for prerender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;docker-compose.prod.yml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;+18&lt;/td&gt;
&lt;td&gt;Production prerender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nginx-prod.conf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;+118&lt;/td&gt;
&lt;td&gt;Production bot routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No React components changed. No business logic touched. The SPA remains exactly as it was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Use This?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes, if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have an existing SPA that works well&lt;/li&gt;
&lt;li&gt;You need SEO but can't justify a rewrite&lt;/li&gt;
&lt;li&gt;Your content doesn't change every second&lt;/li&gt;
&lt;li&gt;You're comfortable with Docker/nginx&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No, if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're starting a new project (just use Next.js)&lt;/li&gt;
&lt;li&gt;You need real-time SEO updates&lt;/li&gt;
&lt;li&gt;Your pages are highly personalized&lt;/li&gt;
&lt;li&gt;You can't add infrastructure&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Have questions about implementing this for your SPA? Drop a comment below or open an issue on GitHub!&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#react&lt;/code&gt; &lt;code&gt;#seo&lt;/code&gt; &lt;code&gt;#docker&lt;/code&gt; &lt;code&gt;#nginx&lt;/code&gt; &lt;code&gt;#webdev&lt;/code&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>react</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Expose Your Local Server to the Internet (Without Port Forwarding)</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Fri, 02 Jan 2026 17:37:18 +0000</pubDate>
      <link>https://dev.to/mrviduus/how-to-expose-your-local-server-to-the-internet-without-port-forwarding-3f3h</link>
      <guid>https://dev.to/mrviduus/how-to-expose-your-local-server-to-the-internet-without-port-forwarding-3f3h</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Use Cloudflare Tunnel to make your home server accessible from anywhere. Free, secure, no router configuration needed.&lt;/p&gt;




&lt;p&gt;I recently deployed a web app running on my laptop to a real domain. No cloud hosting, no VPS, no monthly bills. Just my laptop, a domain, and Cloudflare Tunnel.&lt;/p&gt;

&lt;p&gt;Here's exactly how I did it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I wanted to host my side project on my own hardware. Sounds simple, right?&lt;/p&gt;

&lt;p&gt;But my setup had issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No access to the main router's admin panel&lt;/li&gt;
&lt;li&gt;Dynamic IP address&lt;/li&gt;
&lt;li&gt;Port forwarding wasn't an option&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional solutions like opening ports 80/443 weren't going to work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Cloudflare Tunnel
&lt;/h2&gt;

&lt;p&gt;Cloudflare Tunnel (formerly Argo Tunnel) creates an &lt;strong&gt;outbound connection&lt;/strong&gt; from your server to Cloudflare's edge. No incoming ports needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internet → Cloudflare (SSL) → Tunnel → Your laptop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's free, handles SSL automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A domain name (any registrar works)&lt;/li&gt;
&lt;li&gt;A Cloudflare account (free tier is fine)&lt;/li&gt;
&lt;li&gt;A Linux/Mac/Windows machine running your app&lt;/li&gt;
&lt;li&gt;~10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Add Your Domain to Cloudflare
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://dash.cloudflare.com" rel="noopener noreferrer"&gt;dash.cloudflare.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Add a site&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enter your domain (e.g., &lt;code&gt;myapp.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Select the &lt;strong&gt;Free&lt;/strong&gt; plan&lt;/li&gt;
&lt;li&gt;Cloudflare will show you new nameservers&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 2: Update Nameservers
&lt;/h2&gt;

&lt;p&gt;Go to your domain registrar and replace the nameservers with Cloudflare's.&lt;/p&gt;

&lt;p&gt;For example, if you're using Porkbun:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to Domain Management → Your domain&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Nameservers&lt;/strong&gt; → &lt;strong&gt;Edit&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Replace with Cloudflare nameservers:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   carter.ns.cloudflare.com
   vita.ns.cloudflare.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Save and wait 5-30 minutes for propagation&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 3: Create a Tunnel
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;In Cloudflare, go to &lt;strong&gt;Zero Trust&lt;/strong&gt; (left sidebar)&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Networks&lt;/strong&gt; → &lt;strong&gt;Tunnels&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create a tunnel&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Name it something like &lt;code&gt;my-server&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;You'll get an installation command with a token&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Install Cloudflared
&lt;/h2&gt;

&lt;p&gt;On your server, install the &lt;code&gt;cloudflared&lt;/code&gt; daemon:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ubuntu/Debian:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
&lt;span class="nb"&gt;sudo &lt;/span&gt;dpkg &lt;span class="nt"&gt;-i&lt;/span&gt; cloudflared.deb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;cloudflared
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt;&lt;br&gt;
Download from &lt;a href="https://github.com/cloudflare/cloudflared/releases" rel="noopener noreferrer"&gt;GitHub releases&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 5: Connect the Tunnel
&lt;/h2&gt;

&lt;p&gt;Run the command Cloudflare gave you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;cloudflared service &lt;span class="nb"&gt;install&lt;/span&gt; &amp;lt;YOUR_TOKEN&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This installs cloudflared as a system service that starts automatically on boot.&lt;/p&gt;

&lt;p&gt;Check it's running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status cloudflared
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Add a Public Hostname
&lt;/h2&gt;

&lt;p&gt;Back in Cloudflare Dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your tunnel → &lt;strong&gt;Configure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Public Hostname&lt;/strong&gt; → &lt;strong&gt;Add a public hostname&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Fill in:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subdomain:&lt;/strong&gt; leave empty (or use &lt;code&gt;www&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain:&lt;/strong&gt; select your domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service Type:&lt;/strong&gt; &lt;code&gt;HTTP&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;code&gt;localhost:80&lt;/code&gt; (or whatever port your app runs on)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Save&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you get "DNS record already exists" error, delete the existing A record for your domain in &lt;strong&gt;DNS&lt;/strong&gt; → &lt;strong&gt;Records&lt;/strong&gt; first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Test It
&lt;/h2&gt;

&lt;p&gt;Open your domain in a browser. It should load your local app with HTTPS!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Or test from command line&lt;/span&gt;
curl https://myapp.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running Multiple Services
&lt;/h2&gt;

&lt;p&gt;You can route different domains or paths to different local services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;myapp.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;localhost:3000&lt;/code&gt; (frontend)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;api.myapp.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;localhost:8080&lt;/code&gt; (API)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Just add multiple public hostnames in the tunnel configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Setup
&lt;/h2&gt;

&lt;p&gt;Here's what I'm running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;textstack.app     → localhost:80 → nginx → React frontend
textstack.app/api → localhost:80 → nginx → Docker API (port 8080)
textstack.dev     → localhost:80 → nginx → Same app, different site
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All from a laptop sitting in my living room.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Tips
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Firewall:&lt;/strong&gt; Only allow necessary ports locally
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow 22/tcp   &lt;span class="c"&gt;# SSH&lt;/span&gt;
   &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow 80/tcp   &lt;span class="c"&gt;# For tunnel (local only)&lt;/span&gt;
   &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw &lt;span class="nb"&gt;enable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't expose admin panels&lt;/strong&gt; to the internet. Keep them on &lt;code&gt;localhost&lt;/code&gt; only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cloudflare adds SSL automatically&lt;/strong&gt; - no need for Let's Encrypt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Access Policies&lt;/strong&gt; (in Zero Trust) to require authentication for sensitive routes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Site not loading?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check tunnel is running&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status cloudflared

&lt;span class="c"&gt;# Check your app is running&lt;/span&gt;
curl localhost:80

&lt;span class="c"&gt;# Check DNS propagation&lt;/span&gt;
dig myapp.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;"Address Not Found" on mobile?&lt;/strong&gt;&lt;br&gt;
DNS might not have propagated to your carrier yet. Try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Toggle airplane mode on/off&lt;/li&gt;
&lt;li&gt;Use a different DNS (1.1.1.1)&lt;/li&gt;
&lt;li&gt;Wait 15-30 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;502 Bad Gateway?&lt;/strong&gt;&lt;br&gt;
Your local app isn't responding. Check it's running on the correct port.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Cloudflare Tunnel: &lt;strong&gt;Free&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Domain: ~$10/year&lt;/li&gt;
&lt;li&gt;Hosting: &lt;strong&gt;$0&lt;/strong&gt; (your own hardware)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When NOT to Use This
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;High-traffic production apps (your home internet has limits)&lt;/li&gt;
&lt;li&gt;Apps requiring 99.99% uptime (your laptop can crash)&lt;/li&gt;
&lt;li&gt;Sensitive data without proper security measures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For serious production workloads, use proper cloud hosting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Cloudflare Tunnel is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Side projects&lt;/li&gt;
&lt;li&gt;Development/staging environments&lt;/li&gt;
&lt;li&gt;Self-hosted apps&lt;/li&gt;
&lt;li&gt;Home automation dashboards&lt;/li&gt;
&lt;li&gt;Personal APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It took me about 15 minutes to go from "app running locally" to "app accessible worldwide with HTTPS."&lt;/p&gt;

&lt;p&gt;No cloud bills. No DevOps complexity. Just your code, running on your hardware, accessible to the world.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have questions?&lt;/strong&gt; Drop a comment below or find me on &lt;a href="https://x.com/rexetdeus" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building something cool with this setup?&lt;/strong&gt; I'd love to hear about it!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: cloudflare, self-hosting, devops, tutorial, web-development&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>networking</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build and Push .NET 8 Apps as Docker Images (No Dockerfile)</title>
      <dc:creator>Vasyl</dc:creator>
      <pubDate>Thu, 24 Jul 2025 02:19:51 +0000</pubDate>
      <link>https://dev.to/mrviduus/build-and-push-net-8-apps-as-docker-images-no-dockerfile-mf4</link>
      <guid>https://dev.to/mrviduus/build-and-push-net-8-apps-as-docker-images-no-dockerfile-mf4</guid>
      <description>&lt;h2&gt;
  
  
  1. Why do this?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;.NET 8 can build a Docker image for you.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;One command creates and tags the image.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You can push the image without Docker on the build server.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. What you need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;.NET SDK 8.0.200+&lt;/strong&gt; — Build and publish the image.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker or Podman (optional)&lt;/strong&gt; — Run the image on your PC.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub or Azure DevOps&lt;/strong&gt; — Run CI/CD examples.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Build and run on your PC
&lt;/h2&gt;

&lt;p&gt;Open a terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet new console &lt;span class="nt"&gt;-o&lt;/span&gt; MyApp
&lt;span class="nb"&gt;cd &lt;/span&gt;MyApp
dotnet publish &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer &lt;span class="nt"&gt;-p&lt;/span&gt;:EnableSdkContainerSupport&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; myapp:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK picks the base image, tags it &lt;strong&gt;&lt;code&gt;myapp:latest&lt;/code&gt;&lt;/strong&gt;, and stores it locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Change image settings
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Alpine Linux&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  dotnet publish &lt;span class="nt"&gt;--os&lt;/span&gt; linux-musl &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Ubuntu 22.04&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  dotnet publish &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerFamily&lt;span class="o"&gt;=&lt;/span&gt;jammy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tiny chiseled image&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  dotnet publish &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerFamily&lt;span class="o"&gt;=&lt;/span&gt;jammy-chiseled-extra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set repo &amp;amp; tag&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  dotnet publish &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerRepository&lt;span class="o"&gt;=&lt;/span&gt;ghcr.io/user/app &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerImageTags&lt;span class="o"&gt;=&lt;/span&gt;v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Push in build&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  dotnet publish &lt;span class="nt"&gt;-t&lt;/span&gt;:PublishContainer &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerRegistry&lt;span class="o"&gt;=&lt;/span&gt;ghcr.io &lt;span class="nt"&gt;-p&lt;/span&gt;:ContainerPush&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. GitHub Actions
&lt;/h2&gt;

&lt;p&gt;Save this as &lt;strong&gt;&lt;code&gt;.github/workflows/docker.yml&lt;/code&gt;&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push image&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-dotnet@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;dotnet-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8.0.x&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/login-action@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.actor }}&lt;/span&gt;
          &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;dotnet publish -c Release -t:PublishContainer \&lt;/span&gt;
            &lt;span class="s"&gt;-p:ContainerRepository=ghcr.io/${{ github.repository }} \&lt;/span&gt;
            &lt;span class="s"&gt;-p:ContainerImageTags=${{ github.sha }} \&lt;/span&gt;
            &lt;span class="s"&gt;-p:ContainerPush=true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Azure DevOps
&lt;/h2&gt;

&lt;p&gt;Save this as &lt;strong&gt;&lt;code&gt;azure-pipelines.yml&lt;/code&gt;&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;

&lt;span class="na"&gt;pool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vmImage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

&lt;span class="na"&gt;variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;acrName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myacr.azurecr.io&lt;/span&gt;
  &lt;span class="na"&gt;imageRepo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo/myapp&lt;/span&gt;

&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;checkout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;UseDotNet@2&lt;/span&gt;
  &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;packageType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sdk&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8.0.x&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;dotnet publish -c Release -t:PublishContainer \&lt;/span&gt;
      &lt;span class="s"&gt;-p:ContainerRegistry=$(acrName) \&lt;/span&gt;
      &lt;span class="s"&gt;-p:ContainerRepository=$(imageRepo) \&lt;/span&gt;
      &lt;span class="s"&gt;-p:ContainerImageTags=$(Build.BuildNumber) \&lt;/span&gt;
      &lt;span class="s"&gt;-p:ContainerPush=true&lt;/span&gt;
  &lt;span class="na"&gt;displayName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Quick fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hidden folders like &lt;strong&gt;&lt;code&gt;.well-known&lt;/code&gt;&lt;/strong&gt; are missing → Rename the folder or add a custom MSBuild target.
&lt;/li&gt;
&lt;li&gt;Need &lt;strong&gt;&lt;code&gt;apt&lt;/code&gt;&lt;/strong&gt; packages → Make your own base image and set &lt;strong&gt;&lt;code&gt;ContainerBaseImage&lt;/code&gt;&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private NuGet feed&lt;/strong&gt; → Use the same NuGet auth you use in normal builds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. Remember
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Dockerfile&lt;/strong&gt; for most apps.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Images are smaller&lt;/strong&gt; and run as &lt;strong&gt;non‑root&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One command&lt;/strong&gt; can build and push.&lt;/li&gt;
&lt;/ul&gt;




</description>
    </item>
  </channel>
</rss>
