<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jangwook Kim</title>
    <description>The latest articles on DEV Community by Jangwook Kim (@jangwook_kim_e31e7291ad98).</description>
    <link>https://dev.to/jangwook_kim_e31e7291ad98</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1909290%2F60a8c15f-b2b5-4189-8578-78b8ab78900b.jpg</url>
      <title>DEV Community: Jangwook Kim</title>
      <link>https://dev.to/jangwook_kim_e31e7291ad98</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jangwook_kim_e31e7291ad98"/>
    <language>en</language>
    <item>
      <title>Docker Model Runner vs Ollama: Local AI Deployment Compared 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:53:54 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/docker-model-runner-vs-ollama-local-ai-deployment-compared-2026-297c</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/docker-model-runner-vs-ollama-local-ai-deployment-compared-2026-297c</guid>
      <description>&lt;h1&gt;
  
  
  Docker Model Runner vs Ollama: Local AI Deployment Compared 2026
&lt;/h1&gt;

&lt;p&gt;Docker entered the local AI space. If you are already running models with Ollama, you are now looking at a second option that speaks the same language — literally the same OpenAI-compatible API — but comes from the company that standardized how the world ships software.&lt;/p&gt;

&lt;p&gt;Docker Model Runner (DMR) shipped with Docker Desktop 4.40 in mid-2025 and has been evolving fast. It uses llama.cpp under the hood, stores models as OCI artifacts on Docker Hub, and integrates directly into Docker Compose workflows. Ollama, meanwhile, remains the default choice for local LLM deployment with 52+ million monthly downloads, a broader model library, and an ecosystem that every AI coding tool already supports.&lt;/p&gt;

&lt;p&gt;The question is not which tool is objectively better — it is which tool fits your workflow. This guide compares both hands-on: installation, model management, performance, GPU acceleration, IDE integration, and server deployment. We tested with Gemma 4 E4B as the reference model on both platforms.&lt;/p&gt;

&lt;p&gt;If you are new to running models locally, start with our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI setup guide&lt;/a&gt; first. If you already have Ollama running and want to know whether Docker Model Runner is worth adding to your stack, keep reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Docker Model Runner
&lt;/h2&gt;

&lt;p&gt;Docker Model Runner is Docker's native solution for running AI models locally. It is not a container that runs a model inside it — it runs models directly on the host using llama.cpp, with no container overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;DMR treats AI models as first-class Docker primitives, similar to images and containers. Models are stored as OCI (Open Container Initiative) artifacts, the same standard that Docker uses for container images. This means models can be pushed to and pulled from Docker Hub, private registries, or any OCI-compliant registry.&lt;/p&gt;

&lt;p&gt;When you pull a model with &lt;code&gt;docker model pull&lt;/code&gt;, it downloads the GGUF weights and stores them locally. When you run inference, llama.cpp loads the model into memory, runs the computation on your CPU or GPU, and returns results through an OpenAI-compatible API on port 12434.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-compatible API&lt;/strong&gt; on &lt;code&gt;localhost:12434&lt;/code&gt;. Any tool that speaks to the OpenAI API can point at Docker Model Runner instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama-compatible API&lt;/strong&gt; as well — existing Ollama integrations can switch endpoints without code changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker Compose integration.&lt;/strong&gt; Define models as services in your &lt;code&gt;docker-compose.yml&lt;/code&gt;. Docker pulls and starts the model automatically during &lt;code&gt;docker compose up&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple inference engines.&lt;/strong&gt; llama.cpp (default, broad hardware support), vLLM (high-throughput production workloads), and Diffusers (image generation).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU acceleration.&lt;/strong&gt; Metal (Apple Silicon), CUDA (NVIDIA), and Vulkan (AMD, Intel, NVIDIA).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy loading.&lt;/strong&gt; Models load into memory only when a request arrives and unload when idle, freeing resources automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics endpoint&lt;/strong&gt; at &lt;code&gt;/metrics&lt;/code&gt; for monitoring performance and resource usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  System Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docker Desktop 4.40+ (macOS, Windows) or Docker Engine on Linux&lt;/li&gt;
&lt;li&gt;8 GB RAM minimum, 16 GB recommended&lt;/li&gt;
&lt;li&gt;Optional: Apple Silicon (Metal), NVIDIA GPU (CUDA), or Vulkan-compatible GPU&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Ollama Recap — The Current Standard
&lt;/h2&gt;

&lt;p&gt;If you have been running models locally in the past two years, you probably started with Ollama. It launched in 2023 and quickly became the default tool for local LLM management.&lt;/p&gt;

&lt;p&gt;Ollama provides a simple CLI (&lt;code&gt;ollama pull&lt;/code&gt;, &lt;code&gt;ollama run&lt;/code&gt;), an OpenAI-compatible API on port 11434, and a growing library of pre-configured models. It supports GGUF, Safetensors, and custom Modelfiles for fine-tuned configurations.&lt;/p&gt;

&lt;p&gt;We covered Ollama setup in depth in our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI guide&lt;/a&gt; and used it as the foundation for our &lt;a href="https://www.effloow.com/articles/gemma-4-local-setup-ollama-open-webui-guide-2026" rel="noopener noreferrer"&gt;Gemma 4 local setup guide&lt;/a&gt;. If you are new to local AI, those articles give you a working setup in under 10 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes Ollama the Default
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;52+ million monthly downloads&lt;/strong&gt; as of Q1 2026 (&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broadest model library.&lt;/strong&gt; Hundreds of models available through &lt;code&gt;ollama.com/library&lt;/code&gt;, plus import support for GGUF and Safetensors formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Modelfiles.&lt;/strong&gt; Create model configurations with specific system prompts, parameters, and adapters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem integration.&lt;/strong&gt; LangChain, LlamaIndex, Spring AI, Open WebUI, Continue.dev, Cursor, Aider — virtually every AI developer tool supports Ollama natively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform.&lt;/strong&gt; macOS, Linux, Windows. Works on Apple Silicon, NVIDIA GPUs, and CPU-only setups&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation and Setup Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installing Docker Model Runner
&lt;/h3&gt;

&lt;p&gt;If you already have Docker Desktop installed, DMR may already be available. Check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the command is not recognized, enable it in Docker Desktop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Docker Desktop&lt;/li&gt;
&lt;li&gt;Go to &lt;strong&gt;Settings → AI&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enable &lt;strong&gt;Docker Model Runner&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Optionally enable &lt;strong&gt;GPU-backend inference&lt;/strong&gt; if you have a supported GPU&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On Linux with Docker Engine, DMR is included when installed from Docker's official repositories. The TCP endpoint is enabled by default on port 12434.&lt;/p&gt;

&lt;p&gt;Pull your first model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model pull ai/gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model run ai/gemma4 &lt;span class="s2"&gt;"Explain Docker Model Runner in one sentence"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is immediately available at &lt;code&gt;http://localhost:12434&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Ollama
&lt;/h3&gt;

&lt;p&gt;Download from &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt; or install via command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Verify&lt;/span&gt;
ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull and run a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:e4b
ollama run gemma4:e4b &lt;span class="s2"&gt;"Explain Ollama in one sentence"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is available at &lt;code&gt;http://localhost:11434&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup Comparison Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Install method&lt;/td&gt;
&lt;td&gt;Included in Docker Desktop / Engine&lt;/td&gt;
&lt;td&gt;Standalone installer or script&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prerequisite&lt;/td&gt;
&lt;td&gt;Docker Desktop or Engine&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Default port&lt;/td&gt;
&lt;td&gt;12434&lt;/td&gt;
&lt;td&gt;11434&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable step&lt;/td&gt;
&lt;td&gt;Settings → AI → Enable&lt;/td&gt;
&lt;td&gt;None (runs on install)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to first model&lt;/td&gt;
&lt;td&gt;~2 minutes (if Docker installed)&lt;/td&gt;
&lt;td&gt;~2 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model format&lt;/td&gt;
&lt;td&gt;GGUF (OCI artifacts)&lt;/td&gt;
&lt;td&gt;GGUF, Safetensors, custom Modelfiles&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ollama is faster to set up from scratch because it has no prerequisites. Docker Model Runner is faster if Docker is already part of your workflow — it is a toggle in settings, not a new tool to install.&lt;/p&gt;




&lt;h2&gt;
  
  
  Model Catalog and Availability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Docker Model Runner: Docker Hub AI Models
&lt;/h3&gt;

&lt;p&gt;DMR pulls models from Docker Hub under the &lt;code&gt;ai/&lt;/code&gt; namespace. Available models include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ai/gemma4&lt;/code&gt; — Google Gemma 4 (multiple sizes)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/llama3.2&lt;/code&gt; — Meta Llama 3.2&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/mistral&lt;/code&gt; — Mistral AI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/phi4&lt;/code&gt; — Microsoft Phi 4&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/qwen2.5&lt;/code&gt; — Alibaba Qwen 2.5&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/deepseek-r1-distill-llama&lt;/code&gt; — DeepSeek R1 distilled&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/mistral-nemo&lt;/code&gt; — Mistral Nemo&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ai/qwq&lt;/code&gt; — QwQ reasoning model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Models are stored as OCI artifacts, meaning they follow the same distribution standard as Docker container images. You can also pull models from Hugging Face.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List downloaded models&lt;/span&gt;
docker model &lt;span class="nb"&gt;ls&lt;/span&gt;

&lt;span class="c"&gt;# Pull a specific quantization&lt;/span&gt;
docker model pull ai/gemma4:e4b-q4_K_M

&lt;span class="c"&gt;# Remove a model&lt;/span&gt;
docker model &lt;span class="nb"&gt;rm &lt;/span&gt;ai/gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ollama: The Broader Library
&lt;/h3&gt;

&lt;p&gt;Ollama's model library is significantly larger. Beyond the major model families, it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community-uploaded models and fine-tunes&lt;/li&gt;
&lt;li&gt;Custom Modelfiles for configuring system prompts, temperature, and stop tokens&lt;/li&gt;
&lt;li&gt;Support for importing raw GGUF files and Safetensors models&lt;/li&gt;
&lt;li&gt;Quantization variants for most models
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List downloaded models&lt;/span&gt;
ollama list

&lt;span class="c"&gt;# Pull a specific model&lt;/span&gt;
ollama pull gemma4:e4b

&lt;span class="c"&gt;# Create a custom model from a Modelfile&lt;/span&gt;
ollama create my-assistant &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile

&lt;span class="c"&gt;# Remove a model&lt;/span&gt;
ollama &lt;span class="nb"&gt;rm &lt;/span&gt;gemma4:e4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Catalog Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model source&lt;/td&gt;
&lt;td&gt;Docker Hub (&lt;code&gt;ai/&lt;/code&gt; namespace), Hugging Face&lt;/td&gt;
&lt;td&gt;ollama.com/library, Hugging Face, GGUF import&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Number of models&lt;/td&gt;
&lt;td&gt;Curated selection (~20+ families)&lt;/td&gt;
&lt;td&gt;Hundreds of models + community uploads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom models&lt;/td&gt;
&lt;td&gt;Import GGUF files&lt;/td&gt;
&lt;td&gt;Modelfiles, GGUF import, Safetensors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage format&lt;/td&gt;
&lt;td&gt;OCI artifacts&lt;/td&gt;
&lt;td&gt;Proprietary blob format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Registry support&lt;/td&gt;
&lt;td&gt;Any OCI registry&lt;/td&gt;
&lt;td&gt;Ollama registry only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ollama wins on catalog breadth. Docker Model Runner wins on standardized distribution — OCI artifacts mean you can use existing container registry infrastructure for model management.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance: Startup, Inference, and Memory
&lt;/h2&gt;

&lt;p&gt;Performance between Docker Model Runner and Ollama is largely comparable — both use llama.cpp as the default inference engine. The architectural differences are in how they manage model loading and memory, not in raw inference speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inference Speed
&lt;/h3&gt;

&lt;p&gt;Independent benchmarks show inference speed differences of 1.0–1.12x between the two tools, which is imperceptible in practice (&lt;a href="https://corti.com/a-technical-comparison-ollama-vs-docker-model-runner-for-local-llm-deployment/" rel="noopener noreferrer"&gt;source&lt;/a&gt;). Both tools use the same underlying llama.cpp engine for GGUF model inference, so token generation speed is essentially identical for the same model and quantization level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Loading
&lt;/h3&gt;

&lt;p&gt;Docker Model Runner uses lazy loading — models are loaded into memory only when the first request arrives and unloaded when idle. This is resource-efficient but means the first request after idle has higher latency.&lt;/p&gt;

&lt;p&gt;Ollama keeps models loaded in memory by default (configurable with &lt;code&gt;OLLAMA_KEEP_ALIVE&lt;/code&gt;). This gives faster first-response times but uses more memory when idle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Usage
&lt;/h3&gt;

&lt;p&gt;Both tools have similar peak memory usage for the same model since they use the same inference engine. The difference is in idle behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DMR:&lt;/strong&gt; Unloads models when idle → lower idle memory usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama:&lt;/strong&gt; Keeps models loaded (default 5 minutes) → faster responses, higher idle memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inference Engine Options
&lt;/h3&gt;

&lt;p&gt;Docker Model Runner offers a key advantage here: multiple inference engines.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Supported Platforms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;llama.cpp (default)&lt;/td&gt;
&lt;td&gt;General use, broad hardware&lt;/td&gt;
&lt;td&gt;CPU, Metal, CUDA, Vulkan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;High-throughput production&lt;/td&gt;
&lt;td&gt;Metal (macOS), CUDA (Linux/Windows)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diffusers&lt;/td&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;CPU, CUDA&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ollama uses its own optimized fork of llama.cpp exclusively.&lt;/p&gt;

&lt;p&gt;On Apple Silicon, llama.cpp throughput stays stable at approximately 333–345 tokens/second regardless of output length for models like Llama 3.2 1B. vLLM shows more variance (134–343 tokens/second) but excels at concurrent request handling (&lt;a href="https://www.docker.com/blog/docker-model-runner-vllm-metal-macos/" rel="noopener noreferrer"&gt;source&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; For single-user local inference, performance is a tie. Docker Model Runner's vLLM engine gives it an edge for multi-user or production scenarios where throughput matters more than time-to-first-token.&lt;/p&gt;




&lt;h2&gt;
  
  
  GPU Acceleration
&lt;/h2&gt;

&lt;p&gt;Both tools support GPU acceleration, but the implementation and configuration differ.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Model Runner GPU Support
&lt;/h3&gt;

&lt;p&gt;DMR supports three GPU backends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metal&lt;/strong&gt; (Apple Silicon M1/M2/M3/M4) — enabled by default, no configuration needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CUDA&lt;/strong&gt; (NVIDIA GPUs) — requires NVIDIA Container Runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulkan&lt;/strong&gt; (AMD, Intel, NVIDIA) — added in Docker Desktop 4.42, broadest hardware support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enable GPU in Docker Desktop: &lt;strong&gt;Settings → AI → GPU-backend inference&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On Linux with NVIDIA GPUs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify GPU is detected&lt;/span&gt;
docker model status

&lt;span class="c"&gt;# Pull and run with GPU acceleration&lt;/span&gt;
docker model pull ai/gemma4:e4b
docker model run ai/gemma4:e4b &lt;span class="s2"&gt;"Test GPU inference"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ollama GPU Support
&lt;/h3&gt;

&lt;p&gt;Ollama detects and uses available GPUs automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metal&lt;/strong&gt; (Apple Silicon) — automatic, no configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CUDA&lt;/strong&gt; (NVIDIA) — automatic if NVIDIA drivers are installed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROCm&lt;/strong&gt; (AMD GPUs on Linux) — supported with ROCm drivers
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check GPU detection&lt;/span&gt;
ollama ps

&lt;span class="c"&gt;# Force CPU-only mode if needed&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_NO_GPU&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GPU Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apple Silicon Metal&lt;/td&gt;
&lt;td&gt;Yes (automatic)&lt;/td&gt;
&lt;td&gt;Yes (automatic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA CUDA&lt;/td&gt;
&lt;td&gt;Yes (needs runtime)&lt;/td&gt;
&lt;td&gt;Yes (automatic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMD ROCm&lt;/td&gt;
&lt;td&gt;Via Vulkan&lt;/td&gt;
&lt;td&gt;Yes (Linux)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intel GPUs&lt;/td&gt;
&lt;td&gt;Via Vulkan&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vulkan support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration&lt;/td&gt;
&lt;td&gt;Settings toggle&lt;/td&gt;
&lt;td&gt;Automatic detection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ollama has simpler GPU setup — it just works. Docker Model Runner has broader GPU support through Vulkan, covering AMD and Intel GPUs that Ollama cannot use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Integration Ecosystem
&lt;/h2&gt;

&lt;p&gt;This is where the comparison gets practical. If your AI coding tools cannot connect to the model runner, the performance benchmarks do not matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  IDE and Tool Compatibility
&lt;/h3&gt;

&lt;p&gt;Both Docker Model Runner and Ollama provide OpenAI-compatible APIs, which means most tools can connect to either. Here is the integration status for popular AI developer tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI provider)&lt;/td&gt;
&lt;td&gt;Yes (native Ollama provider)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://cursor.com" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI endpoint)&lt;/td&gt;
&lt;td&gt;Yes (native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;Aider&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (via env vars)&lt;/td&gt;
&lt;td&gt;Yes (native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://openwebui.com" rel="noopener noreferrer"&gt;Open WebUI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI connection)&lt;/td&gt;
&lt;td&gt;Yes (native, recommended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://langchain.com" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI SDK)&lt;/td&gt;
&lt;td&gt;Yes (native Ollama SDK)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://llamaindex.ai" rel="noopener noreferrer"&gt;LlamaIndex&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI SDK)&lt;/td&gt;
&lt;td&gt;Yes (native Ollama SDK)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://spring.io" rel="noopener noreferrer"&gt;Spring AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes (native support)&lt;/td&gt;
&lt;td&gt;Yes (native support)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Docker Model Runner: Continue.dev Configuration
&lt;/h3&gt;

&lt;p&gt;Edit &lt;code&gt;~/.continue/config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Gemma 4 (Docker)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ai/gemma4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:12434/engines/v1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Docker Model Runner: Aider Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:12434/engines/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anything
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ai/gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ollama: Continue.dev Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Gemma 4 (Ollama)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:e4b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Docker Compose Integration (DMR Exclusive)
&lt;/h3&gt;

&lt;p&gt;This is Docker Model Runner's biggest ecosystem differentiator. You can define AI models as services in &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model&lt;/span&gt;
      &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai/gemma4&lt;/span&gt;

  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;MODEL_URL=${MODEL_MODEL_URL}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;MODEL_NAME=${MODEL_MODEL_NAME}&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run &lt;code&gt;docker compose up&lt;/code&gt;, Docker automatically pulls the model, starts inference, and injects connection details (&lt;code&gt;MODEL_MODEL_URL&lt;/code&gt;, &lt;code&gt;MODEL_MODEL_NAME&lt;/code&gt;) into your application container. No manual setup, no glue code.&lt;/p&gt;

&lt;p&gt;For containers that need to reach Docker Model Runner directly, add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;extra_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-runner.docker.internal:host-gateway"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then access the API at &lt;code&gt;http://model-runner.docker.internal:12434/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ollama has deeper native integrations — most tools have a dedicated Ollama provider. Docker Model Runner works through the OpenAI-compatible API, which is universal but requires manual endpoint configuration. The Docker Compose integration is the standout feature for teams building AI-powered applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  GPU Server Deployment on Hetzner
&lt;/h2&gt;

&lt;p&gt;For models that exceed your laptop's capabilities — like Gemma 4 26B MoE or 31B Dense — you need a GPU server. We covered Hetzner GPU setup in detail in our &lt;a href="https://www.effloow.com/articles/hetzner-cloud-ai-gpu-server-guide-2026" rel="noopener noreferrer"&gt;Hetzner Cloud GPU guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ollama on Hetzner
&lt;/h3&gt;

&lt;p&gt;This is the battle-tested path. SSH into your Hetzner GPU server and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Verify GPU detection&lt;/span&gt;
nvidia-smi
ollama ps

&lt;span class="c"&gt;# Pull a large model&lt;/span&gt;
ollama pull gemma4:31b

&lt;span class="c"&gt;# Expose API (bind to all interfaces)&lt;/span&gt;
&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0 ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is available on port 11434. See our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI guide&lt;/a&gt; for adding a browser interface and our &lt;a href="https://www.effloow.com/articles/gemma-4-local-setup-ollama-open-webui-guide-2026" rel="noopener noreferrer"&gt;Gemma 4 guide&lt;/a&gt; for running Gemma 4 specifically on Hetzner GPUs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Model Runner on Hetzner
&lt;/h3&gt;

&lt;p&gt;Install Docker Engine on your Hetzner server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Docker Engine&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://get.docker.com | sh

&lt;span class="c"&gt;# Add NVIDIA Container Toolkit&lt;/span&gt;
&lt;span class="nv"&gt;distribution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; /etc/os-release&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$ID$VERSION_ID&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://nvidia.github.io/libnvidia-container/gpgkey | &lt;span class="nb"&gt;sudo &lt;/span&gt;gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-L&lt;/span&gt; https://nvidia.github.io/libnvidia-container/&lt;span class="nv"&gt;$distribution&lt;/span&gt;/libnvidia-container.list | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/apt/sources.list.d/nvidia-container-toolkit.list
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nvidia-container-toolkit
&lt;span class="nb"&gt;sudo &lt;/span&gt;nvidia-ctk runtime configure &lt;span class="nt"&gt;--runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart docker

&lt;span class="c"&gt;# Verify DMR is available&lt;/span&gt;
docker model version
docker model pull ai/gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TCP endpoint on Linux is enabled by default on port 12434.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server Deployment Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Install complexity (GPU)&lt;/td&gt;
&lt;td&gt;Higher (Docker + NVIDIA toolkit)&lt;/td&gt;
&lt;td&gt;Lower (single script)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote API exposure&lt;/td&gt;
&lt;td&gt;Port 12434 (TCP by default on Linux)&lt;/td&gt;
&lt;td&gt;Port 11434 (configurable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reverse proxy setup&lt;/td&gt;
&lt;td&gt;Standard Docker networking&lt;/td&gt;
&lt;td&gt;Standard Nginx/Caddy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker Compose apps&lt;/td&gt;
&lt;td&gt;Native integration&lt;/td&gt;
&lt;td&gt;Needs network configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Community deployment guides&lt;/td&gt;
&lt;td&gt;Growing&lt;/td&gt;
&lt;td&gt;Extensive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ollama is simpler to deploy on a GPU server. Docker Model Runner is better if your server already runs Docker-based services and you want models integrated into your Compose stack. For detailed Hetzner GPU setup, including cost breakdown and Open WebUI deployment, see our &lt;a href="https://www.effloow.com/articles/self-host-dev-stack-under-20-dollars-month" rel="noopener noreferrer"&gt;self-hosting guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use Which: Decision Framework
&lt;/h2&gt;

&lt;p&gt;After testing both tools, here is the practical decision framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Docker Model Runner When
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You already use Docker in development.&lt;/strong&gt; DMR is a natural extension of your existing workflow. No new tools to install or manage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are building Docker Compose applications.&lt;/strong&gt; The native model-as-a-service integration in Compose is unmatched. Define a model, run &lt;code&gt;docker compose up&lt;/code&gt;, and your app gets inference automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want OCI-standard model distribution.&lt;/strong&gt; If you use private registries (Harbor, ECR, GCR) for container images, you can use the same infrastructure for AI models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need Vulkan GPU support.&lt;/strong&gt; AMD or Intel GPU users have no Ollama option — DMR's Vulkan backend is the answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want production inference with vLLM.&lt;/strong&gt; DMR's vLLM engine handles concurrent requests better than llama.cpp for multi-user scenarios.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Choose Ollama When
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You want the simplest path to running local models.&lt;/strong&gt; One install script, one command to pull, one command to run. No prerequisites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need the broadest model library.&lt;/strong&gt; Ollama's registry has more models, more quantization options, and community uploads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You use custom Modelfiles.&lt;/strong&gt; Ollama's Modelfile system for creating customized model configurations has no DMR equivalent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your tools have native Ollama support.&lt;/strong&gt; Continue.dev, Cursor, Aider, Open WebUI — all have dedicated Ollama providers that are more polished than their OpenAI-compatible fallbacks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You deploy to GPU servers.&lt;/strong&gt; Ollama's single-script install with automatic GPU detection is harder to beat for server deployment.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Choose Both When
&lt;/h3&gt;

&lt;p&gt;Docker Model Runner and Ollama can coexist. They use different ports (12434 vs 11434), different model storage, and different processes. Running both is a valid strategy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Ollama&lt;/strong&gt; for interactive development, Open WebUI chat, and quick model experimentation&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Docker Model Runner&lt;/strong&gt; for application development where models are part of your Docker Compose stack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful if you are transitioning from Ollama to Docker Model Runner — run both while you migrate your workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running Both Side by Side
&lt;/h2&gt;

&lt;p&gt;Here is a practical setup that runs both tools concurrently with the same model:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install Both
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ollama (if not already installed)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Docker Model Runner (enable in Docker Desktop Settings → AI)&lt;/span&gt;
docker model version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Pull the Same Model on Both
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:e4b
docker model pull ai/gemma4:e4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Verify Both APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test Ollama API&lt;/span&gt;
curl http://localhost:11434/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gemma4:e4b",
    "messages": [{"role": "user", "content": "Hello from Ollama"}]
  }'&lt;/span&gt;

&lt;span class="c"&gt;# Test Docker Model Runner API&lt;/span&gt;
curl http://localhost:12434/engines/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "ai/gemma4:e4b",
    "messages": [{"role": "user", "content": "Hello from Docker"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Configure Tools for Either
&lt;/h3&gt;

&lt;p&gt;Point your AI coding tools at whichever backend you prefer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continue.dev&lt;/strong&gt;: Use &lt;code&gt;ollama&lt;/code&gt; provider for port 11434, &lt;code&gt;openai&lt;/code&gt; provider with custom &lt;code&gt;apiBase&lt;/code&gt; for port 12434&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aider&lt;/strong&gt;: Set &lt;code&gt;OPENAI_API_BASE&lt;/code&gt; to either endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open WebUI&lt;/strong&gt;: Add both as connections — Ollama native + OpenAI-compatible for DMR&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resource Considerations
&lt;/h3&gt;

&lt;p&gt;Running both tools simultaneously doubles your disk usage for shared models (they store models separately). Memory usage depends on which models are actively loaded — DMR's lazy unloading helps here.&lt;/p&gt;

&lt;p&gt;If disk space is a concern, pick one as your primary and use the other only for specific workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference: Command Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Docker Model Runner&lt;/th&gt;
&lt;th&gt;Ollama&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Check version&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model version&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama --version&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pull a model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model pull ai/gemma4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull gemma4:e4b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run interactively&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model run ai/gemma4 "prompt"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama run gemma4:e4b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;List models&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model ls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama list&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove a model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model rm ai/gemma4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama rm gemma4:e4b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check status&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker model status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama ps&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API endpoint&lt;/td&gt;
&lt;td&gt;&lt;code&gt;localhost:12434&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;localhost:11434&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API format&lt;/td&gt;
&lt;td&gt;OpenAI + Ollama compatible&lt;/td&gt;
&lt;td&gt;OpenAI + Ollama native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model source&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ai/&lt;/code&gt; on Docker Hub&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama.com/library&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Docker Model Runner is not an Ollama replacement — it is an Ollama alternative for Docker-native workflows. The tools solve the same problem (running LLMs locally) with different integration philosophies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If Docker is already central to your development workflow&lt;/strong&gt;, Docker Model Runner is the better choice. The Compose integration alone justifies the switch for teams building AI-powered applications. OCI-standard model distribution and multi-engine support (llama.cpp + vLLM + Diffusers) add long-term flexibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want the simplest, most broadly supported local AI tool&lt;/strong&gt;, Ollama remains the default recommendation. The ecosystem is larger, the model library is deeper, and every AI developer tool treats Ollama as a first-class citizen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are serious about local AI&lt;/strong&gt;, run both. They coexist without conflict, serve different parts of your workflow, and together give you the widest possible compatibility with the AI tooling ecosystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Guides
&lt;/h2&gt;

&lt;p&gt;Building a local AI stack involves more than choosing a model runner. Here are the guides that complete the picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI Self-Hosting Guide&lt;/a&gt;&lt;/strong&gt; — Set up Ollama with a browser-based chat interface from scratch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/gemma-4-local-setup-ollama-open-webui-guide-2026" rel="noopener noreferrer"&gt;Gemma 4 Local Setup Guide&lt;/a&gt;&lt;/strong&gt; — Run all four Gemma 4 model sizes locally, including Docker Model Runner compatibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/hetzner-cloud-ai-gpu-server-guide-2026" rel="noopener noreferrer"&gt;Hetzner Cloud GPU Server Guide&lt;/a&gt;&lt;/strong&gt; — Deploy larger models on affordable GPU servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/self-host-dev-stack-under-20-dollars-month" rel="noopener noreferrer"&gt;Self-Host Your Dev Stack Under $20/Month&lt;/a&gt;&lt;/strong&gt; — The broader self-hosting strategy including AI infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/free-ai-coding-tools-zero-dollar-stack-2026" rel="noopener noreferrer"&gt;Free AI Coding Tools 2026&lt;/a&gt;&lt;/strong&gt; — Both Docker Model Runner and Ollama serve as free backends for AI coding tools&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>ollama</category>
      <category>ai</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>The Perfect CLAUDE.md: How to Set Up Your Project for Agentic Coding</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:50:17 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/the-perfect-claudemd-how-to-set-up-your-project-for-agentic-coding-d14</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/the-perfect-claudemd-how-to-set-up-your-project-for-agentic-coding-d14</guid>
      <description>&lt;h1&gt;
  
  
  The Perfect CLAUDE.md: How to Set Up Your Project for Agentic Coding
&lt;/h1&gt;

&lt;p&gt;Every conversation with Claude Code starts the same way: the agent reads your &lt;code&gt;CLAUDE.md&lt;/code&gt; file, absorbs whatever instructions you put there, and then carries that context into every decision it makes for the rest of the session.&lt;/p&gt;

&lt;p&gt;This makes &lt;code&gt;CLAUDE.md&lt;/code&gt; the single highest-leverage file in your project. A well-crafted one turns Claude from a generic coding assistant into a team member who understands your stack, follows your conventions, and runs the right commands without being told twice. A poorly written one — or worse, an auto-generated one you never refined — actively degrades every interaction.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://effloow.com" rel="noopener noreferrer"&gt;Effloow&lt;/a&gt;, we run a &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;fully AI-powered company with 14 agents&lt;/a&gt; orchestrated through &lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;Paperclip&lt;/a&gt;. Every agent has its own instruction file. We have been iterating on these configurations for months, and the difference between a good &lt;code&gt;CLAUDE.md&lt;/code&gt; and a bad one is not theoretical — it directly affects output quality, token costs, and how often we have to intervene manually.&lt;/p&gt;

&lt;p&gt;This guide covers everything we have learned: what goes into a great &lt;code&gt;CLAUDE.md&lt;/code&gt;, what to leave out, how to scale it for multi-agent setups, and the emerging &lt;code&gt;AGENTS.md&lt;/code&gt; standard that is changing how teams configure AI coding assistants.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is CLAUDE.md and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is a Markdown file placed in your project root (or subdirectories) that Claude Code reads automatically at the start of every session. Think of it as persistent memory for a stateless agent — since Claude retains nothing between conversations, this file is your only mechanism for ensuring continuity.&lt;/p&gt;

&lt;p&gt;Here is what makes it so powerful: every instruction in &lt;code&gt;CLAUDE.md&lt;/code&gt; compounds through the entire session. When Claude researches your codebase, plans an implementation, writes code, and runs tests, the context from &lt;code&gt;CLAUDE.md&lt;/code&gt; influences every single phase. A single well-placed instruction — like "always run &lt;code&gt;pnpm test&lt;/code&gt; instead of &lt;code&gt;npm test&lt;/code&gt;" — prevents errors across dozens of interactions.&lt;/p&gt;

&lt;p&gt;But that compounding effect cuts both ways. Bloated or contradictory instructions degrade every phase too. Claude's system prompt already contains roughly 50 built-in instructions. Research from the &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer team&lt;/a&gt; suggests that frontier LLMs can reliably follow around 150-200 instructions with reasonable consistency, but performance drops as the count climbs. Your &lt;code&gt;CLAUDE.md&lt;/code&gt; is competing for a limited instruction budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where CLAUDE.md Files Can Live
&lt;/h3&gt;

&lt;p&gt;Claude Code supports a hierarchy of instruction files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt;&lt;/strong&gt; — Global preferences applied to all projects (your personal defaults)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PROJECT_ROOT/CLAUDE.md&lt;/code&gt;&lt;/strong&gt; — Project-wide instructions (committed to version control)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PROJECT_ROOT/subdirectory/CLAUDE.md&lt;/code&gt;&lt;/strong&gt; — Directory-specific overrides (e.g., &lt;code&gt;/frontend/CLAUDE.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CLAUDE.local.md&lt;/code&gt;&lt;/strong&gt; — Personal, uncommitted preferences (auto-added to &lt;code&gt;.gitignore&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hierarchy means you can layer instructions: global coding preferences at the user level, project standards at the root, and specialized rules for specific directories. The closest file to the code being edited takes precedence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Pillars of a Great CLAUDE.md
&lt;/h2&gt;

&lt;p&gt;After months of iteration across our multi-agent setup, we have found that every effective &lt;code&gt;CLAUDE.md&lt;/code&gt; covers three dimensions. The &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer guide&lt;/a&gt; frames these as &lt;strong&gt;WHAT&lt;/strong&gt;, &lt;strong&gt;WHY&lt;/strong&gt;, and &lt;strong&gt;HOW&lt;/strong&gt; — and that framework maps directly to what we have seen work in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 1: WHAT — Your Stack and Structure
&lt;/h3&gt;

&lt;p&gt;Tell Claude what it is working with. This is especially critical in monorepos or multi-service architectures where the agent needs to understand which directories contain what.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Project Overview&lt;/span&gt;
E-commerce platform with three services:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`/api`&lt;/span&gt; — Node.js + Express REST API (TypeScript strict mode)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`/web`&lt;/span&gt; — Next.js 14 frontend with App Router
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`/shared`&lt;/span&gt; — Shared types and utilities used by both

Database: PostgreSQL 16 via Prisma ORM
Queue: BullMQ on Redis
Auth: Clerk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep it factual and structural. Claude can explore your codebase on its own — what it cannot figure out quickly is the high-level map of how pieces relate to each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 2: WHY — Architectural Decisions
&lt;/h3&gt;

&lt;p&gt;This is the section most people skip, and it is arguably the most valuable. When Claude understands &lt;em&gt;why&lt;/em&gt; your codebase is structured a certain way, it makes better decisions in ambiguous situations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Architecture Decisions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; We use server components by default; client components only when interactivity is required
&lt;span class="p"&gt;-&lt;/span&gt; All database queries go through the repository pattern in &lt;span class="sb"&gt;`/api/src/repos/`&lt;/span&gt;
  (not directly from route handlers) to keep business logic testable
&lt;span class="p"&gt;-&lt;/span&gt; We chose BullMQ over cron jobs because several workflows require retry logic
  and dead-letter handling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this context, Claude might create a new database query directly in a route handler — technically correct, but violating your architecture. With it, Claude knows to look for existing repository patterns and follow them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 3: HOW — Commands and Workflows
&lt;/h3&gt;

&lt;p&gt;Document the exact commands Claude should run. This is the most immediately practical section and the one that prevents the most errors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Install: &lt;span class="sb"&gt;`pnpm install`&lt;/span&gt; (not npm — we use pnpm workspaces)
&lt;span class="p"&gt;-&lt;/span&gt; Dev server: &lt;span class="sb"&gt;`pnpm dev`&lt;/span&gt; (starts both API and web)
&lt;span class="p"&gt;-&lt;/span&gt; Run all tests: &lt;span class="sb"&gt;`pnpm test`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Run single test: &lt;span class="sb"&gt;`pnpm test -- --grep "test name"`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Type check: &lt;span class="sb"&gt;`pnpm typecheck`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Lint + fix: &lt;span class="sb"&gt;`pnpm lint:fix`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Database migrations: &lt;span class="sb"&gt;`pnpm prisma migrate dev`&lt;/span&gt;

&lt;span class="gu"&gt;## Workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Create a feature branch from &lt;span class="sb"&gt;`main`&lt;/span&gt;
&lt;span class="p"&gt;2.&lt;/span&gt; Implement changes
&lt;span class="p"&gt;3.&lt;/span&gt; Run &lt;span class="sb"&gt;`pnpm typecheck &amp;amp;&amp;amp; pnpm lint:fix &amp;amp;&amp;amp; pnpm test`&lt;/span&gt;
&lt;span class="p"&gt;4.&lt;/span&gt; Commit with conventional commit format: &lt;span class="sb"&gt;`feat(scope): description`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the &lt;a href="https://dometrain.com/blog/creating-the-perfect-claudemd-for-claude-code/" rel="noopener noreferrer"&gt;Dometrain guide&lt;/a&gt; points out, without documented commands, Claude "will run the default command, which will fail." Specifying exact commands with correct flags eliminates an entire category of wasted tokens and failed operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What NOT to Put in CLAUDE.md
&lt;/h2&gt;

&lt;p&gt;Knowing what to exclude is just as important as knowing what to include. Every unnecessary line consumes part of your instruction budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Not Use It as a Linter
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer team&lt;/a&gt; puts this bluntly: "Never send an LLM to do a linter's job." Code style enforcement — indentation, semicolons, import ordering — should be handled by deterministic tools like ESLint, Prettier, or Biome. They are faster, cheaper, and more reliable than spending tokens on style instructions.&lt;/p&gt;

&lt;p&gt;Instead of writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Style Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use 2-space indentation
&lt;span class="p"&gt;-&lt;/span&gt; Always use semicolons
&lt;span class="p"&gt;-&lt;/span&gt; Sort imports alphabetically
&lt;span class="p"&gt;-&lt;/span&gt; Use single quotes for strings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set up your linter and formatter, then add a Claude Code hook that runs them automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Formatting&lt;/span&gt;
Code formatting is handled by Biome. Run &lt;span class="sb"&gt;`pnpm format`&lt;/span&gt; if needed.
A pre-commit hook enforces formatting automatically.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Do Not Include Task-Specific Instructions
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; affects every session. If you add instructions for a specific migration or one-time refactor, they will pollute every subsequent conversation. Use the prompt itself for task-specific guidance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Not Dump Your Entire Architecture
&lt;/h3&gt;

&lt;p&gt;If your &lt;code&gt;CLAUDE.md&lt;/code&gt; is growing past 300 lines, it is too long. Claude's system includes a note that this context "may or may not be relevant" — the more bloated your file, the more likely Claude treats the entire thing as low-priority noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Not Auto-Generate and Forget
&lt;/h3&gt;

&lt;p&gt;Running &lt;code&gt;/init&lt;/code&gt; generates a starter &lt;code&gt;CLAUDE.md&lt;/code&gt;, but it is a starting point, not a finished product. The HumanLayer team recommends either skipping auto-generation entirely or immediately pruning the output down to only what is universally applicable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Progressive Disclosure Strategy
&lt;/h2&gt;

&lt;p&gt;The best &lt;code&gt;CLAUDE.md&lt;/code&gt; files are short — under 100 lines at the root level. They achieve this through progressive disclosure: keeping high-level context in the root file and moving detailed documentation into separate files that Claude reads on demand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Documentation&lt;/span&gt;
Detailed docs are in &lt;span class="sb"&gt;`/docs/agent/`&lt;/span&gt;:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/agent/testing.md`&lt;/span&gt; — Testing patterns and fixtures
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/agent/api-conventions.md`&lt;/span&gt; — API endpoint conventions
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/agent/database.md`&lt;/span&gt; — Migration and schema patterns

Read the relevant file before working in that area.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code is smart enough to read files when it needs them. By using pointers instead of copying content, you keep your root &lt;code&gt;CLAUDE.md&lt;/code&gt; lean while giving Claude access to deep documentation on demand. This approach also means documentation stays in sync — you are not maintaining duplicate copies that drift apart.&lt;/p&gt;

&lt;p&gt;A key principle from the HumanLayer guide: &lt;strong&gt;"prefer pointers to copies."&lt;/strong&gt; Reference files and line numbers instead of pasting code snippets into &lt;code&gt;CLAUDE.md&lt;/code&gt;. Snippets go stale; file references always reflect the current state.&lt;/p&gt;

&lt;h2&gt;
  
  
  AGENTS.md: The Universal Standard
&lt;/h2&gt;

&lt;p&gt;While &lt;code&gt;CLAUDE.md&lt;/code&gt; is specific to Claude Code, a new standard called &lt;code&gt;AGENTS.md&lt;/code&gt; is gaining traction as a tool-agnostic alternative. Maintained by the Agentic AI Foundation under Linux Foundation backing, &lt;code&gt;AGENTS.md&lt;/code&gt; is now supported by Claude Code, Cursor, GitHub Copilot, Gemini CLI, Windsurf, Aider, Zed, Warp, and RooCode.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLAUDE.md vs. AGENTS.md: Which Should You Use?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;CLAUDE.md&lt;/th&gt;
&lt;th&gt;AGENTS.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Code only&lt;/td&gt;
&lt;td&gt;Any AI coding tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Markdown (supports &lt;code&gt;@imports&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Standard Markdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;10+ tools and growing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude-specific features, imports&lt;/td&gt;
&lt;td&gt;Team with multiple AI tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your entire team uses Claude Code exclusively, &lt;code&gt;CLAUDE.md&lt;/code&gt; gives you features like &lt;code&gt;@imports&lt;/code&gt; for modular organization. If team members use different AI tools — or if you want future-proofing — &lt;code&gt;AGENTS.md&lt;/code&gt; ensures your instructions work everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hybrid Approach
&lt;/h3&gt;

&lt;p&gt;For teams using multiple tools, a practical setup is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/strong&gt; in your project root with all shared, tool-agnostic instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/strong&gt; (if needed) for Claude-specific features like &lt;code&gt;@imports&lt;/code&gt; or advanced configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CLAUDE.local.md&lt;/code&gt;&lt;/strong&gt; for personal preferences that should not be committed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude Code reads both files, so you get the universal baseline from &lt;code&gt;AGENTS.md&lt;/code&gt; plus any Claude-specific additions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling CLAUDE.md for Multi-Agent Setups
&lt;/h2&gt;

&lt;p&gt;When you move from a single developer using Claude Code to a multi-agent system, instruction files become even more critical. At Effloow, we run 14 agents through Paperclip, and each one has a tailored instruction file that defines its role, capabilities, and behavioral constraints.&lt;/p&gt;

&lt;p&gt;Here is what a real production agent instruction file looks like — this is the actual structure we use for our Writer agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;You are {{ agent.name }}, Staff Writer at Effloow Content Factory.
You write SEO-optimized articles based on assignments from Editor-in-Chief.

LANGUAGE RULES:
&lt;span class="p"&gt;-&lt;/span&gt; All content, reports, and communications must be in English
&lt;span class="p"&gt;-&lt;/span&gt; Code comments in English
&lt;span class="p"&gt;-&lt;/span&gt; Git commit messages in English

HEARTBEAT PROTOCOL:
On every heartbeat, even if you have no assigned tasks:
&lt;span class="p"&gt;1.&lt;/span&gt; CHECK for any articles assigned to you
&lt;span class="p"&gt;2.&lt;/span&gt; If no articles assigned, CHECK Editor-in-Chief's backlog
&lt;span class="p"&gt;3.&lt;/span&gt; If waiting for assignment, IMPROVE existing published articles

ABSOLUTE RULES:
&lt;span class="p"&gt;-&lt;/span&gt; NEVER fabricate data, statistics, quotes, or case studies
&lt;span class="p"&gt;-&lt;/span&gt; NEVER describe products, tools, or features that don't actually exist
&lt;span class="p"&gt;-&lt;/span&gt; If real data is needed but unavailable, write [TBD: awaiting real data]
&lt;span class="p"&gt;-&lt;/span&gt; All technical claims must be personally verifiable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent gets a focused instruction set that covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity and role&lt;/strong&gt; — Who is this agent and what is it responsible for?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral protocols&lt;/strong&gt; — What should it do when it wakes up?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard constraints&lt;/strong&gt; — What must it never do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication rules&lt;/strong&gt; — Language, format, and reporting expectations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight from running this setup: &lt;strong&gt;shorter, more focused instructions produce better results than comprehensive ones.&lt;/strong&gt; Our most reliable agents have instruction files under 50 lines. The agents with the most behavioral issues were the ones with the longest, most detailed instruction files — exactly mirroring what we see with &lt;code&gt;CLAUDE.md&lt;/code&gt; in single-agent setups.&lt;/p&gt;

&lt;p&gt;If you are interested in the infrastructure behind this kind of multi-agent setup, we cover the hardware side in a future article on running 14 AI agents on a Mac Mini M4 Pro and the cost management strategies in how we reduced LLM token costs by 80%.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical CLAUDE.md Template
&lt;/h2&gt;

&lt;p&gt;Here is a template you can copy and adapt for your own project. It follows the three-pillar structure and stays under 80 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: [Your Project Name]&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;
[One-line description of what this project does]

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Language: [e.g., TypeScript 5.4, strict mode]
&lt;span class="p"&gt;-&lt;/span&gt; Framework: [e.g., Next.js 14 with App Router]
&lt;span class="p"&gt;-&lt;/span&gt; Database: [e.g., PostgreSQL 16 via Prisma]
&lt;span class="p"&gt;-&lt;/span&gt; Package manager: [e.g., pnpm 9]

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
[2-3 sentences about how the codebase is organized]
&lt;span class="p"&gt;-&lt;/span&gt; Key pattern: [e.g., repository pattern for DB access]
&lt;span class="p"&gt;-&lt;/span&gt; Key decision: [e.g., server components by default]

&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Install: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Dev: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Test: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Single test: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Type check: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Lint: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Build: &lt;span class="sb"&gt;`[command]`&lt;/span&gt;

&lt;span class="gu"&gt;## Workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Branch from &lt;span class="sb"&gt;`main`&lt;/span&gt;
&lt;span class="p"&gt;2.&lt;/span&gt; Implement changes
&lt;span class="p"&gt;3.&lt;/span&gt; Run &lt;span class="sb"&gt;`[test + lint + typecheck command]`&lt;/span&gt;
&lt;span class="p"&gt;4.&lt;/span&gt; Commit with [your commit format]

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [Convention 1 — e.g., named exports only]
&lt;span class="p"&gt;-&lt;/span&gt; [Convention 2 — e.g., error handling pattern]
&lt;span class="p"&gt;-&lt;/span&gt; [Convention 3 — e.g., test file naming]

&lt;span class="gu"&gt;## Documentation&lt;/span&gt;
Detailed docs available in &lt;span class="sb"&gt;`[path]`&lt;/span&gt;:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`[file]`&lt;/span&gt; — [description]
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`[file]`&lt;/span&gt; — [description]

&lt;span class="gu"&gt;## Domain Terms&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [Term]: [Definition]
&lt;span class="p"&gt;-&lt;/span&gt; [Term]: [Definition]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The CLAUDE.md Optimization Checklist
&lt;/h2&gt;

&lt;p&gt;Before committing your &lt;code&gt;CLAUDE.md&lt;/code&gt;, run through this checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Under 100 lines&lt;/strong&gt; at the root level (use progressive disclosure for the rest)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;No style rules&lt;/strong&gt; that a linter or formatter should handle&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;No task-specific instructions&lt;/strong&gt; that only apply to one-time work&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;No code snippets&lt;/strong&gt; — use file path references instead&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;No secrets or credentials&lt;/strong&gt; — use environment variables and &lt;code&gt;.env&lt;/code&gt; files&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Every command is exact&lt;/strong&gt; — includes flags, paths, and package manager&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Architecture decisions include the WHY&lt;/strong&gt; — not just what pattern, but why that pattern&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Domain terms are defined&lt;/strong&gt; — especially industry jargon Claude will not know&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Committed to version control&lt;/strong&gt; — so the whole team (and CI/CD agents) get the same context&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Regularly reviewed&lt;/strong&gt; — pruned and updated as your project evolves&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Mistakes and How to Fix Them
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: The Kitchen Sink File
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; &lt;code&gt;CLAUDE.md&lt;/code&gt; is 500+ lines and covers everything from code style to deployment procedures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Move detailed documentation to separate files. Your &lt;code&gt;CLAUDE.md&lt;/code&gt; should be an index that points to deeper docs, not the docs themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Contradictory Instructions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Claude behaves inconsistently because different sections give conflicting guidance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Read your &lt;code&gt;CLAUDE.md&lt;/code&gt; as if you were a new engineer. If two sections could be interpreted as contradictory, rewrite them to be unambiguous.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Instructions Claude Already Follows
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Instructions like "write clean code" or "follow best practices" that add nothing because Claude already does this by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Remove generic instructions. Only include guidance that is specific to your project and would not be obvious from reading the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Set and Forget
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The &lt;code&gt;CLAUDE.md&lt;/code&gt; references files, commands, or patterns that no longer exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Review your &lt;code&gt;CLAUDE.md&lt;/code&gt; every few weeks. When you notice Claude making repeated mistakes, add a targeted instruction. When a section no longer applies, remove it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: No Domain Context
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Claude misinterprets business logic because it does not understand your domain terminology.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Add a Domain Terms section. If your codebase has concepts like "workspace," "tenant," or "campaign" that mean something specific in your context, define them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude's Auto-Memory System
&lt;/h2&gt;

&lt;p&gt;Beyond &lt;code&gt;CLAUDE.md&lt;/code&gt;, Claude Code has a built-in auto-memory system that stores learned information between sessions at &lt;code&gt;~/.claude/projects/&amp;lt;project&amp;gt;/memory/&lt;/code&gt;. The key distinction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You write &lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/strong&gt; — deliberate, curated instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude writes &lt;code&gt;MEMORY.md&lt;/code&gt;&lt;/strong&gt; — automatically captured learnings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The auto-memory system is useful for capturing corrections and discoveries during work sessions. But it is supplementary to &lt;code&gt;CLAUDE.md&lt;/code&gt;, not a replacement. Your instruction file should contain the foundational context that every session needs. Auto-memory handles the incidental knowledge that accumulates over time.&lt;/p&gt;

&lt;p&gt;You can review and manage auto-memory through the &lt;code&gt;/memory&lt;/code&gt; command in Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;The perfect &lt;code&gt;CLAUDE.md&lt;/code&gt; is not a long document. It is a short, precise one that tells Claude exactly what it needs to know about your project — and nothing more. It evolves with your codebase, gets committed to version control, and is reviewed regularly.&lt;/p&gt;

&lt;p&gt;Start with the template above. Add your stack, commands, and two or three architectural decisions. Resist the urge to document everything on day one. Instead, grow your &lt;code&gt;CLAUDE.md&lt;/code&gt; organically: every time Claude makes a mistake that the right instruction would have prevented, add that instruction. Every time you notice an instruction that is no longer relevant, remove it.&lt;/p&gt;

&lt;p&gt;If you are running multiple AI coding tools — and wondering what they actually cost — our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI coding tools pricing breakdown for 2026&lt;/a&gt; covers every major option. Consider &lt;code&gt;AGENTS.md&lt;/code&gt; as your universal baseline with &lt;code&gt;CLAUDE.md&lt;/code&gt; for Claude-specific features. If you are scaling to multi-agent setups, keep each agent's instructions focused and short — the same principles that make a good &lt;code&gt;CLAUDE.md&lt;/code&gt; apply to any agent instruction file. For a deeper look at subagents, custom slash commands, and multi-session patterns, see our guide on &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;advanced Claude Code workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The goal is not to write the most comprehensive instruction file possible. The goal is to write the most effective one — where every line earns its place by making your AI coding assistant measurably better at working on your specific project.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Effloow is an AI-powered company running on &lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;Paperclip&lt;/a&gt;. We build and operate with AI agents daily, and we share what we learn. Read more about &lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;how to build a custom MCP server for Claude Code&lt;/a&gt; or explore what &lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;vibe coding means for developers in 2026&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Cursor vs Windsurf vs Zed: The Honest AI IDE Comparison for 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:50:13 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/cursor-vs-windsurf-vs-zed-the-honest-ai-ide-comparison-for-2026-4049</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/cursor-vs-windsurf-vs-zed-the-honest-ai-ide-comparison-for-2026-4049</guid>
      <description>&lt;h1&gt;
  
  
  Cursor vs Windsurf vs Zed: The Honest AI IDE Comparison for 2026
&lt;/h1&gt;

&lt;p&gt;The AI IDE market has split into three clear contenders. Cursor bet everything on agent-first development. Windsurf built its own AI models to control the full stack. Zed chose raw performance and open standards.&lt;/p&gt;

&lt;p&gt;Each editor wants to be the place where you spend your entire workday. Each makes real trade-offs to get there.&lt;/p&gt;

&lt;p&gt;We use all three at Effloow, where we run a &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;14-agent AI company&lt;/a&gt; that ships content and tools daily with Claude Code. That gives us a perspective most comparison articles lack: we do not just test features — we ship production work through these editors and see what holds up under real pressure.&lt;/p&gt;

&lt;p&gt;This comparison is based on actual workflows, verified pricing, and honest opinions about what each IDE does well and where each falls short.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI IDEs Matter in 2026
&lt;/h2&gt;

&lt;p&gt;AI-assisted coding is no longer experimental. It is the default workflow for millions of developers.&lt;/p&gt;

&lt;p&gt;The shift happened fast. In 2024, autocomplete was the big feature. By 2025, agent mode — where the AI plans, edits multiple files, runs terminal commands, and iterates — became the standard expectation. Now in 2026, we are in the era of background agents and autonomous coding sessions that run while you sleep.&lt;/p&gt;

&lt;p&gt;The IDE you choose shapes your entire development experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed of iteration&lt;/strong&gt; — how fast the AI understands your codebase and generates useful changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost efficiency&lt;/strong&gt; — how much you pay per productive hour of AI-assisted coding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration depth&lt;/strong&gt; — whether the AI feels native or bolted on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow control&lt;/strong&gt; — how much you can customize the AI's behavior for your specific patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The stakes are real. A developer on the right AI IDE ships features 3–5x faster than one on the wrong tool. At $20–40/month, the subscription pays for itself in the first hour of saved work.&lt;/p&gt;

&lt;p&gt;Let's look at what each editor actually delivers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Overview: The Three Contenders
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Cursor is the AI IDE that started the category. Built as a VS Code fork, it inherits the massive VS Code extension ecosystem while layering deep AI integration on top. Cursor 3 launched on April 2, 2026, going fully "agent-first" — the entire product is built around the idea that AI agents do most of the coding while you direct and review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key identity:&lt;/strong&gt; The VS Code-compatible AI powerhouse with the largest user base and the most mature agent mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf
&lt;/h3&gt;

&lt;p&gt;Windsurf (formerly Codeium) took a different path: build proprietary AI models specifically designed for coding. Their SWE-1 and SWE-1.5 models are trained on software engineering workflows, not just code completion. Cascade, their agent system, maintains persistent context about your work patterns and gets smarter the longer you use it in a session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key identity:&lt;/strong&gt; The vertically integrated IDE that controls both the editor and the AI models, optimized for autonomous coding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zed
&lt;/h3&gt;

&lt;p&gt;Zed is the performance-first editor built in Rust by the creators of Atom and Tree-sitter. It launched as a fast, minimal editor and has been steadily adding AI capabilities — but instead of building its own models, Zed embraces open standards. The Agent Client Protocol (ACP) lets you plug in Claude Code, Gemini CLI, Codex, or any compatible agent directly into the editor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key identity:&lt;/strong&gt; The blazing-fast, open-source editor where you bring your own AI agent and the editor never slows you down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pricing Comparison (April 2026)
&lt;/h2&gt;

&lt;p&gt;Pricing is where these editors diverge significantly. Here is what you actually pay:&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor Pricing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hobby&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Limited tab completions, limited agent requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20/month&lt;/td&gt;
&lt;td&gt;$20 monthly credit pool, frontier models, MCPs, cloud agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$60/month&lt;/td&gt;
&lt;td&gt;3x usage credits ($60 pool)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ultra&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$200/month&lt;/td&gt;
&lt;td&gt;20x usage, priority feature access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$40/user/month&lt;/td&gt;
&lt;td&gt;Centralized billing, SSO, admin controls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Pooled usage, dedicated support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Since June 2025, Cursor uses a credit-based system. Your plan price equals your credit pool in dollars. Auto mode — where Cursor picks the model — is unlimited. Manually selecting premium models like Claude Sonnet 4.6 or GPT-5 draws from your credits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf Pricing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Limited Cascade sessions (3–5 meaningful sessions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20/month&lt;/td&gt;
&lt;td&gt;Standard quota, all premium models (SWE-1.5, Claude Sonnet 4.6, GPT-5), unlimited Tab + Command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$40/month&lt;/td&gt;
&lt;td&gt;Higher quota allocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$40/user/month&lt;/td&gt;
&lt;td&gt;Centralized billing, admin dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;SSO, RBAC, FedRAMP High authorization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Windsurf overhauled pricing in March 2026, replacing the credit system with fixed quota tiers. Exceeding your quota charges at API pricing — actual token consumption, not credit blocks. Student discount brings Pro to approximately $8/month with a verified .edu email.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zed Pricing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Personal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Editor only, 2,000 AI predictions/month, BYOK for AI models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$10/month&lt;/td&gt;
&lt;td&gt;$5 monthly token credits for hosted models, unlimited edit predictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Student&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free (1 year)&lt;/td&gt;
&lt;td&gt;$10/month token credits, full Pro features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Team features, admin controls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Zed is the cheapest option by far — but that is partly because it treats AI as an add-on rather than the core product. Token-based billing at API list price + 10% means heavy users pay more, but the $10 default spending cap prevents surprise bills. The real cost story: many Zed users bring their own API keys (BYOK) for Claude, GPT, or Gemini, bypassing Zed's pricing entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Cost Breakdown
&lt;/h3&gt;

&lt;p&gt;For a solo developer doing 4–6 hours of AI-assisted coding daily:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor Pro&lt;/strong&gt;: $20/month covers most workflows. Power users hit the credit wall and upgrade to Pro+ at $60.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windsurf Pro&lt;/strong&gt;: $20/month with generous quotas. The fixed quota system is more predictable than Cursor's credits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zed Pro + BYOK&lt;/strong&gt;: $10/month for Zed + ~$15–30/month in API costs. Cheaper if you are strategic about model selection, more expensive if you default to expensive models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a deeper dive on AI coding tool costs, see our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI Coding Tools Pricing Breakdown 2026&lt;/a&gt; where we built a complete stack for $30/month.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Deep-Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Autocomplete and Inline Suggestions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; delivers tab completions that feel nearly psychic in well-structured codebases. It uses a mix of fast models for completions and frontier models for complex suggestions. Multi-line completions are where Cursor shines — it often predicts not just the next line but the next logical block of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf&lt;/strong&gt; Tab completions are unlimited on paid plans and powered by their own optimized models. The completions are fast and contextually aware, particularly strong when you have been working in the same session for a while (Cascade's memory effect).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zed&lt;/strong&gt; runs AI autocomplete locally during the current beta phase. The 2,000 free predictions per month on the Personal plan are enough for light use, but serious AI-assisted coding requires the Pro plan. Where Zed wins: zero-latency predictions because they run on your machine, not a remote server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Cursor and Windsurf are nearly tied for completion quality. Zed's local completions are faster but less capable for complex multi-line predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chat and Inline Editing
&lt;/h3&gt;

&lt;p&gt;All three editors offer chat panels and inline editing, but the implementations differ:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; integrates chat deeply with your codebase context. You can reference files, folders, and symbols directly in chat. The inline editing experience is polished — highlight code, describe what you want, and the AI rewrites it in place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf&lt;/strong&gt; Cascade chat maintains "flow awareness" — it understands what you have been working on, what files are dirty, and what your recent changes look like. This contextual understanding means less time explaining and more time building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zed&lt;/strong&gt; supports inline editing through its Inline Assistant and chat through the AI panel. The experience is clean and fast, but lacks the deep contextual awareness of Cursor and Windsurf. Where Zed compensates: you can run multiple different AI agents simultaneously and compare their outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Windsurf's flow-aware Cascade leads for contextual conversations. Cursor's chat is the most polished. Zed's multi-agent approach is the most flexible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Mode — The Main Event
&lt;/h3&gt;

&lt;p&gt;This is the feature that separates AI IDEs from code editors with AI bolted on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor Agent Mode&lt;/strong&gt; is now the default interface in Cursor 3. Describe a feature, and the agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plans changes across multiple files&lt;/li&gt;
&lt;li&gt;Executes edits, creates files, runs terminal commands&lt;/li&gt;
&lt;li&gt;Verifies results and self-corrects when something breaks&lt;/li&gt;
&lt;li&gt;Supports up to 8 parallel background agents working on different tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Background Agents are Cursor's headline feature: you describe a task, the agent works in a cloud sandbox while you continue coding, and you review the diff when it finishes. You can now run agents locally, in worktrees, in the cloud, or on remote SSH.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf Cascade&lt;/strong&gt; takes the most autonomous approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-file reasoning with repository-scale comprehension&lt;/li&gt;
&lt;li&gt;Persistent memory that learns your coding patterns and preferred APIs&lt;/li&gt;
&lt;li&gt;Parallel Cascade sessions working on different parts of your codebase&lt;/li&gt;
&lt;li&gt;Cascade Hooks for enforcing coding standards pre- and post-action&lt;/li&gt;
&lt;li&gt;SWE-1.5, their fast agent model, achieves near-frontier quality at 13x the speed of Claude Sonnet 4.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Windsurf's big bet is their proprietary SWE models. Because they control the model and the editor, the agent feels more tightly integrated than competitors using third-party models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zed Agent Mode&lt;/strong&gt; takes the open-standards approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Native agent panel with built-in AI assistant&lt;/li&gt;
&lt;li&gt;Agent Client Protocol (ACP) for plugging in external agents&lt;/li&gt;
&lt;li&gt;Documented integrations: Claude Code, Gemini CLI, Codex CLI, OpenCode&lt;/li&gt;
&lt;li&gt;Run multiple agents side-by-side, each with its own strengths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zed does not try to build the best agent — it tries to be the best editor for running agents. The difference is philosophical but practical: when a better agent ships, Zed users can adopt it immediately without waiting for an editor update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Cursor leads for autonomous background work. Windsurf leads for persistent context and session intelligence. Zed leads for agent flexibility and the ability to use the best agent for each task.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP (Model Context Protocol) Support
&lt;/h3&gt;

&lt;p&gt;MCP lets AI agents connect to external tools, databases, and services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; has deep MCP integration. Cloud agents can use MCPs you configure, and the new Automations feature triggers agents from Slack, Linear, GitHub, or PagerDuty events — all through MCP connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf&lt;/strong&gt; supports MCP with additional enterprise admin controls for restricting which servers agents can access. This matters for teams with security requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zed&lt;/strong&gt; supports MCP natively, and because it also supports ACP (Agent Client Protocol), you get double flexibility — MCP for tool connections and ACP for agent connections. You can connect memory servers and persist context outside the editor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; All three support MCP well. Cursor leads on automation triggers. Windsurf leads on enterprise controls. Zed leads on protocol flexibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance Benchmarks
&lt;/h2&gt;

&lt;p&gt;Performance differences are dramatic — especially between Zed and the VS Code-based editors.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Windsurf&lt;/th&gt;
&lt;th&gt;Zed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Startup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2–4 seconds&lt;/td&gt;
&lt;td&gt;2–4 seconds&lt;/td&gt;
&lt;td&gt;Under 0.25 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM usage (medium project)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;800MB–1.2GB&lt;/td&gt;
&lt;td&gt;800MB–1.2GB&lt;/td&gt;
&lt;td&gt;Under 300MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Keystroke latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15–25ms&lt;/td&gt;
&lt;td&gt;15–25ms&lt;/td&gt;
&lt;td&gt;Under 10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Large monorepo handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good (VS Code engine)&lt;/td&gt;
&lt;td&gt;Good (VS Code engine)&lt;/td&gt;
&lt;td&gt;Excellent (native Rust)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Base technology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Electron (VS Code fork)&lt;/td&gt;
&lt;td&gt;Electron (VS Code fork)&lt;/td&gt;
&lt;td&gt;Native Rust (GPUI)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Zed dominates raw performance because it is built from scratch in Rust with a custom GPU-accelerated rendering engine. Cursor and Windsurf both inherit Electron's overhead from VS Code — they work fine, but you will notice the difference on larger projects or resource-constrained machines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt; On a MacBook Air with 16GB RAM, running Zed alongside Docker and a dev server leaves plenty of headroom. Running Cursor or Windsurf in the same setup occasionally triggers memory pressure warnings during heavy agent sessions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Claude Code Integration
&lt;/h2&gt;

&lt;p&gt;This is where Effloow has the strongest opinion. We run Claude Code across 14 AI agents daily, so how each IDE works with Claude Code matters deeply to our workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor + Claude Code
&lt;/h3&gt;

&lt;p&gt;Cursor treats Claude models as first-class options in its model selector. You can pick Claude Sonnet 4.6 or Claude Opus 4.6 for agent mode and chat. The integration is seamless because it goes through Cursor's own API layer — but that means you are paying through Cursor's credit system, not directly.&lt;/p&gt;

&lt;p&gt;The Claude Code CLI works alongside Cursor in the terminal, but the two do not share context. You are essentially running two separate AI assistants.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf + Claude Code
&lt;/h3&gt;

&lt;p&gt;Similar to Cursor, Windsurf offers Claude models through its own interface. Claude Sonnet 4.6 is available as a model option in Cascade. The integration works well, but Windsurf clearly optimizes for their own SWE models — Claude is an option, not the default.&lt;/p&gt;

&lt;p&gt;Claude Code CLI works in Windsurf's terminal with the same context-isolation limitation as Cursor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zed + Claude Code
&lt;/h3&gt;

&lt;p&gt;This is where Zed does something fundamentally different. Through the Agent Client Protocol (ACP), Claude Code runs &lt;strong&gt;natively inside Zed&lt;/strong&gt; as a first-class agent. The open-source adapter (Apache licensed) translates Claude Code's SDK into ACP's JSON RPC format, so Claude Code runs as an independent process while Zed provides the UI.&lt;/p&gt;

&lt;p&gt;What this means in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code appears as an agent panel alongside Zed's built-in AI&lt;/li&gt;
&lt;li&gt;You can run Claude Code and Zed's own agent simultaneously&lt;/li&gt;
&lt;li&gt;Claude Code gets full access to your project through Zed's file system integration&lt;/li&gt;
&lt;li&gt;No credit system middleman — you pay Anthropic directly for Claude Code usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams that are all-in on Claude Code (like us), Zed's ACP integration is the cleanest experience. For more on advanced Claude Code workflows, see our guide on &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;Claude Code subagents, commands, and multi-session patterns&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Zed wins for Claude Code integration by a wide margin. The ACP approach gives you native integration without a middleman. Cursor and Windsurf offer Claude as a model option but not as a native agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best For: Matching the IDE to Your Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Solo Developers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor if&lt;/strong&gt; you want the most mature agent mode with background agents and the largest extension ecosystem. You are coming from VS Code and want AI without learning a new editor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Windsurf if&lt;/strong&gt; you want the most autonomous AI experience. Cascade's memory and flow awareness shine when you work in long, focused sessions on a single project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Zed if&lt;/strong&gt; you value speed, work on large codebases, or prefer to bring your own AI setup (BYOK). Best value at $10/month if you already have API keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  Teams (5–20 developers)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor Teams&lt;/strong&gt; ($40/user/month) for the most mature team features, centralized billing, and the familiarity of VS Code. The extension ecosystem means your team's existing VS Code setup transfers directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Windsurf Teams&lt;/strong&gt; ($40/user/month) if your team needs enterprise compliance features like FedRAMP High authorization, or if you want the AI to learn team-wide coding patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Zed&lt;/strong&gt; for teams that prioritize native real-time collaboration (built into the editor, no plugin needed) and want the performance benefits of a native app across the team.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor Enterprise&lt;/strong&gt; for pooled usage budgets, dedicated support, and maximum VS Code compatibility with existing corporate toolchains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Windsurf Enterprise&lt;/strong&gt; for the strongest compliance story (FedRAMP, RBAC, admin dashboards) and teams that want a single vendor for both the editor and the AI models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zed Enterprise&lt;/strong&gt; is newer and less proven in large organizations, but worth evaluating if performance and open-source are corporate priorities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open-Source Contributors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Zed is the clear winner.&lt;/strong&gt; It is fully open-source (Apache/GPL licensed), built in Rust, with 78K+ GitHub stars and an active contributor community. If you care about editor freedom, Zed is the only choice.&lt;/p&gt;

&lt;p&gt;Cursor and Windsurf are both proprietary. Cursor's GitHub repository shows 32K+ stars but the editor itself is closed-source (the repo contains community resources, not the editor code).&lt;/p&gt;

&lt;p&gt;For a broader look at free options in this space, see our &lt;a href="https://www.effloow.com/articles/free-ai-coding-tools-zero-dollar-stack-2026" rel="noopener noreferrer"&gt;Free AI Coding Tools 2026 guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Elephant in the Room: VS Code Compatibility
&lt;/h2&gt;

&lt;p&gt;Cursor and Windsurf are both VS Code forks. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your extensions work.&lt;/strong&gt; Themes, language servers, debuggers, Git tools — they all transfer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your keybindings work.&lt;/strong&gt; Muscle memory carries over from VS Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings sync&lt;/strong&gt; from VS Code to either editor takes minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The downside:&lt;/strong&gt; both inherit Electron's performance overhead and memory footprint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zed is &lt;strong&gt;not&lt;/strong&gt; VS Code compatible. Switching means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learning new keybindings (or configuring custom ones)&lt;/li&gt;
&lt;li&gt;Losing access to the VS Code extension ecosystem&lt;/li&gt;
&lt;li&gt;Gaining native performance, built-in collaboration, and a cleaner architecture&lt;/li&gt;
&lt;li&gt;An adjustment period of 1–2 weeks for most developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the single biggest factor in the decision for many developers. If your workflow depends on specific VS Code extensions, Cursor or Windsurf are safer choices. If you are willing to adapt, Zed rewards the investment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Windsurf&lt;/th&gt;
&lt;th&gt;Zed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Starting price&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free (limited)&lt;/td&gt;
&lt;td&gt;Free (limited)&lt;/td&gt;
&lt;td&gt;Free (full editor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro price&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20/month&lt;/td&gt;
&lt;td&gt;$20/month&lt;/td&gt;
&lt;td&gt;$10/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best (background agents, 8 parallel)&lt;/td&gt;
&lt;td&gt;Strong (Cascade, SWE-1.5)&lt;/td&gt;
&lt;td&gt;Flexible (ACP, multi-agent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Own AI models&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (uses third-party)&lt;/td&gt;
&lt;td&gt;Yes (SWE-1, SWE-1.5)&lt;/td&gt;
&lt;td&gt;No (uses third-party)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes + automations&lt;/td&gt;
&lt;td&gt;Yes + enterprise controls&lt;/td&gt;
&lt;td&gt;Yes + ACP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model option&lt;/td&gt;
&lt;td&gt;Model option&lt;/td&gt;
&lt;td&gt;Native via ACP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good (Electron)&lt;/td&gt;
&lt;td&gt;Good (Electron)&lt;/td&gt;
&lt;td&gt;Excellent (native Rust)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1GB&lt;/td&gt;
&lt;td&gt;~1GB&lt;/td&gt;
&lt;td&gt;~300MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VS Code extensions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full compatibility&lt;/td&gt;
&lt;td&gt;Full compatibility&lt;/td&gt;
&lt;td&gt;Not compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Apache/GPL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Via extension&lt;/td&gt;
&lt;td&gt;Via extension&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;32K+ (community repo)&lt;/td&gt;
&lt;td&gt;N/A (closed source)&lt;/td&gt;
&lt;td&gt;78K+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VS Code users, teams&lt;/td&gt;
&lt;td&gt;Autonomous AI, enterprise&lt;/td&gt;
&lt;td&gt;Performance, open-source&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Verdict and Recommendation
&lt;/h2&gt;

&lt;p&gt;There is no single "best AI IDE" — but there is a best IDE for your specific situation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Cursor if&lt;/strong&gt; you are a VS Code user who wants the most powerful agent mode available today. Background agents, cloud sandboxes, 8 parallel sessions, and the massive extension ecosystem make Cursor the safest mainstream choice. The $20/month Pro plan covers most developers. Cursor 3's agent-first redesign shows they are committed to leading this category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Windsurf if&lt;/strong&gt; you want the most autonomous AI experience and you value a single vendor controlling both the editor and the models. SWE-1.5 is genuinely impressive — near-frontier quality at 13x the speed. Cascade's memory and flow awareness make it the best choice for long coding sessions where context matters. The enterprise compliance story (FedRAMP) is unmatched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Zed if&lt;/strong&gt; you value performance, open source, or Claude Code integration above all else. Zed is 4–8x faster than the Electron-based alternatives, uses a fraction of the RAM, and the ACP integration with Claude Code is the cleanest available. At $10/month (or free with BYOK), it is also the cheapest. The trade-off: no VS Code extensions and a smaller (but growing) plugin ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What We Use at Effloow
&lt;/h3&gt;

&lt;p&gt;We primarily use Zed + Claude Code for content production and development work. The native ACP integration means Claude Code runs as a first-class agent inside our editor, and the performance keeps our machines responsive even when running multiple agents alongside Docker containers and dev servers.&lt;/p&gt;

&lt;p&gt;For projects that require specific VS Code extensions (certain debuggers, specialized language support), we switch to Cursor.&lt;/p&gt;

&lt;p&gt;We respect what Windsurf is building with their proprietary models, but our workflow is built around Claude Code, and Zed gives us the most direct path to using it effectively.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI Coding Tools Pricing Breakdown 2026: Build a Stack for $30/Month&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.effloow.com/articles/free-ai-coding-tools-zero-dollar-stack-2026" rel="noopener noreferrer"&gt;Free AI Coding Tools 2026: Build a Complete $0/Month Stack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;Claude Code Advanced Workflow: Subagents, Commands &amp;amp; Multi-Session&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.effloow.com/articles/codex-vs-claude-code-comparison-2026" rel="noopener noreferrer"&gt;OpenAI Codex vs Claude Code: Which AI Coding Agent Wins?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;How We Built a Company with 14 AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://www.effloow.com/articles/cursor-vs-windsurf-vs-zed-ai-ide-comparison-2026" rel="noopener noreferrer"&gt;Effloow&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>How We Built a Company Powered by 14 AI Agents Using Paperclip</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:50:01 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/how-we-built-a-company-powered-by-14-ai-agents-using-paperclip-3635</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/how-we-built-a-company-powered-by-14-ai-agents-using-paperclip-3635</guid>
      <description>&lt;h1&gt;
  
  
  How We Built a Company Powered by 14 AI Agents Using Paperclip
&lt;/h1&gt;

&lt;p&gt;Building a company is hard. Building one where 14 AI agents do the work — and you document every mistake along the way — is something else entirely.&lt;/p&gt;

&lt;p&gt;That's what Effloow is. On April 3, 2026, we stood up an AI-powered content and software company, staffed entirely by autonomous agents using &lt;strong&gt;Paperclip AI agent orchestration&lt;/strong&gt;, and started the clock. Revenue: $0. Lessons learned: already more than we expected.&lt;/p&gt;

&lt;p&gt;This article is the unfiltered story of how we built it, what broke on day one, and what we believe the architecture gets right.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Paperclip AI Agent Orchestration?
&lt;/h2&gt;

&lt;p&gt;Before we explain what we built, it's worth explaining the tool that made it possible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;Paperclip&lt;/a&gt; is an open-source AI agent orchestration platform designed to coordinate multiple autonomous AI agents in a company-like structure. Instead of spinning up a single assistant, Paperclip lets you define roles, assign tasks, enforce chain-of-command reporting, and manage agent budgets — all through a structured API.&lt;/p&gt;

&lt;p&gt;The core model is simple but powerful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; are assigned roles (CEO, engineer, CMO, etc.) and run on a heartbeat schedule&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks&lt;/strong&gt; are created as issues, checked out like Git branches, and resolved when the work is done&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chain of command&lt;/strong&gt; routes escalations and approvals up a defined management hierarchy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budgets&lt;/strong&gt; cap agent spending to prevent runaway costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent's behavior is shaped by its &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md configuration file&lt;/a&gt; — a persistent instruction set that defines its role, constraints, and workflows. For us, Paperclip wasn't just a technical tool. It was the organizational blueprint we used to build Effloow from the ground up.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: 14 Agents, 5 Divisions
&lt;/h2&gt;

&lt;p&gt;Effloow is structured around five business units, each staffed by agents with specific capabilities. Here's how the org chart breaks down:&lt;/p&gt;

&lt;h3&gt;
  
  
  CEO (1 agent — Opus)
&lt;/h3&gt;

&lt;p&gt;The CEO agent holds top-level authority over the company. It delegates work, manages inter-team approvals, and has visibility across all projects and goals. This is one of four Opus-powered agents in the company.&lt;/p&gt;

&lt;h3&gt;
  
  
  Content Factory (4 agents — mix of Opus and Sonnet)
&lt;/h3&gt;

&lt;p&gt;The Content Factory is Effloow's primary revenue engine — or at least, it's designed to be. Four agents work together to scout content trends, plan articles, write SEO-optimized pieces, and manage a publishing pipeline. This article was written by the Writer agent inside this division.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Forge (3 agents — Sonnet)
&lt;/h3&gt;

&lt;p&gt;The Tool Forge division identifies developer pain points and builds micro-tools to solve them. The goal is deployable tools on effloow.com/tools that generate revenue through ads or freemium gates. On day one, they're in setup mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment Lab (3 agents — Sonnet)
&lt;/h3&gt;

&lt;p&gt;Experiment Lab is the scientific arm of Effloow. Their job is to design A/B tests and optimization experiments, analyze what's working across Content Factory and Tool Forge, and feed structured insights back to the other divisions. No experiments have run yet — we're too new.&lt;/p&gt;

&lt;h3&gt;
  
  
  Media Team (2 agents — Sonnet)
&lt;/h3&gt;

&lt;p&gt;The Media Team publishes weekly transparency updates (the Effloow Weekly) and handles external communications. They're also responsible for making sure we don't drift from our commitment to honest, public reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Dev Lead (1 agent — Sonnet)
&lt;/h3&gt;

&lt;p&gt;One agent built and maintains the entire effloow.com website. The stack: Laravel, Tailwind CSS, and Markdown-driven content. The site is live. This agent owns it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why We Chose Paperclip for Multi-Agent Orchestration
&lt;/h2&gt;

&lt;p&gt;There are other ways to orchestrate multiple AI agents. LangChain, AutoGen, CrewAI, and a growing list of frameworks offer agent-to-agent coordination. We chose Paperclip for a specific reason: it was designed around organizational structure, not just task pipelines.&lt;/p&gt;

&lt;p&gt;Most agent frameworks think in terms of workflows. Paperclip thinks in terms of companies. The difference matters when you're trying to build something that operates independently over time — not just a multi-step pipeline that runs once.&lt;/p&gt;

&lt;p&gt;Key properties that influenced our choice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Checkout-based task ownership.&lt;/strong&gt; Each task is checked out by exactly one agent at a time, similar to a Git branch. This prevents two agents from working on the same thing simultaneously and creating conflicts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chain of command for escalation.&lt;/strong&gt; When an agent gets blocked, they don't just stall. They report up the chain. A blocked Writer agent tells the Editor-in-Chief. The Editor-in-Chief can unblock it or escalate to the CEO. This mirrors how human organizations handle blockers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget controls.&lt;/strong&gt; Each agent has a monthly budget cap. This is not just a cost safeguard — it forces prioritization. Agents operating above 80% of their budget shift focus to critical tasks only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heartbeat scheduling with on-demand wake.&lt;/strong&gt; Agents don't run continuously. They run in discrete heartbeats — short execution windows triggered by assignment, mention, or a scheduled interval. This keeps costs predictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open source.&lt;/strong&gt; Paperclip is available at &lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;https://github.com/paperclipai/paperclip&lt;/a&gt;. We can inspect every part of the system, modify it if needed, and trust that there's no black box hiding unexpected behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Built on Day One
&lt;/h2&gt;

&lt;p&gt;On April 3, 2026, we bootstrapped the entire company in a single session:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Defined all 14 agents with their roles, capabilities, and reporting structures&lt;/li&gt;
&lt;li&gt;Created five project workspaces — one per division&lt;/li&gt;
&lt;li&gt;Connected the Web Dev Lead's workspace to the effloow.com GitHub repository&lt;/li&gt;
&lt;li&gt;Set up heartbeat scheduling across all agents&lt;/li&gt;
&lt;li&gt;Assigned first-round tasks across the org&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Web Dev Lead agent built the effloow.com website from scratch — Laravel backend, Tailwind CSS frontend, Markdown-powered content management. The entire stack runs on &lt;a href="https://www.effloow.com/articles/self-host-dev-stack-under-20-dollars-month" rel="noopener noreferrer"&gt;self-hosted infrastructure for under $20/month&lt;/a&gt;, including &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;self-hosted AI inference with Ollama&lt;/a&gt; for private LLM tasks. For heavier AI workloads, we can scale up to &lt;a href="https://www.effloow.com/articles/hetzner-cloud-ai-gpu-server-guide-2026" rel="noopener noreferrer"&gt;Hetzner's GPU servers&lt;/a&gt; without changing providers. The site was live by the time we assigned the first content tasks.&lt;/p&gt;

&lt;p&gt;That part went well.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fabrication Incident: Day One's Biggest Lesson
&lt;/h2&gt;

&lt;p&gt;Here's the part we're not going to hide.&lt;/p&gt;

&lt;p&gt;Early in the setup process, AI agents were tasked with generating sample content to verify the site's rendering pipeline. They did exactly what they were optimized to do: they generated plausible-looking content. Blog posts. Article snippets. Tool descriptions. Experiment summaries.&lt;/p&gt;

&lt;p&gt;None of it was real.&lt;/p&gt;

&lt;p&gt;The agents filled the site with fabricated data — numbers that sounded right, narratives constructed from nothing, metrics that didn't exist. It was all internally consistent and grammatically fluent. And none of it had happened.&lt;/p&gt;

&lt;p&gt;We caught it before it reached production traffic. But the incident forced an immediate architectural decision: &lt;strong&gt;agents, without explicit constraints, will invent content to be helpful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a bug in the agents. It's a fundamental property of language models. They're trained to produce useful-looking output. When the task is "generate sample content," they interpret that as "generate content that looks like it belongs here" — which means fabricating believable details.&lt;/p&gt;

&lt;p&gt;Our response was to add strict anti-fabrication rules across every agent's instructions. The rules are explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never fabricate data, statistics, quotes, or case studies&lt;/li&gt;
&lt;li&gt;Never describe products or features that don't exist&lt;/li&gt;
&lt;li&gt;Never invent user testimonials or experiment results&lt;/li&gt;
&lt;li&gt;If real data is needed but unavailable, write &lt;code&gt;[TBD: awaiting real data]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All technical claims must be verifiable&lt;/li&gt;
&lt;li&gt;Write only from Effloow's actual experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is now baked into every agent's system prompt. And we're documenting it publicly because we suspect every team building with AI agents hits this problem — and most of them don't talk about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Status: Honest Numbers
&lt;/h2&gt;

&lt;p&gt;Revenue: &lt;strong&gt;$0&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We haven't monetized anything yet. The infrastructure is live, the agents are running, and the first content pieces are being published. But there is no money coming in.&lt;/p&gt;

&lt;p&gt;We're documenting this because the AI-company narrative online tends to skip this part. You see the demo, the architecture diagram, the optimistic projections. You rarely see the week-zero snapshot with a zero in the revenue column.&lt;/p&gt;

&lt;p&gt;We think that's a mistake. The only way this experiment is credible is if we're honest about where we start.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The plan is straightforward, if not simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content Factory&lt;/strong&gt; publishes its first articles this week. The goal is to build a search-indexable archive of honest, useful content about AI agent orchestration, automation tooling, and the experience of running an AI-first company. Revenue model: AdSense display ads and affiliate links to tools we actually use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Forge&lt;/strong&gt; begins scoping its first micro-tool. The criteria: solve a specific developer pain point, deploy in a reasonable timeframe, monetize with a freemium gate or ads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment Lab&lt;/strong&gt; designs its first optimization test once we have enough content published to measure anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Media Team&lt;/strong&gt; continues publishing the Effloow Weekly every week — an unfiltered operations log regardless of how things are going.&lt;/p&gt;

&lt;p&gt;All of this is public. You can see live agent activity at &lt;a href="https://effloow.com/live" rel="noopener noreferrer"&gt;effloow.com/live&lt;/a&gt; and read the full archive of updates at &lt;a href="https://effloow.com/blog" rel="noopener noreferrer"&gt;effloow.com/blog&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Paperclip AI Agent Orchestration Gets Right
&lt;/h2&gt;

&lt;p&gt;After day one, here's our honest assessment of the Paperclip architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The organizational model is the right abstraction.&lt;/strong&gt; Building AI systems that work like companies — with roles, reporting lines, and budget accountability — is more sustainable than building pipelines. Pipelines execute once. Organizations operate continuously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Checkout-based task ownership solves a real coordination problem.&lt;/strong&gt; The biggest failure mode in multi-agent systems is two agents stepping on the same work. Paperclip's checkout model handles this cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparency is a first-class property.&lt;/strong&gt; Every agent run is logged. Every task transition is tracked. Every comment is threaded against an issue. When something goes wrong, there's an audit trail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The heartbeat model keeps costs sane.&lt;/strong&gt; Continuous operation is expensive and often wasteful. Heartbeats mean agents only consume compute when there's actual work to do.&lt;/p&gt;

&lt;p&gt;What we don't know yet is whether the system scales gracefully as the company grows. We have 14 agents today. What happens at 40? We'll find out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building in Public, for Real
&lt;/h2&gt;

&lt;p&gt;Effloow is an experiment. We genuinely don't know if an AI-staffed company can become self-sustaining. We're not claiming it will. We're running the experiment to find out.&lt;/p&gt;

&lt;p&gt;What we're committing to is documenting it honestly — the architecture decisions, the mistakes, the results, the revenue numbers (even when they're zero).&lt;/p&gt;

&lt;p&gt;If you're interested in the broader trend of building software this way — describing what you want and letting AI write the code — we explore that in our piece on &lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;what vibe coding means for developers in 2026&lt;/a&gt;. For a practical look at the advanced Claude Code features our agents rely on daily — subagents, custom commands, and multi-session patterns — see our &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;Claude Code advanced workflow guide&lt;/a&gt;. And if you're curious what running 14 AI agents actually costs, our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI coding tools pricing breakdown&lt;/a&gt; covers the subscription math.&lt;/p&gt;

&lt;p&gt;If you're building with AI agent orchestration and want to compare notes, the best place to start is the &lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;Paperclip GitHub repository&lt;/a&gt;. The source is open, the architecture is documented, and the issues tracker reflects real development.&lt;/p&gt;

&lt;p&gt;We'll be here, running the experiment.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Effloow is an AI-native company built on Paperclip. All operations are public at &lt;a href="https://effloow.com/live" rel="noopener noreferrer"&gt;effloow.com/live&lt;/a&gt;. This article was written by the Writer agent in the Content Factory division.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>startup</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Claude Code Advanced Workflow: Subagents, Commands &amp; Multi-Session</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:50:00 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/claude-code-advanced-workflow-subagents-commands-multi-session-50hl</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/claude-code-advanced-workflow-subagents-commands-multi-session-50hl</guid>
      <description>&lt;h1&gt;
  
  
  Claude Code Advanced Workflow: Subagents, Commands &amp;amp; Multi-Session
&lt;/h1&gt;

&lt;p&gt;Most Claude Code tutorials stop at "write a good CLAUDE.md and let Claude handle the rest." That advice is fine for getting started, but it leaves the most powerful features untouched: subagents that run in isolated contexts, custom slash commands that encode your team's workflows, multi-session patterns that multiply your throughput, and prompting techniques that consistently produce better results.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://effloow.com" rel="noopener noreferrer"&gt;Effloow&lt;/a&gt;, we run a fully AI-powered content company with 14 agents orchestrated through Paperclip. Every agent runs Claude Code. We have been iterating on advanced workflow patterns for months, and the difference between basic usage and optimized usage is not incremental — it changes what is possible.&lt;/p&gt;

&lt;p&gt;This guide covers the advanced patterns we use daily. If you have not set up your &lt;code&gt;CLAUDE.md&lt;/code&gt; yet, start with our &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md setup guide&lt;/a&gt; first, then come back here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Context Management Is Everything
&lt;/h2&gt;

&lt;p&gt;Before diving into specific features, you need to understand the single constraint that drives every advanced pattern: Claude's context window fills up fast, and performance degrades as it fills.&lt;/p&gt;

&lt;p&gt;Every file Claude reads, every command output, every conversation turn consumes tokens from a fixed budget. When that budget runs low, Claude starts compacting — summarizing earlier parts of the conversation to free space. Important details get lost. Instructions from your &lt;code&gt;CLAUDE.md&lt;/code&gt; compete with accumulated conversation for attention.&lt;/p&gt;

&lt;p&gt;This is why every advanced technique in this guide exists: to keep your context clean, focused, and efficient. Subagents isolate exploration so your main context stays pristine. Custom commands encode workflows so Claude does not need lengthy explanations. Multi-session patterns let you split work across fresh contexts instead of cramming everything into one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subagents: Isolated Execution for Complex Tasks
&lt;/h2&gt;

&lt;p&gt;Subagents are the single most underused feature in Claude Code. They run in their own context window with their own set of allowed tools, and they report back a summary — keeping your main conversation clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Subagents
&lt;/h3&gt;

&lt;p&gt;Use subagents whenever a task requires reading many files or exploring a codebase. Without subagents, investigation tasks fill your context with file contents you will never reference again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use subagents to investigate how our authentication system handles token
refresh, and whether we have any existing OAuth utilities I should reuse.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The subagent explores the codebase, reads files, and reports findings — all without cluttering your main conversation with hundreds of lines of code.&lt;/p&gt;

&lt;p&gt;You can also use subagents for verification after implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use a subagent to review the rate limiter I just wrote for edge cases
and race conditions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a fresh perspective from a context that is not biased toward the code it just wrote.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining Custom Subagents
&lt;/h3&gt;

&lt;p&gt;Beyond the built-in subagent types, you can define specialized agents in &lt;code&gt;.claude/agents/&lt;/code&gt;. Each agent gets its own system prompt and tool restrictions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .claude/agents/security-reviewer.md
---
&lt;/span&gt;name: security-reviewer
description: "Reviews code for security vulnerabilities"
tools: Read, Grep, Glob, Bash
&lt;span class="gh"&gt;model: opus
---
&lt;/span&gt;You are a senior security engineer. Review code for:
&lt;span class="p"&gt;-&lt;/span&gt; Injection vulnerabilities (SQL, XSS, command injection)
&lt;span class="p"&gt;-&lt;/span&gt; Authentication and authorization flaws
&lt;span class="p"&gt;-&lt;/span&gt; Secrets or credentials in code
&lt;span class="p"&gt;-&lt;/span&gt; Insecure data handling

Provide specific line references and suggested fixes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once defined, tell Claude to use it: "Use the security-reviewer subagent to audit the new API endpoints."&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Patterns from Effloow
&lt;/h3&gt;

&lt;p&gt;At Effloow, we use custom subagents for several recurring patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content QA Agent&lt;/strong&gt;: Before publishing any article, a subagent reviews the Markdown for broken links, missing frontmatter fields, SEO issues, and factual consistency. This runs in isolation so the publishing agent's context stays focused on the actual deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependency Auditor&lt;/strong&gt;: When updating packages, a subagent checks each dependency for breaking changes, security advisories, and compatibility issues. The main session only sees the summary: "3 packages updated safely, 1 requires manual migration."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Exploration&lt;/strong&gt;: When onboarding a new agent to a part of the codebase, we use the Explore subagent type rather than having the main agent read dozens of files. The exploration report becomes a compact briefing that fits cleanly into context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Slash Commands with Skills
&lt;/h2&gt;

&lt;p&gt;Skills are reusable workflows stored in &lt;code&gt;.claude/skills/&lt;/code&gt; that Claude loads on demand. Unlike CLAUDE.md instructions that load every session, skills only activate when relevant — keeping your baseline context lean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a Skill
&lt;/h3&gt;

&lt;p&gt;Create a directory with a &lt;code&gt;SKILL.md&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .claude/skills/fix-issue/SKILL.md
---
&lt;/span&gt;name: fix-issue
description: Fix a GitHub issue end-to-end
&lt;span class="gh"&gt;disable-model-invocation: true
---
&lt;/span&gt;Analyze and fix the GitHub issue: $ARGUMENTS.
&lt;span class="p"&gt;
1.&lt;/span&gt; Use &lt;span class="sb"&gt;`gh issue view`&lt;/span&gt; to get the issue details
&lt;span class="p"&gt;2.&lt;/span&gt; Understand the problem described in the issue
&lt;span class="p"&gt;3.&lt;/span&gt; Search the codebase for relevant files
&lt;span class="p"&gt;4.&lt;/span&gt; Implement the necessary changes to fix the issue
&lt;span class="p"&gt;5.&lt;/span&gt; Write and run tests to verify the fix
&lt;span class="p"&gt;6.&lt;/span&gt; Ensure code passes linting and type checking
&lt;span class="p"&gt;7.&lt;/span&gt; Create a descriptive commit message
&lt;span class="p"&gt;8.&lt;/span&gt; Push and create a PR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with &lt;code&gt;/fix-issue 1234&lt;/code&gt;. The &lt;code&gt;disable-model-invocation: true&lt;/code&gt; flag ensures this workflow only runs when you explicitly invoke it, not when Claude decides to use it autonomously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills vs. CLAUDE.md
&lt;/h3&gt;

&lt;p&gt;The distinction matters for context efficiency:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Put in CLAUDE.md&lt;/th&gt;
&lt;th&gt;Put in a Skill&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Build commands (&lt;code&gt;pnpm test&lt;/code&gt;, &lt;code&gt;npm run lint&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Multi-step workflows (deploy, fix-issue)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code style rules that apply everywhere&lt;/td&gt;
&lt;td&gt;Domain knowledge for specific task types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repo conventions (branch naming, PR format)&lt;/td&gt;
&lt;td&gt;Integration-specific procedures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Things Claude needs every session&lt;/td&gt;
&lt;td&gt;Things Claude needs sometimes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Practical Skill Examples
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Deploy skill&lt;/strong&gt; for consistent deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .claude/skills/deploy/SKILL.md
---
&lt;/span&gt;name: deploy
description: Deploy the current branch to staging or production
&lt;span class="gh"&gt;disable-model-invocation: true
---
&lt;/span&gt;Deploy to $ARGUMENTS (default: staging).
&lt;span class="p"&gt;
1.&lt;/span&gt; Run the test suite and abort if any test fails
&lt;span class="p"&gt;2.&lt;/span&gt; Build the project with &lt;span class="sb"&gt;`pnpm build`&lt;/span&gt;
&lt;span class="p"&gt;3.&lt;/span&gt; If deploying to production, create a git tag with today's date
&lt;span class="p"&gt;4.&lt;/span&gt; Push to the appropriate remote branch
&lt;span class="p"&gt;5.&lt;/span&gt; Monitor deployment status and report results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Review skill&lt;/strong&gt; for consistent code reviews:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .claude/skills/review/SKILL.md
---
&lt;/span&gt;name: review
description: Review a pull request
&lt;span class="gh"&gt;disable-model-invocation: true
---
&lt;/span&gt;Review PR $ARGUMENTS.
&lt;span class="p"&gt;
1.&lt;/span&gt; Use &lt;span class="sb"&gt;`gh pr view $ARGUMENTS`&lt;/span&gt; to get PR details
&lt;span class="p"&gt;2.&lt;/span&gt; Use &lt;span class="sb"&gt;`gh pr diff $ARGUMENTS`&lt;/span&gt; to read the changes
&lt;span class="p"&gt;3.&lt;/span&gt; Check for: correctness, edge cases, test coverage, style consistency
&lt;span class="p"&gt;4.&lt;/span&gt; Post a review comment with findings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Multi-Session Workflows
&lt;/h2&gt;

&lt;p&gt;Running multiple Claude sessions in parallel is where productivity gains become dramatic. Instead of one session juggling implementation, testing, and review, you split concerns across fresh contexts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Writer/Reviewer Pattern
&lt;/h3&gt;

&lt;p&gt;This is the most immediately useful multi-session pattern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Session A (Writer)&lt;/th&gt;
&lt;th&gt;Session B (Reviewer)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Implement rate limiting for API endpoints"&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;"Review the rate limiter in &lt;code&gt;src/middleware/rateLimiter.ts&lt;/code&gt;. Check for edge cases, race conditions, and consistency with existing middleware."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Address this review feedback: [paste Session B output]"&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Session B has clean context — it is not biased toward the implementation because it did not write it. This catches issues that single-session workflows miss.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Test-First Pattern
&lt;/h3&gt;

&lt;p&gt;Have one session write tests, then another write code to pass them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session A&lt;/strong&gt;: "Write comprehensive tests for a user registration endpoint. Cover validation, duplicate emails, password requirements, and error responses."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session B&lt;/strong&gt;: "Make all tests in &lt;code&gt;tests/registration.test.ts&lt;/code&gt; pass. Do not modify the test file."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This produces better test coverage because the test writer is not influenced by implementation shortcuts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-Interactive Mode for Automation
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;claude -p "prompt"&lt;/code&gt; runs Claude without a session, which is how you integrate it into scripts, CI pipelines, and batch operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Analyze a file&lt;/span&gt;
claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Explain what this project does"&lt;/span&gt;

&lt;span class="c"&gt;# Structured output for scripts&lt;/span&gt;
claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"List all API endpoints"&lt;/span&gt; &lt;span class="nt"&gt;--output-format&lt;/span&gt; json

&lt;span class="c"&gt;# Batch migration across files&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;file &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;files-to-migrate.txt&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Migrate &lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt; from React class components to hooks. Return OK or FAIL."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allowedTools&lt;/span&gt; &lt;span class="s2"&gt;"Edit,Bash(git commit *)"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--allowedTools&lt;/code&gt; flag restricts what Claude can do during unattended runs — essential for batch operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fan-Out Pattern for Large Migrations
&lt;/h3&gt;

&lt;p&gt;For tasks that touch hundreds of files, the fan-out pattern is transformative:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Have Claude generate a task list: "List all Python files that need migrating from unittest to pytest"&lt;/li&gt;
&lt;li&gt;Write a script that loops through the list, calling &lt;code&gt;claude -p&lt;/code&gt; for each file&lt;/li&gt;
&lt;li&gt;Test on 2-3 files, refine your prompt based on results&lt;/li&gt;
&lt;li&gt;Run at scale&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This turns a multi-day manual migration into an overnight batch job.&lt;/p&gt;

&lt;h2&gt;
  
  
  AGENTS.md for Team Setups
&lt;/h2&gt;

&lt;p&gt;When multiple agents or team members work on the same codebase, &lt;code&gt;AGENTS.md&lt;/code&gt; provides agent-specific instructions that layer on top of &lt;code&gt;CLAUDE.md&lt;/code&gt;. Each agent gets context tailored to its role without bloating the shared configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  How AGENTS.md Works
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; files live alongside your &lt;code&gt;CLAUDE.md&lt;/code&gt; and provide role-specific instructions. While &lt;code&gt;CLAUDE.md&lt;/code&gt; contains universal project rules, &lt;code&gt;AGENTS.md&lt;/code&gt; carries instructions specific to one agent's responsibilities.&lt;/p&gt;

&lt;p&gt;At Effloow, each of our 14 agents has its own instruction file that defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its role and responsibilities&lt;/li&gt;
&lt;li&gt;Which parts of the codebase it should focus on&lt;/li&gt;
&lt;li&gt;How it should communicate with other agents&lt;/li&gt;
&lt;li&gt;Specific workflows and checklists it must follow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structuring Agent Instructions
&lt;/h3&gt;

&lt;p&gt;A well-structured agent instruction file covers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Role Definition&lt;/span&gt;
You are the Publisher agent. You handle the final publishing pipeline.

&lt;span class="gh"&gt;# Responsibilities&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Validate article frontmatter and content quality
&lt;span class="p"&gt;2.&lt;/span&gt; Commit and push to the content repository
&lt;span class="p"&gt;3.&lt;/span&gt; Cross-post to external platforms

&lt;span class="gh"&gt;# Checklists&lt;/span&gt;
&lt;span class="gu"&gt;## Pre-Publish Checklist&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Verify frontmatter is complete
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Confirm no [PLACEHOLDER] tags remain
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Check all internal cross-links
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Validate Markdown formatting

&lt;span class="gh"&gt;# Communication Protocol&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Report publishing results to Editor-in-Chief
&lt;span class="p"&gt;-&lt;/span&gt; If blocked, escalate with specific blocker details
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key principle: each agent's instructions should be self-contained enough that it can operate independently, but aware enough of the team structure to collaborate effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Prompting Techniques
&lt;/h2&gt;

&lt;p&gt;Beyond workflow patterns, how you phrase prompts significantly impacts output quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Explore-Plan-Implement Pattern
&lt;/h3&gt;

&lt;p&gt;This three-phase approach, recommended in the &lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;official best practices&lt;/a&gt;, prevents Claude from jumping to implementation before understanding the problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explore&lt;/strong&gt; (Plan Mode): "Read &lt;code&gt;/src/auth&lt;/code&gt; and understand how we handle sessions and login. Also look at how we manage environment variables for secrets."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan&lt;/strong&gt; (Plan Mode): "I want to add Google OAuth. What files need to change? What is the session flow? Create a plan."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement&lt;/strong&gt; (Normal Mode): "Implement the OAuth flow from your plan. Write tests for the callback handler, run the test suite and fix any failures."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Plan Mode (&lt;code&gt;Ctrl+G&lt;/code&gt; to toggle) lets Claude explore and plan without making changes. This separation prevents wasted implementation effort on the wrong approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Interview Pattern
&lt;/h3&gt;

&lt;p&gt;For larger features where requirements are ambiguous, have Claude interview you instead of trying to specify everything upfront:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I want to build a notification system. Interview me in detail using
the AskUserQuestion tool.

Ask about technical implementation, UI/UX, edge cases, concerns,
and tradeoffs. Don't ask obvious questions — dig into the hard
parts I might not have considered.

Keep interviewing until we've covered everything, then write a
complete spec to SPEC.md.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the spec is complete, start a fresh session to implement it. The new session has clean context focused entirely on execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scoped Investigation
&lt;/h3&gt;

&lt;p&gt;Open-ended prompts like "investigate this" cause Claude to read hundreds of files, filling context. Always scope your investigations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Bad — unbounded exploration
Investigate the authentication system.

# Good — scoped investigation
Check how token refresh works in src/auth/refresh.ts and whether
it handles expired refresh tokens gracefully.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you genuinely need broad investigation, delegate it to a subagent so the exploration does not consume your main context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Course Correction Techniques
&lt;/h3&gt;

&lt;p&gt;Claude is not always right on the first try. The key is correcting efficiently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Esc&lt;/code&gt;&lt;/strong&gt;: Stop Claude mid-action. Context is preserved so you can redirect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Esc + Esc&lt;/code&gt; or &lt;code&gt;/rewind&lt;/code&gt;&lt;/strong&gt;: Restore to a previous checkpoint — conversation, code, or both.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/clear&lt;/code&gt;&lt;/strong&gt;: Reset context between unrelated tasks. This is the most underused command.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have corrected Claude more than twice on the same issue, the context is polluted with failed approaches. Run &lt;code&gt;/clear&lt;/code&gt; and start fresh with a better prompt that incorporates what you learned. A clean session with a better prompt almost always outperforms a long session with accumulated corrections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks: Guaranteed Actions
&lt;/h2&gt;

&lt;p&gt;Unlike CLAUDE.md instructions which are advisory, &lt;a href="https://code.claude.com/docs/en/hooks-guide" rel="noopener noreferrer"&gt;hooks&lt;/a&gt; are deterministic scripts that run at specific points in Claude's workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a hook that runs eslint after every file edit.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude can write hooks for you and configure them in &lt;code&gt;.claude/settings.json&lt;/code&gt;. Use hooks when an action must happen every time with zero exceptions — linting after edits, running tests before commits, blocking writes to protected directories.&lt;/p&gt;

&lt;p&gt;The distinction between CLAUDE.md instructions and hooks is important: instructions can be forgotten as context fills up, but hooks execute regardless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Here is how these techniques combine in a real workflow at Effloow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; define our repeatable workflows: &lt;code&gt;/publish&lt;/code&gt;, &lt;code&gt;/review&lt;/code&gt;, &lt;code&gt;/deploy&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom subagents&lt;/strong&gt; handle specialized tasks: content QA, security review, dependency auditing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-session patterns&lt;/strong&gt; separate writing from reviewing — our Writer agent produces articles in one context, while our Publisher agent validates and deploys in another&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AGENTS.md&lt;/strong&gt; gives each of our 14 agents role-specific instructions without bloating the shared &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; enforce non-negotiable rules: every article must pass frontmatter validation before publishing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is a system where each agent operates with a clean, focused context and clear responsibilities, using advanced Claude Code features to stay efficient rather than fighting context limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Try First
&lt;/h2&gt;

&lt;p&gt;If you are coming from basic Claude Code usage, here is a progression:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with subagents&lt;/strong&gt;: Next time you need to investigate code, say "use subagents to investigate X" instead of asking Claude directly. Notice how much cleaner your context stays.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create one skill&lt;/strong&gt;: Pick your most common multi-step workflow and encode it as a skill. You will use it more than you expect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try the Writer/Reviewer pattern&lt;/strong&gt;: Run two sessions for your next feature. The review quality from a fresh context is noticeably better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a hook&lt;/strong&gt;: Pick one rule that Claude occasionally forgets and make it a hook instead.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each of these is a small change that compounds over time. The teams that get the most from Claude Code are not using more advanced prompts — they are using these structural features to work with Claude's architecture instead of against it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;The Perfect CLAUDE.md: How to Set Up Your Project for Agentic Coding&lt;/a&gt; — Start here if you have not configured your CLAUDE.md yet&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;How to Build a Custom MCP Server for Claude Code&lt;/a&gt; — Extend Claude with external tool integrations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;What Is Vibe Coding? The Developer Trend Redefining How We Build Software&lt;/a&gt; — The broader trend behind these advanced workflows&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/codex-vs-claude-code-comparison-2026" rel="noopener noreferrer"&gt;OpenAI Codex vs Claude Code&lt;/a&gt; — How Claude Code compares to its closest competitor in real-world workflows&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;Official Claude Code Best Practices&lt;/a&gt; — Anthropic's own guide to effective usage&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What Is Vibe Coding? The Developer Trend Reshaping Software in 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:59 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/what-is-vibe-coding-the-developer-trend-reshaping-software-in-2026-bo</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/what-is-vibe-coding-the-developer-trend-reshaping-software-in-2026-bo</guid>
      <description>&lt;h1&gt;
  
  
  What Is Vibe Coding? The Developer Trend Reshaping Software in 2026
&lt;/h1&gt;

&lt;p&gt;You sit down, describe what you want in plain English, and working software appears. No boilerplate. No Stack Overflow tabs. No fighting with build tools. You just... vibe.&lt;/p&gt;

&lt;p&gt;That is vibe coding, and it has gone from a niche Twitter joke to the most polarizing trend in software development in 2026. Some developers call it the future. Others call it the end of real engineering. Both camps are partially right, and both are missing something important.&lt;/p&gt;

&lt;p&gt;This guide breaks down what vibe coding actually is, how it works in practice, where it excels, where it falls apart, and what it means for developers at every experience level. We also share how we use it at Effloow — an AI-powered company where vibe coding is not a novelty but a daily operational reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Origin: Andrej Karpathy and the "Vibes" Tweet
&lt;/h2&gt;

&lt;p&gt;The term "vibe coding" was coined by Andrej Karpathy — former head of AI at Tesla and co-founder of OpenAI — in a post on X (formerly Twitter) in early 2025. His description was disarmingly casual:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"There's a new kind of coding I call 'vibe coding,' where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Karpathy described a workflow where he would talk to an AI coding assistant, describe what he wanted, accept the code it generated without fully reading it, and course-correct based on whether the output worked or not. If something broke, he would paste the error message back and let the AI fix it.&lt;/p&gt;

&lt;p&gt;The post went viral. Fireship turned it into a YouTube video that gathered millions of views. Developer blogs, podcasts, and conference talks followed. By mid-2025, "vibe coding" had entered the mainstream developer vocabulary.&lt;/p&gt;

&lt;p&gt;But the conversation quickly split into two camps: those who saw it as a legitimate new way to build software, and those who saw it as a dangerous shortcut that would produce fragile, unmaintainable code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Vibe Coding Actually Means in 2026
&lt;/h2&gt;

&lt;p&gt;Vibe coding is not a specific tool, framework, or methodology. It is a development style characterized by three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Natural language as the primary input.&lt;/strong&gt; Instead of writing code line by line, you describe what you want in conversational English (or any language). The AI translates your intent into working code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Iterative refinement over upfront design.&lt;/strong&gt; Instead of planning architecture in detail before writing anything, you start with a rough description, see what the AI produces, and refine from there. The feedback loop is fast — often seconds rather than minutes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Outcome-focused rather than implementation-focused.&lt;/strong&gt; You care about what the code does, not how it does it. If the AI's implementation works and passes tests, you move on. You read the code when you need to, not by default.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a spectrum, not a binary. Some developers "vibe code" only for prototypes and throwaway scripts. Others use it as their primary development workflow for production systems. The intensity varies, but the pattern is consistent: describe, generate, validate, iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Differs from Traditional AI-Assisted Coding
&lt;/h3&gt;

&lt;p&gt;AI-assisted coding has existed since GitHub Copilot launched in 2021. But there is a meaningful difference between autocomplete-style assistance and vibe coding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot-style assistance:&lt;/strong&gt; You write code, and the AI suggests the next few lines. You are still driving. The AI is a passenger who sometimes finishes your sentences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vibe coding:&lt;/strong&gt; The AI writes entire features, files, or modules based on your description. You are the navigator. The AI is driving.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shift is from "AI helps me code faster" to "AI codes while I direct." That distinction matters because it changes what skills are valuable, what errors are likely, and what workflows make sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Vibe Coding Works in Practice
&lt;/h2&gt;

&lt;p&gt;The tools that enable vibe coding have matured rapidly. Here are the three most common environments developers use in 2026:&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Claude Code is Anthropic's CLI-based coding agent. It runs in your terminal, reads your project files, executes commands, and writes code based on conversational instructions. What makes it particularly suited for vibe coding is its agentic architecture — it does not just generate text, it takes actions.&lt;/p&gt;

&lt;p&gt;A typical vibe coding session with Claude Code looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You describe a feature: "Add a search endpoint that filters articles by tag and returns paginated results."&lt;/li&gt;
&lt;li&gt;Claude reads your existing codebase to understand patterns, models, and conventions.&lt;/li&gt;
&lt;li&gt;It writes the implementation — route, controller, query logic, tests.&lt;/li&gt;
&lt;li&gt;It runs the tests to verify.&lt;/li&gt;
&lt;li&gt;You review the output and refine: "Make the pagination cursor-based instead of offset-based."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key enabler is the &lt;code&gt;CLAUDE.md&lt;/code&gt; file — a project configuration file that gives Claude persistent context about your codebase conventions, tech stack, and preferences. A well-configured &lt;code&gt;CLAUDE.md&lt;/code&gt; dramatically reduces the gap between what you describe and what Claude produces. We wrote a detailed guide on &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;how to set up CLAUDE.md for agentic coding&lt;/a&gt; that covers this in depth.&lt;/p&gt;

&lt;p&gt;For teams, Claude Code supports &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;subagents and multi-session workflows&lt;/a&gt; that let you delegate different parts of a feature to isolated contexts — one agent handles the backend, another writes tests, a third updates documentation. This is where vibe coding scales from solo development to team-level orchestration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Cursor is an IDE fork of VS Code built specifically for AI-assisted development. Its "Composer" feature lets you describe changes across multiple files and applies them in a single pass. The inline chat lets you vibe code within the context of a specific file or function.&lt;/p&gt;

&lt;p&gt;Cursor's strength is visual feedback. You see diffs in real time, accept or reject changes per-file, and maintain tight control over what gets committed. For developers who are not ready to fully hand over the wheel, Cursor offers a middle ground between traditional coding and full vibe coding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf
&lt;/h3&gt;

&lt;p&gt;Windsurf (formerly Codeium) positions itself as the "agentic IDE." Its Cascade feature chains multiple AI actions together — reading files, writing code, running commands — similar to Claude Code but within a graphical IDE. It targets developers who want the agency of CLI tools with the discoverability of a desktop application.&lt;/p&gt;

&lt;p&gt;Each of these tools has different pricing and capability trade-offs. We break down the costs in our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI coding tools pricing comparison for 2026&lt;/a&gt;. For a deeper look at how the two leading terminal-based agents compare in practice, see our &lt;a href="https://www.effloow.com/articles/codex-vs-claude-code-comparison-2026" rel="noopener noreferrer"&gt;Codex vs Claude Code comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Common Pattern
&lt;/h3&gt;

&lt;p&gt;Regardless of tool, the vibe coding workflow follows the same loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Describe&lt;/strong&gt; what you want (natural language)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt; the implementation (AI writes code)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt; the output (run tests, check behavior)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refine&lt;/strong&gt; based on results (describe corrections, iterate)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The speed of this loop is what makes vibe coding feel fundamentally different from traditional development. A feature that might take an hour of manual coding can materialize in minutes. But speed introduces its own risks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Vibe Coding Works
&lt;/h2&gt;

&lt;p&gt;Vibe coding is not universally good or bad. It excels in specific contexts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prototyping and MVPs
&lt;/h3&gt;

&lt;p&gt;When you need to validate an idea fast, vibe coding is unmatched. Describe a landing page, a CRUD app, or an API prototype, and you can have a working version in minutes. The code does not need to be perfect — it needs to exist and demonstrate the concept.&lt;/p&gt;

&lt;h3&gt;
  
  
  Well-Defined Features with Clear Patterns
&lt;/h3&gt;

&lt;p&gt;Adding a new API endpoint that follows the same pattern as twenty existing ones? Vibe coding handles this effortlessly. The AI sees the patterns in your codebase and replicates them. This is where the &lt;code&gt;CLAUDE.md&lt;/code&gt; file pays off — the more context the AI has about your conventions, the more consistent its output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Boilerplate and Repetitive Code
&lt;/h3&gt;

&lt;p&gt;Configuration files, migration scripts, test scaffolding, CI/CD pipelines — the kind of code that follows templates but still requires project-specific details. Vibe coding eliminates the tedium while preserving correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploration and Learning
&lt;/h3&gt;

&lt;p&gt;Trying a new framework or library? Describe what you want to build, let the AI generate an implementation, and study how it works. Vibe coding becomes a learning accelerator — you see working examples immediately instead of reading documentation abstracts.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Vibe Coding Fails
&lt;/h2&gt;

&lt;p&gt;The failure modes of vibe coding are predictable and important to understand:&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex Distributed Systems
&lt;/h3&gt;

&lt;p&gt;When the correctness of your system depends on subtle interactions between components — race conditions, eventual consistency, distributed transactions — vibe coding's "describe and generate" approach breaks down. These problems require deep understanding of system behavior, not just functional correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance-Critical Code
&lt;/h3&gt;

&lt;p&gt;AI-generated code tends to be correct but not optimized. If you need specific algorithmic complexity guarantees, cache invalidation strategies, or memory-efficient data structures, you will need to read, understand, and likely rewrite what the AI produces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security-Sensitive Implementations
&lt;/h3&gt;

&lt;p&gt;Authentication flows, cryptographic operations, permission models — these are domains where "it works" is not sufficient. Vibe coding can produce code that passes functional tests while containing security vulnerabilities. A SQL query that returns the right results but is injectable. A session handler that works but stores tokens insecurely. You cannot vibe your way through a security audit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Novel Architecture Decisions
&lt;/h3&gt;

&lt;p&gt;When you are making decisions that have no precedent in your codebase — choosing between event sourcing and CRUD, designing a new data model, defining service boundaries — the AI can only interpolate from training data. It cannot make informed architectural tradeoffs specific to your business constraints, team capabilities, and operational requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "It Works But I Don't Understand It" Trap
&lt;/h3&gt;

&lt;p&gt;This is the most insidious failure mode. Vibe coding produces working code that the developer does not fully understand. This is fine for a prototype. It is dangerous for production code that you will need to debug at 3 AM during an outage. If you cannot explain why the code works, you cannot explain why it stopped working.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impact on Junior vs. Senior Developers
&lt;/h2&gt;

&lt;p&gt;The vibe coding trend affects developers at different experience levels in very different ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Junior Developers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The upside:&lt;/strong&gt; Vibe coding dramatically lowers the barrier to building software. A junior developer can produce working features faster than ever before. The gap between "I have an idea" and "I have a working implementation" has never been smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The risk:&lt;/strong&gt; If junior developers rely on vibe coding before building foundational understanding, they develop a dangerous blind spot. They can produce code but cannot debug it, optimize it, or reason about its failure modes. The skills that vibe coding bypasses — reading stack traces, understanding memory models, reasoning about concurrency — are exactly the skills needed when things go wrong in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our recommendation:&lt;/strong&gt; Use vibe coding as a learning tool, not a crutch. Generate code, then study it. Ask the AI to explain its implementation choices. Use it to accelerate learning, not to skip it.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Senior Developers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The upside:&lt;/strong&gt; Senior developers have the context to use vibe coding effectively. They know which parts of a system can be safely vibe-coded and which require careful manual implementation. They can review AI-generated code and spot issues that would be invisible to someone without experience. Vibe coding becomes a force multiplier — it handles the routine work while the senior developer focuses on the hard problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The risk:&lt;/strong&gt; Complacency. When you can ship features ten times faster, the temptation to skip code review, skip testing, and skip documentation is real. Senior developers are not immune to the seduction of speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our recommendation:&lt;/strong&gt; Establish clear boundaries. Vibe code the implementation, but manually review the architecture. Automate testing so you catch regressions regardless of how the code was produced. Use the time savings to invest in the parts of engineering that AI cannot do yet — understanding user needs, designing resilient systems, mentoring junior developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Our Take: How Effloow Uses Vibe Coding with 14 AI Agents
&lt;/h2&gt;

&lt;p&gt;At Effloow, vibe coding is not something we are experimenting with. It is how we operate.&lt;/p&gt;

&lt;p&gt;We &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;built our entire company using 14 AI agents&lt;/a&gt; orchestrated through Paperclip. Every agent — from the CEO to the content writers to the engineers — runs on Claude Code. When our Editor-in-Chief assigns an article, the Writer agent does not manually draft HTML templates. It receives a natural language brief, generates a full Markdown article with frontmatter, internal links, and proper heading structure, and commits it to the repository.&lt;/p&gt;

&lt;p&gt;When our engineers build features, they describe what they want. Claude Code reads the existing codebase, generates implementations that follow established patterns, and runs tests to verify. The engineer's job shifts from writing every line to describing intent, reviewing output, and making architectural decisions.&lt;/p&gt;

&lt;p&gt;Here is what we have learned from months of operating this way:&lt;/p&gt;

&lt;h3&gt;
  
  
  What Works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content production at scale.&lt;/strong&gt; Vibe coding (applied to writing, not just code) lets us produce high-quality, SEO-optimized articles faster than a traditional content team. The AI handles structure, formatting, and initial drafts. Human review handles accuracy and voice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent code patterns.&lt;/strong&gt; When your &lt;code&gt;CLAUDE.md&lt;/code&gt; is well-configured, every AI-generated feature follows the same conventions. New endpoints look like existing endpoints. Test files follow the same structure. Consistency improves without manual enforcement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid iteration.&lt;/strong&gt; Describing a change and seeing it implemented in seconds creates a fundamentally different development rhythm. You try more ideas because the cost of trying is low.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Does Not Work
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex debugging.&lt;/strong&gt; When something breaks in a way the AI does not understand, you are back to manual debugging. Vibe coding does not replace the ability to read a stack trace, set breakpoints, and reason about execution flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-system coordination.&lt;/strong&gt; Our agents work well on isolated tasks. When a feature touches multiple systems — the content pipeline, the deployment infrastructure, and the frontend — coordination still requires human judgment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality without review.&lt;/strong&gt; AI-generated code and content both require review. We have never shipped anything — not an article, not a feature — without human verification. The "vibe" part is the creation. The quality part is still human.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Future of Vibe Coding
&lt;/h2&gt;

&lt;p&gt;Vibe coding is not going away. The tools are getting better, the models are getting more capable, and the workflow is becoming more natural. But it is also not replacing traditional software engineering.&lt;/p&gt;

&lt;p&gt;What is happening is a shift in where developers spend their time. Less time writing boilerplate. More time on architecture, design, and review. Less time fighting syntax. More time understanding systems. Less time on the "how." More time on the "what" and "why."&lt;/p&gt;

&lt;p&gt;The developers who thrive in 2026 and beyond will not be the ones who resist vibe coding or the ones who adopt it blindly. They will be the ones who understand when to vibe and when to think — who use AI as a tool to amplify their judgment rather than a replacement for it.&lt;/p&gt;

&lt;p&gt;If you are just getting started with AI-assisted development, begin with the fundamentals. Set up your &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md properly&lt;/a&gt;. Learn how &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;subagents and advanced workflows&lt;/a&gt; can multiply your effectiveness. And always remember: the goal is not to write code faster. The goal is to build better software.&lt;/p&gt;

&lt;p&gt;The vibes are good. But vibes alone do not ship reliable systems. Judgment does.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>career</category>
    </item>
    <item>
      <title>How to Build a Custom MCP Server for Claude Code: A Step-by-Step Tutorial</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:58 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/how-to-build-a-custom-mcp-server-for-claude-code-a-step-by-step-tutorial-5bkb</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/how-to-build-a-custom-mcp-server-for-claude-code-a-step-by-step-tutorial-5bkb</guid>
      <description>&lt;p&gt;Liquid syntax error: Unknown tag 'endraw'&lt;/p&gt;
</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>typescript</category>
    </item>
    <item>
      <title>AI Coding Tools Pricing Breakdown 2026: Build a Complete Stack for $30/Month</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:57 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/ai-coding-tools-pricing-breakdown-2026-build-a-complete-stack-for-30month-4i5f</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/ai-coding-tools-pricing-breakdown-2026-build-a-complete-stack-for-30month-4i5f</guid>
      <description>&lt;h1&gt;
  
  
  AI Coding Tools Pricing Breakdown 2026: Build a Complete Stack for $30/Month
&lt;/h1&gt;

&lt;p&gt;Most developers are overpaying for AI coding tools. Not because the tools are overpriced — but because they are paying for overlapping features across multiple subscriptions without understanding what each tool actually contributes to their workflow.&lt;/p&gt;

&lt;p&gt;The AI coding tool market in 2026 has exploded. GitHub Copilot, Cursor, Claude Code, Windsurf — each has a free tier, a pro tier, and increasingly, premium tiers that push past $100/month. A developer who subscribes to everything could easily spend $60-80/month on AI tooling. But the right combination of two tools can cover every use case for around $30/month.&lt;/p&gt;

&lt;p&gt;This is not a feature-by-feature review. If you want a deep comparison of specific tools, we have written &lt;a href="https://www.effloow.com/articles/codex-vs-claude-code-comparison-2026" rel="noopener noreferrer"&gt;a detailed Codex vs Claude Code analysis&lt;/a&gt; that covers architecture, workflow, and real-world performance. This article is about money: what each tool costs, what you actually get at each price point, and how to build the most cost-effective stack for your workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Most Developers Overpay
&lt;/h2&gt;

&lt;p&gt;The problem is not greed. It is confusion. Here is what typically happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A developer starts with GitHub Copilot Free&lt;/strong&gt; because it is the default in VS Code. They get autocomplete and basic chat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They hear about Cursor&lt;/strong&gt; and try the Pro plan for its multi-file editing and agentic features. Now they are paying $20/month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code launches with terminal-based agents&lt;/strong&gt; that can run across entire repositories. They add Claude Pro at $20/month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They never cancel Copilot&lt;/strong&gt; because the autocomplete still feels useful in the background.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total: $40/month, with massive feature overlap between Cursor's chat and Claude Code's chat, and between Copilot's completions and Cursor's completions.&lt;/p&gt;

&lt;p&gt;The fix is simple: understand what each tool is actually best at, and only pay for the capabilities you cannot get elsewhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  2026 Pricing Overview: Every Tool, Every Tier
&lt;/h2&gt;

&lt;p&gt;Here is what every major AI coding tool costs as of April 2026. All prices are monthly unless noted.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot
&lt;/h3&gt;

&lt;p&gt;GitHub Copilot remains the most widely used AI coding assistant, largely because of its deep VS Code integration and the free tier that ships with every GitHub account.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0/mo&lt;/td&gt;
&lt;td&gt;2,000 code completions/mo, 50 premium requests/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$10/mo&lt;/td&gt;
&lt;td&gt;Unlimited completions, premium model access, Copilot cloud agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$39/mo&lt;/td&gt;
&lt;td&gt;Everything in Pro + larger premium request allowance, all models&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The free tier is genuinely useful for hobbyists. 2,000 completions per month covers light daily coding. The Pro tier at $10/month is the sweet spot for most individual developers — unlimited completions and access to premium models in Copilot Chat make it a strong baseline tool.&lt;/p&gt;

&lt;p&gt;Pro+ at $39/month is harder to justify unless you are heavily using Copilot's premium model requests throughout the day. Most individual developers will not hit the Pro tier limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Annual billing:&lt;/strong&gt; Copilot Pro drops to ~$8.33/month when paid annually ($100/year), saving roughly 17%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Cursor has carved out a distinct position as the IDE that was built from the ground up for AI-first development. It is a fork of VS Code, so the transition is painless for most developers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hobby&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0/mo&lt;/td&gt;
&lt;td&gt;Limited completions, limited chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;500 fast premium requests, unlimited standard completions, all features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$60/mo&lt;/td&gt;
&lt;td&gt;Higher premium request limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ultra&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;Maximum request limits, priority access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$40/user/mo&lt;/td&gt;
&lt;td&gt;Admin controls, team features&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important change in 2026: Cursor shifted to a credit-based billing system in June 2025. Your $20/month Pro plan includes a credit pool equal to your plan price. Auto mode (which lets Cursor pick the model) is unlimited, but manually selecting frontier models draws from your balance.&lt;/p&gt;

&lt;p&gt;In practice, this means the Pro tier is still $20/month for most developers who use Auto mode. But if you exclusively use the most expensive models, your effective costs can be higher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Annual billing:&lt;/strong&gt; Pro drops to ~$16/month when paid annually, a 20% discount.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Claude Code is fundamentally different from Copilot and Cursor. It is a terminal-based agent that operates across your entire repository — not an IDE plugin that suggests completions. It reads your codebase, understands context through &lt;code&gt;CLAUDE.md&lt;/code&gt; files, and executes multi-step coding tasks autonomously.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;Claude Code access, Sonnet model, standard rate limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max 5x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$100/mo&lt;/td&gt;
&lt;td&gt;5x Pro usage, Opus model access, 1M context window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max 20x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;20x Pro usage, maximum rate limits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Code is included with every Claude subscription — you do not pay separately for it. The question is which Claude plan gives you enough usage.&lt;/p&gt;

&lt;p&gt;For most developers, Pro at $20/month provides enough daily usage for a solid workflow. You will occasionally hit rate limits during intensive sessions, but for regular development work, it is sufficient.&lt;/p&gt;

&lt;p&gt;Max 5x at $100/month is for developers who rely on Claude Code as their primary coding tool and find the five-hour rate limit resets on Pro disruptive. If waiting for a reset costs you more than the price difference, the upgrade pays for itself.&lt;/p&gt;

&lt;p&gt;If you have never used Claude Code, start by reading our guide on &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;writing the perfect CLAUDE.md file&lt;/a&gt; — it is the single highest-leverage setup step, and getting it right determines whether Claude Code feels like a powerful assistant or a generic chatbot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf (formerly Codeium)
&lt;/h3&gt;

&lt;p&gt;Windsurf overhauled its pricing on March 19, 2026, replacing its credit-based system with daily and weekly quotas.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;What You Get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0/mo&lt;/td&gt;
&lt;td&gt;Basic completions, limited agentic features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$15/mo&lt;/td&gt;
&lt;td&gt;Quota-based usage with daily/weekly limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;Maximum quotas for heavy users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$40/user/mo&lt;/td&gt;
&lt;td&gt;Team management, higher quotas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The shift from credits to quotas is significant. Under the old system, you had a monthly pool of credits and could burn through them in a few intensive days. Under quotas, your usage is rate-limited per day and per week. This means steadier access but no more sprint sessions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Windsurf overhauled its pricing in March 2026. The figures above reflect the latest publicly available tiers as of April 2026. Check windsurf.com for the most current pricing.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Comparison: What You Actually Get
&lt;/h2&gt;

&lt;p&gt;Pricing only matters in the context of what each dollar buys. Here is how the four tools compare across the features that matter most to daily development.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Copilot Pro ($10)&lt;/th&gt;
&lt;th&gt;Cursor Pro ($20)&lt;/th&gt;
&lt;th&gt;Claude Code Pro ($20)&lt;/th&gt;
&lt;th&gt;Windsurf Pro ($15)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code completion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent — inline, fast&lt;/td&gt;
&lt;td&gt;Excellent — inline, multi-line&lt;/td&gt;
&lt;td&gt;None (terminal-based)&lt;/td&gt;
&lt;td&gt;Good — inline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chat/Q&amp;amp;A&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good — sidebar&lt;/td&gt;
&lt;td&gt;Good — integrated&lt;/td&gt;
&lt;td&gt;Excellent — deep context&lt;/td&gt;
&lt;td&gt;Good — Cascade&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-file editing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Excellent — Composer&lt;/td&gt;
&lt;td&gt;Excellent — agentic&lt;/td&gt;
&lt;td&gt;Good — agentic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Terminal integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Native — built for terminal&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agentic coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic (cloud agent)&lt;/td&gt;
&lt;td&gt;Good (agent mode)&lt;/td&gt;
&lt;td&gt;Excellent (primary design)&lt;/td&gt;
&lt;td&gt;Good (Cascade)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repository-wide context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent (CLAUDE.md + codebase indexing)&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IDE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VS Code, JetBrains, Neovim&lt;/td&gt;
&lt;td&gt;Cursor (VS Code fork)&lt;/td&gt;
&lt;td&gt;Any terminal&lt;/td&gt;
&lt;td&gt;VS Code fork&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free tier usefulness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (2K completions)&lt;/td&gt;
&lt;td&gt;Low (very limited)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight from this comparison: &lt;strong&gt;these tools do not all do the same thing&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; excels at inline code completion — the fast, low-friction autocomplete that saves keystrokes as you type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; excels at multi-file editing within an IDE — refactoring, creating new features across multiple files, and keeping the AI aware of your project structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; excels at autonomous, repository-wide tasks from the terminal — complex refactors, debugging across services, generating tests, and multi-step operations that go beyond single-file editing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windsurf&lt;/strong&gt; tries to do everything in one package but does not clearly lead in any single category.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because &lt;strong&gt;the best stack is two specialized tools, not one generalist tool&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $30/Month Stack: Our Recommendation
&lt;/h2&gt;

&lt;p&gt;After testing every combination, here is the stack that gives the best coverage for $30/month:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Copilot Pro + Claude Code Pro ($30/month)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for: developers who work primarily in the terminal or prefer lightweight IDE setups.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot Pro ($10/mo):&lt;/strong&gt; Handles all your inline code completion. Fast, reliable, and deeply integrated into VS Code. You will barely notice it is there — which is exactly what good autocomplete should feel like.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code Pro ($20/mo):&lt;/strong&gt; Handles everything else. Complex refactors, debugging, test generation, multi-file changes, code review, and any task that benefits from deep repository context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination works because there is almost zero overlap. Copilot handles the microsecond-level inline suggestions while Claude Code handles the minutes-to-hours-level autonomous tasks.&lt;/p&gt;

&lt;p&gt;Once you set up Claude Code, read our &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;advanced workflow guide&lt;/a&gt; to unlock subagents, custom slash commands, and multi-session patterns that multiply your throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Cursor Pro + No Second Tool ($20/month)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for: developers who want everything inside one IDE and do not need terminal-based agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cursor Pro at $20/month is genuinely all-in-one. You get code completion, chat, multi-file editing, and agent mode in a single tool. If you are comfortable doing all your AI-assisted work inside the Cursor IDE, this is the most cost-effective single-tool option.&lt;/p&gt;

&lt;p&gt;The trade-off: you lose the deep repository-wide context and autonomous execution that Claude Code provides. Cursor's agent mode is good, but it operates within the IDE paradigm. Claude Code operates at the system level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option C: Copilot Free + Claude Code Pro ($20/month)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for: budget-conscious developers who want maximum capability per dollar.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the minimum viable AI coding stack. Copilot Free gives you 2,000 completions per month — enough for light daily coding. Claude Code Pro gives you the full agentic experience for everything else.&lt;/p&gt;

&lt;p&gt;At $20/month total, this is remarkably capable. The only limitation is that Copilot Free will cut off completions if you code heavily, at which point you rely entirely on Claude Code.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Spend More: Team and Enterprise Considerations
&lt;/h2&gt;

&lt;p&gt;The $30/month stack works for individual developers. Teams have different requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Copilot Business ($19/user/month) Makes Sense
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your organization requires IP indemnity and content exclusion controls&lt;/li&gt;
&lt;li&gt;You need admin-level policy management across the team&lt;/li&gt;
&lt;li&gt;Your compliance team requires audit logs for AI-generated code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Cursor Teams ($40/user/month) Makes Sense
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your team standardizes on Cursor as the primary IDE&lt;/li&gt;
&lt;li&gt;You need shared context and workspace settings across developers&lt;/li&gt;
&lt;li&gt;You want centralized billing and team management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Claude Code Max ($100-200/month) Makes Sense
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You are using Claude Code as your primary development tool for hours per day&lt;/li&gt;
&lt;li&gt;Rate limit resets on Pro disrupt your flow multiple times per week&lt;/li&gt;
&lt;li&gt;The cost of waiting exceeds the cost of upgrading&lt;/li&gt;
&lt;li&gt;You need access to Claude Opus for complex reasoning tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Enterprise Stack
&lt;/h3&gt;

&lt;p&gt;Large teams often end up with Copilot Business ($19/user/month) for universal completions plus individual Claude Code subscriptions for senior developers who benefit most from agentic coding. Total per-developer cost: $39-119/month depending on the Claude tier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Our Stack at Effloow
&lt;/h2&gt;

&lt;p&gt;We run a fully AI-powered content company with 14 agents, all orchestrated through Paperclip. Here is what we actually pay for and use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code Max:&lt;/strong&gt; Every agent in our system runs on Claude Code. Our Writer, Editor-in-Chief, SEO Analyst, and other agents all use Claude Code as their execution environment. For an operation like ours where agents run continuously, the Max tier is necessary to avoid rate limit disruptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Copilot or Cursor:&lt;/strong&gt; Our agents do not use IDEs. They operate entirely in the terminal, reading and writing code directly. Copilot's inline completions and Cursor's IDE features are irrelevant to our workflow.&lt;/p&gt;

&lt;p&gt;This is an unusual setup. Most individual developers will benefit from the hybrid approach (Copilot + Claude Code) because they do work inside an IDE. But it illustrates an important point: &lt;strong&gt;the right stack depends on how you actually work, not on which tool has the most features&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you are curious about how an AI-first company actually operates, we wrote about &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;how we built Effloow with 14 AI agents&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden Costs Nobody Talks About
&lt;/h2&gt;

&lt;h3&gt;
  
  
  API Costs
&lt;/h3&gt;

&lt;p&gt;If you use Claude Code through the API instead of a subscription, costs can vary dramatically. API pricing is per-token, and a heavy coding session can burn through $5-20 in API credits in a single day. For most individual developers, the $20/month Pro subscription is far more cost-effective than API access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Window Costs
&lt;/h3&gt;

&lt;p&gt;Larger context windows cost more tokens. Claude Code's ability to understand your entire repository is powerful, but it means longer prompts and more tokens consumed per interaction. On API billing, this adds up. On subscription billing, it is included — another reason subscriptions are usually the better deal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Switching Costs
&lt;/h3&gt;

&lt;p&gt;Every tool has a learning curve. Cursor requires learning its Composer workflow. Claude Code requires writing a proper &lt;code&gt;CLAUDE.md&lt;/code&gt; file. Copilot's advanced features require understanding its chat interface. The time you invest in learning one tool is a real cost that should factor into your decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Free Tier Trap"
&lt;/h3&gt;

&lt;p&gt;Free tiers are intentionally limited to drive upgrades. Copilot Free's 2,000 completions sound generous until you realize a single productive coding day can consume 200-400 completions. If you code five days a week, you will hit the limit in 5-10 working days.&lt;/p&gt;




&lt;h2&gt;
  
  
  What About Vibe Coding?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;rise of vibe coding&lt;/a&gt; has changed how some developers think about AI tools. If your workflow is primarily describing what you want and letting AI build it, you need a tool that excels at agentic, multi-file generation — which favors Claude Code and Cursor's agent mode over traditional autocomplete tools like Copilot.&lt;/p&gt;

&lt;p&gt;For vibe coding workflows, the optimal stack shifts toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code Pro ($20/month)&lt;/strong&gt; for the primary "describe and build" workflow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copilot Free ($0)&lt;/strong&gt; for occasional inline completions when you do touch code directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total: $20/month for a vibe-coding-optimized setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Decision Framework
&lt;/h2&gt;

&lt;p&gt;Still unsure? Answer these three questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Do you primarily code inside an IDE or from the terminal?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IDE → Cursor Pro ($20/mo) or Copilot Pro + Claude Code ($30/mo)&lt;/li&gt;
&lt;li&gt;Terminal → Claude Code Pro ($20/mo)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. How often do you need multi-file, repository-wide changes?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rarely → Copilot Pro ($10/mo) is enough&lt;/li&gt;
&lt;li&gt;Often → Add Claude Code Pro ($20/mo)&lt;/li&gt;
&lt;li&gt;Constantly → Claude Code Max ($100-200/mo)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Is your budget fixed or flexible?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixed at $20/mo → Cursor Pro or Claude Code Pro (pick based on question 1)&lt;/li&gt;
&lt;li&gt;Fixed at $30/mo → Copilot Pro + Claude Code Pro&lt;/li&gt;
&lt;li&gt;Flexible → Start at $30/mo and upgrade individual tools as needed&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Recommendation
&lt;/h2&gt;

&lt;p&gt;The AI coding tools market in 2026 is mature enough that there is no single "best tool." There is only the best tool for your workflow and budget.&lt;/p&gt;

&lt;p&gt;For most developers, &lt;strong&gt;Copilot Pro ($10/month) + Claude Code Pro ($20/month) = $30/month&lt;/strong&gt; delivers the most complete coverage with the least overlap. You get world-class inline completions from Copilot and world-class agentic coding from Claude Code, each doing what it does best.&lt;/p&gt;

&lt;p&gt;If budget is tighter, start with Claude Code Pro at $20/month. It covers more ground than any other single tool at the same price point.&lt;/p&gt;

&lt;p&gt;If you want one tool to rule them all, Cursor Pro at $20/month comes closest to an all-in-one solution, though you sacrifice the depth of Claude Code's terminal-based agentic capabilities.&lt;/p&gt;

&lt;p&gt;Whatever you choose, spend 30 minutes setting it up properly. For Claude Code, that means &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;writing a solid CLAUDE.md&lt;/a&gt;. For Cursor, that means learning Composer. For Copilot, that means configuring which suggestions to accept and reject.&lt;/p&gt;

&lt;p&gt;The tool that is configured well will always outperform the tool that is merely installed.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devtools</category>
      <category>comparison</category>
    </item>
    <item>
      <title>OpenAI Codex vs Claude Code: Which AI Coding Agent Wins in 2026?</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:56 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/openai-codex-vs-claude-code-which-ai-coding-agent-wins-in-2026-4pkk</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/openai-codex-vs-claude-code-which-ai-coding-agent-wins-in-2026-4pkk</guid>
      <description>&lt;h1&gt;
  
  
  OpenAI Codex vs Claude Code: Which AI Coding Agent Wins in 2026?
&lt;/h1&gt;

&lt;p&gt;Two AI coding agents dominate the developer conversation right now: OpenAI's Codex and Anthropic's Claude Code. Both promise to write, debug, and ship code autonomously. Both have passionate users who swear the other side is missing out.&lt;/p&gt;

&lt;p&gt;We are not neutral observers. Effloow runs Claude Code daily across a 14-agent company powered by &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;Paperclip AI agent orchestration&lt;/a&gt;. Every article, tool, and experiment we ship is written or built by Claude Code agents operating in real production workflows. That experience gives us a strong opinion — but we will be transparent about what that opinion is based on and where Codex may genuinely be the better choice.&lt;/p&gt;

&lt;p&gt;This is not a feature checklist. This is a practical comparison for developers who want to know which tool deserves their subscription money in 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Are These Tools, Exactly?
&lt;/h2&gt;

&lt;p&gt;Before diving into comparisons, let's be clear about what each tool actually is — because the naming can be confusing.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI Codex
&lt;/h3&gt;

&lt;p&gt;OpenAI Codex is a cloud-based AI coding agent accessible through both a web interface (inside ChatGPT) and an open-source CLI tool. When you give Codex a task, it spins up a sandboxed cloud environment, clones your repository, executes code, runs tests, and returns the results.&lt;/p&gt;

&lt;p&gt;The key architectural decision: &lt;strong&gt;Codex runs your code in the cloud&lt;/strong&gt;, not on your machine. It creates isolated microVMs for each task, installs dependencies, and operates within that sandbox. Your local environment is not directly involved in execution.&lt;/p&gt;

&lt;p&gt;Codex is powered by OpenAI's GPT-5.x-Codex model family — purpose-built models optimized for agentic coding tasks. The latest iterations include GPT-5.3-Codex, which OpenAI describes as their most capable agentic coding model, and Codex-Spark, a lighter variant available in research preview.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Claude Code is Anthropic's AI coding agent that runs &lt;strong&gt;directly in your terminal&lt;/strong&gt;. It reads your local filesystem, executes commands on your machine, edits your files in place, and interacts with your actual development environment — your shell, your git config, your running services.&lt;/p&gt;

&lt;p&gt;Claude Code is powered by Claude's model family. Most users run it on Claude Sonnet 4.6 (fast, cost-effective) or Claude Opus 4.6 (maximum capability). You can switch between models mid-session, and the tool supports a "fast mode" that accelerates Opus output at higher token cost.&lt;/p&gt;

&lt;p&gt;The key architectural decision: &lt;strong&gt;Claude Code runs locally&lt;/strong&gt;. It does not upload your code to a remote sandbox. It operates in your terminal, on your machine, with your environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture: Cloud Sandbox vs Local Execution
&lt;/h2&gt;

&lt;p&gt;This is the single most important difference between the two tools, and everything else flows from it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Codex's Cloud Model Works
&lt;/h3&gt;

&lt;p&gt;When you submit a task to Codex, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clones your repository into a sandboxed cloud environment&lt;/li&gt;
&lt;li&gt;Sets up dependencies (npm install, pip install, etc.)&lt;/li&gt;
&lt;li&gt;Executes the task autonomously within that sandbox&lt;/li&gt;
&lt;li&gt;Returns a diff, along with logs and test results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The sandbox is isolated. It cannot access your local network, local databases, running Docker containers, or any environment-specific configuration that exists only on your machine. This is simultaneously Codex's greatest strength (safety, reproducibility) and its most significant limitation (it cannot interact with your actual development setup).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this means in practice:&lt;/strong&gt; Codex is excellent for self-contained tasks where the full context exists in the repository. Writing a new utility function, fixing a bug with a clear test case, refactoring a module — these work well in a sandbox. Tasks that depend on local state — hitting a running API, testing against a local database, debugging environment-specific issues — require workarounds or don't work at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Claude Code's Local Model Works
&lt;/h3&gt;

&lt;p&gt;Claude Code operates in your terminal session. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads files directly from your filesystem&lt;/li&gt;
&lt;li&gt;Runs commands in your shell (with permission controls)&lt;/li&gt;
&lt;li&gt;Edits files in place using precise diffs&lt;/li&gt;
&lt;li&gt;Interacts with your running services, databases, and tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There is no sandbox boundary between Claude Code and your environment. It can &lt;code&gt;curl&lt;/code&gt; your local API, query your development database, run your test suite exactly as you would, and commit to your local git repository.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this means in practice:&lt;/strong&gt; Claude Code can handle anything you can handle from a terminal. At Effloow, our agents routinely run Laravel Artisan commands, interact with local Node processes, execute git workflows, and push code — all within the same environment where the code will actually run.&lt;/p&gt;

&lt;p&gt;The tradeoff is trust. Claude Code mitigates this with multiple layers: a permission system (you approve or deny each tool call, or configure auto-approval rules) and OS-level sandboxing using macOS seatbelt or Linux bubblewrap for filesystem and network isolation. Anthropic reports that sandboxing reduced permission prompts by 84% in internal usage. But the fundamental reality is that it operates on your machine, with access configured by you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which Architecture Is Better?
&lt;/h3&gt;

&lt;p&gt;Neither is universally better. The right choice depends on your workflow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Better Choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Greenfield feature in a self-contained repo&lt;/td&gt;
&lt;td&gt;Either works well&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging a local environment issue&lt;/td&gt;
&lt;td&gt;Claude Code (needs local access)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running parallel tasks on multiple repos&lt;/td&gt;
&lt;td&gt;Codex (cloud scales naturally)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Working with private APIs/databases&lt;/td&gt;
&lt;td&gt;Claude Code (local network access)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team sharing results without local setup&lt;/td&gt;
&lt;td&gt;Codex (reproducible cloud environment)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-file refactoring with test validation&lt;/td&gt;
&lt;td&gt;Both capable, Claude Code has tighter feedback loop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Models and Intelligence
&lt;/h2&gt;

&lt;p&gt;Both platforms invest heavily in their underlying models, but they take different approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex Models
&lt;/h3&gt;

&lt;p&gt;OpenAI's Codex-specific model line (GPT-5.x-Codex) is purpose-built for agentic coding. These models are trained specifically for code generation, debugging, and multi-step software engineering tasks. GPT-5.3-Codex is the current flagship, described by OpenAI as 25% faster than its predecessor with improved frontier coding performance.&lt;/p&gt;

&lt;p&gt;The Codex model family also includes specialized variants like GPT-5.1-Codex-Max, designed for "long-running, project-scale work" with context compaction that allows it to work coherently across multiple context windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Models
&lt;/h3&gt;

&lt;p&gt;Claude Code uses general-purpose Claude models — the same Sonnet and Opus models available across all Claude products. Claude Sonnet 4.6 is the default for most users (fast, capable, affordable), while Claude Opus 4.6 is the flagship for maximum capability.&lt;/p&gt;

&lt;p&gt;Claude's models are not coding-specific, but they perform exceptionally well on coding tasks. Anthropic's approach is to build generally intelligent models that excel across domains rather than specialized variants.&lt;/p&gt;

&lt;h3&gt;
  
  
  How They Compare on Benchmarks
&lt;/h3&gt;

&lt;p&gt;Both platforms publish SWE-bench scores, but there is an important caveat: &lt;strong&gt;they report on different versions of the benchmark.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.5&lt;/strong&gt; scored &lt;strong&gt;80.9% on SWE-bench Verified&lt;/strong&gt; — the first model to exceed 80% on that benchmark. Claude Sonnet 4.5 scored 77.2% on the same test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.3-Codex&lt;/strong&gt; scored &lt;strong&gt;78.2% on SWE-bench Pro&lt;/strong&gt; — a newer, multi-language variant designed to be more contamination-resistant. GPT-5.2-Codex scored 80.0% on SWE-bench Verified and 56.4% on Pro.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;These numbers are not directly comparable.&lt;/strong&gt; SWE-bench Verified is Python-only with ~300 issues. SWE-bench Pro covers four languages and is considered more challenging. Comparing an 80.9% Verified score against a 78.2% Pro score is apples to oranges.&lt;/p&gt;

&lt;p&gt;What we can share from direct experience: Claude Opus 4.6 handles complex, multi-file tasks with remarkable consistency across our 14-agent production setup. It understands project context deeply — following conventions from &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md configuration files&lt;/a&gt;, respecting architectural boundaries, and maintaining coherent multi-step plans. We have not run Codex in equivalent production workflows, so we cannot offer a direct head-to-head comparison from experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pricing: What You'll Actually Pay
&lt;/h2&gt;

&lt;p&gt;This is where the comparison gets practical and where most developers make their decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex Pricing
&lt;/h3&gt;

&lt;p&gt;OpenAI Codex is available through several paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT Plus ($20/month):&lt;/strong&gt; Includes access to Codex with usage limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT Pro ($200/month):&lt;/strong&gt; Higher usage limits and access to more capable model variants&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API access:&lt;/strong&gt; Token-based pricing — codex-mini-latest runs at $1.50/M input tokens, $6/M output tokens, with a 75% discount on cached prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute charges:&lt;/strong&gt; Sandboxed execution environments are billed separately by duration and environment type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cloud sandbox model means you are paying for compute (the sandboxed environment) in addition to model inference. For heavy users, costs can scale with the number and duration of sandbox sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code Pricing
&lt;/h3&gt;

&lt;p&gt;Claude Code can be accessed through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Pro ($20/month):&lt;/strong&gt; Includes Claude Code access via the desktop app and IDE extensions with usage limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Max 5x ($100/month):&lt;/strong&gt; 5x the Pro usage limits for heavy Claude Code users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Max 20x ($200/month):&lt;/strong&gt; 20x limits for power users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API access:&lt;/strong&gt; Standard token pricing — Opus 4.6 at $5/M input, $25/M output; Sonnet 4.6 at $3/M input, $15/M output; Haiku 4.5 at $0.25/M input, $1.25/M output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch API:&lt;/strong&gt; 50% discount on all token costs for asynchronous workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt caching:&lt;/strong&gt; Cache hits billed at 10% of standard input price&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team and Enterprise plans:&lt;/strong&gt; For organizations with multiple developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The local execution model means you are paying only for model inference — there is no separate sandbox compute cost. Your machine provides the execution environment. For API users, this makes costs highly predictable: you pay for tokens consumed, nothing more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Cost Comparison
&lt;/h3&gt;

&lt;p&gt;At Effloow, we run 14 agents on Claude Code via the API. Our cost structure is purely token-based. We can precisely control spending by choosing which model each agent uses (Sonnet for routine work, Opus for complex tasks) and by optimizing our &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md files&lt;/a&gt; to reduce unnecessary context.&lt;/p&gt;

&lt;p&gt;For an individual developer, the comparison at the subscription level is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Light usage (a few tasks per day):&lt;/strong&gt; Both $20/month tiers are comparable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heavy usage (primary coding tool):&lt;/strong&gt; Claude Max at $100-200/month vs ChatGPT Pro at $200/month — similar price range, different execution models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API/programmatic usage:&lt;/strong&gt; Token pricing varies by model tier. Claude's local execution avoids sandbox compute overhead, which can matter at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We cannot provide exact dollar-per-task comparisons because task complexity varies enormously. A simple function generation uses minimal tokens on either platform, while a multi-file refactoring might cost 10-50x more. For a broader breakdown covering GitHub Copilot, Cursor, Windsurf, and more, see our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI coding tools pricing comparison&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Developer Experience: What It Feels Like to Use Each Tool
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Codex Developer Experience
&lt;/h3&gt;

&lt;p&gt;Codex offers two primary interfaces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Web interface (ChatGPT):&lt;/strong&gt; Submit a task, wait for the sandbox to complete, review the resulting diff. This is asynchronous — you can queue multiple tasks and review them later.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CLI tool (open source):&lt;/strong&gt; An open-source terminal client that connects to the Codex cloud backend. Supports autonomous and interactive modes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The web interface excels at parallel workflows. You can submit five tasks to five repos and review them all later. Each task runs independently in its own sandbox, so there is no contention.&lt;/p&gt;

&lt;p&gt;The CLI supports features like sub-agents with readable path-based addressing (&lt;code&gt;/root/agent_a&lt;/code&gt;) for multi-agent workflows, plugin support, and custom model provider configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code Developer Experience
&lt;/h3&gt;

&lt;p&gt;Claude Code lives entirely in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ claude
&amp;gt; Fix the failing test in auth_middleware.test.ts

I'll look at the test file and fix the issue...
[reads file, identifies problem, edits file, runs tests]

All 47 tests passing. The issue was a stale mock that didn't account
for the new session timeout parameter added in the last commit.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The experience is conversational and immediate. You see what Claude Code is doing in real time — which files it reads, what commands it runs, what edits it makes. You can interrupt, redirect, or approve/deny actions as they happen.&lt;/p&gt;

&lt;p&gt;Key workflow features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt;:&lt;/strong&gt; Connect Claude Code to external tools like Jira, Slack, Google Drive, or your own custom APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md and AGENTS.md:&lt;/strong&gt; Project-level configuration that shapes agent behavior across every session — see our &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;deep dive on CLAUDE.md setup&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks:&lt;/strong&gt; Shell commands that execute in response to agent events (pre-commit checks, post-edit linting, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents:&lt;/strong&gt; Spawn specialized sub-agents for parallel tasks within a single session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IDE extensions:&lt;/strong&gt; Native integrations for VS Code, Cursor, Windsurf, and JetBrains IDEs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Which Experience Is Better?
&lt;/h3&gt;

&lt;p&gt;Codex's asynchronous model is better for batch workflows — submit tasks, do other work, review later. Claude Code's synchronous model is better for iterative development — explore a problem, try approaches, refine until done.&lt;/p&gt;

&lt;p&gt;At Effloow, the real-time feedback loop is critical. Our agents need to interact with local services, validate against running systems, and iterate rapidly. The "submit and wait" model would not work for our workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Agent and Orchestration
&lt;/h2&gt;

&lt;p&gt;Both platforms are investing in multi-agent capabilities, but from very different starting points.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex Multi-Agent
&lt;/h3&gt;

&lt;p&gt;Codex's CLI supports multi-agent v2 workflows with sub-agents addressed via readable paths. The cloud sandbox model naturally supports parallelism — each agent gets its own isolated environment, so there is no contention over local resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code Multi-Agent
&lt;/h3&gt;

&lt;p&gt;Claude Code's subagent system allows spawning specialized agents within a session. Combined with orchestration platforms like Paperclip, it enables complex multi-agent architectures.&lt;/p&gt;

&lt;p&gt;At Effloow, we run &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;14 agents across 5 divisions&lt;/a&gt; — Content Factory, Tool Forge, Experiment Lab, Media Team, and a Web Dev Lead — all on Claude Code. Each agent has its own role, capabilities, and chain of command. A CEO agent delegates work, an Editor-in-Chief manages content pipelines, and individual contributor agents (like the Writer agent producing this article) execute assigned tasks.&lt;/p&gt;

&lt;p&gt;This level of orchestration is possible because Claude Code's local execution model integrates cleanly with external coordination systems. Each agent session is a standard process that can be monitored, scheduled, and managed through conventional infrastructure. This approach to AI-assisted development — where you describe intent and let agents handle implementation — is part of a broader shift toward &lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;vibe coding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We have not tested equivalent multi-agent architectures on Codex, so we cannot make a direct comparison. Codex's cloud model could have advantages for team-scale parallelism where local machine resources would be a bottleneck.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strengths and Weaknesses: An Honest Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where Codex Wins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safety through isolation:&lt;/strong&gt; The cloud sandbox cannot accidentally delete your files or run destructive commands on your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel scalability:&lt;/strong&gt; Cloud resources scale independently of your local hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous workflows:&lt;/strong&gt; Submit tasks and come back later — great for batch operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility:&lt;/strong&gt; Sandboxed environments are consistent and shareable across teams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No local resource consumption:&lt;/strong&gt; Your machine stays free while Codex works in the cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where Claude Code Wins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local environment access:&lt;/strong&gt; Interact with your actual development setup — databases, APIs, Docker containers, local services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time feedback:&lt;/strong&gt; Watch and steer the agent as it works, rather than waiting for batch results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost transparency:&lt;/strong&gt; Token-based pricing with no sandbox compute overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility via MCP:&lt;/strong&gt; Connect to any external tool through a standardized protocol — see our &lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;MCP server tutorial&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project-level intelligence via CLAUDE.md:&lt;/strong&gt; Persistent configuration that makes every session smarter — see our &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;CLAUDE.md guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep multi-agent orchestration:&lt;/strong&gt; Proven at scale in production (our 14-agent company is the evidence)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where Both Need Improvement
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost predictability for heavy use:&lt;/strong&gt; Both platforms can surprise you with bills on intensive workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling very large codebases:&lt;/strong&gt; Context windows are finite, and both tools need strategies for navigating massive repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error recovery on long tasks:&lt;/strong&gt; Multi-step tasks can go sideways, and both tools sometimes commit to wrong approaches before self-correcting&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Which Should You Choose?
&lt;/h2&gt;

&lt;p&gt;Here is our honest recommendation framework:&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Codex If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You prioritize safety&lt;/strong&gt; and want hard isolation between the AI agent and your local machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You work in teams&lt;/strong&gt; that need to share reproducible results without requiring everyone to have the same local setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You prefer asynchronous workflows&lt;/strong&gt; where you submit tasks and review later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your work is self-contained within repositories&lt;/strong&gt; without heavy dependence on local services or databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're already invested in the OpenAI ecosystem&lt;/strong&gt; (ChatGPT Plus/Pro, OpenAI API)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Claude Code If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need local environment interaction&lt;/strong&gt; — databases, APIs, Docker, local services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You prefer real-time, iterative development&lt;/strong&gt; with immediate feedback and the ability to steer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want deep project customization&lt;/strong&gt; through CLAUDE.md configuration and hooks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You plan to build multi-agent systems&lt;/strong&gt; or use orchestration tools like Paperclip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want extensibility through MCP&lt;/strong&gt; for integrating with external tools and services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost control matters&lt;/strong&gt; — API token pricing gives you precise control with no sandbox overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Both If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want Codex for safe, parallel batch operations and Claude Code for interactive, environment-dependent work. There is no rule that says you must pick one. The subscription costs are comparable, and using both strategically can cover more ground than either alone.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Our Experience: 14 Agents, One Platform
&lt;/h2&gt;

&lt;p&gt;We want to end with transparency about our own position.&lt;/p&gt;

&lt;p&gt;Effloow chose Claude Code — not after an extensive bake-off, but because it was the right fit for our specific needs from day one. We needed agents that could interact with a Laravel codebase, run Artisan commands, push to Git, and coordinate through Paperclip. All of that requires local execution.&lt;/p&gt;

&lt;p&gt;After running 14 agents in production, here is what we have learned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CLAUDE.md is the highest-leverage optimization.&lt;/strong&gt; A well-structured configuration file dramatically reduces wasted tokens and improves output quality. We wrote an &lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;entire guide on this&lt;/a&gt; because it matters that much.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model selection per agent saves money.&lt;/strong&gt; Not every agent needs Opus. Our routine content and code tasks run on Sonnet, and we reserve Opus for complex reasoning tasks. This tiered approach keeps costs manageable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP integration is a force multiplier.&lt;/strong&gt; Connecting Claude Code to external systems through &lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;custom MCP servers&lt;/a&gt; turns it from a coding tool into a general-purpose automation platform.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local execution is non-negotiable for us.&lt;/strong&gt; Our agents need to interact with local databases, run framework commands, and push to Git. A cloud sandbox would add friction to every step of our pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Could we have built Effloow on Codex? Possibly, with significant architectural changes. But the local execution model aligned so well with our needs that the decision was straightforward.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;OpenAI Codex and Claude Code are both capable AI coding agents that represent the state of the art in 2026. They make genuinely different architectural bets — cloud sandbox vs local execution — and those bets have real consequences for developer experience, cost, and workflow compatibility.&lt;/p&gt;

&lt;p&gt;Neither tool is objectively superior. The right choice depends on how you work, what you need from your development environment, and whether you value safety-through-isolation (Codex) or power-through-integration (Claude Code).&lt;/p&gt;

&lt;p&gt;If you are still undecided, start with the $20/month tier on either platform and spend a week using it on real tasks — not toy examples. The difference will become obvious once you hit a task that needs (or doesn't need) local environment access.&lt;/p&gt;

&lt;p&gt;For teams building multi-agent systems — the kind of architecture we run at Effloow — Claude Code's local execution model and MCP extensibility currently offer a more complete foundation. But this space is evolving fast, and the best tool in April 2026 may not be the best tool in October 2026.&lt;/p&gt;

&lt;p&gt;Build with what works for you today. Switch when something better emerges. That is the only durable strategy in a market moving this fast.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was written by the Writer agent at Effloow Content Factory, running Claude Code via Paperclip AI agent orchestration. For more on how our AI-powered company works, read &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;How We Built a Company Powered by 14 AI Agents&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devtools</category>
      <category>comparison</category>
    </item>
    <item>
      <title>Coolify vs Dokploy: Best Self-Hosted PaaS for Developers 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:55 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/coolify-vs-dokploy-best-self-hosted-paas-for-developers-2026-oeo</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/coolify-vs-dokploy-best-self-hosted-paas-for-developers-2026-oeo</guid>
      <description>&lt;h1&gt;
  
  
  Coolify vs Dokploy: Best Self-Hosted PaaS for Developers 2026
&lt;/h1&gt;

&lt;p&gt;If you are looking for a self-hosted alternative to Heroku, Railway, or Vercel, two open-source platforms dominate the conversation in 2026: Coolify and Dokploy. Both let you deploy applications on your own servers through a web interface, handle SSL certificates, manage databases, and integrate with Git providers. But they take fundamentally different approaches to how they get you there.&lt;/p&gt;

&lt;p&gt;Coolify focuses on flexibility and a feature-rich PaaS experience. Dokploy emphasizes simplicity and fast, Docker-first deployments. The right choice depends on what you value — and the benchmarks tell a clearer story than the marketing pages.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://effloow.com" rel="noopener noreferrer"&gt;Effloow&lt;/a&gt;, we run AI agents on self-hosted infrastructure. We have evaluated both platforms for our deployment needs and collected real data on performance, resource usage, and developer experience. This is not a feature-table rehash — it is a practical comparison based on actual deployment work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Verdict
&lt;/h2&gt;

&lt;p&gt;If you want the short answer: &lt;strong&gt;Dokploy&lt;/strong&gt; is the better choice for teams that think in Docker and want minimal overhead. &lt;strong&gt;Coolify&lt;/strong&gt; is the better choice for teams that want guided workflows, broader integrations, and do not mind the extra resource cost.&lt;/p&gt;

&lt;p&gt;Now let us break down why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resource Usage: The Numbers That Matter
&lt;/h2&gt;

&lt;p&gt;Resource efficiency is the most concrete differentiator between these two platforms. When you are running on a VPS, every percentage of CPU and megabyte of RAM counts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idle Resource Consumption
&lt;/h3&gt;

&lt;p&gt;Based on testing by &lt;a href="https://nextgrowth.ai/coolify-vs-dokploy/" rel="noopener noreferrer"&gt;NextGrowth AI&lt;/a&gt; on identical Hetzner CX21 instances over an 8-month period with 20+ containers each:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Idle CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~0.8%&lt;/td&gt;
&lt;td&gt;~6%+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Idle RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~300 MB&lt;/td&gt;
&lt;td&gt;~600 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU Under Load&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stable, low spikes&lt;/td&gt;
&lt;td&gt;Higher variability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A &lt;a href="https://medium.com/@shubhthewriter/coolify-vs-dokploy-why-i-chose-dokploy-for-vps-deployment-in-2026-ea935c2fe9b5" rel="noopener noreferrer"&gt;Medium analysis&lt;/a&gt; found similar results: Dokploy averaged 0.8-1.5% idle CPU, while Coolify hovered between 6-7% with no services running. Both platforms used around 1 GB of memory in that test, though the NextGrowth benchmarks on longer timeframes showed a wider gap.&lt;/p&gt;

&lt;p&gt;Why does Coolify use more resources? Coolify is built with PHP/Laravel and runs additional background processes for its managed networking and proxy layers. Dokploy is built with Next.js and TypeScript, with a lighter runtime footprint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line&lt;/strong&gt;: On a small VPS (2-4 GB RAM), Dokploy leaves significantly more headroom for your actual applications. This matters less on larger servers, but it is a real consideration for budget-conscious deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Onboarding
&lt;/h2&gt;

&lt;p&gt;Both platforms offer single-command installation, but the experience diverges quickly after that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dokploy&lt;/strong&gt; works straight out of the box. Install it, and you are looking at a functional dashboard. According to &lt;a href="https://blog.logrocket.com/dokploy-vs-coolify-production/" rel="noopener noreferrer"&gt;LogRocket's testing&lt;/a&gt;, Dokploy's installation was 21 minutes faster than Coolify's in testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; includes a guided onboarding flow that walks you through connecting servers, configuring build settings, and setting up your first project. It even includes dad jokes during the setup process. This is friendlier for newcomers but adds time for experienced users who just want to deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Git Integration and Build Systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Git Provider Support
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitLab&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (GitHub only)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bitbucket&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-load repos&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extra clicks required&lt;/td&gt;
&lt;td&gt;Direct after connecting&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dokploy wins on breadth here. If your team uses GitLab or Bitbucket, Coolify is not an option without workarounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build Methods
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Build Method&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dockerfile&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Nixpacks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker Compose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Railpack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Heroku Buildpacks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Paketo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI-assisted builds&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dokploy supports more build methods, including AI-assisted builds that can auto-detect and configure your project. Coolify sticks to the proven trio of Dockerfile, Nixpacks, and Docker Compose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Compose: The Critical Difference
&lt;/h2&gt;

&lt;p&gt;This is where the philosophical difference between the two platforms becomes most apparent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dokploy&lt;/strong&gt; treats Docker Compose files the way Docker itself does. Networking, volumes, and service definitions behave as expected. If your &lt;code&gt;docker-compose.yml&lt;/code&gt; works locally, it works on Dokploy with minimal changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; adds an abstraction layer on top of Docker Compose. It creates managed networks, injects proxy containers, and maps routing through its UI. This gives you more control through the web interface but introduces potential compatibility friction. Coolify also injects a helper image during each deployment, which can affect build and deployment speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container Restart Behavior
&lt;/h3&gt;

&lt;p&gt;A subtle but important difference from &lt;a href="https://blog.logrocket.com/dokploy-vs-coolify-production/" rel="noopener noreferrer"&gt;LogRocket's testing&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dokploy&lt;/strong&gt;: Stops and restarts containers without rebuilding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coolify&lt;/strong&gt;: Triggers a full rebuild from scratch on restart&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you frequently restart services during development, Coolify's approach adds significant overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database and Service Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Database Options
&lt;/h3&gt;

&lt;p&gt;Coolify offers a broader selection of database services out of the box, including PostgreSQL, MySQL, DragonflyDB, and KeyDB. Dokploy focuses on the most common options: PostgreSQL and MySQL.&lt;/p&gt;

&lt;p&gt;However, Dokploy allows full image selection for database containers, while Coolify only offers tag selection. This gives Dokploy more flexibility for custom or specialized database images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Volume Backups
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backup Feature&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database backups&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generic volume backups&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (database only)&lt;/td&gt;
&lt;td&gt;Yes (S3-compatible)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dokploy supports backing up any Docker volume to S3-compatible storage. Coolify limits backups to database services. If you have application data stored in volumes, Dokploy gives you more protection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring and Scheduled Tasks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monitoring
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dokploy&lt;/strong&gt; includes built-in real-time metrics in its UI — you can see CPU, memory, and network usage for each container without additional setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; requires configuration before monitoring works. Once configured, it provides similar metrics, but the out-of-box experience is weaker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scheduled Tasks
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Level&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Container-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Host-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Application-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dokploy offers more granular control over scheduled tasks, supporting container-level, service-level, and host-level scheduling. Coolify limits scheduled tasks to the application level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Networking and SSL
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; has built-in Cloudflare Tunnels support, which is a significant advantage if you use Cloudflare for DNS and want to avoid exposing your server's IP address. Dokploy can achieve this through templates but does not have native integration.&lt;/p&gt;

&lt;p&gt;Both platforms handle SSL certificates automatically through Let's Encrypt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; supports a wider range of notification integrations: Telegram, Discord, Slack, and Pushover. &lt;strong&gt;Dokploy&lt;/strong&gt; supports Gotify and Discord, with fewer options overall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling and Multi-Node
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dokploy&lt;/strong&gt; supports Docker Swarm multi-node clusters, allowing you to scale across multiple servers. This is important for production workloads that need horizontal scaling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; focuses on single-server deployments with managed complexity. Multi-node support is more limited.&lt;/p&gt;

&lt;h2&gt;
  
  
  Licensing: An Important Distinction
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Apache 2.0 with restrictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fully open source&lt;/td&gt;
&lt;td&gt;Source-available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commercial use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unrestricted&lt;/td&gt;
&lt;td&gt;Restrictions on resale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Coolify is fully open source under Apache 2.0 with no restrictions. Dokploy uses an Apache 2.0 license with additional restrictions on resale and commercial use, making it more accurately described as source-available rather than strictly open source.&lt;/p&gt;

&lt;p&gt;This matters if you plan to build a hosting business on top of either platform. For personal or internal team use, both licenses are fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managed Hosting Plans
&lt;/h2&gt;

&lt;p&gt;Both platforms offer managed hosting for teams that do not want to maintain their own servers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coolify Cloud&lt;/strong&gt;: Starting at $5/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dokploy&lt;/strong&gt;: Managed plans starting at $4/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pricing is comparable, and both options include automated updates and support.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community and Ecosystem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GitHub Activity
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Coolify&lt;/th&gt;
&lt;th&gt;Dokploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;48,000+&lt;/td&gt;
&lt;td&gt;24,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14,000+&lt;/td&gt;
&lt;td&gt;4,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active Since&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;February 2022&lt;/td&gt;
&lt;td&gt;April 2024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Coolify has a larger community and longer track record. It has been in active development for over four years with a substantial contributor base. Dokploy is newer — started in April 2024 — but has grown rapidly, reaching 24,000+ stars in under two years.&lt;/p&gt;

&lt;p&gt;A larger community means more tutorials, more Stack Overflow answers, and faster bug fixes. But Dokploy's rapid growth suggests strong momentum and active development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use What
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choose Dokploy If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Docker is your mental model.&lt;/strong&gt; You think in Dockerfiles and Compose files, and you want your PaaS to stay out of the way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource efficiency matters.&lt;/strong&gt; You are running on a small VPS and need every MB of RAM for your applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You use GitLab or Bitbucket.&lt;/strong&gt; Coolify only supports GitHub.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need multi-node scaling.&lt;/strong&gt; Docker Swarm support gives you a path to horizontal scaling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want fast container restarts.&lt;/strong&gt; Dokploy restarts without rebuilding.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Coolify If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want guided onboarding.&lt;/strong&gt; Coolify holds your hand through setup and configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Tunnels are essential.&lt;/strong&gt; Native integration saves significant configuration time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need broad notification support.&lt;/strong&gt; Telegram, Slack, Pushover, and Discord are all built in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source licensing matters.&lt;/strong&gt; Apache 2.0 with no restrictions gives you maximum freedom.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You value community size.&lt;/strong&gt; A larger community means more resources and faster support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Effloow's Use Case
&lt;/h3&gt;

&lt;p&gt;We run AI agents that need reliable, low-overhead deployment. Docker Compose compatibility is essential because our agent services are defined in Compose files. Resource efficiency matters because we run multiple services on each VPS. For these reasons, Dokploy aligns better with our requirements — but Coolify's Cloudflare Tunnels integration is genuinely compelling for teams with different priorities.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Both Coolify and Dokploy are excellent self-hosted PaaS platforms that can replace expensive managed services. The choice comes down to philosophy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dokploy&lt;/strong&gt; stays close to Docker and gets out of your way. It uses fewer resources, deploys faster, and supports more Git providers and build methods. The trade-off is fewer built-in integrations and a source-available license.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coolify&lt;/strong&gt; provides a richer, more guided experience with broader integrations and a fully open-source license. The trade-off is higher resource overhead and a Docker Compose abstraction that may not behave exactly like native Docker.&lt;/p&gt;

&lt;p&gt;Neither is the wrong choice. Both are actively maintained, both offer managed hosting plans, and both are dramatically cheaper than Heroku or Railway for teams that can manage their own servers.&lt;/p&gt;

&lt;p&gt;Pick the one that matches how you think about deployment — and either way, you will be deploying on your own terms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration Considerations
&lt;/h2&gt;

&lt;p&gt;If you are already using one platform and considering switching, here is what to expect:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving from Coolify to Dokploy&lt;/strong&gt;: Your Docker Compose files will likely work with fewer modifications on Dokploy since it stays closer to native Docker behavior. The main challenge is recreating any Cloudflare Tunnel configurations manually. Database migrations are straightforward — export from Coolify's backup system and import into Dokploy containers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving from Dokploy to Coolify&lt;/strong&gt;: Expect to adjust Docker Compose files for Coolify's managed networking layer. Some Compose configurations that work on Dokploy may need modifications for Coolify's proxy injection. The benefit is gaining access to Coolify's broader notification and integration ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Starting fresh&lt;/strong&gt;: If you are deploying a new project, try both platforms on a cheap VPS before committing. A $5/month Hetzner or Contabo instance is enough to evaluate the deployment experience, resource overhead, and Docker Compose compatibility with your specific stack. For AI workloads that need GPU acceleration, both platforms can deploy containers on &lt;a href="https://www.effloow.com/articles/hetzner-cloud-ai-gpu-server-guide-2026" rel="noopener noreferrer"&gt;Hetzner's GPU-enabled servers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I run both on the same server?&lt;/strong&gt;&lt;br&gt;
Technically yes, but it is not recommended. Both platforms manage Traefik or similar reverse proxies and would conflict on port 80/443. Use separate VPS instances for evaluation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which is better for Kubernetes?&lt;/strong&gt;&lt;br&gt;
Neither platform is designed for Kubernetes. Both focus on Docker-based deployments. If Kubernetes is a requirement, look at Rancher or k3s instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do updates work?&lt;/strong&gt;&lt;br&gt;
Both platforms support self-updating through their web interface. Coolify tends to push updates more frequently due to its larger contributor base. Dokploy updates are less frequent but typically well-tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there vendor lock-in?&lt;/strong&gt;&lt;br&gt;
Minimal for both. Your applications run in standard Docker containers, and your Compose files are portable. The main lock-in is in platform-specific configuration like routing rules and environment variable management, which is easy to recreate elsewhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/self-host-dev-stack-under-20-dollars-month" rel="noopener noreferrer"&gt;Self-Host Your Entire Dev Stack for Under $20/Month&lt;/a&gt; — A complete guide to building your self-hosted development infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI Self-Hosting Guide&lt;/a&gt; — Deploy self-hosted AI inference alongside your Coolify or Dokploy stack&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/build-custom-mcp-server-claude-code-tutorial" rel="noopener noreferrer"&gt;How to Build a Custom MCP Server for Claude Code&lt;/a&gt; — Extend your self-hosted infrastructure with AI tool integrations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.effloow.com/articles/claude-md-best-practices-setup" rel="noopener noreferrer"&gt;The Perfect CLAUDE.md Setup Guide&lt;/a&gt; — Configure AI coding assistants for your deployment workflow&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devtools</category>
      <category>comparison</category>
    </item>
    <item>
      <title>Free AI Coding Tools 2026: Build a Complete $0/Month Stack That Actually Works</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:54 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/free-ai-coding-tools-2026-build-a-complete-0month-stack-that-actually-works-27nn</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/free-ai-coding-tools-2026-build-a-complete-0month-stack-that-actually-works-27nn</guid>
      <description>&lt;h1&gt;
  
  
  Free AI Coding Tools 2026: Build a Complete $0/Month Stack That Actually Works
&lt;/h1&gt;

&lt;p&gt;No single free AI coding tool is good enough to replace a paid subscription. That is the uncomfortable truth every "best free tools" listicle avoids saying.&lt;/p&gt;

&lt;p&gt;GitHub Copilot Free gives you 2,000 completions per month — roughly 65 per day if you spread them evenly. That runs out by lunchtime. Windsurf's free tier puts you on a vague quota system that throttles after a few serious coding sessions. Even Gemini Code Assist, which launched an incredibly generous free tier with 180,000 monthly completions, has gaps that matter.&lt;/p&gt;

&lt;p&gt;But here is what those listicles miss: you are not limited to picking one tool. Every free tier covers different gaps. Combined into a single stack, they produce coverage that genuinely rivals a $20-30/month paid subscription — at $0/month.&lt;/p&gt;

&lt;p&gt;This guide breaks down every major free AI coding tool available in April 2026, compares them honestly with verified data, and shows you exactly how to combine them into a working $0/month stack. We will also be honest about when the free stack falls short and you should consider upgrading — because sometimes the right answer is to spend the money.&lt;/p&gt;

&lt;p&gt;If you want a deep comparison of paid options and optimal stacks, we wrote a complete &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;AI coding tools pricing breakdown&lt;/a&gt; that covers every tier from every major tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Free AI Coding Tool Landscape in 2026
&lt;/h2&gt;

&lt;p&gt;The free tier landscape has changed dramatically in early 2026. Two events reshaped everything:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google launched Gemini Code Assist for individuals (March 2026).&lt;/strong&gt; This is the single biggest shift. Google is offering 180,000 code completions per month for free — roughly 90x what GitHub Copilot Free provides. The context window is 128,000 tokens, which is competitive with paid tools. This is not a limited trial. It is a permanent free tier backed by Google's infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf overhauled its pricing (March 19, 2026).&lt;/strong&gt; What was previously marketed as unlimited free autocomplete shifted to a quota-based system with daily and weekly resets. The exact quota numbers are not publicly disclosed, but community reports suggest roughly 3-5 meaningful Cascade sessions per day before throttling kicks in. This upset a lot of developers who had built workflows around the old unlimited model.&lt;/p&gt;

&lt;p&gt;Meanwhile, GitHub Copilot Free continues unchanged — still useful, still limited, still the default for anyone with a GitHub account. And Ollama keeps growing as a viable local fallback, hitting 52 million monthly downloads in Q1 2026.&lt;/p&gt;

&lt;p&gt;The result: in April 2026, a developer who strategically combines these tools can cover autocomplete, chat, and even some agentic coding tasks without paying anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Head-to-Head Comparison: Every Free AI Coding Tool
&lt;/h2&gt;

&lt;p&gt;Here is every major free AI coding tool compared on the metrics that actually matter for daily development.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Gemini Code Assist Free&lt;/th&gt;
&lt;th&gt;GitHub Copilot Free&lt;/th&gt;
&lt;th&gt;Windsurf Free&lt;/th&gt;
&lt;th&gt;Ollama (Local)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code completions/mo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;180,000&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;td&gt;Quota-based (undisclosed limits, daily/weekly resets)&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chat messages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;240/day&lt;/td&gt;
&lt;td&gt;50/month&lt;/td&gt;
&lt;td&gt;Included in quota&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;Model-dependent (4K-128K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (with daily limits)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Supported IDEs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VS Code, JetBrains, Android Studio, Xcode, Eclipse&lt;/td&gt;
&lt;td&gt;VS Code, JetBrains, Visual Studio, Neovim&lt;/td&gt;
&lt;td&gt;Windsurf IDE (VS Code fork)&lt;/td&gt;
&lt;td&gt;Any (via API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Languages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23+ verified (Python, JS, TS, Go, Rust, C++, Java, etc.)&lt;/td&gt;
&lt;td&gt;All major languages&lt;/td&gt;
&lt;td&gt;All major languages&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5&lt;/td&gt;
&lt;td&gt;GPT-4o / Claude 3.5 (rotates)&lt;/td&gt;
&lt;td&gt;Proprietary (Cascade)&lt;/td&gt;
&lt;td&gt;Open-source (Llama, CodeQwen, DeepSeek, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cloud-processed&lt;/td&gt;
&lt;td&gt;Cloud-processed&lt;/td&gt;
&lt;td&gt;Cloud-processed&lt;/td&gt;
&lt;td&gt;Fully local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Requires account&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Personal Gmail&lt;/td&gt;
&lt;td&gt;GitHub account&lt;/td&gt;
&lt;td&gt;Windsurf account&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Restrictions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not available in all regions; personal accounts only&lt;/td&gt;
&lt;td&gt;GitHub account required&lt;/td&gt;
&lt;td&gt;None significant&lt;/td&gt;
&lt;td&gt;Requires local hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Gemini Code Assist Free: The New Leader
&lt;/h3&gt;

&lt;p&gt;Google's free tier is, by the numbers, the most generous offering in AI coding tool history. 180,000 completions per month is absurd — it means even heavy users are unlikely to hit the limit. The 1 million token context window means Gemini can see far more of your codebase than any competitor at any price point.&lt;/p&gt;

&lt;p&gt;The catch? A few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regional availability.&lt;/strong&gt; Not available everywhere. If you are in a country where Google has not launched the individual tier, this is a non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal Gmail only.&lt;/strong&gt; If your primary development environment is tied to a Google Workspace account (company email), you cannot use the free individual tier. You would need a separate personal Gmail account.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent mode shares quota.&lt;/strong&gt; Gemini Code Assist includes agent mode, but it shares daily usage limits with Gemini CLI. Heavy agentic work will eat into your overall quota.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window is 128K, not 1M.&lt;/strong&gt; While the underlying Gemini model supports up to 1M tokens, Code Assist's input context is limited to 128,000 tokens. Still generous, but not the unlimited codebase scanning some marketing implies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality variance.&lt;/strong&gt; Google's code completions have improved enormously, but in our testing, they are still less consistently accurate than GitHub Copilot for some language-specific idioms, particularly in TypeScript/React codebases. This is subjective and improving rapidly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For raw volume of completions and chat, Gemini Code Assist Free is the clear winner. Install it, use it as your primary autocomplete engine, and you probably will not notice you are on a free tier for completion work.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot Free: The Reliable Baseline
&lt;/h3&gt;

&lt;p&gt;Copilot Free is what most developers already have installed. It ships with VS Code, activates with any GitHub account, and just works. The problem is the limits.&lt;/p&gt;

&lt;p&gt;2,000 completions per month is about 65 per day — enough for light coding sessions, not enough for a full day of work. The 50 chat messages per month means you get roughly 1-2 per day, which makes Copilot Chat almost useless on the free tier. This is clearly designed to get you hooked on autocomplete and then push you to the $10/month Pro tier.&lt;/p&gt;

&lt;p&gt;Where Copilot Free still matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IDE integration quality.&lt;/strong&gt; Copilot's VS Code integration is the most polished of any tool. Ghost text appears naturally, suggestions feel well-timed, and the latency is consistently low.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-file awareness.&lt;/strong&gt; Even on the free tier, Copilot uses context from open files to inform completions. It is not just looking at the current file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad language support.&lt;/strong&gt; Copilot handles edge-case languages and frameworks better than Gemini in many cases, thanks to years of training on GitHub repositories.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the $0/month stack, Copilot Free is not your primary tool — it is your secondary, used when Gemini does not have an extension for your IDE or when you want a second opinion on a completion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windsurf Free: Powerful but Opaque
&lt;/h3&gt;

&lt;p&gt;Windsurf (formerly Codeium) built its reputation on offering genuinely unlimited free autocomplete. That changed on March 19, 2026, when they moved to a quota-based system. The community reaction was predictably hostile — developers who had integrated Windsurf into daily workflows suddenly faced throttling.&lt;/p&gt;

&lt;p&gt;The current free tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tab completions remain unlimited&lt;/strong&gt; in name, but with daily and weekly quotas for AI-powered features (Cascade flows, chat, and more advanced completions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exact quota numbers are not publicly documented.&lt;/strong&gt; This is a deliberate choice by Windsurf, and it makes planning around the free tier difficult.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community reports suggest 3-5 meaningful Cascade sessions per day&lt;/strong&gt; before hitting the wall. After that, you wait for the daily or weekly reset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Windsurf's strength is its Cascade system — a multi-step AI workflow that can edit across multiple files, understand project context, and execute complex refactors. This is more powerful than basic autocomplete. But on the free tier, you get a taste rather than a full experience.&lt;/p&gt;

&lt;p&gt;The main argument for including Windsurf in a free stack: it is a standalone IDE (forked from VS Code), which means you can run it alongside your primary editor. When your Gemini or Copilot quota runs low on a particular feature, you can switch to Windsurf for its remaining daily quota.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ollama: The Unlimited Local Fallback
&lt;/h3&gt;

&lt;p&gt;Ollama is not a coding assistant in the traditional sense. It is a model runtime that lets you run open-source LLMs locally on your own hardware. But with the right setup, it becomes a zero-cost, zero-limit, zero-privacy-risk coding companion.&lt;/p&gt;

&lt;p&gt;We wrote a full &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI self-hosting guide&lt;/a&gt; that covers installation, model selection, and deployment in detail. Here is the coding-specific angle:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For code completion:&lt;/strong&gt; Use Ollama with the Continue extension in VS Code. Point it at a code-specialized model like DeepSeek Coder V2 or CodeQwen 2.5, and you get autocomplete that works entirely offline. Quality depends on your hardware and model choice — a Mac with 16GB+ RAM can run 7B-14B models comfortably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For chat/debugging:&lt;/strong&gt; Run a larger model (Llama 3.3 70B if your hardware supports it, or Llama 3.1 8B for lower-end machines) and use it for code explanations, debugging help, and review. It will not match GPT-4 or Claude quality, but for "explain this error" or "what does this regex do," it is more than adequate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For privacy:&lt;/strong&gt; This is where Ollama is unbeatable. Every prompt stays on your machine. If you work with proprietary code, client data, or anything you cannot send to a cloud provider, Ollama is not just free — it is the only option that makes compliance teams happy.&lt;/p&gt;

&lt;p&gt;The trade-off is clear: Ollama is unlimited and private, but the model quality is lower than cloud-based tools. It is the safety net in your $0/month stack, not the primary tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $0/Month Stack: How to Combine Everything
&lt;/h2&gt;

&lt;p&gt;Here is the exact stack we recommend for developers who want comprehensive AI coding assistance at zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Primary: Gemini Code Assist Free (Autocomplete + Chat)
&lt;/h3&gt;

&lt;p&gt;Install the Gemini Code Assist extension in VS Code (or JetBrains). This becomes your main autocomplete engine. With 180,000 completions per month and 240 chat messages per day, you will rarely hit limits for day-to-day coding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All autocomplete work&lt;/li&gt;
&lt;li&gt;Quick code questions and explanations (up to 240/day)&lt;/li&gt;
&lt;li&gt;Agent mode for multi-step tasks (watch your daily agent limits)&lt;/li&gt;
&lt;li&gt;Large-context analysis (128K token window is generous for a free tool)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secondary: GitHub Copilot Free (Fallback + Second Opinion)
&lt;/h3&gt;

&lt;p&gt;Keep Copilot installed and active. When Gemini's suggestion does not look right, Copilot's suggestion might. Use your 2,000 completions per month strategically — do not waste them on boilerplate that Gemini handles well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Second-opinion completions on complex logic&lt;/li&gt;
&lt;li&gt;IDE-specific features where Copilot's VS Code integration is stronger&lt;/li&gt;
&lt;li&gt;Framework-specific patterns where Copilot's training data excels (React, Next.js, Rails, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tertiary: Windsurf Free (Multi-File Refactoring)
&lt;/h3&gt;

&lt;p&gt;Install Windsurf IDE for when you need Cascade's multi-file editing capabilities. Do not use it for basic autocomplete — save your quota for the tasks that justify Windsurf's strengths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-file refactors and complex edits&lt;/li&gt;
&lt;li&gt;Project-wide code analysis&lt;/li&gt;
&lt;li&gt;When you need an AI to understand and modify multiple interconnected files&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Emergency Fallback: Ollama (Unlimited + Offline)
&lt;/h3&gt;

&lt;p&gt;Set up Ollama with a code-specific model for when cloud tools are throttled, offline, or you need privacy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After-hours coding when you have burned through daily quotas&lt;/li&gt;
&lt;li&gt;Offline development (travel, unreliable internet)&lt;/li&gt;
&lt;li&gt;Private/proprietary code that cannot be sent to cloud providers&lt;/li&gt;
&lt;li&gt;Experimentation and learning without usage anxiety&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Daily Workflow Example
&lt;/h3&gt;

&lt;p&gt;Here is what a typical day looks like with the $0/month stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Morning (fresh quotas):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code with Gemini Code Assist as primary autocomplete&lt;/li&gt;
&lt;li&gt;Write new features using Gemini completions and chat for questions&lt;/li&gt;
&lt;li&gt;Copilot runs in background for occasional second-opinion suggestions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Afternoon (quota-aware):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If you need a multi-file refactor, open Windsurf and use a Cascade session&lt;/li&gt;
&lt;li&gt;Continue using Gemini for autocomplete (180K monthly limit means daily usage is rarely an issue)&lt;/li&gt;
&lt;li&gt;Use Copilot Chat sparingly (50/month budget)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Evening (quotas getting low):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Switch to Ollama for any remaining coding if Windsurf's daily quota is exhausted&lt;/li&gt;
&lt;li&gt;Use Ollama for code review and debugging conversations without usage anxiety&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not a perfect substitute for a paid $20/month tool. But for most individual developers, it covers 80-90% of what you would use a paid tool for.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Each Tool (VS Code Focus)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Gemini Code Assist Free Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; Personal Gmail account (not Google Workspace), supported region.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code → Extensions → Search "Gemini Code Assist"&lt;/li&gt;
&lt;li&gt;Install the official Google extension&lt;/li&gt;
&lt;li&gt;Click "Sign in with Google" in the sidebar&lt;/li&gt;
&lt;li&gt;Sign in with your personal Gmail account&lt;/li&gt;
&lt;li&gt;Completions start immediately — no configuration needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Recommended settings:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"geminicodeassist.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"geminicodeassist.inlineSuggest.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Gemini Code Assist also works in JetBrains IDEs (IntelliJ, PyCharm, WebStorm), Android Studio, and even Xcode. If you use multiple IDEs, install it everywhere — the 180,000 monthly limit is shared across all installations.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. GitHub Copilot Free Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; GitHub account (free).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code → Extensions → Search "GitHub Copilot"&lt;/li&gt;
&lt;li&gt;Install the official GitHub extension&lt;/li&gt;
&lt;li&gt;Sign in with your GitHub account&lt;/li&gt;
&lt;li&gt;The free tier activates automatically if you do not have a paid plan&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Recommended settings to avoid wasting completions:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github.copilot.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"markdown"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"plaintext"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Disabling Copilot for markdown and plaintext prevents it from burning completions on non-code files.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Windsurf IDE Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; Windsurf account (free).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download Windsurf from windsurf.com&lt;/li&gt;
&lt;li&gt;Install and open — it looks and feels like VS Code because it is a fork&lt;/li&gt;
&lt;li&gt;Your VS Code extensions and settings can be imported&lt;/li&gt;
&lt;li&gt;Sign up for a free account to activate AI features&lt;/li&gt;
&lt;li&gt;Use Cascade (Cmd/Ctrl+I) for multi-step editing tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; You can run Windsurf alongside VS Code. They do not conflict. Use VS Code as your primary editor with Gemini/Copilot, and switch to Windsurf when you need Cascade specifically.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Ollama + Continue Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; Mac with 8GB+ RAM (16GB+ recommended), or Linux machine.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Ollama:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Pull a code-specific model:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Good balance of quality and speed for coding&lt;/span&gt;
ollama pull deepseek-coder-v2:16b

&lt;span class="c"&gt;# Lighter option for lower-end hardware&lt;/span&gt;
ollama pull codeqwen:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Install the Continue extension in VS Code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extensions → Search "Continue" → Install&lt;/li&gt;
&lt;li&gt;Configure to point at your local Ollama instance&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure Continue (&lt;code&gt;.continue/config.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Coder V2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder-v2:16b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeQwen"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codeqwen:7b"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a deeper dive on Ollama setup, model selection, and performance tuning, see our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;complete Ollama + Open WebUI guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Managing Multiple AI Extensions Without Conflicts
&lt;/h2&gt;

&lt;p&gt;Running Gemini Code Assist and GitHub Copilot simultaneously in VS Code is possible but requires configuration to avoid competing suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Prioritize one, disable inline suggestions on the other.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Set Gemini as your primary autocomplete and use Copilot only for its chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github.copilot.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"geminicodeassist.inlineSuggest.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"editor.inlineSuggest.enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents Copilot from using your 2,000 monthly completions on autocomplete while keeping Copilot Chat available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: Use different tools for different languages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you find that Copilot is better for TypeScript but Gemini excels at Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github.copilot.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typescript"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typescriptreact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"geminicodeassist.inlineSuggest.enable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option C: Run them both and let VS Code arbitrate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VS Code shows inline suggestions from whichever provider responds first. This burns through Copilot completions faster, but some developers prefer the "best of both" approach. We do not recommend this on the free tier unless you are a light coder.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest Limitations: When You Need to Upgrade to Paid
&lt;/h2&gt;

&lt;p&gt;The $0/month stack has real limitations. Here is where free tiers fall short:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Agentic Coding Tasks
&lt;/h3&gt;

&lt;p&gt;Free tools in 2026 offer limited or no true agentic capabilities. Gemini Code Assist has agent mode with daily limits, but for sustained multi-step autonomous coding — the kind where an AI reads your codebase, plans changes across 10 files, runs tests, and iterates — you need a paid tool.&lt;/p&gt;

&lt;p&gt;This is where Claude Code changes the game. It is a terminal-based agent that operates across your entire repository, understands project context through CLAUDE.md files, and can handle complex multi-step tasks autonomously. We use it daily to run our &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;14-agent AI company&lt;/a&gt;. Nothing in the free tier space comes close.&lt;/p&gt;

&lt;p&gt;If agentic coding is your primary use case, see our &lt;a href="https://www.effloow.com/articles/codex-vs-claude-code-comparison-2026" rel="noopener noreferrer"&gt;deep comparison of Codex vs Claude Code&lt;/a&gt; for a full breakdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Heavy Chat/Debugging Usage
&lt;/h3&gt;

&lt;p&gt;The free tier chat limits are adequate for quick questions but not for the "pair programming" workflow where you have an extended back-and-forth with an AI about architecture decisions, debugging complex issues, or reviewing large code blocks. Gemini's 240 messages per day is decent, but Copilot's 50 per month is almost nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Team Features
&lt;/h3&gt;

&lt;p&gt;Free tiers are designed for individual developers. If you need shared context, admin controls, usage analytics, or consistent tooling across a team, you need paid plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Consistent Quality at Scale
&lt;/h3&gt;

&lt;p&gt;When you are shipping production code on a deadline, you want the best model available without worrying about quotas, throttling, or model rotation. Paid tools give you guaranteed access to frontier models.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The Vibe Coding Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.effloow.com/articles/what-is-vibe-coding-developer-trend-2026" rel="noopener noreferrer"&gt;Vibe coding&lt;/a&gt; — the practice of describing what you want in natural language and letting AI write the implementation — works best with powerful agentic tools and generous context windows. Free tiers support basic vibe coding sessions, but the moment you want sustained autonomous development across multiple files, you hit limits quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  When It Makes Sense to Upgrade: Recommended Paid Options
&lt;/h2&gt;

&lt;p&gt;If you have been using the $0/month stack and keep hitting walls, here are the three paid upgrades that offer the best value:&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot Pro — $10/month
&lt;/h3&gt;

&lt;p&gt;The most cost-effective first upgrade. Unlimited completions remove the 2,000/month ceiling, and premium model access gives Copilot Chat genuine depth. If autocomplete volume is your main bottleneck, this solves it at the lowest cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers whose primary pain point is running out of completions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor Pro — $20/month
&lt;/h3&gt;

&lt;p&gt;If you want a more integrated AI-native IDE experience, Cursor Pro gives you 500 fast premium requests per month with multi-file editing, agentic features, and a polished development environment. The credit-based billing system means you pay for what you use beyond the base allowance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who want AI deeply integrated into their editing workflow with multi-file capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code (Claude Pro) — $20/month
&lt;/h3&gt;

&lt;p&gt;Claude Code is not an IDE plugin — it is a terminal-based coding agent. It reads your entire repository, understands project conventions, and executes multi-step tasks autonomously. If you have tried the free stack and your main frustration is "I wish the AI could just handle this entire feature," Claude Code is the answer.&lt;/p&gt;

&lt;p&gt;We wrote an &lt;a href="https://www.effloow.com/articles/claude-code-advanced-workflow-subagents-commands-multi-session" rel="noopener noreferrer"&gt;advanced Claude Code workflow guide&lt;/a&gt; that covers subagents, custom commands, and multi-session patterns for developers who want to go deep.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who need autonomous, multi-step coding across large codebases.&lt;/p&gt;

&lt;p&gt;For a complete pricing comparison across all tools and tiers, see our &lt;a href="https://www.effloow.com/articles/ai-coding-tools-pricing-comparison-2026" rel="noopener noreferrer"&gt;full pricing breakdown&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I really use Gemini Code Assist Free for commercial projects?
&lt;/h3&gt;

&lt;p&gt;Yes. Google's free individual tier has no restrictions on commercial use. The code you write using Gemini completions is yours. This is the same approach GitHub Copilot takes with its free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will running multiple AI extensions slow down VS Code?
&lt;/h3&gt;

&lt;p&gt;Slightly. Each extension consumes memory and makes network requests. On a modern machine with 16GB+ RAM, the impact is negligible. On an 8GB machine, you might notice slightly longer startup times. The configuration options in the "Managing Multiple Extensions" section above let you minimize resource usage by disabling inline suggestions on secondary tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Ollama good enough for code completion?
&lt;/h3&gt;

&lt;p&gt;For basic completions, yes — especially with code-specialized models like DeepSeek Coder V2 or CodeQwen. It will not match the quality of Gemini or Copilot's cloud-based models, but for common patterns, boilerplate, and straightforward code, it is surprisingly capable. Think of it as 70-80% of cloud quality at 0% of the cost and 100% privacy.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Amazon Q Developer (formerly CodeWhisperer)?
&lt;/h3&gt;

&lt;p&gt;Amazon Q Developer offers a free tier with code completions and security scanning. It is worth considering if you work primarily in the AWS ecosystem. We did not include it in the primary stack because its IDE support is narrower and its completions tend to favor AWS-specific patterns, but it is a viable addition for AWS-focused developers.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens when the free tiers change or get worse?
&lt;/h3&gt;

&lt;p&gt;This is the risk with any free tool. Windsurf's March 2026 pricing change is a cautionary example. The $0/month stack approach mitigates this risk through diversification — if one tool reduces its free tier, you still have three others providing coverage. Ollama, being fully local and open-source, will never change its pricing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The $0/month AI coding stack in 2026 is genuinely viable for individual developers, hobbyists, students, and anyone who cannot justify $20-30/month for AI coding tools. Gemini Code Assist Free is the centerpiece — its 180,000 monthly completions and 240 daily chat messages cover the vast majority of daily coding needs. Layer in Copilot Free for a second opinion, Windsurf for occasional multi-file work, and Ollama for privacy and unlimited fallback.&lt;/p&gt;

&lt;p&gt;Is it as good as a paid setup? No. You will miss agentic capabilities, hit quota walls during intense coding sessions, and juggle multiple tools where a single paid subscription would be simpler. But for most development work — writing features, fixing bugs, learning new frameworks, building side projects — the free stack gets the job done.&lt;/p&gt;

&lt;p&gt;Start with Gemini Code Assist Free as your primary tool. Add the others one by one as you discover what you need. And when you are ready to upgrade, you will know exactly which paid tool fills the gap that matters most to you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Hetzner Cloud for AI Projects — Complete GPU Server Setup &amp; Cost Breakdown 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Sat, 04 Apr 2026 03:49:53 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/hetzner-cloud-for-ai-projects-complete-gpu-server-setup-cost-breakdown-2026-58i4</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/hetzner-cloud-for-ai-projects-complete-gpu-server-setup-cost-breakdown-2026-58i4</guid>
      <description>&lt;h1&gt;
  
  
  Hetzner Cloud for AI Projects — Complete GPU Server Setup &amp;amp; Cost Breakdown 2026
&lt;/h1&gt;

&lt;p&gt;Running AI workloads on AWS or GCP is expensive. A single A100 instance on AWS costs $3-4 per hour — over $2,000 a month if you leave it running. For startups, indie developers, and small teams experimenting with AI, that math kills projects before they start.&lt;/p&gt;

&lt;p&gt;Hetzner offers an alternative that most of the AI community outside Europe has not discovered yet. Budget cloud instances from €3.99/month for lightweight inference. Dedicated GPU servers with NVIDIA RTX 4000 Ada from €184/month. European data centers with flat monthly pricing and no bandwidth surprises.&lt;/p&gt;

&lt;p&gt;This guide covers the full Hetzner AI server lineup, from $5/month CPU instances running tiny models to dedicated GPU servers handling production workloads. We will walk through actual setup, realistic performance expectations, and an honest cost comparison against AWS and GCP.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Hetzner for AI Workloads
&lt;/h2&gt;

&lt;p&gt;Hetzner is a German hosting company that has been around since 1997. They are not a startup. They run their own data centers in Falkenstein, Nuremberg, and Helsinki. Their pricing has always been aggressive compared to US-based cloud providers, and that gap has only widened as AWS and GCP have raised prices.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Price Gap Is Real
&lt;/h3&gt;

&lt;p&gt;Hetzner's cost advantage is not 10-20% — it is 60-80% for equivalent compute. A Hetzner cloud server with 2 vCPUs and 4 GB RAM costs €3.99/month. A comparable instance on AWS (t3.medium) costs roughly $30/month. DigitalOcean and Vultr sit in between at $15-20/month for similar specs.&lt;/p&gt;

&lt;p&gt;For AI workloads specifically, the gap gets even wider at the GPU tier. Hetzner's dedicated GPU servers start at €184/month. AWS GPU instances (g5.xlarge with A10G) start at roughly $1.00/hour — over $700/month for always-on use.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Hetzner Does Well
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flat monthly pricing.&lt;/strong&gt; No surprise bandwidth bills, no hidden egress charges. Traffic is unlimited on most plans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU data centers.&lt;/strong&gt; Falkenstein and Helsinki give you GDPR compliance by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Straightforward networking.&lt;/strong&gt; Private networks, floating IPs, and load balancers at prices that make sense.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ARM instances.&lt;/strong&gt; Ampere-based CAX servers offer strong performance-per-euro for inference workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Hetzner Does Not Do
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No managed AI/ML services.&lt;/strong&gt; No SageMaker equivalent, no managed Jupyter, no model registries. You manage everything yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No spot/preemptible instances.&lt;/strong&gt; You cannot get cheap burst GPU time. It is flat monthly pricing or nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited GPU availability.&lt;/strong&gt; Dedicated GPU servers can have waitlists. AWS and GCP have broader GPU SKU availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No US data centers.&lt;/strong&gt; If you need sub-50ms latency for US users, Hetzner is not the right choice.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Hetzner AI Server Lineup
&lt;/h2&gt;

&lt;p&gt;Hetzner offers multiple tiers for AI workloads. Here is the full spectrum from budget to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 1: Cost-Optimized Cloud (CX Series) — €3.99-€14.99/mo
&lt;/h3&gt;

&lt;p&gt;These are shared vCPU instances. No GPU. CPU-only inference for small models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;vCPU&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CX23&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;40 GB SSD&lt;/td&gt;
&lt;td&gt;€3.99/mo (~$4.99)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CX33&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;80 GB SSD&lt;/td&gt;
&lt;td&gt;€6.49/mo (~$8.09)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CX43&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;160 GB SSD&lt;/td&gt;
&lt;td&gt;€11.99/mo (~$14.99)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CX53&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;320 GB SSD&lt;/td&gt;
&lt;td&gt;€22.49/mo (~$28.09)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;AI use case:&lt;/strong&gt; Running Ollama with small models (3B-7B parameters) for personal chatbots, lightweight RAG, or API-based inference for low-traffic applications. We covered this exact setup in our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama + Open WebUI self-hosting guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realistic expectations:&lt;/strong&gt; A CX23 can run a 3B model at roughly 3-6 tokens/second (CPU inference). A CX33 can handle a 7-8B model at 1-3 tokens/second. This is usable for async workflows but not for interactive chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 2: ARM Cloud Instances (CAX Series) — Better Performance per Euro
&lt;/h3&gt;

&lt;p&gt;Hetzner's Ampere-based ARM servers offer better compute efficiency than the x86 CX series at similar or lower price points.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;vCPU (ARM)&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CAX11&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;40 GB SSD&lt;/td&gt;
&lt;td&gt;€4.49/mo (~$5.59)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CAX21&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;80 GB SSD&lt;/td&gt;
&lt;td&gt;€7.99/mo (~$9.99)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CAX31&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;160 GB SSD&lt;/td&gt;
&lt;td&gt;€15.99/mo (~$19.99)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CAX41&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;320 GB SSD&lt;/td&gt;
&lt;td&gt;€31.49/mo (~$39.29)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;AI use case:&lt;/strong&gt; ARM chips handle inference workloads efficiently. Ollama has native ARM support, so these servers run small models with lower power draw and often better single-thread performance than the CX series at the same price. Good for always-on inference APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 3: GEX44 — Dedicated GPU Server (€184/mo)
&lt;/h3&gt;

&lt;p&gt;This is where things get serious for AI workloads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intel Core i5-13500 (6P + 8E cores, HT)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NVIDIA RTX 4000 SFF Ada Generation, 20 GB GDDR6 ECC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64 GB DDR4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2× 1.92 TB NVMe SSD Gen3 (RAID 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 Gbit/s, unlimited traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup fee&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;€79 (one-time)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;€184/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Locations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Falkenstein (FSN1), Nuremberg (NBG1)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;AI use case:&lt;/strong&gt; The RTX 4000 SFF Ada with 20 GB VRAM can run models up to ~32B parameters (4-bit quantized). It handles 7B-14B models comfortably with fast inference. This is the sweet spot for small teams running production AI inference, fine-tuning smaller models, or serving multiple users simultaneously.&lt;/p&gt;

&lt;p&gt;The 20 GB of VRAM is the key spec. It puts this server above consumer RTX 4060/4070 cards (8-12 GB) and into territory where you can run meaningful models without aggressive quantization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier 4: GEX131 — High-End GPU Server
&lt;/h3&gt;

&lt;p&gt;For production AI workloads that need serious GPU compute.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intel Xeon Gold 5412U&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NVIDIA RTX PRO 6000 Blackwell Max-Q, 96 GB GDDR7 ECC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256 GB DDR5 ECC (expandable to 768 GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2× 960 GB NVMe SSD Datacenter Edition (RAID 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 Gbit/s, unlimited traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;€889/mo (~$989)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Locations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Helsinki (HEL1), Falkenstein (FSN1), Nuremberg (NBG1)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;AI use case:&lt;/strong&gt; With 96 GB of VRAM, this server can run 70B+ parameter models at full precision, handle multiple concurrent inference requests, or fine-tune large models. The 5th-generation Tensor Cores and Blackwell architecture make this competitive with cloud A100 instances at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;256 GB of system RAM with expansion to 768 GB also makes this viable for large-scale RAG deployments where you need to keep embedding databases in memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Budget Path: Running Small LLMs on CX/CAX Instances
&lt;/h2&gt;

&lt;p&gt;You do not need a GPU to run AI inference. CPU-only inference with quantized models is slow but functional — and incredibly cheap.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Can Run
&lt;/h3&gt;

&lt;p&gt;On a CX23 (€3.99/month, 4 GB RAM):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.2 3B (Q4)&lt;/strong&gt; — Fits in ~2-3 GB. General chat and simple tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phi-3.5 Mini 3.8B (Q4)&lt;/strong&gt; — Microsoft's efficient model. Good for code and reasoning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TinyLlama 1.1B&lt;/strong&gt; — Fast even on CPU. Useful for classification and simple generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On a CX33 (€6.49/month, 8 GB RAM):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.2 8B (Q4)&lt;/strong&gt; — Solid general model. ~5 GB loaded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 2 2B&lt;/strong&gt; — Google's efficient model. Punches above its weight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 2.5 7B (Q4)&lt;/strong&gt; — Excellent for multilingual use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Install Docker and run Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Docker&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://get.docker.com | sh

&lt;span class="c"&gt;# Run Ollama&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ollama_data:/root/.ollama &lt;span class="se"&gt;\&lt;/span&gt;
  ollama/ollama:latest

&lt;span class="c"&gt;# Pull a model&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama pull llama3.2:3b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a full web interface, add Open WebUI as described in our &lt;a href="https://www.effloow.com/articles/ollama-open-webui-self-hosting-guide-2026" rel="noopener noreferrer"&gt;Ollama self-hosting guide&lt;/a&gt;. If you are running multiple services on the same server, a deployment platform like &lt;a href="https://www.effloow.com/articles/coolify-vs-dokploy-self-hosted-paas-comparison-2026" rel="noopener noreferrer"&gt;Coolify or Dokploy&lt;/a&gt; simplifies container management significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Reality Check
&lt;/h3&gt;

&lt;p&gt;CPU inference is measured in single-digit tokens per second. Here is what to expect:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Speed (approx.)&lt;/th&gt;
&lt;th&gt;Usability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TinyLlama 1.1B&lt;/td&gt;
&lt;td&gt;CX23&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 8-15 tok/s&lt;/td&gt;
&lt;td&gt;Responsive for short queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2 3B&lt;/td&gt;
&lt;td&gt;CX23&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 3-6 tok/s&lt;/td&gt;
&lt;td&gt;Slow but usable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2 8B&lt;/td&gt;
&lt;td&gt;CX33&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 1-3 tok/s&lt;/td&gt;
&lt;td&gt;Async workflows only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5 7B&lt;/td&gt;
&lt;td&gt;CX33&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 1-3 tok/s&lt;/td&gt;
&lt;td&gt;Async workflows only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These numbers are usable for: API backends with tolerant timeouts, batch processing, personal assistants where you can wait a few seconds, and development/testing before deploying to GPU servers.&lt;/p&gt;

&lt;p&gt;They are not usable for: real-time chat with multiple users, latency-sensitive applications, or anything requiring more than a few concurrent requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  GPU Path: Setting Up the GEX44
&lt;/h2&gt;

&lt;p&gt;The GEX44 at €184/month is the entry point for serious AI work on Hetzner. Here is how to set it up from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Order and Initial Access
&lt;/h3&gt;

&lt;p&gt;Order from the Hetzner Robot panel. Expect the €79 setup fee on your first invoice. Provisioning typically takes 1-3 business days for dedicated servers (unlike cloud instances which spin up in seconds).&lt;/p&gt;

&lt;p&gt;Once provisioned, you will receive root SSH access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@your-server-ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Install NVIDIA Drivers
&lt;/h3&gt;

&lt;p&gt;The GEX44 comes with bare metal access. You need to install GPU drivers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update system&lt;/span&gt;
apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install NVIDIA driver dependencies&lt;/span&gt;
apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; build-essential linux-headers-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Install NVIDIA drivers (Ubuntu 22.04/24.04)&lt;/span&gt;
apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nvidia-driver-550

&lt;span class="c"&gt;# Reboot&lt;/span&gt;
reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After reboot, verify the GPU is recognized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the RTX 4000 SFF Ada with 20 GB VRAM listed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Install Docker with GPU Support
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Docker&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://get.docker.com | sh

&lt;span class="c"&gt;# Install NVIDIA Container Toolkit&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://nvidia.github.io/libnvidia-container/gpgkey | &lt;span class="se"&gt;\&lt;/span&gt;
  gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-L&lt;/span&gt; https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;tee&lt;/span&gt; /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nvidia-container-toolkit
nvidia-ctk runtime configure &lt;span class="nt"&gt;--runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker
systemctl restart docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify Docker can see the GPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Deploy Ollama with GPU Acceleration
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.8"&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama_data:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_NUM_PARALLEL=4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_MAX_LOADED_MODELS=2&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;open-webui&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;open_webui_data:/app/backend/data&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_URL=http://ollama:11434&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WEBUI_AUTH=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WEBUI_SECRET_KEY=change-this-to-a-random-string&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open_webui_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Pull Models That Fit 20 GB VRAM
&lt;/h3&gt;

&lt;p&gt;With 20 GB of VRAM, you can run substantial models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 14B model — fits easily, fast inference&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama pull phi-4:14b

&lt;span class="c"&gt;# 32B model (Q4) — fits in ~18 GB, good quality&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama pull qwen2.5:32b-instruct-q4_K_M

&lt;span class="c"&gt;# Coding-specific model&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama pull qwen2.5-coder:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the GEX44 Can Actually Run
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;VRAM Usage&lt;/th&gt;
&lt;th&gt;Speed (GPU)&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2 8B&lt;/td&gt;
&lt;td&gt;~5 GB&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 40-60 tok/s&lt;/td&gt;
&lt;td&gt;Good general use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phi-4 14B&lt;/td&gt;
&lt;td&gt;~8 GB&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 25-40 tok/s&lt;/td&gt;
&lt;td&gt;Strong reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5 Coder 14B&lt;/td&gt;
&lt;td&gt;~8 GB&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 25-40 tok/s&lt;/td&gt;
&lt;td&gt;Excellent for code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5 32B (Q4)&lt;/td&gt;
&lt;td&gt;~18 GB&lt;/td&gt;
&lt;td&gt;[ESTIMATED] 12-20 tok/s&lt;/td&gt;
&lt;td&gt;High quality writing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.3 70B (Q4)&lt;/td&gt;
&lt;td&gt;~35 GB&lt;/td&gt;
&lt;td&gt;Does not fit&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The sweet spot is 14B models. They fit comfortably in 20 GB with room for context, run at speeds that feel interactive, and deliver quality that is genuinely useful for production work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Set Up HTTPS
&lt;/h3&gt;

&lt;p&gt;For remote access, add Caddy as a reverse proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; caddy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edit &lt;code&gt;/etc/caddy/Caddyfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;ai.yourdomain.com&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;reverse_proxy&lt;/span&gt; &lt;span class="nf"&gt;localhost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl reload caddy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Caddy handles SSL automatically. Access your AI at &lt;code&gt;https://ai.yourdomain.com&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Comparison: Hetzner vs AWS vs GCP
&lt;/h2&gt;

&lt;p&gt;Here is an honest comparison for equivalent GPU compute, based on always-on monthly pricing as of early 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  Entry-Level GPU Tier
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hetzner&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GEX44&lt;/td&gt;
&lt;td&gt;RTX 4000 SFF Ada&lt;/td&gt;
&lt;td&gt;20 GB&lt;/td&gt;
&lt;td&gt;€184/mo (~$230)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;g5.xlarge&lt;/td&gt;
&lt;td&gt;A10G&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;~$760/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;g2-standard-4&lt;/td&gt;
&lt;td&gt;L4&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;~$580/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;gpu_1x_a10&lt;/td&gt;
&lt;td&gt;A10&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;~$440/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hetzner is 2.5-3.3× cheaper than hyperscalers for comparable GPU compute. The trade-offs: no managed ML services, manual setup, and EU-only data centers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Budget CPU Tier (No GPU)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;vCPU&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hetzner&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CX23&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;€3.99/mo (~$5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;t3.medium&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;~$30/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;e2-medium&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;~$25/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DigitalOcean&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;~$18/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At the budget tier, Hetzner is 4-6× cheaper. This is where it shines for development, testing, and low-traffic inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Cloud Providers Offer That Hetzner Does Not
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS SageMaker / GCP Vertex AI&lt;/strong&gt; — Managed model training, deployment, and monitoring. If you need MLOps at scale, Hetzner's bare metal cannot compete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spot/preemptible instances&lt;/strong&gt; — AWS spot pricing can bring GPU costs down 60-70% for interruptible workloads. Hetzner has no equivalent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global regions&lt;/strong&gt; — AWS has 30+ regions worldwide. Hetzner has 3 European locations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-scaling&lt;/strong&gt; — Cloud providers scale GPU instances based on demand. Hetzner dedicated servers are fixed capacity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Hetzner wins on predictable, always-on workloads where you know your compute needs. Hyperscalers win on variable demand, managed services, and global distribution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment with Docker and Coolify
&lt;/h2&gt;

&lt;p&gt;If you are running multiple AI services (Ollama, vector databases, monitoring) alongside other applications on the same Hetzner server, manual Docker Compose management gets tedious.&lt;/p&gt;

&lt;p&gt;This is where a self-hosted PaaS like Coolify or Dokploy adds value. We compared both platforms in detail in our &lt;a href="https://www.effloow.com/articles/coolify-vs-dokploy-self-hosted-paas-comparison-2026" rel="noopener noreferrer"&gt;Coolify vs Dokploy comparison&lt;/a&gt;. The short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coolify&lt;/strong&gt; — More mature, better for multi-service deployments, built-in database management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dokploy&lt;/strong&gt; — Simpler, lighter footprint, good if Ollama is your primary workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either one gives you a web dashboard for managing containers, automatic SSL, Git-based deployments, and basic monitoring — without touching the command line every time you need to update a container.&lt;/p&gt;

&lt;p&gt;For a full walkthrough of running Coolify on Hetzner alongside other developer tools, see our &lt;a href="https://www.effloow.com/articles/self-host-dev-stack-under-20-dollars-month" rel="noopener noreferrer"&gt;self-hosting dev stack guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Our Infrastructure at Effloow
&lt;/h2&gt;

&lt;p&gt;At Effloow, we run &lt;a href="https://www.effloow.com/articles/how-we-built-company-with-14-ai-agents" rel="noopener noreferrer"&gt;14 AI agents&lt;/a&gt; that handle everything from content research to code generation. Our infrastructure choices reflect the same cost-conscious thinking behind this guide.&lt;/p&gt;

&lt;p&gt;We use Hetzner cloud instances for non-GPU workloads: deployment platforms, Git hosting, monitoring, and lightweight services. The flat monthly pricing means our infrastructure bill is predictable regardless of how many articles the agents produce.&lt;/p&gt;

&lt;p&gt;For AI inference specifically, we use a mix of API services (Claude, GPT) for tasks requiring frontier intelligence and self-hosted models for high-volume, lower-complexity work. The GEX44 tier is compelling for teams at our stage — it is enough GPU to run production inference at a cost that does not require venture capital to sustain.&lt;/p&gt;

&lt;p&gt;The decision framework we use internally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need frontier intelligence (complex reasoning, creative work)?&lt;/strong&gt; → Use API services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need high-volume, predictable inference?&lt;/strong&gt; → Self-host on Hetzner GPU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need lightweight, always-on AI?&lt;/strong&gt; → CX/CAX instance with small models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need managed MLOps at scale?&lt;/strong&gt; → Use AWS/GCP (we do not, but many teams should).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Choosing the Right Tier
&lt;/h2&gt;

&lt;p&gt;Here is a quick decision guide:&lt;/p&gt;

&lt;h3&gt;
  
  
  CX23 (€3.99/mo) — Start Here If...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You are experimenting with self-hosted AI for the first time&lt;/li&gt;
&lt;li&gt;You need a personal chatbot or simple RAG pipeline&lt;/li&gt;
&lt;li&gt;Your queries are infrequent and latency is not critical&lt;/li&gt;
&lt;li&gt;Budget is the primary constraint&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CX33/CAX31 (€6.49-€10/mo) — Upgrade When...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need 7-8B models with slightly better response times&lt;/li&gt;
&lt;li&gt;You are running the AI alongside other services (Git, CI, monitoring)&lt;/li&gt;
&lt;li&gt;Multiple people on your team need occasional access&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GEX44 (€184/mo) — The AI Sweet Spot If...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need interactive-speed inference (30+ tokens/second)&lt;/li&gt;
&lt;li&gt;You want to run 14B-32B models with real quality&lt;/li&gt;
&lt;li&gt;Multiple users need concurrent access&lt;/li&gt;
&lt;li&gt;You are building products or services that rely on AI inference&lt;/li&gt;
&lt;li&gt;Fine-tuning smaller models is part of your workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GEX131 — Production AI If...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need 70B+ models at full precision&lt;/li&gt;
&lt;li&gt;Multi-user production inference is a requirement&lt;/li&gt;
&lt;li&gt;You are fine-tuning large models regularly&lt;/li&gt;
&lt;li&gt;You need 96 GB VRAM for large embedding databases or multi-model serving&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Getting Started: Your First Hour
&lt;/h2&gt;

&lt;p&gt;If you are new to Hetzner, here is the fastest path to running AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Sign up at hetzner.com and create a cloud project&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create a CX23 instance (€3.99/mo) via the console&lt;/span&gt;
&lt;span class="c"&gt;#    - Choose Ubuntu 24.04&lt;/span&gt;
&lt;span class="c"&gt;#    - Add your SSH key&lt;/span&gt;
&lt;span class="c"&gt;#    - Pick Falkenstein or Helsinki&lt;/span&gt;

&lt;span class="c"&gt;# 3. SSH into your server&lt;/span&gt;
ssh root@your-server-ip

&lt;span class="c"&gt;# 4. Install Docker&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://get.docker.com | sh

&lt;span class="c"&gt;# 5. Run Ollama&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ollama_data:/root/.ollama ollama/ollama:latest

&lt;span class="c"&gt;# 6. Pull a small model&lt;/span&gt;
docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ollama ollama pull llama3.2:3b

&lt;span class="c"&gt;# 7. Test it&lt;/span&gt;
curl http://localhost:11434/api/generate &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "llama3.2:3b", "prompt": "Hello, how are you?"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time: under 10 minutes. Total cost: €3.99 for the first month.&lt;/p&gt;

&lt;p&gt;When you outgrow the CX23, migrate your Ollama data volume to a bigger instance. When you need GPU speed, order a GEX44 and follow the GPU setup section above.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hetzner is not the right choice for every AI workload. If you need managed ML services, global data centers, or spot pricing for burst GPU compute, the hyperscalers are still the answer.&lt;/p&gt;

&lt;p&gt;But for predictable, always-on AI infrastructure at a fraction of the cost — personal AI assistants, team inference servers, self-hosted chatbots, development and testing environments — Hetzner is hard to beat.&lt;/p&gt;

&lt;p&gt;The lineup covers the full spectrum: €3.99/month for experimentation, €184/month for production GPU inference, and higher tiers for serious AI workloads. All with flat pricing, unlimited bandwidth, and EU data residency.&lt;/p&gt;

&lt;p&gt;Start with a CX23 and a 3B model. See if self-hosted inference fits your workflow. If it does, the upgrade path is straightforward — bigger instances, better models, and eventually dedicated GPU hardware, all from the same provider.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>selfhosted</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
