<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sachin Singh</title>
    <description>The latest articles on DEV Community by Sachin Singh (@sachinsingh2156).</description>
    <link>https://dev.to/sachinsingh2156</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3991771%2Fa925b91a-3e3e-4a6d-be39-206d9e4b0ab3.gif</url>
      <title>DEV Community: Sachin Singh</title>
      <link>https://dev.to/sachinsingh2156</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sachinsingh2156"/>
    <language>en</language>
    <item>
      <title>From Chaos to Consistency: Docker for Modern AI Workflows</title>
      <dc:creator>Sachin Singh</dc:creator>
      <pubDate>Fri, 19 Jun 2026 05:08:19 +0000</pubDate>
      <link>https://dev.to/sachinsingh2156/from-chaos-to-consistency-docker-for-modern-ai-workflows-2gb7</link>
      <guid>https://dev.to/sachinsingh2156/from-chaos-to-consistency-docker-for-modern-ai-workflows-2gb7</guid>
      <description>&lt;p&gt;You trained the model. The notebook runs. The demo works. You push it to a teammate, and forty minutes later you get the message every engineer dreads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey, I'm getting a CUDA error. And &lt;code&gt;torch&lt;/code&gt; won't import. And what version of Python is this?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And you say the six words that have haunted software since the dawn of time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"But it works on my machine."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the uncomfortable truth: &lt;em&gt;"it works on my machine"&lt;/em&gt; isn't a defense. It's a confession. It means your code depends on something living on your laptop that you never wrote down a Python version, a system library, a CUDA toolkit, a stray environment variable, a model file sitting in &lt;code&gt;~/Downloads&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Docker is how you stop making that confession. Let's fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: AI projects are dependency monsters
&lt;/h2&gt;

&lt;p&gt;A typical web app has a handful of dependencies. An AI project has &lt;em&gt;layers&lt;/em&gt; of them, and each layer can betray you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python packages&lt;/strong&gt; : &lt;code&gt;torch&lt;/code&gt;, &lt;code&gt;transformers&lt;/code&gt;, &lt;code&gt;numpy&lt;/code&gt;, and the version conflicts between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System libraries&lt;/strong&gt; : things like &lt;code&gt;libgl1&lt;/code&gt; or &lt;code&gt;ffmpeg&lt;/code&gt; that pip won't install for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The CUDA / driver stack&lt;/strong&gt; : the single most common reason "it works on my machine" and nowhere else.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The model weights themselves&lt;/strong&gt; : multi-gigabyte files that aren't in your repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python itself&lt;/strong&gt; : 3.10 on your laptop, 3.12 on the server, subtle breakage everywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;requirements.txt&lt;/code&gt; captures &lt;em&gt;one&lt;/em&gt; of those five layers. Docker captures all of them. That's the whole pitch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Docker actually is?
&lt;/h2&gt;

&lt;p&gt;Forget the whale logo and the buzzwords for a second.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Docker image&lt;/strong&gt; is a frozen snapshot of a complete computer: the operating system, Python, your packages, your code, and your config, all baked into one file. A &lt;strong&gt;container&lt;/strong&gt; is a running copy of that snapshot.&lt;/p&gt;

&lt;p&gt;The mental model that makes it click: a virtual machine simulates an entire computer including its own operating system kernel, which is heavy and slow. A container shares your machine's kernel and only packages everything &lt;em&gt;above&lt;/em&gt; it. So it boots in seconds, not minutes, and a single image runs identically on your laptop, your teammate's laptop, and a cloud GPU server.&lt;/p&gt;

&lt;p&gt;You write the recipe once. Everyone gets the exact same kitchen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your first Dockerfile for a PyTorch project
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;Dockerfile&lt;/code&gt; is just that recipe, a plain text file of instructions. Here's a real one for a PyTorch project, with every line explained:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start from an official Python image. The "-slim" variant is smaller.&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;

&lt;span class="c"&gt;# Install system libraries that pip can't. Many vision/audio&lt;/span&gt;
&lt;span class="c"&gt;# libraries need these, and forgetting them is a classic&lt;/span&gt;
&lt;span class="c"&gt;# "works on my machine" trap.&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    build-essential &lt;span class="se"&gt;\
&lt;/span&gt;    libgl1 &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="c"&gt;# Set the working directory inside the container.&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Copy ONLY requirements first, then install.&lt;/span&gt;
&lt;span class="c"&gt;# This is a caching trick, see the note below.&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Now copy the rest of your code.&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="c"&gt;# The command that runs when the container starts.&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "predict.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two beginner mistakes this avoids:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pin your versions.&lt;/strong&gt; Your &lt;code&gt;requirements.txt&lt;/code&gt; should look like this, not just bare package names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;torch==2.3.1
transformers==4.41.2
fastapi==0.111.0
uvicorn==0.30.1
numpy==1.26.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;torch&lt;/code&gt; without a version is a future outage waiting to happen. The whole point of Docker is reproducibility, don't undermine it by letting versions float.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Copy &lt;code&gt;requirements.txt&lt;/code&gt; before your code.&lt;/strong&gt; Docker builds in layers and caches each one. If you copy everything at once, changing a single line of code forces it to reinstall &lt;code&gt;torch&lt;/code&gt; (a multi-minute download) every single build. By copying requirements first, Docker reuses the cached install layer and only re-runs steps that actually changed. Your build goes from minutes to seconds.&lt;/p&gt;

&lt;p&gt;To build and run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; my-model &lt;span class="nb"&gt;.&lt;/span&gt;
docker run my-model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;-t my-model&lt;/code&gt; just names the image. The &lt;code&gt;.&lt;/code&gt; tells Docker to look for the &lt;code&gt;Dockerfile&lt;/code&gt; in the current folder. That's it, you now have a portable, reproducible model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't bake your model weights into the image
&lt;/h2&gt;

&lt;p&gt;Beginners often &lt;code&gt;COPY model.bin&lt;/code&gt; straight into the image. Don't. A 5GB image is painful to build, push, and pull, and you'll rebuild it every time the weights change.&lt;/p&gt;

&lt;p&gt;Instead, keep large files &lt;em&gt;outside&lt;/em&gt; the image and mount them at runtime with a &lt;strong&gt;volume&lt;/strong&gt;,  a shared folder between your machine and the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/models:/app/models my-model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This maps your local &lt;code&gt;models/&lt;/code&gt; folder to &lt;code&gt;/app/models&lt;/code&gt; inside the container. The weights live on disk, the image stays lean, and you can swap models without rebuilding anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serving a model as an API
&lt;/h2&gt;

&lt;p&gt;Most of the time you don't just want to run a script, you want a model behind an endpoint your app can call. Here's a minimal FastAPI server, &lt;code&gt;app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Load the model ONCE at startup, not on every request.
# This is the single biggest performance mistake beginners make.
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/app/models/model.pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;map_location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/predict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prediction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the model loads &lt;em&gt;once&lt;/em&gt; when the server boots, not inside &lt;code&gt;predict()&lt;/code&gt;. Loading weights on every request will make your API crawl, a mistake that's easy to miss until production traffic hits.&lt;/p&gt;

&lt;p&gt;Now adjust the Dockerfile's last line to launch the server instead of a script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;--host 0.0.0.0&lt;/code&gt; matters. Inside a container, the default &lt;code&gt;127.0.0.1&lt;/code&gt; means "only reachable from inside this container", your requests from outside would bounce. Binding to &lt;code&gt;0.0.0.0&lt;/code&gt; makes it reachable. Then map the port when you run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/models:/app/models my-model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-p 8000:8000&lt;/code&gt; connects port 8000 on your machine to 8000 in the container. Hit &lt;code&gt;http://localhost:8000/predict&lt;/code&gt; and you're serving a model from a container.&lt;/p&gt;

&lt;h2&gt;
  
  
  When one container isn't enough: docker-compose
&lt;/h2&gt;

&lt;p&gt;Real AI apps rarely run alone. You've got your model API, plus maybe a Redis cache for results and a vector database for embeddings. Starting three containers by hand, with the right flags and in the right order, gets old fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;docker-compose&lt;/strong&gt; lets you define your whole stack in one &lt;code&gt;docker-compose.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;model-api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./models:/app/models&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;

  &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis:7-alpine&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6379:6379"&lt;/span&gt;

  &lt;span class="na"&gt;vector-db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qdrant/qdrant:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6333:6333"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./qdrant_data:/qdrant/storage&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the entire stack starts with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command, three services, wired together and talking to each other. And because services can reach each other by name, your API connects to Redis at the host &lt;code&gt;redis:6379&lt;/code&gt;,  no IP addresses to chase down. Shut it all down with &lt;code&gt;docker compose down&lt;/code&gt;. This is the moment most people fall in love with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few habits that separate pros from beginners
&lt;/h2&gt;

&lt;p&gt;A short list of things worth doing from day one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add a &lt;code&gt;.dockerignore&lt;/code&gt; file.&lt;/strong&gt; Just like &lt;code&gt;.gitignore&lt;/code&gt;, it keeps junk out of your image. At minimum: &lt;code&gt;__pycache__&lt;/code&gt;, &lt;code&gt;.git&lt;/code&gt;, &lt;code&gt;venv&lt;/code&gt;, &lt;code&gt;*.pt&lt;/code&gt;, and &lt;code&gt;data/&lt;/code&gt;. Without it, you'll accidentally copy gigabytes of cache and datasets into your build.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;-slim&lt;/code&gt; or official ML base images.&lt;/strong&gt; &lt;code&gt;python:3.11-slim&lt;/code&gt; over the full image saves hundreds of megabytes. For GPU work, start from an official CUDA-enabled base like &lt;code&gt;pytorch/pytorch&lt;/code&gt; so the driver stack is handled for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One process per container.&lt;/strong&gt; Resist the urge to cram your API, database, and worker into one container. Split them, that's exactly what compose is for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never bake secrets into images.&lt;/strong&gt; API keys and tokens go in environment variables (&lt;code&gt;-e MY_KEY=...&lt;/code&gt; or an &lt;code&gt;.env&lt;/code&gt; file), never hardcoded into the Dockerfile. Anyone with the image can read what's baked in.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The payoff
&lt;/h2&gt;

&lt;p&gt;Go back to that teammate who couldn't run your model. With Docker, the entire conversation becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone your-repo
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two commands. Same Python, same CUDA, same packages, same everything on their laptop, on the cloud GPU, on the production server. No "what version are you on?" No "did you install ffmpeg?" No 40-minute debugging session.&lt;/p&gt;

&lt;p&gt;You don't need to master Kubernetes or become a DevOps engineer to get this. You just need a &lt;code&gt;Dockerfile&lt;/code&gt;, a pinned &lt;code&gt;requirements.txt&lt;/code&gt;, and maybe a &lt;code&gt;docker-compose.yml&lt;/code&gt;. Start with the PyTorch example above, get one model running in a container today, and build from there.&lt;/p&gt;

&lt;p&gt;The next time someone asks if your project works on their machine, you'll already know the answer.&lt;/p&gt;

&lt;p&gt;It works on &lt;em&gt;every&lt;/em&gt; machine.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a comment with the trickiest "works on my machine" bug you've hit&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>docker</category>
      <category>python</category>
    </item>
  </channel>
</rss>
