<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joshua Hall</title>
    <description>The latest articles on DEV Community by Joshua Hall (@joshjhall).</description>
    <link>https://dev.to/joshjhall</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3291477%2F00f3c632-e4a9-41ce-a8a9-0aa3e910fd71.jpeg</url>
      <title>DEV Community: Joshua Hall</title>
      <link>https://dev.to/joshjhall</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joshjhall"/>
    <language>en</language>
    <item>
      <title>Building a Universal Container System (So I Never Have to Write Another Dockerfile)</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Sat, 13 Jun 2026 14:30:00 +0000</pubDate>
      <link>https://dev.to/joshjhall/building-a-universal-container-system-so-i-never-have-to-write-another-dockerfile-181c</link>
      <guid>https://dev.to/joshjhall/building-a-universal-container-system-so-i-never-have-to-write-another-dockerfile-181c</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Built a modular Dockerfile system that lets you compose dev/prod containers using build arguments instead of writing custom Dockerfiles. Includes 28 feature modules with 100+ tools, weekly automated version updates with testing, and support for Python/Node/Rust/Go/Kubernetes and more. Saves me between a few hours and a day or two per project setup in addition to a lot of downstream effort with updates and coordinating environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem That Wouldn't Go Away
&lt;/h2&gt;

&lt;p&gt;You know that feeling when you're starting a new project and you think "oh great, I get to set up devcontainer again"? That was me, every single time. Another day lost to writing yet another Dockerfile, configuring Python, Node, databases, cloud tools, dev tools... rinse and repeat.&lt;/p&gt;

&lt;p&gt;After several projects, I realized something depressing: I was literally copying and pasting the same 300 lines of Dockerfile with minor tweaks. The only thing changing was whether I needed Python 3.12 or 3.13, or if this project used Postgres or MySQL.&lt;/p&gt;

&lt;p&gt;But the real pain hit when maintenance season arrived. Python releases a security patch? Great, now I get to update six different repos. New team security practice? Update six Dockerfiles. New teammate starts a project? They copy the Dockerfile from the last project (which was already out of date), and now we have seven slightly different configurations.&lt;/p&gt;

&lt;p&gt;This is ridiculous, I thought. I'm a designer and understand software engineering best practices and patterns. I should be able to solve this.&lt;/p&gt;

&lt;p&gt;The core problem wasn't just the repetition. It was the maintenance burden. I don't have time to manually track when Python 3.13.7 comes out, or when kubectl updates, or when some npm package has a critical vulnerability. I needed automation that would handle the routine stuff and only bother me when something actually broke.&lt;/p&gt;

&lt;p&gt;So I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;What if, instead of writing custom Dockerfiles, you could just declare what you want?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; myproject:dev &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; containers/Dockerfile &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;INCLUDE_NODE_DEV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;NODE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;PYTHON_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.13 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No custom Dockerfile to write or maintain. Just build arguments that compose pre-tested features into exactly what you need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15wyrhshzq8lpmsmvxpn.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15wyrhshzq8lpmsmvxpn.webp" alt="Feature composition diagram" width="552" height="1632"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Base layer: "Debian Slim" (gray box)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow down with "+INCLUDE_PYTHON_DEV=true" label&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Second layer: "Base + Python 3.13 + Poetry + Pytest + Black + Mypy" (blue boxes stacking)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow down with "+INCLUDE_NODE_DEV=true" label&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Third layer: Previous + "Node 20 + TypeScript + ESLint + Jest" (green boxes adding)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow down with "+INCLUDE_KUBERNETES=true" label&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Final layer: Previous + "kubectl + helm + k9s" (purple boxes adding)&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The solution ended up being a single, modular Dockerfile that can create any development or production container through build-time configuration. I designed it as a git submodule so I could add it to any project and immediately have access to dozens of pre-built features. This means you get the benefits of a centralized, maintained Dockerfile without losing the ability to customize per-project.&lt;/p&gt;

&lt;p&gt;Want Python with all the dev tools? &lt;code&gt;INCLUDE_PYTHON_DEV=true&lt;/code&gt;. Need to add Kubernetes tools later? &lt;code&gt;INCLUDE_KUBERNETES=true&lt;/code&gt;. The Dockerfile doesn't change. The project doesn't change. You just flip a switch.&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/joshjhall/containers" rel="noopener noreferrer"&gt;https://github.com/joshjhall/containers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me show you what this actually looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Solves
&lt;/h2&gt;

&lt;p&gt;Let me be concrete about what changes when you use this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup Time: Days to Minutes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Starting a new Python API project meant 1-2 days of Dockerfile work. Copy an old one, update Python version, fix broken apt packages, research how to install AWS CLI, configure poetry, set up pytest, add black and mypy, configure non-root user, set up entrypoint scripts... you know the drill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; Add the git submodule, set &lt;code&gt;INCLUDE_PYTHON_DEV=true&lt;/code&gt;, and you're done in 10 minutes. Everything is already there and tested.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Left side (Before): Timeline showing "Day 1: Research Docker setup, write Dockerfile" → "Day 2: Debug apt packages, fix Python install" → "Day 3: Add dev tools, configure entrypoint" → "Done: 2-3 days"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Right side (After): Single timeline showing "Minute 1: Add git submodule" → "Minute 5: Set build args" → "Minute 10: Done ✓"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Large "2-3 days → 10 minutes" callout&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The time savings alone justify this. But honestly, the bigger win is the mental overhead. I no longer dread starting new projects because of Docker setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consistency: Stop the Drift
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Project A has Python 3.11 with poetry 1.4. Project B has Python 3.12 with poetry 1.5. Project C has Python 3.13 with poetry 2.0. They all work... differently. New developer joins? Good luck figuring out which version of which tool you need for which project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; All projects use the same foundation. Update the submodule, and all projects move forward together. Same Python version. Same tool versions. Same configurations. It's so much saner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dev/Prod Parity: One Dockerfile, Two Environments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Separate Dockerfiles for dev and prod. Try to keep them in sync. Fail. Ship to production. Discover your dev environment had a dependency that prod doesn't. Debug in production. Not fun.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; Same Dockerfile for both. Just different build arguments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Development: Full tooling&lt;/span&gt;
docker build &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; ...

&lt;span class="c"&gt;# Production: Minimal runtime&lt;/span&gt;
docker build &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;INCLUDE_PYTHON&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it works in dev, it works in prod. Same base, same versions, same everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Features: From Hours to Seconds
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Need to add Redis to your project? Time to research the correct apt package name, figure out how to configure the client, update environment variables, test it... 2 hours later you have Redis support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; &lt;code&gt;--build-arg INCLUDE_REDIS_CLIENT=true&lt;/code&gt;. Done. Tested. Works.&lt;/p&gt;

&lt;p&gt;This is the kind of thing that makes development feel fast again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security &amp;amp; Updates: Set It and Forget It
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Python security patch comes out. Now you get to manually update six different Dockerfiles in six different repos. Miss one? Hope your security team doesn't notice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; The automation handles it. Update happens automatically, gets tested, merges if everything passes. You wake up and it's done. Or you get notified if something broke and you need to intervene.&lt;/p&gt;

&lt;p&gt;Security best practices are built into the system. Non-root users. Minimal base images. Vulnerability scanning. It's all there by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Automation Philosophy (Or: How I Stopped Worrying and Learned to Trust CI)
&lt;/h2&gt;

&lt;p&gt;Here's the thing that really makes this system practical: I genuinely don't have time to track version updates manually. Python 3.13.7 comes out? I'll find out... eventually. kubectl 1.34 releases? I'm probably three versions behind already. Security patch for some npm package? I might hear about it on Hacker News if it's bad enough.&lt;/p&gt;

&lt;p&gt;This is a terrible way to manage infrastructure. So I automated it completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Manual Updates
&lt;/h3&gt;

&lt;p&gt;Every tool in your container has a version. Python, Node, kubectl, Terraform, AWS CLI, poetry, npm... the list goes on. Each one gets updated regularly. Some weekly, some monthly. Tracking all of them manually? That's not a job, that's a punishment.&lt;/p&gt;

&lt;p&gt;And it's not just tracking. You need to test each update. Does the new Python version break your linter? Does the new kubectl version have API changes? Does the new Node.js version introduce a breaking change in how it handles modules?&lt;/p&gt;

&lt;p&gt;I needed a system that would handle the boring parts and only bug me when something actually needed my attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Sunday Morning Automation
&lt;/h3&gt;

&lt;p&gt;Every Sunday at 2am UTC, the system wakes up and checks every pinned tool version against the latest releases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python, Node, Rust, Go, Ruby, Java&lt;/li&gt;
&lt;li&gt;kubectl, helm, Terraform, AWS CLI, Google Cloud SDK&lt;/li&gt;
&lt;li&gt;Poetry, npm, cargo, and all the other package managers&lt;/li&gt;
&lt;li&gt;Development tools, database clients, everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If it finds updates, it creates a new branch with all the version bumps, updates the Dockerfile and CHANGELOG, commits everything, and pushes to GitHub.&lt;/p&gt;

&lt;p&gt;No human intervention. No ticket in my inbox. Just automatic detection and branch creation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Test Suite That Makes It Possible
&lt;/h3&gt;

&lt;p&gt;This is where it gets good. That new branch triggers the full CI gauntlet—and I mean &lt;em&gt;full&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;535+ unit tests&lt;/strong&gt; on every bash script (one for each feature installation, version check, cache configuration, error handling path, and Debian compatibility scenario)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shellcheck&lt;/strong&gt; for code quality and common bash pitfalls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gitleaks&lt;/strong&gt; scanning for accidentally committed secrets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Docker builds&lt;/strong&gt; for all six variants (minimal, python-dev, node-dev, cloud-ops, polyglot, rust-golang)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration tests&lt;/strong&gt; that actually use the tools—compile code, run tests, execute commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debian compatibility checks&lt;/strong&gt; spot-testing across versions 11, 12, and 13&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security scanning&lt;/strong&gt; with Trivy for known vulnerabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the update breaks something, the tests catch it. If it introduces a security vulnerability, Trivy catches it. If it has Debian compatibility issues, the matrix testing catches it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lo7t1wm4uucll3xhxi2.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lo7t1wm4uucll3xhxi2.webp" alt="Automation flow diagram" width="800" height="1058"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Start: "Sunday 2am UTC" (clock icon)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Step 1: "Check for Updates" → "Found: Python 3.13.7, kubectl 1.31.2" (magnifying glass icon)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Step 2: "Create Branch 'auto-update-2025-10-27'" (git branch icon)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Step 3: "Run Full CI Pipeline" (gear icon) with sub-bullets: "535+ unit tests", "Build 6 variants", "Security scan"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Decision Diamond: "All Tests Pass?"&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;YES path (green): "Auto-merge to main" → "Create tag v1.2.3" → "Notify: ✅ Patch Release v1.2.3 Deployed"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;NO path (red): "Preserve branch" → "Notify: ❌ CI Failed - Python 3.13.7 breaks black formatter - Manual review required"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  What Actually Happens Next
&lt;/h3&gt;

&lt;p&gt;Here's the magic part:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If everything passes:&lt;/strong&gt; The system auto-merges to main, creates a version tag, and sends me a Pushover notification on my phone: "✅ Patch Release v1.2.3 Deployed". I wake up to updated containers. I did nothing. It's beautiful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Pushover notification at top of phone screen&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;App icon, title "Container System"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Message: "✅ Patch Release v1.2.3 Deployed"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Subtext: "Updated: Python 3.13.6→3.13.7, kubectl 1.31.1→1.31.2"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Time: "6:47 AM"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;If anything fails:&lt;/strong&gt; I get a high-priority Pushover notification (the kind that makes noise even in Do Not Disturb mode): "❌ CI Failed - Python 3.13.7 breaks black formatter". The branch is preserved for manual review. Now I actually need to get involved.&lt;/p&gt;

&lt;p&gt;This means 95% of version updates happen automatically. I only get involved when something actually breaks. That's the level of automation I was looking for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important note on update control:&lt;/strong&gt; This automation updates the &lt;em&gt;containers repository itself&lt;/em&gt;. Projects that include containers as a git submodule maintain full control over when they adopt new versions. You can pin to a specific version for stability and test updates on your schedule (treating it like any other dependency managed via npm, cargo, etc.). Or you can automate pulling updates if you want to stay current automatically. The choice is yours—you get the benefits of automated testing and version tracking without being forced to adopt updates before you're ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;I designed this as a git submodule that you add to any project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git submodule add https://github.com/joshjhall/containers.git containers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  One Dockerfile, Many Configurations
&lt;/h3&gt;

&lt;p&gt;A single Dockerfile accepts build arguments to enable features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile (simplified)&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; INCLUDE_PYTHON_DEV=false&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; INCLUDE_NODE_DEV=false&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; INCLUDE_RUST_DEV=false&lt;/span&gt;
&lt;span class="c"&gt;# ... dozens more features&lt;/span&gt;

&lt;span class="k"&gt;RUN if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;      /tmp/build-scripts/features/python-dev.sh&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Modular Features
&lt;/h3&gt;

&lt;p&gt;I broke everything into self-contained installation scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lib/
  features/
    python.sh          # Python runtime
    python-dev.sh      # + poetry, pytest, black, mypy
    node.sh            # Node.js runtime
    node-dev.sh        # + TypeScript, ESLint, Jest
    rust.sh            # Rust toolchain
    docker.sh          # Docker CLI (for Docker-in-Docker)
    kubernetes.sh      # kubectl, helm, k9s
    aws.sh             # AWS CLI
    postgres-client.sh # psql
    # ... 28 feature modules total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each script validates its installation, configures caching, handles Debian version differences automatically, and follows security best practices. Critically, each is independently testable.&lt;/p&gt;

&lt;p&gt;The 28 feature modules install 100+ individual tools. For example, &lt;code&gt;golang-dev.sh&lt;/code&gt; alone installs 34 Go development tools (gopls, dlv, golangci-lint, staticcheck, etc.), while &lt;code&gt;rust-dev.sh&lt;/code&gt; installs 11 Rust tools, and &lt;code&gt;dev-tools.sh&lt;/code&gt; adds 10+ productivity utilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Caching Strategy
&lt;/h3&gt;

&lt;p&gt;BuildKit cache mounts are configured for every package manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cache,target&lt;span class="o"&gt;=&lt;/span&gt;/cache/pip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cache,target&lt;span class="o"&gt;=&lt;/span&gt;/cache/npm &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cache,target&lt;span class="o"&gt;=&lt;/span&gt;/cache/cargo &lt;span class="se"&gt;\
&lt;/span&gt;    pip &lt;span class="nb"&gt;install &lt;/span&gt;poetry &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; typescript
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;em&gt;X-axis: "First Build" and "Rebuild with Cache"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Y-axis: Time in minutes (0-10)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;First Build: Bar reaching 8.5 minutes (red)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Rebuild with Cache: Bar reaching 1.2 minutes (green)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Large callout: "7x faster rebuilds"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Below: Icons showing cached items: pip, npm, cargo, go modules&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Rebuilds are fast even when switching features. I've had builds that took 8 minutes the first time complete in under 90 seconds on subsequent runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Runtime Initialization
&lt;/h3&gt;

&lt;p&gt;First-time setup scripts run on container start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lib/runtime/
  first-time-setup.d/   # Run once per container
    20-aws-setup.sh     # Check AWS credentials
    20-kubernetes-setup.sh
  startup.d/            # Run every time
    10-docker-socket-fix.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means users get helpful setup messages instead of cryptic errors when something's misconfigured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing: Confidence Instead of Crossing Fingers
&lt;/h2&gt;

&lt;p&gt;Here's something that still surprises people: I have 535+ unit tests for bash scripts. Yeah, really.&lt;/p&gt;

&lt;p&gt;Most Dockerfile projects have zero tests. You write it, build it, hope it works, and find out it doesn't work when someone else tries to use it three months later. That's not acceptable for production infrastructure.&lt;/p&gt;

&lt;p&gt;So I built a complete testing framework specifically for bash. It has assertions, mocking, container testing utilities, and detailed reporting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="s2"&gt;"../../framework.sh"&lt;/span&gt;
init_test_framework

test_python_installs_correctly&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;# Mock external commands&lt;/span&gt;
    mock_function &lt;span class="s2"&gt;"curl"&lt;/span&gt; &lt;span class="s2"&gt;"echo 'mocked download'"&lt;/span&gt;

    &lt;span class="c"&gt;# Run the installation&lt;/span&gt;
    &lt;span class="nb"&gt;source &lt;/span&gt;lib/features/python.sh

    &lt;span class="c"&gt;# Assert expected behavior&lt;/span&gt;
    assert_success &lt;span class="s2"&gt;"Installation should succeed"&lt;/span&gt;
    assert_file_exists &lt;span class="s2"&gt;"/usr/local/bin/python"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

test_python_version&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;local &lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"test-python:latest"&lt;/span&gt;
    assert_command_in_container &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$image&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"python --version"&lt;/span&gt; &lt;span class="s2"&gt;"Python 3."&lt;/span&gt;
    assert_executable_in_path &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$image&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"poetry"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

run_test test_python_installs_correctly &lt;span class="s2"&gt;"Python installs correctly"&lt;/span&gt;
generate_report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cpaurhik1cnxa6zvjjn.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cpaurhik1cnxa6zvjjn.webp" alt="Testing framework output" width="800" height="975"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Running tests for features/python.sh...
✓ test_python_installs_correctly (0.3s)
✓ test_python_version_check (0.2s)
✓ test_poetry_available (0.4s)
✓ test_cache_configuration (0.2s)
✗ test_rust_compiles (1.2s)
  Expected: rustc command available
  Got: command not found

Tests: 447 passed, 3 failed, 0 skipped
Time: 2m 14s
Coverage: 94.2%
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;The framework provides assertion functions (&lt;code&gt;assert_success&lt;/code&gt;, &lt;code&gt;assert_equals&lt;/code&gt;, &lt;code&gt;assert_file_exists&lt;/code&gt;), container testing (&lt;code&gt;assert_command_in_container&lt;/code&gt;), a mocking system, and detailed reporting. Each test runs in isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gets Tested
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Unit Tests (535+):&lt;/strong&gt; Every bash script tested in isolation—base system setup, all 28 feature modules, runtime scripts, and user-facing commands. Each script has tests for successful installation, version verification, error handling, cache configuration, and Debian compatibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration Tests:&lt;/strong&gt; Full container builds for six real-world scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;minimal&lt;/code&gt;: Base system only, for when you want to start from scratch&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;python-dev&lt;/code&gt;: Python stack with databases, for API development&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;node-dev&lt;/code&gt;: Node.js stack with test frameworks, for web development&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cloud-ops&lt;/code&gt;: Kubernetes + Terraform + AWS + GCloud, for infrastructure work&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;polyglot&lt;/code&gt;: Python + Node.js together, for full-stack projects&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rust-golang&lt;/code&gt;: Rust + Go, for systems programming&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each integration test builds the container, verifies tools are installed correctly, runs version checks, tests actual functionality (compile code, run tests), and verifies cache configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debian Matrix Testing:&lt;/strong&gt; The CI pipeline spot-checks compatibility across Debian 11 (Bullseye), 12 (Bookworm), and 13 (Trixie) to catch compatibility issues before they ship. The system can be configured for more thorough cross-version testing when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Testing:&lt;/strong&gt; Shellcheck for static analysis, Gitleaks for secret scanning, Trivy for vulnerability scanning of the final images.&lt;/p&gt;

&lt;p&gt;When the CI pipeline runs, if something fails, I know exactly which feature broke, what assertion failed, which Debian version it affects, and whether there are security implications. No more "it works on my machine" mysteries.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers: What 42,000 Lines of Code Looks Like
&lt;/h2&gt;

&lt;p&gt;Let me show you what's actually in this repository, because the scope surprised even me when I looked at it recently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repository stats:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total size:&lt;/strong&gt; 4.8 MB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total files:&lt;/strong&gt; 178&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lines of code:&lt;/strong&gt; ~42,000 (including documentation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Shell scripts:&lt;/strong&gt; 117 files, 35,776 lines total

&lt;ul&gt;
&lt;li&gt;28 feature installation modules: 13,013 lines&lt;/li&gt;
&lt;li&gt;Test framework + unit tests: 15,263 lines&lt;/li&gt;
&lt;li&gt;Runtime/startup scripts: 1,966 lines&lt;/li&gt;
&lt;li&gt;Core utilities: 1,565 lines&lt;/li&gt;
&lt;li&gt;User-facing scripts: 1,337 lines&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Documentation:&lt;/strong&gt; 17 markdown files, 4,434 lines&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;CI/CD workflows:&lt;/strong&gt; 1,369 lines&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Docker configs:&lt;/strong&gt; 412 lines&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this actually means:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 1:1 ratio of feature code to test code (13K lines each) isn't an accident. When I said I have comprehensive testing, I meant it. For every line of feature installation code, there's roughly a line of test code validating it.&lt;/p&gt;

&lt;p&gt;The feature modules are 36% of the codebase. The tests are another 36%. The remaining 28% is split between documentation, core utilities, runtime scripts, and CI/CD. This is what production-ready infrastructure looks like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The efficiency angle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what makes this interesting: 42,000 lines supporting 28 feature modules covering 100+ tools, with full CI/CD, comprehensive testing, and extensive documentation, all in under 5MB.&lt;/p&gt;

&lt;p&gt;Most enterprise Dockerfile collections I've seen would be 5-10x this size for similar functionality. They'd have separate Dockerfiles for every combination, duplicated setup code across files, and minimal or no testing. This modular approach is genuinely more maintainable.&lt;/p&gt;

&lt;p&gt;For context, a typical enterprise setup might have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30+ separate Dockerfiles (one per project/team)&lt;/li&gt;
&lt;li&gt;Each 200-500 lines&lt;/li&gt;
&lt;li&gt;6,000-15,000 lines of duplicated Docker code&lt;/li&gt;
&lt;li&gt;Maybe 500 lines of tests if you're lucky&lt;/li&gt;
&lt;li&gt;Inconsistent versions across all of them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This system replaces all of that with one Dockerfile, modular features, and more tests than most teams have for their entire Docker infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Included
&lt;/h2&gt;

&lt;p&gt;Over time, I've built 28 feature modules that install 100+ tools to cover basically everything I need across different projects. Here's how they group by use case:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're building APIs or web services:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python with FastAPI/Flask (runtime + poetry, pytest, black, mypy, ruff)&lt;/li&gt;
&lt;li&gt;Node.js with Express/Next.js (runtime + TypeScript, ESLint, Prettier, Jest)&lt;/li&gt;
&lt;li&gt;Database clients for PostgreSQL, Redis, and SQLite&lt;/li&gt;
&lt;li&gt;All the testing frameworks and linters you actually use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you're doing cloud operations or infrastructure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes tools: kubectl, helm, and k9s for cluster management&lt;/li&gt;
&lt;li&gt;Terraform + Terragrunt for infrastructure as code&lt;/li&gt;
&lt;li&gt;AWS CLI, Google Cloud SDK, and Cloudflare Workers tooling&lt;/li&gt;
&lt;li&gt;Docker CLI for Docker-in-Docker workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you're doing ML or data science work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python data science stack with all the usual suspects&lt;/li&gt;
&lt;li&gt;Ollama for running local LLMs (because apparently that's a thing we do now)&lt;/li&gt;
&lt;li&gt;R for statistical computing&lt;/li&gt;
&lt;li&gt;Java for Spark/Hadoop work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you're doing systems programming:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rust toolchain with cargo, clippy, and rustfmt&lt;/li&gt;
&lt;li&gt;Go with all the build tools&lt;/li&gt;
&lt;li&gt;C/C++ compilers and build systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For everyone:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Git with GitHub CLI for version control&lt;/li&gt;
&lt;li&gt;1Password CLI for secrets management (so you stop committing API keys)&lt;/li&gt;
&lt;li&gt;All the basic dev utilities you forget you need until you don't have them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every feature is independently toggleable through build arguments. All versions are pinned and automatically tracked for updates by the weekly automation I described earlier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture support:&lt;/strong&gt; Works on ARM64 (Apple Silicon M1/M2/M3, AWS Graviton) and AMD64 (traditional x86_64). The same Dockerfile builds correctly on both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: How I Organized This Thing So I Could Actually Find Stuff Later
&lt;/h2&gt;

&lt;p&gt;Here's the folder structure that evolved as this project grew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;containers/
├── Dockerfile              # Universal, feature-based
├── lib/
│   ├── base/              # Core system setup
│   │   ├── setup.sh       # Base system config
│   │   ├── user.sh        # Non-root user creation
│   │   └── apt-utils.sh   # Debian version detection
│   ├── features/          # Individual feature modules (28)
│   │   ├── python.sh, python-dev.sh
│   │   ├── node.sh, node-dev.sh
│   │   └── ... all other features
│   └── runtime/           # Container initialization
│       ├── first-time-setup.d/
│       └── startup.d/
├── tests/
│   ├── unit/              # Feature-level tests
│   └── integration/       # Full build scenarios
└── examples/              # Docker Compose templates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Root: "containers/" folder icon&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Green section: "lib/base/" with shield icon - "Core system, security, user setup"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Blue section: "lib/features/" with puzzle piece icons - "28 modular features, 100+ tools" with mini-icons for Python, Node, Kubernetes, etc.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Purple section: "lib/runtime/" with play button icon - "Initialization scripts"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Orange section: "tests/" with checkmark icon - "535+ tests"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrows showing: "Build time uses lib/base + lib/features" and "Runtime uses lib/runtime"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Handling Debian Version Compatibility (Or: That One Time Debian Broke Everything)
&lt;/h3&gt;

&lt;p&gt;Remember when Debian 13 (Trixie) removed the &lt;code&gt;apt-key&lt;/code&gt; command in 2024? If you maintain Docker images, you probably remember. Container builds across the entire ecosystem broke overnight. HashiCorp tools? Broken. Kubernetes? Broken. Terraform? Broken. Every single image that added third-party repositories the "old way" just... stopped working.&lt;/p&gt;

&lt;p&gt;The error looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash: line 1: apt-key: &lt;span class="nb"&gt;command &lt;/span&gt;not found
✗ Adding HashiCorp GPG key failed with &lt;span class="nb"&gt;exit &lt;/span&gt;code 127
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I saw this coming (the deprecation warnings had been around for a while), so I built automatic Debian version detection into the system. The scripts detect which Debian version they're running on and use the appropriate method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# lib/features/terraform.sh (simplified)&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; /tmp/build-scripts/base/apt-utils.sh

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; apt-key &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="c"&gt;# Debian 11/12: Legacy method&lt;/span&gt;
    curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://apt.releases.hashicorp.com/gpg | apt-key add -
&lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="c"&gt;# Debian 13+: Modern signed-by method&lt;/span&gt;
    curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://apt.releases.hashicorp.com/gpg | &lt;span class="se"&gt;\&lt;/span&gt;
        gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/share/keyrings/hashicorp-archive-keyring.gpg
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] ..."&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package/Command&lt;/th&gt;
&lt;th&gt;Debian 11&lt;/th&gt;
&lt;th&gt;Debian 12&lt;/th&gt;
&lt;th&gt;Debian 13&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;apt-key&lt;/td&gt;
&lt;td&gt;✓ Available&lt;/td&gt;
&lt;td&gt;✓ Available&lt;/td&gt;
&lt;td&gt;✗ Removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lzma-dev&lt;/td&gt;
&lt;td&gt;✓ Available&lt;/td&gt;
&lt;td&gt;✓ Available&lt;/td&gt;
&lt;td&gt;→ liblzma-dev&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPG key method&lt;/td&gt;
&lt;td&gt;apt-key add&lt;/td&gt;
&lt;td&gt;apt-key add&lt;/td&gt;
&lt;td&gt;signed-by=&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Highlight cells that changed in red, add checkmarks in green&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I also built utility functions that feature authors can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;is_debian_version 13&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="c"&gt;# Trixie-specific logic&lt;/span&gt;
&lt;span class="k"&gt;fi

&lt;/span&gt;apt_install_conditional 11 12 lzma-dev  &lt;span class="c"&gt;# Only Debian 11/12&lt;/span&gt;
apt_install liblzma-dev                  &lt;span class="c"&gt;# Works on all versions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system handles package migrations (like &lt;code&gt;lzma-dev&lt;/code&gt; to &lt;code&gt;liblzma-dev&lt;/code&gt; in Debian 13) automatically. The CI pipeline spot-checks compatibility across Debian 11, 12, and 13, catching major issues before they ship—without the overhead of testing every possible combination on every run.&lt;/p&gt;

&lt;p&gt;This saved me when Debian 13 released. While everyone else was scrambling to fix broken builds, mine just... worked. That felt good.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Strategy Deep Dive
&lt;/h3&gt;

&lt;p&gt;I configured persistent cache volumes for every package manager that matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cache/
  ├── pip/       # Python packages
  ├── npm/       # Node packages
  ├── cargo/     # Rust crates
  ├── go/        # Go modules
  └── bundle/    # Ruby gems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mount these as Docker volumes for fast rebuilds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-v&lt;/span&gt; project-cache:/cache myproject:dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first build downloads everything. Subsequent builds reuse the cache, even if you change which features are enabled. I've seen build times drop from 8+ minutes to under 2 minutes just from proper cache configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Pitfalls I Learned the Hard Way
&lt;/h2&gt;

&lt;p&gt;Let me save you some pain by sharing mistakes I made while building this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I Chose Debian Over Alpine
&lt;/h3&gt;

&lt;p&gt;Alpine seems attractive at first—tiny base image, minimal attack surface. And for many use cases, Alpine is excellent (I use it for database containers and simple services all the time).&lt;/p&gt;

&lt;p&gt;But for development containers with lots of tools, I ran into friction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different package manager (apk vs apt) meant learning a new ecosystem and maintaining two versions of scripts&lt;/li&gt;
&lt;li&gt;Musl libc instead of glibc caused occasional compatibility issues with pre-compiled binaries&lt;/li&gt;
&lt;li&gt;Many Python packages need compilation from source on Alpine (no pre-built wheels)&lt;/li&gt;
&lt;li&gt;Some tools have better support and documentation for Debian/Ubuntu&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After spending time debugging Alpine-specific issues, I switched to Debian slim for development containers. The image size difference was about 50MB, but the development experience improved significantly. Your mileage may vary—Alpine is great for production workloads and simpler images, but Debian slim gave me fewer surprises when installing development tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Day Debian 13 Broke Everything
&lt;/h3&gt;

&lt;p&gt;I already mentioned this, but it's worth emphasizing: &lt;strong&gt;always plan for breaking changes in base images&lt;/strong&gt;. I learned to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pin Debian versions in CI testing (&lt;code&gt;debian:11-slim&lt;/code&gt;, &lt;code&gt;debian:12-slim&lt;/code&gt;, &lt;code&gt;debian:13-slim&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Test new Debian versions before they become stable&lt;/li&gt;
&lt;li&gt;Build version detection into installation scripts&lt;/li&gt;
&lt;li&gt;Never assume commands will exist forever&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;apt-key&lt;/code&gt; deprecation taught me this lesson hard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I Don't Use ARG for Secrets
&lt;/h3&gt;

&lt;p&gt;Early on, I tried using build arguments for API keys and credentials. Don't do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# DON'T DO THIS&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; AWS_ACCESS_KEY_ID&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; AWS_SECRET_ACCESS_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build arguments end up in the image history. Anyone with access to the image can read them with &lt;code&gt;docker history&lt;/code&gt;. Use secrets management (1Password CLI, AWS Secrets Manager) or mount them at runtime instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Submodule Update Trap
&lt;/h3&gt;

&lt;p&gt;Git submodules are great but have one big gotcha: they don't auto-update. When you &lt;code&gt;git pull&lt;/code&gt; your main project, the submodule stays at its old commit unless you explicitly update it.&lt;/p&gt;

&lt;p&gt;I now include a note in every project's README:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update the container system&lt;/span&gt;
git submodule update &lt;span class="nt"&gt;--remote&lt;/span&gt; containers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better yet, I'm working on a pre-commit hook that warns when the submodule is more than a week behind main.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping Your Projects Updated
&lt;/h2&gt;

&lt;p&gt;Here's the workflow for keeping your containers current:&lt;/p&gt;

&lt;h3&gt;
  
  
  Updating to Latest Versions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In your project directory&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;containers
git checkout main
git pull origin main
&lt;span class="nb"&gt;cd&lt;/span&gt; ..
git add containers
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Update container system to v1.2.3"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing containers keep running. Next time you rebuild, they'll use the new versions. If you want to force an update immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose down
docker compose build &lt;span class="nt"&gt;--no-cache&lt;/span&gt;
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What Happens During Updates
&lt;/h3&gt;

&lt;p&gt;When you update the submodule:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No immediate effect&lt;/strong&gt; - Running containers keep running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next build uses new versions&lt;/strong&gt; - Python 3.13.6 → 3.13.7, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests run during build&lt;/strong&gt; - If something breaks, the build fails (not your running container)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache mostly survives&lt;/strong&gt; - Only changed features need re-downloading&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Rolling Back if Needed
&lt;/h3&gt;

&lt;p&gt;Git submodules make rollbacks trivial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;containers
git checkout v1.2.2  &lt;span class="c"&gt;# Previous version&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ..
git add containers
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Rollback container system to v1.2.2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then rebuild. This is one of the big advantages of the submodule approach—every version is one &lt;code&gt;git checkout&lt;/code&gt; away.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Built
&lt;/h2&gt;

&lt;p&gt;Here's what the system includes right now:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature Coverage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;28 feature modules that install 100+ individual tools&lt;/li&gt;
&lt;li&gt;8 programming languages (Python, Node.js, Rust, Go, Ruby, R, Java, Mojo)&lt;/li&gt;
&lt;li&gt;5 cloud platform CLIs (AWS, GCloud, Cloudflare, Kubernetes, Terraform)&lt;/li&gt;
&lt;li&gt;3 database clients (PostgreSQL, Redis, SQLite)&lt;/li&gt;
&lt;li&gt;Comprehensive dev tools (Git, GitHub CLI, Docker CLI, 1Password CLI, and more)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test Coverage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;535+ unit tests covering every feature installation&lt;/li&gt;
&lt;li&gt;6 integration test scenarios (minimal, python-dev, node-dev, cloud-ops, polyglot, rust-golang)&lt;/li&gt;
&lt;li&gt;Debian compatibility spot-checks across versions 11, 12, and 13&lt;/li&gt;
&lt;li&gt;Security scanning with Trivy for all built images&lt;/li&gt;
&lt;li&gt;Shellcheck validation on all bash scripts&lt;/li&gt;
&lt;li&gt;Secret scanning with Gitleaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Automation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weekly automated version checks (Sunday 2am UTC)&lt;/li&gt;
&lt;li&gt;Full CI pipeline on every update (build 6 variants, run all tests, scan for vulnerabilities)&lt;/li&gt;
&lt;li&gt;Auto-merge on success, high-priority notification on failure&lt;/li&gt;
&lt;li&gt;Pushover notifications keep you informed without requiring constant monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architecture Support:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ARM64 (Apple Silicon M1/M2/M3, AWS Graviton, Raspberry Pi)&lt;/li&gt;
&lt;li&gt;AMD64 (traditional x86_64, Intel/AMD processors)&lt;/li&gt;
&lt;li&gt;Same Dockerfile builds correctly on both architectures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system is production-ready with comprehensive testing, documentation, and automation. It's being used across multiple projects, but the numbers that matter are the ones above—those demonstrate the engineering rigor built into the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Use...?
&lt;/h2&gt;

&lt;p&gt;You might be wondering why not just use existing solutions. Fair question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom Dockerfiles:&lt;/strong&gt; This is what I was doing. Full control is great until you have ten projects and need to update something in all of them. The maintenance burden became untenable. Every project drifts slightly differently, and keeping them in sync is a losing battle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dev Container Features:&lt;/strong&gt; These are actually pretty good! Microsoft's dev container features are well-designed and solve similar problems. But they're VS Code specific. I wanted something that works with VS Code, Docker Compose, plain Docker, CI/CD pipelines, and production environments. Also, dev container features don't solve the version tracking and automation problem—you still need to manually update your feature versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-built Images (python:3.13, node:20, etc.):&lt;/strong&gt; Fast to pull from Docker Hub, but you get what you get. Need Python + Node.js? That's not a standard combination. Need Python + Kubernetes tools + PostgreSQL client? Good luck finding that exact image. And you definitely don't get automatic version updates with comprehensive testing. You're trusting someone else's build process and update cadence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration Management (Ansible, Chef, Puppet):&lt;/strong&gt; Different problem space. Those are for runtime configuration of running systems (mutable infrastructure). This is build-time configuration for immutable containers. Both have their place, but they're solving different problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker Official Images + Multistage Builds:&lt;/strong&gt; This gets closer, but you still end up maintaining multistage Dockerfiles for every project. The complexity moves but doesn't disappear. And you still need to manually track version updates.&lt;/p&gt;

&lt;p&gt;The unique value here is the combination: modular features + automated updates + comprehensive testing + production-ready defaults. I haven't found another solution that does all of this together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Documentation
&lt;/h2&gt;

&lt;p&gt;I wrote comprehensive documentation because I got tired of answering the same questions (and because I'd forget details myself):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core docs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; - Quick start and common use cases&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt; - Architecture guidance and design decisions (yes, I wrote docs specifically for Claude to understand the codebase)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CONTRIBUTING.md&lt;/code&gt; - How to add new features&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CHANGELOG.md&lt;/code&gt; - Version history with breaking changes highlighted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detailed guides in &lt;code&gt;docs/&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Troubleshooting common issues (the "it doesn't work" guide)&lt;/li&gt;
&lt;li&gt;Writing tests for new features&lt;/li&gt;
&lt;li&gt;Security best practices&lt;/li&gt;
&lt;li&gt;Architecture decisions and rationale&lt;/li&gt;
&lt;li&gt;Version tracking and automated releases&lt;/li&gt;
&lt;li&gt;Security scanning with Trivy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Examples in &lt;code&gt;examples/&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker Compose templates for common scenarios&lt;/li&gt;
&lt;li&gt;Build context patterns&lt;/li&gt;
&lt;li&gt;Environment configurations&lt;/li&gt;
&lt;li&gt;Multi-service setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means new developers can onboard without waiting for me to explain things, and troubleshooting is self-service. The CLAUDE.md file has been particularly useful—it means debugging can be quickly handed off to an AI agent to track down issues. Given the rigor of the testing system, this has been quite effective. I still review the changes, of course, but an agent running something like Claude Sonnet can usually identify and fix problems while I'm mostly focused on something else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Plans
&lt;/h2&gt;

&lt;p&gt;I'm actively working on several improvements:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance optimizations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parallel feature installation (run independent installs concurrently)&lt;/li&gt;
&lt;li&gt;More aggressive layer caching&lt;/li&gt;
&lt;li&gt;Build time metrics tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Plugin system:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allow custom features without modifying core&lt;/li&gt;
&lt;li&gt;Company-specific tools (internal VPNs, proprietary CLIs)&lt;/li&gt;
&lt;li&gt;Private registry support&lt;/li&gt;
&lt;li&gt;Local feature overrides&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Configuration templates:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-built combinations for common stacks (Python FastAPI + PostgreSQL, Next.js + Redis, etc.)&lt;/li&gt;
&lt;li&gt;Quick-start templates&lt;/li&gt;
&lt;li&gt;Best practices baked in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Observability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build-time metrics (what takes the most time?)&lt;/li&gt;
&lt;li&gt;Image size tracking over time&lt;/li&gt;
&lt;li&gt;Security vulnerability trends&lt;/li&gt;
&lt;li&gt;Feature usage analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced runtime features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Environment templating (generate configs from 1Password/AWS Secrets)&lt;/li&gt;
&lt;li&gt;Health checks&lt;/li&gt;
&lt;li&gt;Auto-update mechanisms for running containers&lt;/li&gt;
&lt;li&gt;Graceful feature degradation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enterprise features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SBOM (Software Bill of Materials) generation for compliance&lt;/li&gt;
&lt;li&gt;License scanning&lt;/li&gt;
&lt;li&gt;Air-gapped environment support&lt;/li&gt;
&lt;li&gt;Custom registry integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The roadmap is driven by real usage. If you have feature requests, open an issue on GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Want to try it? The setup is straightforward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Add as Submodule
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
git submodule add https://github.com/joshjhall/containers.git containers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Create Docker Compose Configuration
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;.devcontainer/docker-compose.yml&lt;/code&gt; (or use it anywhere you'd normally use a Dockerfile):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;devcontainer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../containers&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;PROJECT_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myproject&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_DOCKER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;..:/workspace/myproject&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;myproject-cache:/cache&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;myproject-cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Arrow pointing to INCLUDE_PYTHON_DEV: "This enables Python + poetry + pytest + black + mypy"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow pointing to INCLUDE_POSTGRES_CLIENT: "Adds psql command"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow pointing to INCLUDE_DOCKER: "Docker-in-Docker support"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Arrow pointing to myproject-cache:/cache: "Persistent cache for fast rebuilds"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Green checkmark icon: "That's it - no custom Dockerfile needed!"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 3: Build and Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; .devcontainer/docker-compose.yml up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. You now have a fully configured Python development environment with PostgreSQL client and Docker-in-Docker support.&lt;/p&gt;

&lt;h3&gt;
  
  
  VS Code Dev Container Integration
&lt;/h3&gt;

&lt;p&gt;If you're using VS Code, you can use Microsoft's devcontainer base images for a cleaner integration. This avoids the Docker-in-Docker plugin complications and their questionable security implications. Here's what that looks like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.devcontainer/docker-compose.yml&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;devcontainer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../containers&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;BASE_IMAGE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/devcontainers/base:trixie&lt;/span&gt;
        &lt;span class="na"&gt;PROJECT_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myproject&lt;/span&gt;
        &lt;span class="na"&gt;USERNAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vscode&lt;/span&gt;
        &lt;span class="na"&gt;WORKING_DIR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/workspace/myproject&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
        &lt;span class="na"&gt;INCLUDE_DEV_TOOLS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
      &lt;span class="na"&gt;cache_from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;type=local,src=/tmp/.buildx-cache&lt;/span&gt;
      &lt;span class="na"&gt;cache_to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;type=local,dest=/tmp/.buildx-cache,mode=max&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;..:/workspace/myproject&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;TZ=${TZ:-America/Chicago}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ENVIRONMENT=${ENVIRONMENT:-development}&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sleep infinity&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;containers-network&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers-network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;.devcontainer/devcontainer.json&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"My Project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dockerComposeFile"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker-compose.yml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"devcontainer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"workspaceFolder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/workspace/${localWorkspaceFolderBasename}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"postCreateCommand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"poetry install"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"customizations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vscode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"extensions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ms-python.python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ms-python.vscode-pylance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"charliermarsh.ruff"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"terminal.integrated.defaultProfile.linux"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"zsh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"python.defaultInterpreterPath"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/usr/local/bin/python"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"remoteUser"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vscode"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the key differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BASE_IMAGE&lt;/code&gt; arg lets you use Microsoft's devcontainer base images&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;USERNAME: vscode&lt;/code&gt; integrates with VS Code's expectations&lt;/li&gt;
&lt;li&gt;Simple &lt;code&gt;devcontainer.json&lt;/code&gt; without Docker-specific plugin configurations&lt;/li&gt;
&lt;li&gt;The same features work whether you use &lt;code&gt;debian:13-slim&lt;/code&gt; for production or &lt;code&gt;mcr.microsoft.com/devcontainers/base:trixie&lt;/code&gt; for development&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility means you can optimize for each environment: Microsoft's devcontainer images for local VS Code development, standard Debian LTS images (11, 12, or 13) for production and QA. Same features, same tools, just different base images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding More Features
&lt;/h3&gt;

&lt;p&gt;Want to add Node.js? Just update the build args:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_NODE_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;      &lt;span class="c1"&gt;# Add this line&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_DOCKER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rebuild:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose down
docker compose build
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Python API development:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_REDIS_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Node.js web development:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_NODE_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud operations:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_KUBERNETES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_TERRAFORM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_AWS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full-stack development:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_PYTHON_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_NODE_DEV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_POSTGRES_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_REDIS_CLIENT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;INCLUDE_DOCKER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All available build arguments are documented in the repository README.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;p&gt;Security best practices are built into the foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-root user by default&lt;/strong&gt; - All processes run as a non-root user unless explicitly needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal Debian slim base images&lt;/strong&gt; - Smallest viable attack surface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated security updates&lt;/strong&gt; - Weekly automation checks for and applies security patches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret scanning with Gitleaks&lt;/strong&gt; - Prevents accidentally committed credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulnerability scanning with Trivy&lt;/strong&gt; - Catches known CVEs in dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validated installations&lt;/strong&gt; - Each feature verifies correct installation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proper file permissions&lt;/strong&gt; - No world-writable files or directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The automated update system is designed with security in mind. Security patches are prioritized and tested immediately. If a critical vulnerability is detected, the system can create emergency updates outside the normal Sunday schedule.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contributing
&lt;/h2&gt;

&lt;p&gt;Want to add a feature? The process is straightforward:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create the Feature Script
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;lib/features/your-feature.sh&lt;/code&gt; following this template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="c"&gt;# Detect latest version&lt;/span&gt;
&lt;span class="nv"&gt;TOOL_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"1.2.3"&lt;/span&gt;

&lt;span class="c"&gt;# Install&lt;/span&gt;
apt-get update
apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; your-tool

&lt;span class="c"&gt;# Validate&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; your-tool &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Installation failed"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Successfully installed your-tool &lt;/span&gt;&lt;span class="nv"&gt;$TOOL_VERSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Add Unit Tests
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;tests/unit/features/your-feature.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; &lt;span class="s2"&gt;"../../framework.sh"&lt;/span&gt;
init_test_framework

test_your_feature_installs&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;source &lt;/span&gt;lib/features/your-feature.sh
    assert_success &lt;span class="s2"&gt;"Installation should succeed"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

test_your_feature_version&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;local &lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"test-image:latest"&lt;/span&gt;
    assert_command_in_container &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$image&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"your-tool --version"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

run_tests
generate_report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Update Documentation
&lt;/h3&gt;

&lt;p&gt;Add your feature to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; - Available features list&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CHANGELOG.md&lt;/code&gt; - Under "Unreleased"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/FEATURES.md&lt;/code&gt; - Detailed feature documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Submit Pull Request
&lt;/h3&gt;

&lt;p&gt;The CI pipeline will automatically test your changes on all Debian versions and run the full test suite.&lt;/p&gt;

&lt;p&gt;See &lt;code&gt;CONTRIBUTING.md&lt;/code&gt; for detailed guidelines, coding standards, and best practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;I built this out of frustration. I was tired of writing the same Dockerfile repeatedly. Tired of tracking version updates manually. Tired of fixing the same Docker issues in six different repos. Tired of spending days on infrastructure instead of building actual products.&lt;/p&gt;

&lt;p&gt;This system solves all of those problems for me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New projects set up in 10 minutes instead of 2 days&lt;/li&gt;
&lt;li&gt;All my projects stay in sync automatically
&lt;/li&gt;
&lt;li&gt;Version updates happen while I sleep (and only wake me if something breaks)&lt;/li&gt;
&lt;li&gt;Dev and production environments use identical foundations&lt;/li&gt;
&lt;li&gt;I have actual confidence that builds work because of comprehensive testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It started as a side project to scratch my own itch. But it's become the foundation for everything I build with containers now. I haven't written a custom Dockerfile in over a year. I don't miss it.&lt;/p&gt;

&lt;p&gt;The best part? The automation. That weekly CI run that updates everything, tests it, and either deploys or notifies me? That's the difference between managing infrastructure and just &lt;em&gt;using&lt;/em&gt; it. I wake up to patched containers. I spend time building products instead of maintaining Docker configs.&lt;/p&gt;

&lt;p&gt;If you're dealing with the same pain points—multiple projects, manual version tracking, inconsistent environments, time-consuming setup—this might help you too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It On Your Next Project
&lt;/h2&gt;

&lt;p&gt;Don't overcommit. Just try it on one project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the git submodule&lt;/li&gt;
&lt;li&gt;Enable the features you need&lt;/li&gt;
&lt;li&gt;Build the container&lt;/li&gt;
&lt;li&gt;Spend your time building instead of configuring Docker&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you run into issues or have questions, open an issue on GitHub. I'm actively maintaining this and usually respond within a day.&lt;/p&gt;

&lt;p&gt;If it saves you even half the time it's saved me, that's hours back in your week. Hours you can spend on things that actually matter.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/joshjhall/containers" rel="noopener noreferrer"&gt;https://github.com/joshjhall/containers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; See the &lt;code&gt;docs/&lt;/code&gt; directory for detailed guides&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt; See the &lt;code&gt;examples/&lt;/code&gt; directory for Docker Compose templates&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Issues/Questions:&lt;/strong&gt; Open an issue on GitHub - I'm responsive and want this to work well for others too&lt;/p&gt;

</description>
      <category>automation</category>
      <category>docker</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
    <item>
      <title>CFS: Scoring Features Before You Argue About Them</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Wed, 10 Jun 2026 22:29:56 +0000</pubDate>
      <link>https://dev.to/joshjhall/cfs-scoring-features-before-you-argue-about-them-3p50</link>
      <guid>https://dev.to/joshjhall/cfs-scoring-features-before-you-argue-about-them-3p50</guid>
      <description>&lt;p&gt;Should we build two-factor authentication? Users have asked for it. That isn't a yes. Should we add the export-to-Excel feature the enterprise account keeps requesting, or the second product line a competitor just shipped? Every team faces a steady stream of these, and most answer them in a meeting where whoever holds the strongest opinion wins. Six months later the roadmap has a dozen features nobody uses, and a dozen more that got turned down for reasons no one can reconstruct.&lt;/p&gt;

&lt;p&gt;The deeper problem isn't the decision. It's that the decision evaporates. It becomes an action item, a five-minute hallway conversation, a Slack thread, and it is almost never written down with the reasoning attached. So in a year, someone asks the same question, proposes the same feature you already killed, and nobody can say why it died. Maybe the answer should change now; maybe it shouldn't. Without a record you can't tell, and you can't improve the way you decide, because the inputs are gone. Organizational memory turns out to be however much a few people can hold in their heads (which is less than you'd hope and not nearly as long as you need) and that's before half the team turns over.&lt;/p&gt;

&lt;p&gt;The fix isn't more meetings. It's a small piece of structure that gives the conversation an anchor: three axes, a one-to-five rating on each, and a score you write down with the decision. It's called CFS — Commonality, Frequency, Severity. It won't make the call for you. It makes the call &lt;em&gt;comparable&lt;/em&gt; and &lt;em&gt;durable&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Axes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Commonality&lt;/strong&gt; asks what share of your users would touch this feature at all. Are 80% of users going to export to Excel, or 20%? Is it one important enterprise client and nobody else? A one is a single niche audience. A five is universal: the kind of thing nearly everyone in the product needs. Fives are rare, and the rarity is the point. If everything scores a five, the axis has stopped telling you anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frequency&lt;/strong&gt; asks how often the people who &lt;em&gt;do&lt;/em&gt; use it come back. Among that slice of users (whether it's 20% of the base or 95%) is this a couple-times-a-year thing or a twenty-times-a-day thing? Note that this is conditional on commonality: a feature can matter to only a sliver of users and still be something that sliver lives in daily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Severity&lt;/strong&gt; asks how much the absence hurts the people who'd benefit, and whether there's an easy way around it. The workaround question is doing most of the work here. Can they copy-paste manually? Lean on a shortcut key, or an OS capability common enough to assume everyone has it, like printing? That's low severity. At the other end: without this, the product is worthless to the audience that needs it. Not being able to export your books to Excel out of an accounting tool can be that. Not being able to print from a photo app can be that. Most things land somewhere between the convenience and the dealbreaker.&lt;/p&gt;

&lt;p&gt;The three are independent. A feature can be common but rarely used, used constantly by a tiny group, painful to lack but only in one corner of the product. Keeping them separate is what lets the score carry information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multiply, Don't Add
&lt;/h2&gt;

&lt;p&gt;Here's the part that matters and is easy to get wrong: the axes &lt;strong&gt;multiply&lt;/strong&gt;. Commonality times Frequency times Severity, so a one-to-five scale tops out at 125, not 15.&lt;/p&gt;

&lt;p&gt;Multiplication is deliberate, because each tick should land with more weight than the last — a jump from severity three to four isn't one unit more pain, it's a different category of pain, and the math should say so. It also means a single one anywhere drags the whole thing down, which is usually correct: a feature almost nobody can use, no matter how often or how critically, probably isn't where your next two weeks should go.&lt;/p&gt;

&lt;p&gt;A shortcut key shows why the axes have to stay separate. I remap keys to split and merge browser windows constantly (high frequency, for me). But my web habits are geeky and unrepresentative; commonality is low. High frequency, low commonality, and the product is fine without it: the workaround is a button two pixels away. Multiply it out and the score stays honest about that.&lt;/p&gt;

&lt;p&gt;Calibrate the top of the scale hard. A five means six-sigma certainty: everyone uses it, or the people who do use it dozens of times a day, or the product is simply broken without it. I rarely reach for fours and almost never fives. Most real scoring lives in the one-to-three band with the occasional four, and that's the framework working, not failing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Scores Look Like in Practice
&lt;/h2&gt;

&lt;p&gt;Take private accounts on a consumer social network. Authenticated, individual accounts are how the product works at all (high commonality, high severity, you don't have a social network without them). But that one decision fans out into a cluster of features whose scores diverge sharply. Password reset, lost-password flow, passkeys, emailed one-time codes, OTP: all roads to "I can get into my account," each scoring differently. And frequency is genuinely contextual. If I let a mobile session refresh silently for months, signing back in is rare even though the account itself is universal and critical. High commonality, high severity, low frequency, and the math holds all three at once.&lt;/p&gt;

&lt;p&gt;Now account merging. You join a network with your Gmail, forget, and join again with Yahoo. Two accounts, two addresses. There are workflows to reconcile them, but it's an uncommon situation, an infrequent one, and it takes a savvy user to even notice. The system usually can't detect it without a reliable shared anchor like a verified phone number. And the workaround is brutal but real: just delete one account. Low commonality, low frequency, low-to-moderate severity; multiply it out and you get a small number, which is the right answer. A lot of products correctly never build this.&lt;/p&gt;

&lt;p&gt;Printing sits in the messy middle, which is exactly why it's useful. Plenty of apps don't need it. But plenty of users still print, or print-to-PDF because a manager wants it emailed. Middling commonality, low-to-middling frequency, low-to-middling severity depending on the domain: a clean printable view is a respectable, unglamorous, middle-of-the-table feature. Not every decision is a clear build or a clear cut, and the score is honest about the ones that aren't.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Input Among Many
&lt;/h2&gt;

&lt;p&gt;CFS is not a prioritization engine. It's the &lt;em&gt;benefit&lt;/em&gt; half of a cost-benefit, and it deliberately leaves cost out.&lt;/p&gt;

&lt;p&gt;That's the line between CFS and something like &lt;a href="https://www.intercom.com/blog/rice-simple-prioritization-for-product-managers/" rel="noopener noreferrer"&gt;RICE&lt;/a&gt;, the Intercom model that folds effort in as a divisor: Reach times Impact times Confidence, over Effort. RICE bakes the cost into the number. CFS doesn't, on purpose, because cost belongs in a separate, cleaner conversation. Effort is the easy thing to compare: put it in person-weeks and you're done. I've prioritized half a dozen ones and twos ahead of an eighteen plenty of times, simply because the small ones shipped in days while the big one needed two more weeks of design before engineering could even start.&lt;/p&gt;

&lt;p&gt;The harder inputs resist a tidy number. Team load-balancing: the feature needs Susan and Javier, but Javier is booked solid for a quarter on something the executives flagged, and Susan won't surface from her two current projects for six weeks. That has nothing to do with the feature's merit and everything to do with whether you can staff it. Then there's strategic value a usage score will never capture. A feature that scores low on all three axes but demos beautifully and helps sales close can absolutely be worth building.&lt;/p&gt;

&lt;p&gt;So treat the CFS number as one calibrated input you set beside cost, capacity, and strategy, not the verdict. As a rough read on whether something &lt;em&gt;looks&lt;/em&gt; important before the harder conversations start, I've found nothing better. An eighteen on the board earns real attention. North of twenty-four, I'm usually building it or making a deliberate case for why not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Payoff Is Rigor
&lt;/h2&gt;

&lt;p&gt;Here's what the score actually buys you, and it took me embarrassingly long to name it: it adds a pseudo-quantitative layer to a fundamentally qualitative judgment, and that structure does two things a meeting can't.&lt;/p&gt;

&lt;p&gt;First, it lets the best idea win regardless of who has it. When the conversation is "how common, how frequent, how severe," it stops mattering whether the proposal came from the CEO or the quietest person in the room. You're rating the need, not the advocate. The strong-opinion-wins dynamic that runs most feature debates loses its grip, because everyone is now arguing about the same three things in the same terms.&lt;/p&gt;

&lt;p&gt;Second, it forces people to check their own biases. If I walked in certain feature A beat feature B, and we score them and A comes out a nine while B comes out a twenty-four, I have to sit with that. Why did I think A mattered more? Is our read on B inflated, or was my conviction about A just ego, or familiarity, or whatever I carried into the room? The tension between gut and score is the most valuable thing CFS produces — not because the number is right and the gut is wrong, but because the gap is where the real conversation lives. That's also where you should introspect hardest: when half the room expected low and it came back high, the disagreement is pointing at a hidden assumption worth dragging into the light.&lt;/p&gt;

&lt;p&gt;The number was never the deliverable. The deliverable is a room full of people who have stopped arguing about whose opinion is louder and started arguing about how much the absence actually costs — written down, with the reasoning attached, so that when the question comes back in a year, the answer is right there waiting. &lt;em&gt;Commonality two, frequency one, severity two. Has anything changed?&lt;/em&gt; Usually nothing has. And when it has, you'll know exactly what.&lt;/p&gt;

</description>
      <category>ux</category>
      <category>uxdesign</category>
      <category>productivity</category>
      <category>product</category>
    </item>
    <item>
      <title>Chat Is an Input, Not an Interface</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Tue, 09 Jun 2026 00:51:03 +0000</pubDate>
      <link>https://dev.to/joshjhall/chat-is-an-input-not-an-interface-39bj</link>
      <guid>https://dev.to/joshjhall/chat-is-an-input-not-an-interface-39bj</guid>
      <description>&lt;p&gt;Ask me my address in a chat box and you've just made my life worse. I can type it into a form in three seconds: one that validates the ZIP, knows my state from my city, and tells me immediately if I fat-fingered a digit. In a chat box I'm at the mercy of however the model decides to parse my reply, and some fraction of the time it comes back subtly wrong because I phrased it in a way the prompt didn't anticipate. The form was right there.&lt;/p&gt;

&lt;p&gt;That's the thing I keep coming back to looking at the wave of chat-first products. Conversation is a real input modality. It is rarely the right default. Picking the interface that fits the task is design work, and replacing every form, grid, and canvas with a prompt box because the model is cheap is the opposite of doing that work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat Is One Input, Not the Interface
&lt;/h2&gt;

&lt;p&gt;The category error in most chat-first products is treating chat as &lt;em&gt;the&lt;/em&gt; interface rather than &lt;em&gt;an&lt;/em&gt; input. The model underneath is a processing layer. The surface the user touches is a separate decision, and collapsing the two ("we shipped an LLM, so the product is a chat window") is what produces the worst of this genre.&lt;/p&gt;

&lt;p&gt;Conversation is one input among many, and most software already runs on a rich vocabulary of others: forms, dropdowns, sliders, grids, direct manipulation on a canvas. Each exists because it fit a task better than typing a sentence would. A chat box doesn't retire any of them. It joins the list, useful in the specific places where the others fall short — and a liability everywhere they don't.&lt;/p&gt;

&lt;p&gt;The honest test for any chat-first screen is whether prose is the &lt;em&gt;best&lt;/em&gt; way to express the task, or merely the easiest way to ship it. Those answers diverge more often than the demos admit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Chat Earns Its Keep
&lt;/h2&gt;

&lt;p&gt;There's a category of input that genuinely benefits from conversation: anything where the structure of the answer isn't known in advance, where the follow-up questions depend on previous answers, or where the user's vocabulary doesn't match the application's and something has to translate between them. If the task is the kind of thing an interview handles better than a questionnaire, chat is worth exploring. The keyword is &lt;em&gt;interview&lt;/em&gt; — there are limits, and they arrive fast.&lt;/p&gt;

&lt;p&gt;Construction takeoff estimating is a good example from a project I work on. The drawings give you measurable structured data (square footage, room count, fixture positions) but say nothing about finish level. A builder-grade kitchen package might run $10K. A premium one with a Viking range and a Sub-Zero fridge can hit $150K. The plans don't say which the owner wants, and no single form field captures the difference cleanly.&lt;/p&gt;

&lt;p&gt;Chat fits there. The system asks the contractor what conversations they've been having with the owner, picks up on "they want it nice but not crazy" or "she keeps showing me Italian tile," and translates that into structured data the estimator can apply. The contractor's mental model is conversational; forcing them through a finishing-level dropdown throws information away. Chat also lets ambiguity survive when it needs to — "they're still deciding, but I think it'll land in the $35K–$45K range" is a perfectly valid answer, and form fields either choke on that kind of variance or become elaborate interfaces in their own right.&lt;/p&gt;

&lt;p&gt;Chat earns its place there because the input is genuinely unstructured at the source. Most chat interfaces I see aren't solving that problem. They're solving the much smaller problem of "we shipped a model and now everything has to look like one."&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Direct Manipulation Wins
&lt;/h2&gt;

&lt;p&gt;Then there's the larger territory the chat-first framing quietly ignores: the work that graphical interfaces do better than any sentence can, because the interface &lt;em&gt;is&lt;/em&gt; the precision.&lt;/p&gt;

&lt;p&gt;Try to nudge the spacing on an image by describing it. "Move it left a bit — no, less — okay, now down." That's ten seconds of direct manipulation in Figma or Illustrator turned into a frustrating game of telephone. A prose-only path to production design (describe the screen, accept whatever comes back) fails before it starts, because human language can't specify pixel spacing, and the tools that actually work never ask it to. The same holds for CAD, spreadsheets, page layout, business analytics: domains where the value lives in a precise, spatial, direct relationship between the user's hand and the artifact. Prose can't carry that bandwidth.&lt;/p&gt;

&lt;p&gt;The forms case is just the narrow, well-behaved end of this same spectrum. Address entry, date selection, numeric ranges, anything with strict validation, anything the user has typed a hundred times, anything whose answers enumerate cleanly in a short dropdown: these are known data of a discrete shape the system needs to collect, sometimes with an order or a dependency between them. A form is a contract. It states exactly what's needed, validates locally, surfaces errors on the spot, and finishes in seconds. Run the same task through chat and you get a slow guessing game with the ambiguity pushed onto the user, who might not notice the model misread the address until the shipping confirmation arrives a day later.&lt;/p&gt;

&lt;p&gt;Bulk is where it gets stark. Entering one phone number through chat? Maybe. Entering a client list with every contact's details? Almost never. It's faster to upload a CSV or type into a grid than to narrate a hundred rows. The richer the structure, the worse prose performs.&lt;/p&gt;

&lt;p&gt;None of this is an argument against the model. The model can sit behind the form, the grid, the canvas, autocompleting, validating against external sources, suggesting corrections. The interaction surface doesn't have to be a chat box just because the processing layer is an LLM. As Andrej Karpathy puts it in his &lt;a href="https://www.latent.space/p/s3" rel="noopener noreferrer"&gt;Software 3.0 framing&lt;/a&gt;, the future stack is classical code, machine learning, and LLMs working together; for genuine business logic I'd add a rules engine to that list. The interface is a design choice independent of which of those is doing the work underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Right Number of Ways
&lt;/h2&gt;

&lt;p&gt;This is partly an old debate in new clothes: how many ways to do one thing should an application offer? Both extremes are wrong.&lt;/p&gt;

&lt;p&gt;Perl and COBOL famously hand you fifteen ways to write the same line, and the result is often code nobody else can read; the reader has to reverse-engineer not just what the author did but which dialect they were speaking. The most flexible interface to an operating system is probably a Unix shell: it can do nearly anything, and it's correspondingly hard to learn and easy to misuse. Maximum optionality has a real cognitive price.&lt;/p&gt;

&lt;p&gt;Python takes the opposite stance ("there should be one obvious way to do it") and rode it past the point of usefulness. For most of its history the language had no real &lt;code&gt;case&lt;/code&gt; statement, on the principle that nested &lt;code&gt;if/elif/else&lt;/code&gt; was already enough. The philosophy was consistent; the lived reality was that a match statement is simply easier to read than a chain of elifs, and the one-obvious-way dogma cost the language readability for years in a spot that didn't need the discipline.&lt;/p&gt;

&lt;p&gt;The balance for an application is two or three modalities for the same task, chosen deliberately. A form for the structured case. A chat fallback for the unstructured one. Maybe a power-user shortcut for repeat operations. Three covers the meaningful variation in how people work without sliding into Perl-land. One, picked because the designer fell for a modality, quietly pushes half the user base into a worse experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LLM Bridges the Gaps
&lt;/h2&gt;

&lt;p&gt;So if chat isn't the interface, what is the model actually for? Its real power is bridging ambiguity, papering over the places in a workflow where the input is messy, the formatting is inconsistent, or the answer is still half-formed, so the user can keep moving toward the objective instead of stalling on a field that won't accept what they have.&lt;/p&gt;

&lt;p&gt;Back to construction. Bids come back from subcontractors in wildly inconsistent formats. Pulling the total price off each one is trivial; regex or a parser handles it. What that price &lt;em&gt;includes and excludes&lt;/em&gt; is where you need real interpretation, and that's the LLM's job: read the messy document, triage what plain machine learning can extract, what a rules engine can decide, and what genuinely needs a model to interpret. The output isn't a chat transcript. It's a structured model of the bid, echoed back through the normal interface in a consistent shape: the same fields, every time, however ragged the source.&lt;/p&gt;

&lt;p&gt;That consistency is the actual payoff of LLM-powered UX, and it has almost nothing to do with chat as a surface. Picture it in CAD. I've got my 3D mouse, my shortcuts, my hands deep in the geometry of a part. Chat isn't where I model; that would be absurd. But "now that the shape's roughed in, run a torque analysis on this and farm it to the background while I keep working" is exactly where a conversational command earns its place, riding alongside the precise interface instead of replacing it. The model bridges; it doesn't take the wheel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Default Is the Enemy of Design
&lt;/h2&gt;

&lt;p&gt;If you're designing a product right now and the default screen is a chat box, here's the test. Pull up the five most common tasks your users perform. For each, ask whether a form, grid, dropdown, button, or canvas would let them finish faster, with less ambiguity and better validation than typing a sentence.&lt;/p&gt;

&lt;p&gt;If the answer is yes across the board, you've built a forms-and-grids application wearing a chatbot costume, and the costume is hurting your users. Ship the real interface. Keep chat for the slice of cases where the input genuinely doesn't fit a structured surface, and put it behind a button that says "or just describe it" rather than making it the front door.&lt;/p&gt;

&lt;p&gt;If the answer is no for some of them (you can't predict the shape of the input, the follow-ups depend on the answers, the user's vocabulary won't survive translation to fields) then chat is the right primary input &lt;em&gt;for those tasks.&lt;/em&gt; Build it well, set the structured shortcuts beside it for the users who want them, and treat chat as a mode the application offers, not the application's identity.&lt;/p&gt;

&lt;p&gt;The model is a tool. Chat is one of the interfaces it can power, valuable exactly where ambiguity lives and forgettable everywhere else. Forms, grids, sliders, and direct manipulation remain the right answer for most of what people actually do with software. The designers who can hold all of those in their head at once and choose on purpose will out-build the ones who reach for whatever modality is trending.&lt;/p&gt;

&lt;p&gt;Default is the enemy of design.&lt;/p&gt;

</description>
      <category>ux</category>
      <category>uxdesign</category>
      <category>ai</category>
      <category>uidesign</category>
    </item>
    <item>
      <title>Rethinking Design Systems: When Code Becomes the Source of Truth</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Sat, 06 Jun 2026 14:30:00 +0000</pubDate>
      <link>https://dev.to/joshjhall/rethinking-design-systems-when-code-becomes-the-source-of-truth-3g28</link>
      <guid>https://dev.to/joshjhall/rethinking-design-systems-when-code-becomes-the-source-of-truth-3g28</guid>
      <description>&lt;h2&gt;
  
  
  The Design-to-Development Gap
&lt;/h2&gt;

&lt;p&gt;Every product team knows this dance: Design creates beautiful components in Figma. Engineering builds them in React. Then something breaks. The button in production doesn't quite match the button in Figma. Someone updated the design. Someone else updated the code. Nobody updated both. The documentation is definitely not up to date either.&lt;/p&gt;

&lt;p&gt;We tell ourselves this is inevitable. Design tools and code are fundamentally different beasts. The best we can hope for is "close enough" and good documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I think we've been solving this backwards.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the first in a series about building design systems that actually work. Figma doesn't have to be just pretty pictures for engineers to interpret. It can be a structured system that directly informs code generation. Design and engineering don't have to be two sources of truth struggling to stay in sync. They can be a single system with design tools as one interface and code as another.&lt;/p&gt;

&lt;p&gt;Sound impossible? I've built a proof of concept, and I think there's something here.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Vision: Reverse the Flow
&lt;/h2&gt;

&lt;p&gt;The typical process looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Design in Figma&lt;/strong&gt;: Create component mockups with variants, states, interactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec it out&lt;/strong&gt;: Write documentation explaining what engineers should build
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement in code&lt;/strong&gt;: Engineers interpret the designs and build React components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publish to Figma&lt;/strong&gt;: Designers use these components in mockups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift begins&lt;/strong&gt;: Design updates Figma. Code updates independently. Documentation lags behind both.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem? We're maintaining &lt;strong&gt;two separate sources of truth&lt;/strong&gt; and hoping they stay synchronized through diligence and documentation.&lt;/p&gt;

&lt;p&gt;What if we flipped this? Here's the component library workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Design conceptually in Figma&lt;/strong&gt; (sketches, explorations, mockups)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement the component&lt;/strong&gt; in React based on those designs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build testing, Storybook, documentation&lt;/strong&gt; (the real specification)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate Figma components from the code&lt;/strong&gt; ← This is the flip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publish these code-generated components&lt;/strong&gt; to the shared Figma library&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy9p44fap273teyq1c1b.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy9p44fap273teyq1c1b.webp" alt="Search variants in the icon system" width="799" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Figma component library stops being aspirational and starts reflecting what actually exists in production. When a designer uses a button component in Figma, they're using the button component in code, just rendered in a design tool.&lt;/p&gt;

&lt;p&gt;These code-generated Figma components carry metadata. Design handoff tools and AI agents don't have to &lt;em&gt;guess&lt;/em&gt; that a button maps to &lt;code&gt;&amp;lt;Button variant="primary" size="large"&amp;gt;&lt;/code&gt;. They can &lt;em&gt;know&lt;/em&gt; this because the component itself contains that information.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Requires
&lt;/h3&gt;

&lt;p&gt;This approach demands rethinking some assumptions about component libraries. Design tools become where component concepts get explored and refined, but the component library itself reflects the code. Product designs still happen in Figma using these components. But the components designers use are no longer aspirational—they're direct representations of what exists in production. Documentation stops being a bridge between two systems and gets embedded in the system itself.&lt;/p&gt;

&lt;p&gt;Early-stage products iterating rapidly might not want this level of structure yet. But small teams building for the long term could benefit significantly—this is exactly the kind of foundation that prevents the tech and design debt that accumulates as teams grow.&lt;/p&gt;

&lt;p&gt;But if you're building a mature product with a design system, constantly fighting drift between Figma and code, and dreaming of better design-to-development handoffs? This might be interesting.&lt;/p&gt;

&lt;p&gt;The approach requires solving some hard technical problems: generating Figma components programmatically, tracking changes efficiently, maintaining sync between systems, and handling edge cases gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'm starting with something concrete: icons.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If this is the easiest part of the vision (and honestly, it is), then succeeding here proves the approach is feasible. Failing here means the bigger vision isn't practical.&lt;/p&gt;

&lt;p&gt;The plugin and automation are open source: &lt;a href="https://github.com/joshjhall/google-symbols-figma-plugin" rel="noopener noreferrer"&gt;github.com/joshjhall/google-symbols-figma-plugin&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Icons Make Sense as Step One
&lt;/h2&gt;

&lt;p&gt;Icons might seem like a small part of the bigger vision, but they're the perfect place to start for three practical reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I need them anyway.&lt;/strong&gt; Whether or not the code-driven design system idea works, I need a comprehensive, up-to-date icon library in Figma. If the larger experiment fails or becomes too complicated, this part still delivers value to both design and engineering today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's a learning ground.&lt;/strong&gt; Building a Figma plugin that pulls data directly from a Git repository and generates components programmatically? That's exactly what I'll need to do when generating components from React code later. Better to learn on a well-structured external repository than on my own codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It forces optimization.&lt;/strong&gt; With nearly 4,000 icons and 504 variants each, that's close to 2 million total variants to manage. GitHub rate limiting. Figma memory constraints. Partial import failures. Incremental updates. All the problems I'll face when processing a real codebase at scale need solutions here first.&lt;/p&gt;

&lt;p&gt;It took several days to get all the icons initially imported into Figma. The process would fail partway through due to HTTP rate limits when downloading raw files directly from GitHub (turns out downloading 75,000+ SVG files is enough to hit those limits). I'd restart it. It would fail again, differently. I spent a lot of time staring at progress bars and error logs.&lt;/p&gt;

&lt;p&gt;Those same strategies (graceful recovery, smart batching, delta-only updates) will be essential when re-processing codebases for changes. If rebuilding from scratch takes hours or days, the system won't work in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Google Material Symbols?
&lt;/h2&gt;

&lt;p&gt;I needed an icon system that could work as a shared foundation. Google's Material Symbols fit perfectly.&lt;/p&gt;

&lt;p&gt;The library is comprehensive, with nearly 4,000 icons covering most interface needs. But more importantly, it's a proper system, not just a collection of SVGs. Each icon has 3 visual styles (Outlined, Rounded, Sharp), 7 weights (100-700, like typography), 2 fill states (empty or filled), 3 optical grades (adjusts for light/dark backgrounds), and 4 standard sizes (20, 24, 40, 48dp).&lt;/p&gt;

&lt;p&gt;That's &lt;strong&gt;504 explicit variants per icon&lt;/strong&gt; (3 × 7 × 2 × 3 × 4). Google ships these as variable fonts for production use, about 1.5MB per style or 4.5MB total, highly optimized for web delivery. But they also maintain the source SVGs these fonts are built from, which is what the plugin uses.&lt;/p&gt;

&lt;p&gt;Why SVGs for Figma? Because Figma needs actual vector paths to render icons correctly. Using SVGs means the Figma components look exactly like production, with all the same configuration options clearly represented for engineering and other stakeholders. Then the actual implementation can use the optimized font files rather than embedding SVGs directly.&lt;/p&gt;

&lt;p&gt;The raw SVG source is nearly 1GB. Two million very small files add up quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Universal delivery&lt;/strong&gt;: The library is already cached on millions of devices. Developers know it. Designers recognize it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Active maintenance&lt;/strong&gt;: Google updates it regularly with new icons and refinements. If I'm building a system around staying synchronized, I need a source that actually changes. That said, there have only been about 10 updates in the last 18 months, so changes happen infrequently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tbhjszofmbyqkqxg5vn.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tbhjszofmbyqkqxg5vn.webp" alt="Icon rotation set showing systematic variants" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If both my Figma components and my React components use the same Material Symbols system with the same naming and variants, we have a shared vocabulary. A &lt;code&gt;search&lt;/code&gt; icon in Figma maps to a &lt;code&gt;search&lt;/code&gt; icon in code, with weight, style, and fill properties that mean the same thing in both places.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technical Challenge: Updates and Metadata
&lt;/h2&gt;

&lt;p&gt;Building a static snapshot of 4,000 icons is straightforward. Building a system that &lt;strong&gt;stays synchronized&lt;/strong&gt; as Google updates their repository was the hard part. Here's how I solved the three trickiest problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracking Changes with Metadata
&lt;/h3&gt;

&lt;p&gt;Each icon component in Figma stores metadata using Figma's &lt;code&gt;setPluginData()&lt;/code&gt; API. Specifically, each icon component stores the Git SHA from the commit that last changed it in Google's repository, and each variant frame stores a hash of its SVG content.&lt;/p&gt;

&lt;p&gt;When Google updates the repository, my automation script (running weekly via CI) analyzes which icons changed between the old commit and the new one. It notifies me and calculates the delta information needed for the plugin.&lt;/p&gt;

&lt;p&gt;Then I manually run the plugin on each of the 26 files. Running all 26 takes about an hour or two total now. The initial build took nearly an hour or more per file, over 26 hours total. The plugin only regenerates icons that actually changed.&lt;/p&gt;

&lt;p&gt;Take the &lt;code&gt;pentagon&lt;/code&gt; icon. Google has a bug in the 20dp optical size alignment. (Yes, I check every update hoping they've fixed it.) When they eventually fix it, the script identifies that &lt;code&gt;pentagon&lt;/code&gt; has a different Git SHA than what's stored in Figma. The plugin downloads all 504 SVG variants, computes content hashes for each, and compares them with stored hashes. Only the 20dp variants have different hashes, so only those frames get updated. Icons without updates are skipped entirely.&lt;/p&gt;

&lt;p&gt;What used to take 26+ hours becomes a 1-2 hour update cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Edge Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Deprecations&lt;/strong&gt;: When Google deprecates an icon, I don't delete it immediately. The plugin prefixes the component name with &lt;code&gt;_deprecated_&lt;/code&gt;. Designers can still see deprecated icons in their existing designs while preventing new usage. On the next publish, these can be cleanly removed from the shared library.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New additions&lt;/strong&gt;: When Google adds new icons, they need to fit into the existing alphabetical organization. The 26 files are split alphabetically, with most files containing about 160 icons ranging from 80 to 200. New icons are added to whichever adjacent file has fewer icons, and the filename range gets updated accordingly. Each file is 28-39MB, 900MB total. Keeping files balanced prevents unnecessary icon moves and stays within Figma's memory constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graceful recovery&lt;/strong&gt;: During initial development, partial failures were common. I'd start building 170 icons for a file, get 70% through, then start hitting rate limits. Instead of getting all 504 variants for an icon, I'd only get 274 SVGs. The rest returned 404 errors.&lt;/p&gt;

&lt;p&gt;The plugin creates partial icons with whatever SVGs it successfully retrieved. Then I'd wait a few minutes and rerun the plugin on the same file to backfill the missing variants. Some files I had to rerun several times. Often because I was getting impatient and didn't wait long enough between attempts.&lt;/p&gt;

&lt;p&gt;On the plus side, most of this ran in the background while I worked on other things. On the downside, it still took a bit of my attention off and on for several days to complete the initial import.&lt;/p&gt;

&lt;p&gt;But this robustness is exactly what I mean by "scalable to more complex scenarios." Icons are simple. Just SVG paths. When I eventually generate components from React code, I'll likely need to download JSON schemas, API responses, or other data to describe what's needed for complex components. I'll hit those same errors. The system needs to handle partial failures gracefully and resume where it left off.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Result: A Living Icon Library
&lt;/h2&gt;

&lt;p&gt;With all these strategies in place (delta updates, smart batching, metadata tracking, and graceful recovery), the system delivers what I set out to build: a living icon library that stays current with minimal manual intervention.&lt;/p&gt;

&lt;p&gt;The plugin generates &lt;strong&gt;26 Figma files&lt;/strong&gt;, organized alphabetically, with nearly 4,000 icons and their 504 variants. These files are exported and attached to GitHub releases, so anyone can download pre-built &lt;code&gt;.fig&lt;/code&gt; files and use them immediately.&lt;/p&gt;

&lt;p&gt;More importantly, the system &lt;strong&gt;stays current&lt;/strong&gt;. A GitHub Action runs weekly, checks Google's repository for changes, calculates the delta if changes are detected, runs tests on the changes, and notifies me with an optional mobile push notification.&lt;/p&gt;

&lt;p&gt;When I merge the changes and run the plugin on the 26 files (an hour or two of work), only the changed icons regenerate. Given that updates happen roughly twice a year, we're talking about just a few hours of annual maintenance. Additional automation either isn't possible in Figma's environment or wouldn't save enough time to be worth building.&lt;/p&gt;

&lt;p&gt;The plugin and pre-generated &lt;code&gt;.fig&lt;/code&gt; files are available in the latest release: &lt;a href="https://github.com/joshjhall/google-symbols-figma-plugin/releases/tag/v1.2.2" rel="noopener noreferrer"&gt;v1.2.2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository also takes advantage of my universal docker container system for consistent development and CI/CD environments. That system is open sourced at &lt;a href="https://github.com/joshjhall/containers" rel="noopener noreferrer"&gt;github.com/joshjhall/containers&lt;/a&gt;, with a detailed write-up about the approach: &lt;a href="https://medium.com/@josh-hall/building-a-universal-container-system-so-i-never-have-to-write-another-custom-dockerfile-1bcb62d4be7c" rel="noopener noreferrer"&gt;Building a Universal Container System&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ofcfpq7qswlz2aflt1f.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ofcfpq7qswlz2aflt1f.webp" alt="Icon generation example in Figma" width="800" height="928"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How These Files Actually Get Used
&lt;/h3&gt;

&lt;p&gt;I'm going to dive deeper into this in the next post, but the brief version: my design system has multiple layers, each serving different purposes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Core components&lt;/strong&gt; (where code-generated components will live): Fundamental UI elements like buttons, fields, sheets, and dialogs. The goal is to build these in React, then generate them back into Figma and publish to the library. Today these are manually maintained, but this is where the code-driven approach will eventually land.&lt;/p&gt;

&lt;p&gt;In Figma, I use naming conventions to convey information to engineers. Components prefixed with an underscore (like &lt;code&gt;_icon&lt;/code&gt;) are private and won't be published directly to the shared library. Think of these like private classes. Components wrapped in angle brackets (like &lt;code&gt;&amp;lt;icon&amp;gt;&lt;/code&gt;) indicate React component boundaries, helping engineers understand where component splits likely make sense.&lt;/p&gt;

&lt;p&gt;The core file includes all 26 icon files as dependencies and has a base &lt;code&gt;_icon&lt;/code&gt; component with a swappable child symbol. An &lt;code&gt;&amp;lt;icon&amp;gt;&lt;/code&gt; component enhances this with sizing and optional badge decorations. The &lt;code&gt;_icon&lt;/code&gt; component maintains a &lt;strong&gt;preferred icons list&lt;/strong&gt;, filtering the full 4,000 icons down to 100-300 that the team actually needs. Common icons like close, menu, and search are universal, but there's about 20-60% variation between different apps.&lt;/p&gt;

&lt;p&gt;A small set of heavily-reduced icon variants (24 variants each instead of 504) are embedded directly in the core components file for things like checkboxes, radio buttons, and dropdown arrows. Keeping these embedded means the core file stays clean of dependencies except for the main &lt;code&gt;_icon&lt;/code&gt; preferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Product-specific components&lt;/strong&gt; (where partially code-generated components will live): Specialized components and configured variations of core components. The vision is for some to be generated from code, while others are designed in Figma then built in code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Product designs&lt;/strong&gt; (never code-generated): Actual screens, flows, and states designed using components from layers 1 and 2. This is where designers work day-to-day, combining existing components to design new features.&lt;/p&gt;

&lt;p&gt;There's also an &lt;strong&gt;implicit exploration layer&lt;/strong&gt; that doesn't fit neatly in this hierarchy: component concept files that consume core and product components to explore new component ideas. These inform the code development that eventually generates new components for layers 1 and 2.&lt;/p&gt;

&lt;p&gt;Designers working on product features only need to include the core components and product components in their files. The icon filtering is already done at the system level. Adding new icons to the preferred list requires a deliberate decision and a new publish from core components. Just enough friction to make teams think about it, and in larger organizations, forces collaboration between teams before icon proliferation gets out of hand.&lt;/p&gt;

&lt;p&gt;More on component structure patterns in the next post.&lt;/p&gt;




&lt;h2&gt;
  
  
  Taking Stock: What This Proves
&lt;/h2&gt;

&lt;p&gt;Building this icon library has proven several things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Programmatic Figma component generation works&lt;/strong&gt; at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git-based synchronization&lt;/strong&gt; can keep design assets current with external sources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata and delta updates&lt;/strong&gt; make large-scale regeneration practical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The same approach&lt;/strong&gt; that syncs with Google's repository can sync with a React codebase&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But this is just the foundation. These are raw Material Symbols, exactly as Google publishes them. They're the shared vocabulary layer, the common foundation that both design and engineering build upon.&lt;/p&gt;

&lt;p&gt;In the next post, I'll show how to actually use these icons in well-structured Figma components. We'll look at what makes a good base component using buttons as an example, and how consistent practices in Figma can convey rich information to engineers. &lt;/p&gt;

&lt;p&gt;We'll also run into some of Figma's frustrating limitations, which is part of why generating components from code starts looking increasingly appealing.&lt;/p&gt;

&lt;p&gt;I'm still refining this approach. The icon foundation is working well in production. The component generation from React code is next. The full workflow hasn't been battle-tested at scale.&lt;/p&gt;

&lt;p&gt;But I think there's something here. A way to build design systems that aren't two separate systems hoping to stay in sync, but one system with multiple interfaces.&lt;/p&gt;

&lt;p&gt;More to come.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next in this series: Building effective base components in Figma—what makes a good button, and why some of Figma's limitations make code-driven generation increasingly appealing&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The plugin and source code are available at &lt;a href="https://github.com/joshjhall/google-symbols-figma-plugin" rel="noopener noreferrer"&gt;github.com/joshjhall/google-symbols-figma-plugin&lt;/a&gt;. Have thoughts on this approach? Find me on &lt;a href="https://x.com/joshjhall" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt; or open an issue on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>design</category>
      <category>uidesign</category>
      <category>ux</category>
      <category>uxdesign</category>
    </item>
    <item>
      <title>Generative UI Is Three Things. Only One Ships.</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:15:00 +0000</pubDate>
      <link>https://dev.to/joshjhall/generative-ui-is-three-things-only-one-ships-n66</link>
      <guid>https://dev.to/joshjhall/generative-ui-is-three-things-only-one-ships-n66</guid>
      <description>&lt;p&gt;Google shipped a &lt;a href="https://research.google/blog/generative-ui-a-rich-custom-visual-interactive-user-experience-for-any-prompt/" rel="noopener noreferrer"&gt;generative UI experiment&lt;/a&gt; in November 2025 that matched a human-designed interface in roughly half the cases it was tested against. The other half it produced an artifact the way an unsteamed dumpling is food — recognizably the right shape, not actually edible. The system could take a minute or more to render a single page and produced different output on every refresh. Most of the takes I read framed the gap as a "more compute will fix it" problem. It won't.&lt;/p&gt;

&lt;p&gt;Google's own scoring is honest about it: their implementation &lt;a href="https://generativeui.github.io/" rel="noopener noreferrer"&gt;earned an ELO of 1736.2&lt;/a&gt;, a strong preference over every other output format, and lost only to human experts, whom it merely matched about half the time. That's an impressive result aimed at the wrong target.&lt;/p&gt;

&lt;p&gt;Notice which half gets the spotlight. "Comparable to a human expert in half the cases" is the generous read; the inverse is that the other half come out worse. And screens don't arrive one at a time, they chain into workflows. Probability compounds in the direction nobody's gut expects. At a coin-flip per screen, a four-step flow renders cleanly about one time in sixteen: roughly 6% of users get through without a visible failure, and the other &lt;strong&gt;94% hit at least one bad screen&lt;/strong&gt;. Stretch it to six steps and you're at 98%. The demo reads like a coin toss; the workflow reads like a near-certain stumble. People hear "50%" and picture a wash. It isn't — it's a near-guarantee, dressed up as a fair bet.&lt;/p&gt;

&lt;p&gt;"Generative UI" is one name stretched across at least three distinct technical approaches, and the conversation suffers for the conflation. One is the demo everyone admires and nobody can ship. One is a modest improvement on the first that still doesn't work. The third is tame enough that most people won't file it under generative UI at all, and it's the one that wins.&lt;/p&gt;

&lt;p&gt;One distinction up front, because the rest depends on it: generating code is not the same as generating interfaces. Tools like v0 and Claude Code generate UI, but the output is a static artifact, a box you build once and reuse. Generative UI in the sense that matters here is a runtime behavior, an interface that adapts to context and input as you use it. The difference is the difference between asking an AI to build you a box for your camping gear and having an AI watch you nearly drop an armful of cans, decide you need a backpack, sew one, and hand it over. The second is a marvel — and a parade of assumptions, each one a fresh chance to be wrong, which is roughly why we evolved language first. Being told what someone needs beats inferring it from watching them fumble.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Variants Wearing One Name
&lt;/h2&gt;

&lt;p&gt;The most ambitious variant generates the interface from scratch on every load. The model takes a prompt, writes HTML, CSS, and JavaScript, and the browser renders whatever comes back. It has the highest ceiling — anything could be in the UI — and it's the version Google demoed. It also produced usable interfaces at a coin-flip rate, took minutes per page, burned tokens by the fistful, and rendered differently every refresh. Make the model ten times bigger and faster and it's still stochastic, still expensive, still inconsistent. Compute doesn't change the shape of the problem.&lt;/p&gt;

&lt;p&gt;The middle variant constrains the model to a finite component library. Instead of writing layout code, the model emits a serialized payload, usually JSON, naming which existing components to render, with what props, in what arrangement. The client reads the payload and assembles the page. This is a genuine improvement: components stay consistent, accessibility doesn't regress on every refresh, and the model can't ship anything the design system forbids. It also has real momentum. Google open-sourced &lt;a href="https://developers.googleblog.com/introducing-a2ui-an-open-project-for-agent-driven-interfaces/" rel="noopener noreferrer"&gt;A2UI&lt;/a&gt; in December 2025 to standardize exactly this kind of agent-emitted interface, and a wave of commercial tools now lets non-technical staff assemble forms and reports inside a sanctioned component set. Boring, mostly. But boring is what most corporate software is, and the CIO gets at least some say over the standards — enough to be a little less likely to wake up at 3 AM already sweating before the phone finishes its first ring.&lt;/p&gt;

&lt;p&gt;The catch is that the middle variant still hands the model authority over the whole screen. Every button, every input, every layout call is the model's. The variance that made the full-code version unreliable just moves to the component-selection axis: refresh and you might get a different field order, a different empty state, a different primary action. That's the same usability problem with extra steps. Add competing, half-formed standards and the fact that non-designers still can't organize an information flow, and this variant stalls in the near term even where the plumbing works.&lt;/p&gt;

&lt;p&gt;The third variant is where products will actually land. Keep the core interface deterministic: designed by a human, shipped as fixed components, predictable across visits. Then carve out a bounded surface where dynamic widget selection happens. The design team builds a finite library of widgets (twenty, thirty, fifty) and a layer above them (an LLM, a plain rules engine, often both) decides which to surface in which context. The model isn't laying out the page. It's picking from a menu. That surface is often supplemental, but it doesn't have to be: it can be the primary one, a form that grows its own fields as it works out which data is still missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 80/20 of Music
&lt;/h2&gt;

&lt;p&gt;Effective interfaces follow the balance that effective music does. Most of a song is familiar: the beat you can clock, the progression you can predict, the structure you expect. A thin sliver is novel: the unexpected modulation, the dropped chorus, the turn you didn't see coming. The familiar parts are why you can listen. The novel parts are why you keep listening. Tilt the ratio too far either way and the song fails: too much repetition bores, too much novelty becomes unlistenable.&lt;/p&gt;

&lt;p&gt;Interfaces work the same way. Most of what a user touches needs to be predictable enough that they build muscle memory and stop seeing the chrome. The smaller novel portion — the smart default, the contextual helper, the just-in-time widget — is what makes the product feel intelligent. Invert the ratio and you get generative UI in the full-code sense: little is predictable, and the user relearns the interface on every visit. The model can make a good local call every single time and the experience still fails, because consistency is a property of the sequence of interactions, not any one of them.&lt;/p&gt;

&lt;p&gt;The third variant respects the ratio, but the ratio is about the whole product, not each screen. The deterministic core is the 80%. The dynamic surface is the 20%. Some individual screens will lean heavily dynamic, and that's fine; what matters is that the parts the user depends on stay put while the model gets to be clever in a bounded space where being wrong is recoverable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Travel App Test
&lt;/h2&gt;

&lt;p&gt;Travel booking is the example I keep returning to, because the assumptions are so visible. Kayak, Travelocity, the airline sites: they're all built around one traveler or one group, one origin, one destination, maybe a round trip with a couple of legs. That covers most trips. It also falls apart the moment a trip doesn't fit the mold, and the interfaces have no graceful way to bend.&lt;/p&gt;

&lt;p&gt;Picture a bachelorette party. Six friends converging on Vegas from four cities, all needing to land before 8 PM Friday for Penn &amp;amp; Teller tickets. Two have hard work deadlines pinning their return dates; the others have more flexibility. Today you solve this with six or more browser tabs, a spreadsheet, and a group chat full of screenshots. Nobody ships an interface for it: it's NP-shaped and rare enough not to be worth hand-designing.&lt;/p&gt;

&lt;p&gt;Here's what generative UI actually buys you, and it isn't what the demos suggest. You don't design an interface for the bachelorette case. You design an origin widget, a destination widget, a lodging widget, a ground-transport widget, each one independent, each with sensible defaults, and you give the model enough context to know when to flex them. The origin widget either grows a state that accepts up to ten cities, or the model drops a second origin widget on the canvas for parallel tracking. Either way, the origin widget is the origin widget. It neither knows nor cares whether the trip is one person round-tripping to Vegas or six people scattering home from three cities. Build enough flexibility into each component, hand the model good defaults and enough context to choose between them, and the permutations resolve themselves. You were never going to hand-design all of them anyway.&lt;/p&gt;

&lt;p&gt;The same machinery handles the far more common case, the business loop trip, A → B → C → D → A. The core interface shows the loop. A bounded panel runs the pigeonhole logic: which legs have a rental car, which don't, which hotels are confirmed, which are still open. Twenty-five holes to fill, sixteen filled, nine left, and the model's job is to rank the nine and surface the most important one next. The widgets that render them are already designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decompose the Permutation, Not the Interface
&lt;/h2&gt;

&lt;p&gt;I shipped a version of this at Reva. Credit and criminal screening reports come back in thousands of permutations, and we had to represent that range while telling both the applicant and the property manager, in plain language, what a given result actually meant, then drive different workflows, validation, and escalation off it. Designing a layout for every combination was never on the table.&lt;/p&gt;

&lt;p&gt;So I didn't. I decomposed the screening result into about thirty key metrics. I decomposed the interface into a dozen regions: headline, subheading, summary, a credit line item, and so on. Then each region into the five to twenty text variants that could fill it. The output was a clean, human-readable JSON payload of thirty-odd metrics that drove the i18n keys, and those keys covered every combination: clean credit with a criminal flag, thin file with nothing on it, all of it. I never solved the whole permutation at once. I solved small, independent slices, which is the only reason I could prove the result was mathematically complete. In the end it was a set of rules. Decompose the information problem cleanly enough and most of the apparent complexity was never really there — Shannon's fingerprints are on every logistics problem if you look for them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This De-Risks the Bad Days
&lt;/h2&gt;

&lt;p&gt;The third variant carries a quiet benefit the ambitious ones don't. When the model gets it wrong — and it will, routinely — the damage stays inside a surface the user can ignore. The core interface keeps working, defaults stay sane, and escape hatches are easy to design because the design team owns the chrome.&lt;/p&gt;

&lt;p&gt;In the full-code variant, a model mistake is a broken page. In the component-everywhere variant, it's a confusing layout the user has to decode. In the bounded variant, it's an oddly chosen widget the user can dismiss (or correct, which feeds a signal back to sharpen the next call) while their actual task runs on untouched. Same model, same error rate, wildly different blast radius.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Actually Build
&lt;/h2&gt;

&lt;p&gt;If I were building generative UI into a product today, the work is concrete. Build a finite widget library, twenty to fifty widgets covering the workflow components your application actually has. The trick is clean separation of concerns: if the origin widget only ever captures origin, it contributes exactly one slice to the information problem at the heart of nearly every logistics task. Define what data each widget reads, what states it can hold, what context makes it relevant. Then build the selection layer — sometimes an LLM, sometimes a rules engine, usually both — that picks widgets based on the live context: what the user is doing, what they've finished, what's still missing.&lt;/p&gt;

&lt;p&gt;The same widgets pay off twice, because they work in a chat context as readily as a panel. The user who wants a traditional UI gets widgets in a side panel; the user who wants to type gets the same widgets inline in the conversation. One library, one data contract, two surfaces, and the design investment carries across interaction modes instead of being rebuilt for each.&lt;/p&gt;

&lt;p&gt;The full-dynamic dream — an interface generated from scratch every time — ignores why interfaces work at all. Consistency isn't a symptom of stale design; it's the point. The right ratio is most of the song you recognize and a chorus that surprises you. Get it right and generative UI is genuinely useful. Get it wrong and you're shipping an unsteamed dumpling.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ux</category>
      <category>uxdesign</category>
      <category>ui</category>
    </item>
    <item>
      <title>Idempotent Design: When Order Shouldn't Matter</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Wed, 03 Jun 2026 15:00:00 +0000</pubDate>
      <link>https://dev.to/joshjhall/idempotent-design-when-order-shouldnt-matter-5gce</link>
      <guid>https://dev.to/joshjhall/idempotent-design-when-order-shouldnt-matter-5gce</guid>
      <description>&lt;p&gt;I pulled up to a Dairy Queen drive-thru last week and ordered a meal in the wrong order. The cashier couldn't take it. Not because the information was missing (I'd given her every piece she needed) but because I'd given them to her in a sequence the point-of-sale system couldn't absorb. She made me repeat elements three times to get the magic incantation right. I know it wasn't her fault.&lt;/p&gt;

&lt;p&gt;This bug lives in nearly every piece of software I've used. The interaction is conceptually order-independent (bold then italic, italic then bold, doesn't matter), but the system underneath models it as sequential, and the order leaks out to the user. That leak is what I want to talk about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Idempotent and Not Orthogonal
&lt;/h2&gt;

&lt;p&gt;In math, an idempotent operation produces the same result whether you apply it once or a thousand times. In engineering, idempotency usually means you can retry a request without changing the outcome. &lt;code&gt;mkdir foo&lt;/code&gt; fails the second time; &lt;code&gt;mkdir -p foo&lt;/code&gt; doesn't. The flag is what makes the operation idempotent.&lt;/p&gt;

&lt;p&gt;I'm borrowing the term a little loosely for design, because the closest engineering concept (orthogonality) already means something else in the design world. Orthogonal designs are two or more approaches to the same underlying need. A contact form and an LLM chat that collect the same fields are orthogonal: different surfaces, same outcome, and a team can ship either or both without one blocking the other. That's a useful framing, but it's not what I want to talk about here.&lt;/p&gt;

&lt;p&gt;The property I'm after is whether the order of independent actions affects the result. Bold then italic gives you the same component as italic then bold. Adding a leading icon and changing a label are independent operations; either sequence ought to produce the same output state.&lt;/p&gt;

&lt;p&gt;Many user-facing interactions ought to be idempotent by default. The cases where order genuinely matters (a wizard, a checkout, an irreversible commit) are the exceptions, and they should be designed as exceptions. Yet most software treats idempotent interactions as if they were sequential, because the data model underneath happens to be sequential and nobody pushed back.&lt;/p&gt;

&lt;p&gt;Idempotency in design isn't a strict math property. It's a promise to the user that the tool will meet them wherever they start, in whatever order they think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Drive-Thru Test
&lt;/h2&gt;

&lt;p&gt;The Dairy Queen story has a counterexample down the road. I can pull up to a McDonald's drive-thru and say "number one, large, with a Diet Coke." I can also say "Diet Coke, large, number one." Or "a Big Mac and a Coke, make it a meal, make the Coke diet, make the meal large." All three describe the same order, and McDonald's point-of-sale absorbs them. Someone on that team thought through what the user actually says and built the system to handle any phrasing.&lt;/p&gt;

&lt;p&gt;Dairy Queen (and most typical configurations from Micros, Aloha, and the rest) has not. Try to upgrade the drink before the meal is built and the system gets confused. The cashier compensates by holding the order in memory, reordering it for the system, then entering it, sometimes mid-sentence while the customer moves on to the next item. Fast-food locations struggle to find people who can handle the drive-thru specifically because of that cognitive overhead.&lt;/p&gt;

&lt;p&gt;Think of it as a pigeonhole problem. The order has a fixed set of slots (item, size, modifications, drink), and a sufficiently good interface routes incoming information into the right slot regardless of arrival order. The hard work is on the team building the system, not the user placing the order. Most teams skip that hard work, and the friction lands on the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Tools Fall Apart
&lt;/h2&gt;

&lt;p&gt;Design tools are full of this. Figma is the worst offender I work in daily: nested components, instance overrides at every layer, and an override-resolution algorithm where the order you make changes determines whether your other changes survive. Swap a child icon before you change the parent's state and the state change silently fails to propagate. Reverse the order and it works. I'll dig into the specific cases in a future post; the point here is the philosophical one. The tool's behavior is consistent. It just isn't idempotent.&lt;/p&gt;

&lt;p&gt;Designers who use Figma long enough split into two groups. One learns the order-dependence as muscle memory and develops a private set of workarounds: set the color last, don't touch the size after you've set the variant, don't swap the nested instance until everything else is locked in. The other stays constantly frustrated and never quite figures out why their components keep breaking. Some of the failure modes are nuanced enough that even experienced designers can't articulate the cause; they just clean up after the tool and chalk it up to Figma being Figma. None of the workarounds are documented anywhere, because the tool can't carry the rule itself. It's tribal knowledge passed designer-to-designer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Constraints vs Shadow Constraints
&lt;/h2&gt;

&lt;p&gt;Not every interaction should be idempotent. Some flows are inherently sequential. You can't ship a package before you've packed it. You can't deploy code before it builds. You can't charge a card before you have the cart total. Pretending those constraints away doesn't make the design better — it makes it unworkable.&lt;/p&gt;

&lt;p&gt;The test isn't whether the order matters; it's whether the ordering is a real constraint or a shadow constraint. Real constraints come from physics, business rules the user actually expects, or genuinely irreversible operations. Shadow constraints come from engineering and design decisions: simplifying assumptions, specific UI choices, poor data modeling, shortcuts to ship a feature seven years ago that now require a major refactor, or legacy assumptions carried in from third-party integrations.&lt;/p&gt;

&lt;p&gt;The first-name/last-name field is one of the most common shadow constraints in software. At Reva, I pushed the team hard to use full name plus preferred name instead — better UX, better internationalization, what WCAG actually recommends. But every credit and criminal screening service we integrated with required first and last as separate fields. So we built a shim to split a full name on submission, plus a manual override flow for the rare case the split was wrong. It was right about 999 times out of 1,000, which still meant a meaningful number of applicants needed the override. All of that extra design and engineering work existed because of a third-party data model nobody on our team owned. The constraint wasn't real for our users; it was inherited from upstream systems we couldn't change.&lt;/p&gt;

&lt;p&gt;Figma's idempotency problems are the same shape, but deeper in the stack. Most of the failure modes are almost certainly direct outcomes of engineering decisions in the override-resolution data model: simplifying assumptions from earlier versions, shortcuts that became load-bearing as the product grew. At this point, some are probably easier to fix with a partial rewrite than a patch. That's the long-term cost of treating idempotency as a series of one-off bugs instead of a property of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Catch It Early or Pay Later
&lt;/h2&gt;

&lt;p&gt;Idempotency, or whatever you want to call this property, almost never makes it onto a design system roadmap. It shows up as individual bugs ("color resets when size changes") that get triaged one at a time and never get connected to the property they share. The property is: &lt;strong&gt;the order of independent actions affects the result&lt;/strong&gt;. Once that's named as a class of problem, the triage changes. Instead of fixing the color-reset bug in isolation, the team asks which other attribute pairs are accidentally coupled.&lt;/p&gt;

&lt;p&gt;The catch is that this work pays the biggest dividends when it's done early. Once the data model and the engineering constraints set, fixing them gets exponentially more expensive. Start with a shopping cart table that assumes one account per cart and you've quietly locked out roommates sharing groceries; the DB constraints, the auth checks, and the order-history queries all assume that shape now. DoorDash took years to make office group lunch ordering feel like anything other than a phone passed around a conference room, because the original model didn't anticipate it. Yardi still makes you create a new account with a different email address every time you apply to a property managed by a company you've already rented from. The unique constraint on email is scoped to a single property — a database decision pretending to be a product feature.&lt;/p&gt;

&lt;p&gt;The interactions you ship today carry the constraints of the model you commit to. If you're designing a system right now, list the actions that compose into a single state and ask which are genuinely ordered and which only look that way because of how something underneath was modeled. If possible, fix the underlying models before they ship. It's easier to create a many-to-many relationship today &lt;em&gt;that is not used in the UI&lt;/em&gt;, than to change that data relationship in the future. Even if you can't make and fulfill the promise to the user right now, leaving the door open gives you options.&lt;/p&gt;

&lt;p&gt;Ask how you can apply idempotent design early and often. Make the promise to users in advance instead of begging forgiveness when it feels complicated later.&lt;/p&gt;

</description>
      <category>ux</category>
      <category>uxdesign</category>
      <category>design</category>
    </item>
    <item>
      <title>Your Designers Aren't the Problem</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Tue, 02 Jun 2026 13:30:00 +0000</pubDate>
      <link>https://dev.to/joshjhall/your-designers-arent-the-problem-15jp</link>
      <guid>https://dev.to/joshjhall/your-designers-arent-the-problem-15jp</guid>
      <description>&lt;p&gt;A team I cleaned up after had seventeen dropdown components. Not seventeen variants of one dropdown — seventeen separate components, each with its own states, its own spacing, its own quirks. The audit took longer than I expected, mostly because the team kept finding more. "Wait, this one's in a different file. So is this one."&lt;/p&gt;

&lt;p&gt;Nobody set out to build seventeen. The first designer had a perfectly good reason — the spacing on the original was wrong for their case. The second needed a different icon slot. The third was working in a feature area where the existing dropdown's hover state didn't match the surrounding components. There was no design leadership conversation to say "let's fix the original instead." There was no shared review process to even notice a new component had been added. So another one went in, and another, and another, and the developers shipping them were doing exactly what they'd been asked to do.&lt;/p&gt;

&lt;p&gt;The worst part isn't the count. The worst part is that every one of those dropdowns failed something. Most failed basic accessibility — no keyboard navigation, no screen reader labels, no focus management. Several were unusable on touch interfaces. A few looked fine in light mode and fell apart in dark. Building a dropdown that's accessible, reliable across browsers, responsive across input modes, and theme-correct in any context is genuine work. The team had done that work badly seventeen times instead of doing it well once.&lt;/p&gt;

&lt;p&gt;The maintenance cost is the visible cost. The hidden cost is users silently hitting broken interfaces. A user whose screen reader trips on the wrong dropdown doesn't file a bug — they leave. A user whose touch device can't open the right menu doesn't email support — they close the tab. The savvier user, who recognizes the pattern, gets angrier — these are exactly the kind of amateurish mistakes a serious team isn't supposed to make. Seventeen separate components is a maintenance story. Seventeen separately broken components is a user story, and it's the one that should keep the team up at night.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a Design System Fractures
&lt;/h2&gt;

&lt;p&gt;The story always rhymes. A team ships a design system with components that solve the cases they imagined when they built it. Six months later somebody hits an edge case the team didn't imagine — slightly different padding, an extra slot, a state that wasn't covered — and they have a choice. Extend the existing component, or duplicate it and modify the copy.&lt;/p&gt;

&lt;p&gt;Extension feels hard. It requires understanding how the existing component is structured, what its variants mean, which other teams depend on it, and how the change will land in every consumer file. It requires either a conversation with whoever owns the component, or enough confidence to make the call alone. It requires time the designer doesn't have, because they're trying to ship a feature, not rebuild infrastructure.&lt;/p&gt;

&lt;p&gt;Duplication feels easy. Right-click, detach instance, rename, modify. Done in three minutes. The new component lives in the same file as the feature work. Nobody is asking permission. The original is untouched.&lt;/p&gt;

&lt;p&gt;The local calculation looks obvious. The problem is that the team is climbing the wrong hill. The top of the hill is shipping this feature this sprint. The mountain is a design system that compounds — where every accessibility fix lands once and stays fixed, where every browser quirk gets named and never rediscovered, where the next designer two years from now stands on the work the last one already did. Duplication wins the hill. Extension is the only path up the mountain.&lt;/p&gt;

&lt;p&gt;The seventeen-dropdown count is what happens when a team optimizes for the hill every time, for ten years, with nobody ever pulling them back to point at the mountain. Newton's "shoulders of giants" is the global-maximum version of this work. The local-max version is fifty people relearning the same browser bug, individually, in private, with nothing surviving any of those discoveries to help the fifty-first.&lt;/p&gt;

&lt;p&gt;Multiply that decision by a hundred features across ten years and you get seventeen dropdowns. The same pattern produces five button variants, eight card layouts, four versions of the same modal. The fracture isn't visible until somebody pulls back and counts. And it outlives the people who made it — the escape key in Photoshop cancels a text edit, the escape key in Illustrator commits it, and Adobe acquired Illustrator something like thirty-five years ago. Design and tech debt are the digital equivalent of an invasive species. Once they're rooted, they're nearly impossible to extract without a coordinated effort somebody almost never wants to fund.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Misdiagnosis
&lt;/h2&gt;

&lt;p&gt;The first reaction I see when a team confronts the count is to blame the designers. "We need better discipline. We need a stricter review process. We need to enforce the system." This produces a brief flurry of consolidation, followed by another round of fracture starting six months later.&lt;/p&gt;

&lt;p&gt;The designers aren't the bug. People follow the curb cuts of the system they're working within. If duplication is the path of least resistance, people will duplicate — not because they're undisciplined, but because they're behaving rationally inside the constraints they were given. The system is what gave them those constraints, and the system is what has to change.&lt;/p&gt;

&lt;p&gt;This is hard to see from the inside. Joseph Heller spent five hundred pages on the point in Catch-22 — only Yossarian can see how insane the "sane" rules are, because everyone else has spent so long inside the rules that they've stopped noticing them. Most design teams are full of Yossarians who haven't realized they're Yossarians yet. The dropdowns are a symptom, not the disease.&lt;/p&gt;

&lt;p&gt;The disease has multiple roots, and component infrastructure is only one of them. Components that don't extend gracefully are part of it — the original button wasn't built to accept a new state, so the designer forked rather than fight it. But it's also incentives: in most orgs, shipping a feature is rewarded and refactoring an existing component is invisible work, so the rational employee ships the feature and never touches the component. It's also discoverability: a designer may not know the existing dropdown is there, because nothing in the system surfaces it at the moment they're reaching for one. It's also engineering politics: the engineer who would have to refactor the underlying code knows it's been bad for years, can't get the time from their manager to fix it properly, and would rather not be the one to crack the lid on it.&lt;/p&gt;

&lt;p&gt;Each of those is a deeper post — incentives, discoverability, refactor politics. The structural fix and the people-and-process fix have to happen together. Audits without structural change produce relapse. Structural change without addressing why people picked duplication in the first place produces beautiful unused components.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Component Structure Has to Carry
&lt;/h2&gt;

&lt;p&gt;A well-structured component does more than render correctly. It communicates how it's supposed to be used, where it's supposed to be extended, and what it's supposed to forbid — and it communicates &lt;em&gt;why&lt;/em&gt; each of those is true. The why matters more than the what. A designer who knows the dropdown forbids inline icon overrides because the alignment math falls apart on right-to-left languages will respect the constraint and propose a real solution. A designer who only knows that overrides are forbidden will route around the rule the first time they hit a deadline.&lt;/p&gt;

&lt;p&gt;This is a much higher bar than "the component renders." It means the component has to expose intent — touch target sizes, state layer behavior, slot semantics, padding rules — in ways the designer can read at a glance and the engineer can implement from without ambiguity. The four-layer frame architecture that does some of this is a later post. And the deeper requirement, that the &lt;em&gt;why&lt;/em&gt; of a decision needs to live somewhere durable and attached to the component itself, is what I've been calling connective documentation — also a later post. The short version: most orgs have their design decisions trapped in three people's heads and a thousand stale documents, and AI alone isn't going to fix that without a real structure to build on.&lt;/p&gt;

&lt;p&gt;The component also has to live somewhere appropriate to its scope. A product-specific card pattern doesn't belong in the same file as the core button. A one-off layout for a specific feature doesn't belong in the shared library at all. Without scope discipline, every component drifts toward the same file, and the file becomes unusable.&lt;/p&gt;

&lt;p&gt;Clear separation of concerns is the most underrated discipline in design system work. Nail the separation and most of the headaches above disappear. The frameworks for finding good separation are coming from a few interesting places lately — actor-network theory and object-oriented ontology both have something useful to say about where the meaningful boundaries between concerns actually live, and I think both are going to matter for how we structure design systems and the world models that AI systems are starting to need. Another post for another day.&lt;/p&gt;

&lt;p&gt;And scope decisions — "should the button include a loading state, a dropdown trigger, a badge?" — need to be made on something better than the gut feel of whoever shows up to the meeting first. There's a scoring system I use for that. It's the next post.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Series
&lt;/h2&gt;

&lt;p&gt;This is the first post in a short series about preventing design system fracture. The next three go into the specific frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CFS — Commonality, Frequency, Severity.&lt;/strong&gt; A three-axis scoring system for deciding what belongs in a component and what doesn't. One-to-five score on each axis, multiplied (not added) for a logarithmic total out of 125, and a conversation that has structure instead of vibes. It's the single tool that has prevented more variant proliferation on the teams I've worked with than any other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The three-layer architecture.&lt;/strong&gt; Core, application, and context layers — what goes where, how components migrate between layers as they mature, and why most teams end up dumping everything into one tier until the tier collapses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decomposition without forking.&lt;/strong&gt; Sub-component patterns, the underscore-versus-angle-bracket naming convention that makes component boundaries readable, and the simple test for when to decompose and when to build inline.&lt;/p&gt;

&lt;p&gt;After the frameworks, we build the actual button — the four-layer frame architecture, the icon-system composition, and the points where Figma's limitations push back on what the system wants to express.&lt;/p&gt;

&lt;p&gt;The seventeen-dropdown problem isn't a discipline problem. It's a structure problem. Structure is fixable.&lt;/p&gt;

</description>
      <category>design</category>
      <category>uxdesign</category>
      <category>uidesign</category>
      <category>designsystem</category>
    </item>
    <item>
      <title>Your Coding Agent Doesn't Need Your Secrets</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Mon, 01 Jun 2026 19:21:20 +0000</pubDate>
      <link>https://dev.to/joshjhall/your-coding-agent-doesnt-need-your-secrets-1jfi</link>
      <guid>https://dev.to/joshjhall/your-coding-agent-doesnt-need-your-secrets-1jfi</guid>
      <description>&lt;p&gt;Every coding agent I use can read my &lt;code&gt;.env&lt;/code&gt; file. Every one of them is a single prompt away from streaming its contents to a server I don't control. The fix has been obvious since the day Claude Code launched — redact on the way out, rehydrate on the way in — and a year later, no major vendor has built it into the client where it belongs.&lt;/p&gt;

&lt;p&gt;Third-party proxies that do a version of this exist. The agents themselves don't ship it. That gap is the whole story: the one place this protection makes sense is the one place it's missing.&lt;/p&gt;

&lt;p&gt;Start from an uncomfortable assumption — anything you send to an inference endpoint has zero security. Every major provider would object, and on paper they'd have a case. But two years of provider-side leaks have convinced me it's the safer bet to assume the worst. If you wouldn't paste a value into a public Slack channel, you shouldn't hand it to a remote model either. That covers two overlapping buckets: secrets like API tokens, private keys, and passwords; and personal data like names, phone numbers, and the occasional social security number that's somehow both at once. A proxy sitting in the middle can scan for all of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shape of the Fix
&lt;/h2&gt;

&lt;p&gt;A local proxy sits between the agent and the inference endpoint. On the outbound side it scans the payload for anything that looks sensitive and replaces each match with a deterministic placeholder — &lt;code&gt;[[REDACTED_PHONE:8f3a]]&lt;/code&gt;, &lt;code&gt;[[REDACTED_TOKEN:3f2a]]&lt;/code&gt;, one per distinct value. Alongside the redacted payload it appends a short instruction telling the model what those placeholders are: opaque strings the client holds privately, to be treated as inscrutable identifiers and reproduced verbatim if the response needs them.&lt;/p&gt;

&lt;p&gt;The redacted prompt goes up. The model does its work having never seen the real value. On the way back, the proxy runs the response through a string replace — every placeholder swapped for its original. The user sees a normal answer. The model saw nonsense tokens. The secret never left the machine.&lt;/p&gt;

&lt;p&gt;The prompt-injection step is the part people skip, and it's the part that makes the whole thing work. As long as the model treats &lt;code&gt;[[REDACTED_PHONE:8f3a]]&lt;/code&gt; the way it would treat a phone number — and returns the literal string unchanged, same hash on the end — rehydration is a trivial lookup. A single placeholder format can stand in for an unbounded number of distinct values within one session.&lt;/p&gt;

&lt;p&gt;If you controlled the stack, you wouldn't need the injection at all. A vendor could carry that mapping in a structured side-channel — a JSON attachment, a field the model is trained to respect — and bake pass-through behavior far below the prompt layer. As an outsider with no access to the stack, prompt injection is the lever I have, and it's good enough to prove the idea works. It is not the version that should ship.&lt;/p&gt;

&lt;p&gt;The placeholder-to-value map lives encrypted in memory. Pick a delimiter unlikely to appear in real code and collisions on the rehydration pass round to zero. I'm overstating the simplicity, slightly — there are edge cases. Secrets the model legitimately needs to modify (rare). Secrets that span multiple chunks of streaming output (annoying). False positives on the redaction pass (manageable). None of these are research problems. They're engineering work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Just Use Presidio" Isn't Enough
&lt;/h2&gt;

&lt;p&gt;Microsoft's &lt;a href="https://microsoft.github.io/presidio/" rel="noopener noreferrer"&gt;Presidio&lt;/a&gt; identifies and redacts PII well, including a fair number of international and borderline-uncommon formats. Yelp's &lt;a href="https://github.com/Yelp/detect-secrets" rel="noopener noreferrer"&gt;detect-secrets&lt;/a&gt; is the other obvious building block — but it's a detector, not a redactor. It finds credentials so a pre-commit baseline can block them; it doesn't rewrite anything on the wire. Wire either into a proxy like LiteLLM or Bifrost and you get detection plus outbound redaction.&lt;/p&gt;

&lt;p&gt;Two things you still don't get. The first is a clean rehydration path. Presidio technically has a reversible mode, but it emits an AES-encrypted blob rather than a readable placeholder, and reversing it is a separate decrypt pass you have to orchestrate yourself. The mapping that lets you put the original value back was never designed to survive a round trip through a language model — and detect-secrets, being detection-only, offers nothing here at all.&lt;/p&gt;

&lt;p&gt;The second, and the one nobody mentions, is the prompt-injection layer that tells the model what the placeholder &lt;em&gt;is&lt;/em&gt;. Without it, the model treats &lt;code&gt;[REDACTED_TOKEN]&lt;/code&gt; as junk or as a fill-in-the-blank exercise. You get "I'm not sure what &lt;code&gt;REDACTED_TOKEN&lt;/code&gt; refers to" instead of clean pass-through. The model has to be told, explicitly, that this is a placeholder and it should leave it alone. Presidio won't do that for you. Neither will detect-secrets.&lt;/p&gt;

&lt;p&gt;LiteLLM and Bifrost will both let you script all of this by hand, if you're willing to write the integration. Most developers won't — and more to the point, most developers don't run a local inference proxy at all. Standing one up is a pain, and keeping it working is worse: every few weeks the upstream APIs shift and I'm back tweaking shims to hold the seam together. I don't mind that. The person dipping into Claude Code through something like Cowork has neither the time nor the inclination. A protection only advanced users can stand up isn't a protection. That's the honest limit of my own project, too: a proxy bolted onto someone else's stack will always have seams. The durable version lives inside the tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Belongs Inside the Coding Agent
&lt;/h2&gt;

&lt;p&gt;Coding agents are where the problem is structural, not incidental. The entire purpose of a &lt;code&gt;.env&lt;/code&gt; file is to hold values the application needs and the developer should never paste anywhere. An agent that reads project files reads &lt;code&gt;.env&lt;/code&gt;. An agent that writes new code references what's in it. The agent's job and the secret's purpose are in tension by design.&lt;/p&gt;

&lt;p&gt;The sandboxing has genuinely improved. Auto mode and tighter default permissions mean Claude Code goes off-script far less than it did nine months ago. But those are shims on the symptom. No amount of prompt engineering — however clever — can guarantee a secret is never read and sent, and at the volume of inference requests a working developer generates in a day, a one-percent slip stops being a tail risk and becomes a near-certainty that eventually visits everyone. Redact-rehydrate is the first thing in this space that's a fix rather than a fence.&lt;/p&gt;

&lt;p&gt;Every major vendor knows the gap is there. The reasons it's still open are the usual ones: it's fiddly, it's lower priority than the next demo, users haven't complained loudly enough yet. Those reasons are real, but none of them are good — and this is the rare case where the hard part is deciding to do it, not doing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Differentiation Nobody Has Claimed
&lt;/h2&gt;

&lt;p&gt;A vendor who shipped this could say something none of its competitors can: &lt;em&gt;we are actively engineering so that we never see your secrets or your personal data — securing what does reach us, and building the tooling so most of it never arrives in the first place.&lt;/em&gt; Back that with a code path anyone can audit and it's a real claim, not the "we take your privacy seriously" wallpaper that every settings page already carries.&lt;/p&gt;

&lt;p&gt;This sits squarely in Anthropic's lane. Their differentiation from OpenAI, Google, and the rest has always been trust. Whatever you make of any single company, more people right now hand their data to Anthropic with less hesitation than they would to Google or Meta — and compounding that reputation with a feature you can actually verify is about as close to a no-brainer as product strategy gets. Trust you can read in the source beats trust you have to take on faith.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Building
&lt;/h2&gt;

&lt;p&gt;On evenings and weekends I've been reimplementing the parts of Presidio I care about — entity detection, structured placeholder generation, deterministic replacement, ergonomic encryption — as a Rust library called &lt;a href="https://github.com/joshjhall/octarine" rel="noopener noreferrer"&gt;octarine&lt;/a&gt;, wrapped in a local proxy that runs the redact-inject-rehydrate round trip transparently in front of Anthropic, OpenAI, or anything else speaking the same API. It's open source. You can read every line.&lt;/p&gt;

&lt;p&gt;It's sizable — the Rust alone runs to roughly 200,000 lines, close to half of it tests, with some nine thousand of them running on every change — and it's also the first substantial thing I've built in Rust, roughly 95% written by the agent with me reviewing as time allowed. I don't offer that as a humblebrag or a confession. I offer it because it's the thesis in miniature. I know where the bodies are buried in a redaction pipeline — the architecture, the validation, the failure modes — and the agent handles the syntax I'd otherwise be looking up. This is attempt three: the first died in Python, the second on a bad Rust architecture, and the third survived because I pulled the core logic into a clean library and let well-worn patterns carry the structure. Along the way I built a handful of my own agents and skills to catch tech debt before it set.&lt;/p&gt;

&lt;p&gt;In my own testing it works. The model handles placeholder tokens as opaque strings without further prodding, the round trip is fast enough not to notice, and the false-positive rate drops to tolerable once the patterns are tuned. I'm under no illusion that it's &lt;em&gt;the&lt;/em&gt; answer. It's an experiment — useful to advanced users, a good way to learn what Rust can do, and a standing existence proof that the hard parts are tractable. The real answer, when it comes, gets built by someone who controls the whole stack and can do it better than any proxy ever will.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Anthropic Should Do
&lt;/h2&gt;

&lt;p&gt;Of the handful of companies positioned to ship this, Anthropic is the one I'd bet on — not because it's a small team (it isn't), but because it has the culture and the wherewithal to treat this as a priority. They've had a year. That I can install Claude Code this afternoon and, with one reasonable-sounding request, stream my &lt;code&gt;.env&lt;/code&gt; to a remote endpoint is a gap I'd like to see closed — and one I'd happily help close.&lt;/p&gt;

&lt;p&gt;It is not a huge lift. A team with their resources could do everything I've done, and more, and better, in a couple of weeks. I'm working around a token budget, a Rust learning curve, and whatever evenings happen to be free; they have none of those constraints. Until someone ships it in the client, anyone building tooling around coding agents should treat secrets-on-the-wire as a first-class concern, not an afterthought. The fix is small. The cost of leaving it unbuilt isn't.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
    <item>
      <title>Working with Lennie: The Reality of AI Code Supervision</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Wed, 09 Jul 2025 17:31:36 +0000</pubDate>
      <link>https://dev.to/joshjhall/working-with-lennie-the-reality-of-ai-code-supervision-gh1</link>
      <guid>https://dev.to/joshjhall/working-with-lennie-the-reality-of-ai-code-supervision-gh1</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/joshjhall/the-convergence-opportunity-how-ai-coding-agents-are-reshaping-product-roles-278c"&gt;Last week, I posed a hypothesis&lt;/a&gt;: AI coding agents might enable experienced product managers and designers to operate across the traditional business/product/engineering divide. After generating 30,000 lines of code, I'm convinced the opportunity is real—but only for those willing to master a completely new kind of supervision.&lt;/p&gt;

&lt;p&gt;To test this practically, I started building something I've wanted for seven years: a production-grade design system. Since Style Dictionary launched, I've been fascinated by systematic design token management, but the economics never worked. Building a proper design system takes 3-5 developers working 6-12 months—easily $300,000+ in startup resources for comprehensive testing, documentation, and Storybook examples.&lt;/p&gt;

&lt;p&gt;My hypothesis: I can produce something sufficiently comparable in two months using agentic code generation.&lt;/p&gt;

&lt;p&gt;After just a few days of focused work, &lt;a href="https://github.com/terroir-ds/core" rel="noopener noreferrer"&gt;the early results&lt;/a&gt; show genuine promise. The journey also revealed exactly why this approach demands serious technical knowledge to succeed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Logger Saga: Learning to Manage Lennie
&lt;/h2&gt;

&lt;p&gt;Nothing illustrates the George-and-Lennie dynamic better than my week-long battle to implement enterprise-grade logging. I know this sounds absurdly overengineered for a design system—&lt;code&gt;console.log()&lt;/code&gt; would probably work fine—but part of my experiment involves pushing AI supervision to production-quality limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: The Enthusiastic Amateur
&lt;/h3&gt;

&lt;p&gt;When a method naturally required logging, Lennie dutifully added a few &lt;code&gt;console.log()&lt;/code&gt; calls. I asked a simple question: "Should we implement a shared Logger that can be reused across the codebase?"&lt;/p&gt;

&lt;p&gt;Lennie correctly identified this as good architecture, then immediately started building a custom logger from scratch. I let it run for several minutes, curious to see what it would produce. The result wasn't terrible—exactly what you'd expect from someone who possesses tremendous strength but has never heard of third-party packages.&lt;/p&gt;

&lt;p&gt;Lennie sees a problem and applies raw strength, even when finesse would work better. Rather than researching existing solutions, it enthusiastically reinvents wheels with the same eager intensity it brings to every task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: George Steps In
&lt;/h3&gt;

&lt;p&gt;I stopped the custom implementation and redirected: "Research best practices we could apply to decompose the logger and make it robust enough for a production enterprise environment."&lt;/p&gt;

&lt;p&gt;Once I reminded Lennie to actually research first, it recommended Pino as the foundation—exactly matching my own investigation. But I had to explicitly tell it to stop and think. Left to its own devices, Lennie would have continued building a mediocre custom solution indefinitely, convinced it was helping.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Ma'am, specificity is the soul of all good communication." — Middleman&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This redirect became my primary management technique throughout the project. Lennie possesses tremendous implementation power, but George must provide constant course correction to produce professional-quality results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: When Lennie Gets Excited
&lt;/h3&gt;

&lt;p&gt;After establishing the Pino foundation, I asked which components we should extract for reuse across the application. Lennie suggested refactoring async and error handling processes—sensible architectural thinking.&lt;/p&gt;

&lt;p&gt;But then Lennie got excited about the possibilities. What started as simple extraction evolved into 6-7 major async components, another 8 helper utilities, dozens of tests (unit, integration, and performance), and comprehensive documentation.&lt;/p&gt;

&lt;p&gt;Like hearing about the rabbits, Lennie fixated on making everything perfect. It can't remember that we're building a design system, not an async processing framework. Every task becomes an opportunity for enthusiastic over-engineering unless George carefully constrains the scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: When Lennie Forgets Everything
&lt;/h3&gt;

&lt;p&gt;The testing phase revealed Lennie's most frustrating behaviors. Despite my constant supervision, Lennie struggled with three recurring patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lennie's Memory Loop&lt;/strong&gt; When mocking Pino for unit tests, Lennie would attempt approach A, fail, try approach B, fail, attempt approach C, hit the context window limit, and then—with fresh "memory"—confidently suggest approach A again. I watched this cycle repeat three times before realizing Lennie had forgotten we'd already proven that approach wouldn't work. The attempts were almost exactly the size of the context window, creating a perfect amnesia loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lennie's Reality Distortion&lt;/strong&gt; Lennie confidently declared our async utilities should handle a million calls in 100ms. When I mentioned my local development container might not match this heroic expectation, Lennie seemed genuinely surprised—as if it had forgotten we were writing code for actual hardware rather than theoretical perfection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lennie's Console Explosion&lt;/strong&gt; During stress testing, Lennie helpfully logged every single operation to the console—all 100,000 of them. Watching test results scroll like the Matrix was oddly mesmerizing, but completely useless for debugging. Lennie couldn't understand why this might be a problem until I explicitly guided it toward proper test logging patterns.&lt;/p&gt;

&lt;p&gt;Most frustrating: when tests became complex, Lennie's default response was to give up entirely. "Let's simplify this test" inevitably meant "let's crush this bunny." I spent significant time redirecting: "No, Lennie, we need to fix the test appropriately, not disable it."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Numbers Tell the Story
&lt;/h3&gt;

&lt;p&gt;The logger saga provides concrete data about what this supervision approach actually delivers. My implementation took roughly 5 hours total—2 hours for basic functionality before I developed the spiral process, then 3 additional hours refining through multiple phases. I'm still not entirely satisfied with some decomposition decisions.&lt;/p&gt;

&lt;p&gt;Building the same logger manually would have taken me 2-3 weeks, primarily because I'm not comfortable in TypeScript and would need extensive research. A mid-to-senior TypeScript developer might estimate the logger at 1-2 story points and the parallel async utilities at 3-5 points—easily 1-2 calendar weeks once tests and documentation are included.&lt;/p&gt;

&lt;p&gt;The productivity gain isn't just about speed. In those few development days, I generated roughly 1,200 tests across the entire design system—mostly unit tests with integration and performance coverage. This represents better test coverage than any code I wrote 15+ years ago when I focused more on programming.&lt;/p&gt;

&lt;p&gt;Writing unit tests became almost enjoyable with Lennie handling implementation. I personally hate the tedium of test creation, but with Lennie managing the mechanical work, I could focus on test strategy while listening to podcasts. Decent coverage for a module (20-60 tests) takes 30-60 minutes with minimal cognitive load.&lt;/p&gt;

&lt;p&gt;Lennie also surprised me with sophisticated architectural suggestions I wouldn't have considered without significant research time. The SSH hardening approaches it implemented demonstrate security expertise I lack in bash such as defense-in-depth strategies, input validation techniques, and error handling patterns that prevent information leakage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Context Window Reality: George's Endless Patience
&lt;/h2&gt;

&lt;p&gt;These experiences highlighted my biggest frustration with supervising Lennie: even with 200k token context windows, serious development burns through available memory in 45 minutes. Working across dozens of files—essential for comprehensive features like logging and testing—means constantly re-explaining everything to Lennie.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"What if there is no tomorrow? There wasn't one today." — Phil Connors, Groundhog Day&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every context window collapse feels exactly like this moment. Lennie forgets our architectural patterns, coding standards, and project goals. It re-reads recently modified files during compaction, but can't maintain the strategic context that guides good engineering decisions.&lt;/p&gt;

&lt;p&gt;George must patiently re-explain the plan every hour, remind Lennie what we're building and why, and redirect its enthusiastic energy toward the right problems. This felt like using hand tools for precision carpentry—effective, but exhausting when you want a factory.&lt;/p&gt;

&lt;h3&gt;
  
  
  George's Task File Innovation
&lt;/h3&gt;

&lt;p&gt;To maintain continuity across Lennie's memory resets, I developed a living task file that defines current objectives, tracks progress, and documents failed approaches. My prompts now instruct Lennie to update this file frequently with findings, todos, and progress notes.&lt;/p&gt;

&lt;p&gt;After context window resets, Lennie can reconstruct 90% of necessary context from this single document. Combined with refined initial prompts and extensive use of the claude.md file, this approach qualitatively improved reliability and reduced the constant need for George to repeat himself.&lt;/p&gt;

&lt;p&gt;The improvement was subtle but significant. Instead of explaining the same architectural patterns every hour, I could focus on higher-level guidance and quality review—more like supervising a forgetful but talented worker rather than teaching the same lesson repeatedly.&lt;/p&gt;

&lt;p&gt;But this raises the critical question: what exactly does George need to know to supervise Lennie effectively?&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Serious Engineering Knowledge" Actually Means
&lt;/h2&gt;

&lt;p&gt;The logger implementation clarified exactly what technical knowledge supervising Lennie requires. These aren't advanced concepts—most represent computer science 200-level material or first-year professional experience. But George needs enough expertise to recognize when Lennie is wandering off course:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern Recognition
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Understanding when &lt;code&gt;console.log()&lt;/code&gt; isn't sufficient for production&lt;/li&gt;
&lt;li&gt;Recognizing that custom loggers are almost always unnecessary&lt;/li&gt;
&lt;li&gt;Knowing third-party ecosystem options (Pino, Winston, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Testing Strategy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Test isolation and proper mocking patterns&lt;/li&gt;
&lt;li&gt;Understanding that tests should validate behavior, not get disabled when things get hard&lt;/li&gt;
&lt;li&gt;Performance expectations grounded in hardware reality&lt;/li&gt;
&lt;li&gt;Clean test output for CI/CD compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architectural Thinking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Component extraction and reuse principles&lt;/li&gt;
&lt;li&gt;Error handling patterns and async management&lt;/li&gt;
&lt;li&gt;When to refactor vs. when to constrain Lennie's enthusiastic scope creep&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Production Awareness
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node version targeting and feature compatibility&lt;/li&gt;
&lt;li&gt;Monitoring and observability requirements&lt;/li&gt;
&lt;li&gt;Enterprise-grade reliability expectations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this requires deep systems programming knowledge, but George needs enough experience to distinguish good approaches from mediocre ones. Lennie can implement either equally well—only human judgment prevents it from crushing the bunny.&lt;/p&gt;

&lt;p&gt;The logger saga taught me these supervision fundamentals, but it also revealed a larger challenge: how do you review code at the pace Lennie produces it?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code Review Challenge
&lt;/h2&gt;

&lt;p&gt;Working at this pace creates an unprecedented review burden. The logger implementation alone generated more pull requests in a few days than I typically create in months. At 10-20 PRs daily across the entire design system, traditional review processes break down entirely.&lt;/p&gt;

&lt;p&gt;You're not just checking for bugs—you're validating architectural decisions, ensuring consistency across Lennie's memory resets, and catching the subtle signs that Lennie has wandered off-task. George must develop different skills than reviewing human-written code, where you can assume the developer remembers yesterday's decisions.&lt;/p&gt;

&lt;p&gt;The logger experience taught me to watch for specific patterns that signal trouble ahead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Red Flags: When Lennie Goes Off Course
&lt;/h3&gt;

&lt;p&gt;After thousands of lines of review during the logger implementation and beyond, certain patterns emerged as reliable warning signs that Lennie was about to crush something:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method Bloat in Single Files&lt;/strong&gt; When Lennie starts adding numerous methods to one file, it usually signals missed decomposition opportunities. I'd see logger files growing utility functions for async thread management—completely different conceptual domains crammed together. The same architectural red flags you'd catch reviewing a junior developer's work, but amplified by Lennie's enthusiasm for solving everything in place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conceptual Drift&lt;/strong&gt; Lennie tends to solve adjacent problems it discovers along the way. Building a logger? Might as well add custom error handling. Implementing async utilities? Let's create a performance monitoring framework too. This scope creep happens gradually and requires George's constant vigilance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Legacy Pattern Defaults&lt;/strong&gt; Lennie consistently defaulted to outdated approaches until I reminded it of our Node 18+ target. When implementing error handling, it reached for the &lt;code&gt;verror&lt;/code&gt; package—a library that hasn't been updated in four years—instead of using typed errors built into Node since version 16. These patterns emerge because Lennie's training includes years of legacy solutions that were once best practice but are now obsolete.&lt;/p&gt;

&lt;h2&gt;
  
  
  George's Spiral Development Process
&lt;/h2&gt;

&lt;p&gt;Traditional code review assumes human memory and consistent quality across iterations. Managing Lennie requires a completely different approach—what I call spiral development.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The two most powerful warriors are patience and time." — Leo Tolstoy&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I evolved a nine-phase process that sounds excessive but works remarkably well with Lennie's strengths and limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Make it work&lt;/strong&gt; — Let Lennie get basic functionality in place&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it right&lt;/strong&gt; — Guide Lennie to decompose code, apply best practices, integrate third-party solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it robust&lt;/strong&gt; — Help Lennie handle edge cases and improve error handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it secure&lt;/strong&gt; — Dedicated security review and hardening (Lennie needs lots of guidance here)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it performant&lt;/strong&gt; — Optimization pass focused on performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it observable&lt;/strong&gt; — Add monitoring, consistent logging, debugging capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it tested&lt;/strong&gt; — Comprehensive test coverage (where Lennie actually excels with supervision)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it documented&lt;/strong&gt; — Both human-readable and AI-optimized documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it integrated&lt;/strong&gt; — Refactor existing code to leverage new capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each phase takes 20-60 minutes depending on complexity, but the focused approach prevents Lennie from getting overwhelmed and trying to accomplish everything simultaneously—which inevitably leads to crushed bunnies.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Iterative Review Advantage
&lt;/h3&gt;

&lt;p&gt;Lennie responds remarkably well to repeated review requests along different dimensions. I asked it to review my SSH script for security improvements over a dozen times. Each iteration found new hardening opportunities I'd never considered—input sanitization techniques, defense-in-depth strategies, error handling patterns that prevent information leakage.&lt;/p&gt;

&lt;p&gt;This iterative approach works because Lennie doesn't get frustrated or defensive about criticism like humans might. Ask it to review for third-party package opportunities, then security issues, then performance optimizations, then architectural improvements. Each lens reveals different improvement opportunities that George can guide Lennie toward implementing.&lt;/p&gt;

&lt;p&gt;The SSH script eventually ballooned from a simple setup utility to a comprehensive, hardened installation process. Probably overkill, but it demonstrated that genuinely secure code is achievable when George provides patient, systematic guidance—just not on the first or second pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Context Across Memory Resets
&lt;/h3&gt;

&lt;p&gt;To maintain continuity across context window collapses, I developed a structured task management approach using markdown files with frontmatter metadata.&lt;/p&gt;

&lt;p&gt;Each task follows a template covering objective, problem statement, success criteria, method guide, implementation requirements, technical decisions, and progress notes. The goal is providing sufficient context to keep the agent on track without overwhelming it with excessive documentation.&lt;/p&gt;

&lt;p&gt;I use a three-digit, zero-padded naming convention (e.g., &lt;code&gt;003-implement-logger-utilities.md&lt;/code&gt;) that allows the agent to automatically pick up the next task numerically. This supports adding and reprioritizing tasks even while work is in progress—essential for planned experiments with parallel agent coordination.&lt;/p&gt;

&lt;p&gt;The format evolves rapidly based on results. If a task doesn't proceed well, I can modify the template before the next session. This flexibility allows continuous improvement in agent guidance without requiring complex tooling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Reliability and Workflow Adaptation
&lt;/h3&gt;

&lt;p&gt;Claude Code crashes in VS Code roughly once daily—annoying but manageable. I suspect this relates more to VS Code's terminal handling extended histories rather than Claude itself. Sessions with 10,000+ lines of history seem to trigger instability.&lt;/p&gt;

&lt;p&gt;The crashes rarely destroy significant work since I maintain frequent commits and detailed task files. Opening a fresh console session usually resolves the issue without losing context. I did encounter one spectacular failure where aggressive file watching during testing (adding 500k-1M watchers) overwhelmed Docker's volume interface, but that was clearly my architectural mistake rather than a tool limitation.&lt;/p&gt;

&lt;p&gt;These reliability issues reinforce the importance of systematic context management. The task file system becomes even more critical when tools fail unexpectedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Succession Challenge and Learning Pathways
&lt;/h2&gt;

&lt;p&gt;My logger experience reinforced concerns about entry-level developer training. Lennie consistently made mistakes that any experienced developer would catch immediately, but might confuse someone still learning fundamentals.&lt;/p&gt;

&lt;p&gt;We're creating a world where senior professionals can move faster than ever, while eliminating traditional pathways for developing the expertise required to supervise Lennie effectively. This isn't sustainable long-term.&lt;/p&gt;

&lt;h2&gt;
  
  
  For Designers and Product Managers: My Advice on Getting Started
&lt;/h2&gt;

&lt;p&gt;The learning path varies dramatically based on your existing technical background. Those with engineering experience will naturally fare better, but I've seen the barriers aren't insurmountable for others willing to invest the effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid the Code-Hidden Temptation&lt;/strong&gt; I get why many gravitate toward code-free solutions like Replit for quick prototyping. These tools work well for weekend experiments and stakeholder conversations, but they come with serious caveats I've learned the hard way.&lt;/p&gt;

&lt;p&gt;I've watched sales teams sell features they saw in slide decks of possible future projects—nowhere near production-ready. High-fidelity AI-generated prototypes can fool non-technical stakeholders into believing complex features are nearly complete. In software companies, most people understand this limitation. In traditional industries, you might find yourself explaining to an executive why the working demo they just used is still 3-6 months from customer deployment. Trust me, that's not a conversation you want to have.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If You're Ready for Production Development&lt;/strong&gt; For serious development efforts using tools like Claude Code, the initial setup presents the biggest hurdle. It took me most of a day to configure my development environment, despite substantial experience with containerization and DevOps.&lt;/p&gt;

&lt;p&gt;DevOps remains Lennie's weakest area. Lennie can write decent Dockerfiles but produces terrible Docker Compose files. The YAML looks correct, but environment variables and specific configurations are consistently wrong. Since compose arrangements are highly context-specific with limited training examples, current models simply lack sufficient data to handle this complexity reliably.&lt;/p&gt;

&lt;h3&gt;
  
  
  My Practical Getting Started Advice
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keep environments simple initially&lt;/strong&gt; — avoid complex DevOps configurations until you're comfortable with basic workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose constrained projects&lt;/strong&gt; — a blog built on 11ty using Tailwind and deployed to Cloudflare Pages provides clear boundaries and well-documented patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expect the deployment gap&lt;/strong&gt; — you might have something working locally in days, but deployment could take a week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Lennie as a learning tool&lt;/strong&gt; — ask questions like "Why did you use different error handling in method X versus method Y?" It often provides better explanations than most professors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local experiments cost nothing&lt;/strong&gt; — try building something to understand the process before worrying about production concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Next Generation Challenge
&lt;/h2&gt;

&lt;p&gt;I've been experimenting with letting my younger children build projects using Lennie, with the requirement that they explain how the code actually works when finished. My wife and I are still debating the pedagogical merits of this approach, but it mirrors the advanced calculator problem in mathematics.&lt;/p&gt;

&lt;p&gt;I want engineers who look up equations every time rather than memorizing formulas—forgetting a factor when building a bridge has catastrophic consequences. But those same engineers need deep pattern recognition to make intuitive leaps and develop architectural thinking.&lt;/p&gt;

&lt;p&gt;The challenge becomes teaching both tool usage and fundamental understanding without creating professionals who only know which buttons to push. We can't ignore these tools, but we also can't skip teaching rudimentary skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  Economic and Workflow Implications
&lt;/h3&gt;

&lt;p&gt;The productivity gains from my logger experience translate to substantial economic impact. My 5-hour implementation versus 2-3 weeks of manual development represents roughly $1,000 versus $20,000 in time value. Even using offshore resources at $10,000-$25,000 monthly, we're looking at 5x cost savings and 15x time improvements.&lt;/p&gt;

&lt;p&gt;But the real change isn't just speed—it's the cognitive load shift. When supervising Lennie rather than coding directly, I spend focused attention on planning and reviewing, with implementation feeling more like being a passenger who occasionally gives directions or shouts "stop" when Lennie veers toward a playground. During highway stretches—testing and documentation phases—I can listen to podcasts while Lennie handles the mechanical work.&lt;/p&gt;

&lt;p&gt;This change in mental effort distribution appeals to me more than traditional coding. I've always preferred architectural thinking over syntax research, so having Lennie handle mundane implementation details reduces frustration rather than creating it. The supervision challenge becomes about strategic guidance rather than tactical execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Team Structure and Role Evolution
&lt;/h3&gt;

&lt;p&gt;The implications for team composition remain unclear from my limited experiment, but early patterns are emerging. For professionals who can operate across traditional role boundaries—the supposed unicorns—this creates unique productivity opportunities.&lt;/p&gt;

&lt;p&gt;I expect low-fidelity mockups will become less common, continuing a 15-year trend from Balsamiq wireframes to high-fidelity Figma designs. Some designers will resist this shift because visual decoration can obscure core UX problems. But it's entirely feasible to build design systems that allow toggling between working prototypes and wireframe views—something I'm actively exploring with Terroir DS.&lt;/p&gt;

&lt;p&gt;Smaller teams will accomplish more, but whether organizations invest in additional projects or reduce headcount depends on leadership philosophy. Many will choose layoffs because they're easier to justify to shareholders. I suspect this is usually the wrong path, but it's predictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Convergence Is Real, But George Is Everything
&lt;/h2&gt;

&lt;p&gt;The George-and-Lennie dynamic isn't a temporary limitation—it's the fundamental nature of working with AI coding agents. Even as models improve, someone must maintain strategic context, make architectural decisions, and ensure quality standards. Lennie can crush technical implementation with incredible power, but George's judgment determines whether it crushes the right problems or destroys the work entirely.&lt;/p&gt;

&lt;p&gt;For product managers and designers willing to develop these supervisory skills, the convergence opportunity is transformative and immediate. The technical knowledge barrier is real but surmountable, especially for professionals with existing engineering exposure. My spiral development process, task file systems, and iterative review approaches provide concrete frameworks for learning to manage Lennie effectively.&lt;/p&gt;

&lt;p&gt;The economic case alone justifies the investment: 15x time improvements and 5x cost reductions fundamentally change what's possible for small teams. But the real opportunity lies in role fluidity—the ability to move seamlessly between strategic product thinking and tactical implementation without losing creative momentum.&lt;/p&gt;

&lt;p&gt;The trust required here isn't blind faith in Lennie's capabilities, but confidence in your own supervision skills. Trust that you can recognize when Lennie veers off course, redirect effectively, and maintain architectural coherence across memory resets.&lt;/p&gt;

&lt;p&gt;This isn't about replacing engineers or eliminating the need for technical expertise. It's about enabling experienced professionals to operate across traditional domain boundaries when speed and resource constraints demand it. The succession challenge remains real—we need new apprenticeship models that teach both fundamental technical judgment and Lennie supervision skills.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The secret to getting ahead is getting started." — Mark Twain&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But for those ready to embrace the George role, the convergence opportunity is already here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: Building at Scale
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://github.com/terroir-ds/core" rel="noopener noreferrer"&gt;Terroir Design System&lt;/a&gt; experiment has made material progress from initial hypothesis in less than a week. The logger saga represents just one component in a broader architectural exploration that pushes AI supervision to enterprise-scale limits.&lt;/p&gt;

&lt;p&gt;Rather than following traditional MVP approaches, I'm using Terroir as a vehicle to experiment with different AI supervision techniques—meandering somewhat, but deliberately so. This allows me to explore edge cases and architectural possibilities that might not emerge from more constrained development approaches.&lt;/p&gt;

&lt;p&gt;In my next post, I'll dive into the technical ambitions driving Terroir—type-safe design tokens, automated documentation generation, comprehensive testing strategies, and the architectural patterns that make design systems truly scalable. This discussion will target engineers and design system specialists interested in the technical possibilities AI supervision enables.&lt;/p&gt;

&lt;p&gt;The question isn't whether convergence professionals can build simple applications—we've proven that. The question is whether we can produce enterprise-grade solutions that compete with dedicated development teams. Terroir is my attempt to find out.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is the second in a series exploring how AI coding agents are reshaping product development. Next: the technical deep dive into building production-grade design systems with AI supervision.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For product managers and designers&lt;/strong&gt;: What convergence opportunities are you seeing in your current role? Have you experimented with AI coding tools, and what supervision challenges did you encounter? I'm particularly interested in hearing from professionals who've started bridging the traditional role boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For engineering leaders&lt;/strong&gt;: How are you adapting team structures and skill development to accommodate AI-enhanced productivity? What new apprenticeship models are you considering for the next generation?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>design</category>
      <category>coding</category>
    </item>
    <item>
      <title>"It's not the notes you play, it's the notes you don't play." — Miles Davis. Perfect metaphor for working with AI coding agents. They want to solve everything by adding more code, but good programming often requires restraint.</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Tue, 24 Jun 2025 20:07:53 +0000</pubDate>
      <link>https://dev.to/joshjhall/its-not-the-notes-you-play-its-the-notes-you-dont-play-miles-davis-perfect-metaphor-for-27d6</link>
      <guid>https://dev.to/joshjhall/its-not-the-notes-you-play-its-the-notes-you-dont-play-miles-davis-perfect-metaphor-for-27d6</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/joshjhall" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3291477%2F00f3c632-e4a9-41ce-a8a9-0aa3e910fd71.jpeg" alt="joshjhall"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/joshjhall/the-convergence-opportunity-how-ai-coding-agents-are-reshaping-product-roles-278c" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;The Convergence Opportunity: How AI Coding Agents Are Reshaping Product Roles&lt;/h2&gt;
      &lt;h3&gt;Joshua Hall ・ Jun 24&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#design&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#productivity&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>design</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Convergence Opportunity: How AI Coding Agents Are Reshaping Product Roles</title>
      <dc:creator>Joshua Hall</dc:creator>
      <pubDate>Tue, 24 Jun 2025 17:08:24 +0000</pubDate>
      <link>https://dev.to/joshjhall/the-convergence-opportunity-how-ai-coding-agents-are-reshaping-product-roles-278c</link>
      <guid>https://dev.to/joshjhall/the-convergence-opportunity-how-ai-coding-agents-are-reshaping-product-roles-278c</guid>
      <description>&lt;p&gt;For years, I've described good product development as a delicate balance between three fundamental perspectives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Business people&lt;/strong&gt; answer how we pay for it and how we make money from it—whether through revenue, cost reduction, or strategic advantage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engineers&lt;/strong&gt; tackle how we build it and how we maintain it over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product people&lt;/strong&gt; focus on what needs it solves and who has those needs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The magic happens when all three perspectives are balanced. The most elegantly designed product remains worthless if you can't afford to build it or maintain it over time.&lt;/p&gt;

&lt;p&gt;I've held senior positions across all three domains throughout my career, but never believed one person could effectively perform all three roles simultaneously. The analytical mindset needed for debugging code conflicts with the empathetic thinking required for user research. The pragmatic constraints of engineering budgets clash with the ambitious vision needed for breakthrough products.&lt;/p&gt;

&lt;p&gt;This limitation held true until recently.&lt;/p&gt;

&lt;p&gt;Over the past week, I generated roughly 30,000 lines of code alongside another 20,000 lines of documentation using Claude Code. This felt completely different from traditional coding. Rather than wrestling with syntax and library documentation, I found myself pair programming with an AI agent, conducting real-time code reviews, and focusing on architectural decisions.&lt;/p&gt;

&lt;p&gt;For the first time in my career, I can envision a path where experienced professionals might effectively operate across all three domains. AI allows individuals to offload the mechanical work while focusing on strategic decisions. This enables them to produce diverse artifacts traditionally siloed across different subject matter experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Of Machines and Men: The Reality of AI Coding Partners
&lt;/h2&gt;

&lt;p&gt;Working with agentic code generation feels remarkably like managing Lennie from Steinbeck's Of Mice and Men. The AI possesses the raw power of two or three experienced developers, but left unsupervised, it will crush the bunny every time.&lt;/p&gt;

&lt;p&gt;When debugging a failing test, the agent would attempt a fix once or twice, then decide the test simply needed to be disabled or "simplified" beyond recognition. A gentle redirect—"No, we need to fix or rewrite the test appropriately rather than disabling it"—would usually produce the right solution. The AI wants to solve every problem by adding more content, but programming often requires subtraction and restraint.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It's not the notes you play, it's the notes you don't play." — Miles Davis&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When building baseline architecture, it reinvents wheels with enthusiastic inefficiency. At one point, it correctly identified the need for logging in a function, then proceeded to write a custom Logger class from scratch. When I questioned why we weren't using a battle-tested logging package, it immediately agreed and refactored to implement a proper third-party solution. But I had to recognize the misstep and suggest the alternative.&lt;/p&gt;

&lt;p&gt;This dynamic mirrors working with a talented but inexperienced engineer—one who lacks pattern recognition and doesn't know what tools already exist. To get production-quality results, you need serious engineering knowledge to guide the process effectively. George must understand the complex jobs on the farm to get the most out of Lennie.&lt;/p&gt;

&lt;p&gt;The key is constant supervision with small, specific tasks. Broader, more ambiguous requests often lead to cycles of refactoring and questions like "Why aren't we using our common logging pattern here?" The available context window is fixed, so working with an AI is like working with someone who can't form new long-term memories—you're constantly reminding it of established patterns and decisions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Tell me about the rabbits, George." — Lennie&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Convergence Opportunity
&lt;/h2&gt;

&lt;p&gt;For product managers and designers who possess these engineering fundamentals, the possibilities are transformative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Design systems scale efficiently&lt;/strong&gt; — I can generate design system and UI code rivaling the output of most front-end teams, freeing engineering resources for higher-value problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping becomes table stakes&lt;/strong&gt; — I can rapidly create high-fidelity, functional prototypes instead of static mockups, replacing debates and hypotheticals with testable interactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data exploration democratizes&lt;/strong&gt; — I can analyze datasets far more effectively than traditional BI tools allow, without needing to master R or Python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thinking realistically, this opportunity has a crucial limitation: it requires substantial technical knowledge that most product managers and designers currently lack. The combination of business acumen, product intuition, and technical depth needed to effectively supervise AI code generation remains relatively uncommon—even among talented people I've hired and would hire again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Succession Challenge
&lt;/h2&gt;

&lt;p&gt;The rise of AI-powered development creates a succession challenge: we're raising the bar for entry-level professionals while making their traditional learning paths obsolete. Companies need to invest in new apprenticeship models that develop systems thinking alongside technical skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rediscovering Joy in Building
&lt;/h2&gt;

&lt;p&gt;On a personal note, I haven't genuinely enjoyed writing code for 10-15 years. I avoided coding because constant syntax lookup killed my creative flow. Knowing what I wanted to accomplish but spending hours researching specific classes and methods was soul-crushing.&lt;/p&gt;

&lt;p&gt;AI coding agents have changed this completely. I can focus on architectural and design decisions—the parts I actually enjoy—while the AI handles syntactic details. Building software has become fun and creative again instead of frustrating and slow.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"When you ask creative people how they did something, they feel a little guilty because they didn't really do it, they just saw something." — Steve Jobs&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The ability to focus on creative problem-solving rather than mechanical implementation suggests we're entering an era where role boundaries become more fluid, at least for those with the right foundation. But it also raises questions about code review processes, quality assurance, and team structures that we're only beginning to explore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Forward
&lt;/h2&gt;

&lt;p&gt;This isn't theoretical - the convergence is happening now, but only for professionals who bring together business acumen, product intuition, and technical depth that remains relatively rare. Companies that recognize this and invest in developing these hybrid capabilities—while also creating pathways for the next generation—will have significant competitive advantages.&lt;/p&gt;

&lt;p&gt;In my upcoming posts, I'll dive deeper into the practical realities of working with AI coding agents, explore specific opportunities for product and design roles, and examine potential solutions to the training crisis we're creating. I'm also beginning an experiment to build an enterprise-class design system using these methodologies, which should provide concrete examples of what's possible.&lt;/p&gt;

&lt;p&gt;The question isn't whether AI will change product development—it already has. The question is whether we'll adapt our roles, our teams, and our training to harness this change effectively.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is the first in a series exploring how AI coding agents are reshaping product development. Follow along as I document building a complete design system using these tools, starting soon.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>design</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
