<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raju Dandigam</title>
    <description>The latest articles on DEV Community by Raju Dandigam (@raju_dandigam).</description>
    <link>https://dev.to/raju_dandigam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1726463%2F38d1e46f-d122-4fa3-b130-772169c24466.png</url>
      <title>DEV Community: Raju Dandigam</title>
      <link>https://dev.to/raju_dandigam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raju_dandigam"/>
    <language>en</language>
    <item>
      <title>The Dependency Security Workflow Your Node.js Project Is Missing</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Fri, 12 Jun 2026 14:23:35 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/the-dependency-security-workflow-your-nodejs-project-is-missing-2b32</link>
      <guid>https://dev.to/raju_dandigam/the-dependency-security-workflow-your-nodejs-project-is-missing-2b32</guid>
      <description>&lt;p&gt;&lt;em&gt;Why local, lockfile-aware scanning gives JavaScript teams a more practical path from discovery to remediation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A workflow problem, not a scanner shortage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Node.js teams have no shortage of vulnerability scanners. What they still lack, in many cases, is a dependency security workflow that helps developers act on findings while they are still in the middle of making release decisions.&lt;/p&gt;

&lt;p&gt;That distinction matters more than it may seem. Most organizations can already generate reports, sort findings by severity, and prove that some form of scanning exists in the pipeline. The harder problem begins after the report is produced. A developer still has to understand whether the issue lives in a direct dependency or a transitive one, whether a fixed version is available, whether the vulnerable package appears along more than one path, and whether remediation can happen inside the project today or will depend on upstream maintainers. Those are the questions that shape release decisions, yet they are often the least visible part of the workflow.&lt;/p&gt;

&lt;p&gt;This is why dependency security in JavaScript often feels more mature on paper than it does in practice. From a distance, the workflow sounds complete: packages are installed, CI runs, a scanner executes somewhere in the pipeline, and a list of known issues appears. In the real world, that sequence can still leave teams with findings that are technically correct but operationally weak. The data exists, but the context needed for action is missing.&lt;/p&gt;

&lt;p&gt;For development teams, the real gap is no longer basic detection. It is the ability to turn scan output into a credible next step without waiting for several rounds of trial, error, and pipeline feedback. In other words, the missing layer is not another dashboard. It is a workflow that makes dependency risk understandable at the point where engineers actually change packages, regenerate lockfiles, and prepare releases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What a local-first approach changes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the lens through which I have been exploring &lt;a href="https://github.com/sonukapoor/cve-lite-cli" rel="noopener noreferrer"&gt;CVE Lite CLI&lt;/a&gt;. The tool is intentionally narrower than many security platforms. It does not attempt to be an everything scanner for infrastructure, containers, secrets, runtime behavior, or exploitability analysis. Instead, it focuses on one job that many JavaScript teams still need done well: scanning real project dependency state locally from the lockfile, mapping known OSV-backed findings, separating direct from transitive issues, and showing dependency paths clearly enough that a developer can decide what to do next.&lt;/p&gt;

&lt;p&gt;The value of that approach is not simply that it produces results on a laptop instead of in CI. The deeper value is that it shortens the distance between discovery and remediation. Once a scan runs locally against the actual lockfile, the developer can inspect the dependency path, make an update, regenerate the dependency state, and immediately verify whether the issue has been removed or merely changed shape somewhere else in the graph. That is a much more practical engineering loop than waiting for a later system to report the same thing after the branch has already moved forward.&lt;/p&gt;

&lt;p&gt;The structure of Node.js dependency graphs is exactly why that loop matters. Many projects carry a modest list of top-level dependencies but resolve into hundreds or thousands of packages once the lockfile is materialized. In that environment, a flat vulnerability count tells only a small part of the story. Two findings with the same severity may imply very different work. One may be a direct dependency upgrade that the project team can make immediately. Another may be introduced several layers down through an indirect chain, where the realistic option is to monitor upstream movement rather than force a change locally.&lt;/p&gt;

&lt;p&gt;Without that distinction, teams often get stuck in a familiar pattern: urgency created by the count, uncertainty created by the graph, and delay created by the lack of ownership clarity. A stronger workflow does not remove all complexity, but it does make the complexity legible. It tells developers what is directly actionable now, what requires a deeper look, and what is effectively blocked outside the project boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the repository scans revealed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To see how that plays out in practice, I ran the tool against several public open source projects, including NestJS, release-it, and pnpm. The point was not to single out those repositories; well-maintained projects can still surface dependency findings, and results can change over time as upstream packages move. The more interesting question was whether a local-first, lockfile-aware workflow could provide developers with something more useful than a generic count.&lt;/p&gt;

&lt;p&gt;The NestJS run was the clearest example. CVE Lite CLI parsed 1,626 packages from package-lock.json and found 25 packages with known OSV matches: one high-severity issue, four medium, and twenty low. On its own, that number can sound alarming. Yet the more useful insight came from structure rather than volume. Twelve findings appeared directly fixable in the project. Thirteen were transitive. That split immediately changes the conversation from "How many issues exist?" to "Which of these can we realistically address ourselves right now?"&lt;/p&gt;

&lt;p&gt;The richer case study that followed made the workflow difference even more obvious. In one dependency path, remediation was not a single upgrade but an iterative sequence. Updating the dependency graph once exposed the next step required to move further toward a clean state. This is where local scanning becomes more than a convenience. If a developer has to rely only on CI-based feedback, that same sequence can turn into repeated pushes, repeated pipeline waits, and repeated rediscovery of the next dependency adjustment. When the scan runs locally, the same work becomes a more natural scan-fix-rescan loop carried out in a single session.&lt;/p&gt;

&lt;p&gt;The NestJS scan also showed how the same package can appear under different remediation conditions. For example, diff surfaced as a high-severity transitive issue through one parent chain while also appearing in other forms with different dependency relationships. That is a realistic picture of Node.js maintenance. Developers are not just responding to package names; they are responding to how those packages enter the graph and whether the project has meaningful control over the path that introduced them.&lt;/p&gt;

&lt;p&gt;Release-it showed a similar pattern on a smaller dependency graph. The scan parsed 545 packages and found 10 packages with known OSV matches, including four medium-severity issues and six low-severity ones. Six appeared directly fixable, while four were transitive. In a smaller project, that level of clarity is arguably even more useful because it helps a maintainer move quickly instead of losing time interpreting a report that treats every issue as if it belonged to the same class of work. Some findings pointed toward straightforward upgrades. Others required understanding how packages such as minimatch arrived through indirect chains. Again, the practical value came from distinguishing the shape of the work, not just the existence of the finding.&lt;/p&gt;

&lt;p&gt;The pnpm run mattered for a different reason. It parsed 563 packages from pnpm-lock.yaml and returned no known OSV matches. That kind of result is easy to dismiss because it lacks drama, but it is just as important in a real workflow. A useful local security tool should not exist only to raise alarms. It should also help developers establish confidence quickly when the dependency state appears clean. In that sense, a no-findings result can be a productivity feature. It allows a team to validate release readiness without dragging every change through a heavier review cycle simply because the tooling is distant from the developer workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters beyond one tool&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is why I do not think the future of dependency security will be defined by which tool can generate the longest report. The stronger direction is toward interpretation and decision support. Development teams need security tooling that recognizes the difference between direct and transitive findings, that makes dependency paths visible, that surfaces fixed-version guidance when available, and that allows engineers to verify changes immediately against the real lockfile state of the project.&lt;/p&gt;

&lt;p&gt;That is also why local-first tooling deserves more attention than it usually gets. Running a lockfile-aware scan before release should feel less like a special security event and more like checking tests, lint output, or build health. It belongs inside the engineering loop because dependency decisions are already being made there. The further security context is pushed away from that moment, the more likely it becomes that developers receive findings without usable remediation context.&lt;/p&gt;

&lt;p&gt;For the JavaScript ecosystem, the broader lesson is straightforward. Teams do not simply need more vulnerability visibility. They need better workflow infrastructure for acting on what they already know. A scanner that only reports issues after code has moved downstream will always feel one step removed from the work that matters. A scanner that helps a developer inspect dependency structure, understand fixability, and verify changes locally is much closer to becoming part of everyday engineering practice.&lt;/p&gt;

&lt;p&gt;That is the workflow Node.js projects are still missing. And as dependency graphs continue to deepen and release cycles continue to accelerate, the tools that gain traction will be the ones that help developers move from finding to fix with the least friction, the clearest structure, and the most honest view of what is actually under their control.&lt;/p&gt;

</description>
      <category>node</category>
      <category>security</category>
      <category>typescript</category>
      <category>cli</category>
    </item>
    <item>
      <title>Multi-Architecture Docker Builds for Node.js: From Apple Silicon to AWS Graviton</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Wed, 27 May 2026 23:33:03 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/multi-architecture-docker-builds-for-nodejs-from-apple-silicon-to-aws-graviton-34dn</link>
      <guid>https://dev.to/raju_dandigam/multi-architecture-docker-builds-for-nodejs-from-apple-silicon-to-aws-graviton-34dn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A few years ago, many teams could ignore CPU architecture when building Docker images. Most development machines were x86, most CI runners were x86, and most production servers were x86. If the image worked in CI, it probably worked in production.&lt;/p&gt;

&lt;p&gt;That world has changed.&lt;/p&gt;

&lt;p&gt;Many developers now use Apple Silicon laptops, which run on ARM64. AWS Graviton instances use ARM-based processors and are widely used for cost and performance optimization. Edge devices and small compute environments often use ARM as well. At the same time, many CI pipelines still run on AMD64 Linux runners.&lt;/p&gt;

&lt;p&gt;This creates a practical problem for Node.js teams. An image built for one architecture may not run efficiently on another. It may run through emulation, but that can be slower and less predictable. It may fail completely if the image contains native binaries for the wrong architecture.&lt;/p&gt;

&lt;p&gt;Docker Buildx solves this by allowing teams to build multi-platform images from a single Dockerfile. Docker's documentation describes a multi-platform build as a single build invocation that targets multiple operating system or CPU architecture combinations, such as linux/amd64 and linux/arm64.&lt;/p&gt;

&lt;p&gt;For TypeScript and Node.js applications, this is especially useful when the same app needs to work across Apple Silicon development machines, Linux CI, and ARM64 production environments such as AWS Graviton.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real-World Problem
&lt;/h2&gt;

&lt;p&gt;Imagine a team building a TypeScript API. Developers use M-series MacBooks. GitHub Actions builds the image on Ubuntu runners. Production runs on AWS ECS, and the team wants to move some workloads to Graviton for better price performance.&lt;/p&gt;

&lt;p&gt;At first, the Dockerfile looks fine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may work until the team introduces a native dependency such as sharp, canvas, sqlite3, bcrypt, or a browser automation dependency. These packages may use native binaries or system libraries. If the wrong architecture is built, cached, or pulled, the application may fail in confusing ways.&lt;/p&gt;

&lt;p&gt;The issue is not TypeScript itself. TypeScript compiles to JavaScript, which is mostly architecture-independent. The issue is the runtime environment around it: Node.js binaries, native npm modules, base image packages, browser dependencies, and platform-specific libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Multi-Architecture Images Work
&lt;/h2&gt;

&lt;p&gt;A multi-architecture image is usually published as a manifest list, also called an image index. The tag points to multiple platform-specific images. Docker chooses the correct one when the image is pulled.&lt;/p&gt;

&lt;p&gt;Docker's manifest CLI documentation explains that a manifest list contains one or more image names and can be used like an image name in docker pull or docker run.&lt;/p&gt;

&lt;p&gt;Here is the idea.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5w0xr5un9zo1at4ayvls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5w0xr5un9zo1at4ayvls.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The developer does not need separate image names such as my-app-amd64 and my-app-arm64. They pull the same tag, and Docker selects the matching image for the machine.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting Up Docker Buildx
&lt;/h2&gt;

&lt;p&gt;Buildx is Docker's extended build tool powered by BuildKit. It is included with Docker Desktop and modern Docker installations.&lt;/p&gt;

&lt;p&gt;Check that it is available.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker buildx version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create and use a builder that supports multi-platform builds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker buildx create &lt;span class="nt"&gt;--name&lt;/span&gt; multiarch-builder &lt;span class="nt"&gt;--driver&lt;/span&gt; docker-container &lt;span class="nt"&gt;--bootstrap&lt;/span&gt;
docker buildx use multiarch-builder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inspect the builder.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker buildx inspect &lt;span class="nt"&gt;--bootstrap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see supported platforms such as linux/amd64 and linux/arm64.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Multi-Architecture Dockerfile for TypeScript
&lt;/h2&gt;

&lt;p&gt;For a straightforward TypeScript API, the Dockerfile does not need to be complicated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; node&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well across architectures when your dependencies support the target platforms. The official Node images support common platforms including AMD64 and ARM64, so the base image is not usually the hard part.&lt;/p&gt;

&lt;p&gt;Build and push a multi-platform image like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker buildx build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64,linux/arm64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; yourname/node-agent:latest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--push&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--platform&lt;/code&gt; flag tells Docker which architectures to build. The &lt;code&gt;--push&lt;/code&gt; flag pushes the multi-platform image to a registry. Docker's multi-platform GitHub Actions documentation notes that the default Docker setup for GitHub Actions runners supports building and pushing multi-platform images to registries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About Native npm Modules?
&lt;/h2&gt;

&lt;p&gt;Native modules are where multi-architecture builds become more interesting.&lt;/p&gt;

&lt;p&gt;Packages such as sharp, canvas, bcrypt, and sqlite3 may depend on native code or system libraries. During a multi-platform build, each platform should install dependencies for that platform. That is what you want, but it means your Dockerfile must provide any required build or runtime packages.&lt;/p&gt;

&lt;p&gt;For example, if your AI app processes images with sharp, you may need system dependencies depending on your base image and package version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; python3 make g++ &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; node&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not something every project needs. Add build tools only when your dependencies require them. The important lesson is to test both platforms, not assume that a successful AMD64 build proves ARM64 is safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Playwright and Browser Dependencies
&lt;/h2&gt;

&lt;p&gt;Playwright adds another practical wrinkle. Browser dependencies can be large, platform-specific, and sensitive to the base image.&lt;/p&gt;

&lt;p&gt;If Playwright is used only for testing, keep it out of your production API image. Use the official Playwright image for tests and keep the app image small.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yourname/node-agent:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;

  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/playwright:v1.56.1-noble&lt;/span&gt;
    &lt;span class="na"&gt;working_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./:/app&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sh -c "npm ci &amp;amp;&amp;amp; npx playwright test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your agent truly needs browser automation at runtime, consider a separate browser worker image. Do not force every API container to carry browser dependencies if only one workflow needs them.&lt;/p&gt;

&lt;p&gt;That architecture is usually cleaner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q86o37dgzg5kgaq9ygq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q86o37dgzg5kgaq9ygq.png" alt=" " width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Actions Build
&lt;/h2&gt;

&lt;p&gt;A CI workflow can build and push both AMD64 and ARM64 images.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Multi-Arch Image&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-qemu-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/login-action@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DOCKER_USERNAME }}&lt;/span&gt;
          &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DOCKER_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v6&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
          &lt;span class="na"&gt;platforms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64,linux/arm64&lt;/span&gt;
          &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yourname/node-agent:latest&lt;/span&gt;
          &lt;span class="na"&gt;cache-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha&lt;/span&gt;
          &lt;span class="na"&gt;cache-to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,mode=max&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Docker setup-buildx action creates and boots a builder for use with Buildx and Docker's build-push action, and its documentation notes that the default docker-container driver supports multi-platform images and cache export through a BuildKit container.&lt;/p&gt;

&lt;p&gt;The Docker build-push action supports Buildx features including multi-platform builds, secrets, and remote cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AWS Graviton Makes This Worth It
&lt;/h2&gt;

&lt;p&gt;Multi-architecture builds are not only about developer convenience. They can also unlock cloud cost and performance options.&lt;/p&gt;

&lt;p&gt;AWS says Graviton-based instances can deliver up to 40% better price performance compared with comparable current-generation x86-based instances, depending on workload and instance family.&lt;/p&gt;

&lt;p&gt;That does not mean every Node.js service automatically saves 40%. You still need to benchmark. But multi-architecture images make it possible to test the same application on x86 and ARM without maintaining two separate image pipelines.&lt;/p&gt;

&lt;p&gt;For teams running many Node.js APIs, workers, or agent services, that flexibility can matter. Even a modest percentage improvement becomes meaningful at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Both Architectures
&lt;/h2&gt;

&lt;p&gt;After building the image, test both platforms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 yourname/node-agent:latest
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/arm64 yourname/node-agent:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a host that does not match the requested platform, Docker may use emulation. That is useful for basic validation, but it is not a substitute for testing on real ARM64 infrastructure if performance matters.&lt;/p&gt;

&lt;p&gt;For production readiness, run at least one deployment test on the actual target architecture. For AWS, that may mean ECS tasks, EKS nodes, or EC2 instances backed by Graviton.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;The first mistake is assuming JavaScript means architecture does not matter. Pure JavaScript is portable, but Node.js apps often include native packages, system libraries, and browser tooling.&lt;/p&gt;

&lt;p&gt;The second mistake is using base images that do not support all target platforms. Official images such as Node generally support common platforms, but third-party images may not.&lt;/p&gt;

&lt;p&gt;The third mistake is building multi-architecture images without testing the ARM64 path. A manifest can exist while one platform image still has a runtime bug.&lt;/p&gt;

&lt;p&gt;The fourth mistake is treating multi-architecture builds as free. They add build time and CI complexity. Use caching, and only support platforms you actually need.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Multi-Architecture Builds Are Worth It
&lt;/h2&gt;

&lt;p&gt;Multi-architecture builds are worth it when your developers use Apple Silicon, your production platform includes ARM64, you want to evaluate AWS Graviton, or you deploy to edge devices.&lt;/p&gt;

&lt;p&gt;They are less urgent when your entire development, CI, and production stack is AMD64 and you have no plan to change. In that case, the added complexity may not be justified yet.&lt;/p&gt;

&lt;p&gt;For AI and agent workloads, this decision depends on the runtime. A simple Node.js orchestration service may run well on ARM64. A workload that depends heavily on specific native libraries, browser automation, or local model inference needs more careful testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Multi-architecture Docker builds are a practical way to make Node.js applications work across the hardware landscape developers actually use today.&lt;/p&gt;

&lt;p&gt;Apple Silicon changed local development. AWS Graviton changed cloud cost and performance discussions. Edge and ARM devices continue to grow. Docker Buildx connects these worlds by letting one image tag point to the right platform-specific image.&lt;/p&gt;

&lt;p&gt;For TypeScript apps, the basic setup is straightforward. Use a portable Dockerfile, build with &lt;code&gt;docker buildx build --platform linux/amd64,linux/arm64&lt;/code&gt;, push to a registry, and let Docker select the correct image at pull time.&lt;/p&gt;

&lt;p&gt;The hard parts are the real-world details: native npm modules, browser dependencies, image caching, and testing both architectures before trusting production. Get those right, and multi-architecture builds become more than a Docker feature. They become a path to better developer experience, more deployment flexibility, and potentially lower cloud costs.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>node</category>
      <category>typescript</category>
      <category>devops</category>
    </item>
    <item>
      <title>Stop Rebuilding Your AI App on Every Change: Docker Compose Watch for Node.js Developers</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Mon, 25 May 2026 17:38:51 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/stop-rebuilding-your-ai-app-on-every-change-docker-compose-watch-for-nodejs-developers-442</link>
      <guid>https://dev.to/raju_dandigam/stop-rebuilding-your-ai-app-on-every-change-docker-compose-watch-for-nodejs-developers-442</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Local AI application development often starts simple. You build a Node.js API, call a model provider, add a prompt, and test the response. Then the stack grows. You add Redis for short-term memory, Postgres for application state, a local model endpoint, maybe a worker service, and a frontend to inspect results.&lt;/p&gt;

&lt;p&gt;At that point, Docker Compose becomes useful because it can run the whole development environment consistently. The problem is the development loop. If every source code change requires stopping containers, rebuilding images, restarting services, and waiting for the app to come back, Docker starts to feel slower than working directly on the host machine.&lt;/p&gt;

&lt;p&gt;Docker Compose Watch helps solve that problem. It lets Compose watch local file changes and either sync files into running containers, rebuild services, or sync files and restart services depending on what changed. Docker's documentation describes Compose Watch as a way to automatically update and preview running Compose services as you edit and save code.&lt;/p&gt;

&lt;p&gt;For Node.js AI apps, this can make local development feel much smoother. You keep the benefits of a containerized stack, but you avoid the manual rebuild cycle for every small TypeScript change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Local AI Development Loop Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid6d4bjw9jk9zp0352p5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid6d4bjw9jk9zp0352p5.png" alt=" " width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A typical local AI stack may include several services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node.js API (TypeScript)
    ├── Redis (session/memory)
    ├── Postgres (state)
    ├── Local LLM endpoint (optional)
    └── Frontend debug UI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without watch mode, a small change can turn into a slow loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Edit a TypeScript file&lt;/li&gt;
&lt;li&gt;Stop containers&lt;/li&gt;
&lt;li&gt;Rebuild the API image&lt;/li&gt;
&lt;li&gt;Restart containers&lt;/li&gt;
&lt;li&gt;Wait for services to initialize&lt;/li&gt;
&lt;li&gt;Test the change&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You edit a file, rebuild the API image, restart the container, wait for the service to initialize, then test the prompt again. If the frontend also changes, you repeat the same process there. If the change touches dependencies, you rebuild again.&lt;/p&gt;

&lt;p&gt;That delay matters because AI development is highly iterative. You may change the prompt, adjust a tool schema, update response parsing, improve logging, or add one guardrail. These are small changes, but you may make dozens of them in a single session.&lt;/p&gt;

&lt;p&gt;The goal is not to avoid rebuilds forever. Dependency and Dockerfile changes should still rebuild the image. The goal is to avoid rebuilding the entire service when only a source file changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Compose Watch Does
&lt;/h2&gt;

&lt;p&gt;Compose Watch is configured under the &lt;code&gt;develop.watch&lt;/code&gt; section of a service. The Compose Develop specification defines watch actions such as &lt;code&gt;sync&lt;/code&gt;, &lt;code&gt;rebuild&lt;/code&gt;, &lt;code&gt;sync+restart&lt;/code&gt;, and newer &lt;code&gt;sync+exec&lt;/code&gt;. The common actions most Node.js developers need are &lt;code&gt;sync&lt;/code&gt;, &lt;code&gt;rebuild&lt;/code&gt;, and &lt;code&gt;sync+restart&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;sync&lt;/strong&gt; action copies changed files from your host into the running container. This is useful when the process inside the container already has a watcher, such as &lt;code&gt;tsx watch&lt;/code&gt;, &lt;code&gt;nodemon&lt;/code&gt;, or Vite.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;rebuild&lt;/strong&gt; action rebuilds the service image. This is useful when &lt;code&gt;package.json&lt;/code&gt;, a lockfile, or a Dockerfile changes.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;sync+restart&lt;/strong&gt; action copies files and restarts the container. This is useful when the service does not have its own hot-reload process.&lt;/p&gt;

&lt;p&gt;You start the environment with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker also provides a &lt;code&gt;docker compose watch&lt;/code&gt; command for watching build context and rebuilding or refreshing containers when files are updated.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Node.js AI API Example
&lt;/h2&gt;

&lt;p&gt;Assume we have a TypeScript API that exposes one endpoint for testing prompts. It talks to Redis for short-term memory and uses an environment variable for the model endpoint.&lt;/p&gt;

&lt;p&gt;A simple development Dockerfile can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["npx", "tsx", "watch", "src/index.ts"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is intentionally for development. It includes dependencies needed to run TypeScript directly with a watcher. The production Dockerfile should usually be different and use a compiled &lt;code&gt;dist&lt;/code&gt; output.&lt;/p&gt;

&lt;p&gt;Now add Compose Watch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.dev&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;
      &lt;span class="na"&gt;REDIS_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis://redis:6379&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_BASE_URL:-http://host.docker.internal:12434/engines/llama.cpp/v1}&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_API_KEY:-local-development-key}&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;
    &lt;span class="na"&gt;develop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./src&lt;/span&gt;
          &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app/src&lt;/span&gt;
          &lt;span class="na"&gt;ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.test.ts"&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.spec.ts"&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./package.json&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./package-lock.json&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./tsconfig.json&lt;/span&gt;

  &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis:7-alpine&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6379:6379"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now start the stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you edit a TypeScript file in &lt;code&gt;src&lt;/code&gt;, Compose syncs it into &lt;code&gt;/app/src&lt;/code&gt; inside the container. Then &lt;code&gt;tsx watch&lt;/code&gt; notices the change and reloads the process. When you change &lt;code&gt;package.json&lt;/code&gt; or the lockfile, Compose rebuilds the image because the dependency layer needs to change.&lt;/p&gt;

&lt;p&gt;This gives you a better local loop without abandoning containers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Helps AI Development
&lt;/h2&gt;

&lt;p&gt;AI development has a different rhythm from many traditional API projects. You often test small changes repeatedly. A developer may adjust a prompt, change a system message, add structured JSON parsing, tweak a retry rule, or update how tool results are summarized.&lt;/p&gt;

&lt;p&gt;These changes usually live in source files. They should not require a full image rebuild. Compose Watch lets those files sync quickly while the rest of the stack stays running.&lt;/p&gt;

&lt;p&gt;For example, you may have a small prompt helper like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildSummaryPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You summarize technical logs clearly. Mention the likely cause and next action.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you change the system message, the API container can reload quickly. Redis stays running. Postgres stays running. Your local model endpoint or cloud model configuration stays the same. You can immediately send another request and compare behavior.&lt;/p&gt;

&lt;p&gt;That is the value. Watch mode helps keep the feedback loop close to the speed of normal Node.js development while preserving the consistency of a Compose stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a Frontend Debug UI
&lt;/h2&gt;

&lt;p&gt;Many AI apps eventually need a simple UI for testing prompts, reviewing agent traces, or inspecting responses. Compose Watch works well with frontend tools such as Vite or Next.js too.&lt;/p&gt;

&lt;p&gt;Here is a small multi-service setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.dev&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5173:5173"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;VITE_API_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;
    &lt;span class="na"&gt;develop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend/src&lt;/span&gt;
          &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app/src&lt;/span&gt;
          &lt;span class="na"&gt;ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node_modules/&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend/package.json&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend/package-lock.json&lt;/span&gt;

  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./api&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.dev&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;REDIS_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis://redis:6379&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;
    &lt;span class="na"&gt;develop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./api/src&lt;/span&gt;
          &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app/src&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./api/package.json&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./api/package-lock.json&lt;/span&gt;

  &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis:7-alpine&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this setup, frontend changes sync into the frontend container, API changes sync into the API container, and Redis keeps its state unless you restart or remove the volume. You can iterate on the UI and backend without rebuilding everything on every change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch Mode vs Bind Mounts
&lt;/h2&gt;

&lt;p&gt;Many developers already use bind mounts for local development:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./src:/app/src&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works, but it can be slower or less predictable on macOS and Windows because Docker Desktop runs containers inside a virtualized environment. Large directories, file watchers, and &lt;code&gt;node_modules&lt;/code&gt; can create performance issues.&lt;/p&gt;

&lt;p&gt;Compose Watch gives you more explicit control. You decide which paths sync, which paths trigger rebuilds, and which paths should be ignored. Docker's file watch documentation also recommends using ignore rules to prevent unnecessary syncs and notes that watch rules can ignore paths relative to the watched path.&lt;/p&gt;

&lt;p&gt;For source code, watch mode is often clearer than mounting the entire repository. For persistent data such as Postgres, Redis, uploads, or local cache directories, volumes still make sense.&lt;/p&gt;

&lt;p&gt;A good rule is simple: &lt;strong&gt;Use watch mode for files you edit frequently. Use volumes for data you need to persist.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Watch Strategy
&lt;/h2&gt;

&lt;p&gt;For Node.js and TypeScript projects, use &lt;strong&gt;sync&lt;/strong&gt; for source files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./src&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app/src&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;strong&gt;rebuild&lt;/strong&gt; for dependency files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./package.json&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rebuild&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./package-lock.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;strong&gt;sync+restart&lt;/strong&gt; for configuration files if the app does not reload them automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync+restart&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./config&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app/config&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep ignored paths explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;node_modules/&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.test.ts"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.spec.ts"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;coverage/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not watch everything by default. A broad watch rule can cause unnecessary syncs, rebuilds, and confusing reloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Does Not Belong
&lt;/h2&gt;

&lt;p&gt;Compose Watch is a development feature. It should not be part of your production deployment strategy. Production images should be built, tagged, scanned, and deployed through a normal pipeline.&lt;/p&gt;

&lt;p&gt;It also should not replace a good production Dockerfile. A development Dockerfile may run &lt;code&gt;tsx watch&lt;/code&gt; or &lt;code&gt;nodemon&lt;/code&gt;, but a production Dockerfile should usually compile TypeScript and run the compiled output.&lt;/p&gt;

&lt;p&gt;Compose Watch also does not remove the need for test automation. It improves the local loop, but you still need unit tests, integration tests, Cypress or Playwright tests, and CI validation before merging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;One common mistake&lt;/strong&gt; is combining watch mode and bind mounts for the same path. If you mount &lt;code&gt;./src:/app/src&lt;/code&gt; and also configure watch to sync &lt;code&gt;./src&lt;/code&gt; to &lt;code&gt;/app/src&lt;/code&gt;, you are doing the same job twice. Pick one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Another mistake&lt;/strong&gt; is using &lt;code&gt;sync&lt;/code&gt; for dependency changes. If &lt;code&gt;package.json&lt;/code&gt; changes, the container needs a rebuild so dependencies are installed correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A third mistake&lt;/strong&gt; is expecting &lt;code&gt;depends_on&lt;/code&gt; to mean a service is ready. It controls startup order, but it does not always guarantee readiness. For databases or APIs, add health checks when the dependent service must be ready before another service starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Docker Compose Watch is one of those features that can quietly improve daily development. It does not change your architecture, and it does not make your AI app smarter. It simply removes friction from the local development loop.&lt;/p&gt;

&lt;p&gt;For Node.js AI apps, that friction matters. Prompt changes, tool schema updates, response parsing fixes, and UI adjustments happen constantly. Rebuilding containers manually after every small change slows down the exact part of development that should feel fast.&lt;/p&gt;

&lt;p&gt;The useful pattern is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run your local AI stack with Docker Compose&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;sync&lt;/code&gt; for source files&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;rebuild&lt;/code&gt; for dependency and build configuration changes&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;sync+restart&lt;/code&gt; when a process cannot hot reload by itself&lt;/li&gt;
&lt;li&gt;Keep Redis, Postgres, and other services running while you iterate on the code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you the best of both worlds: a repeatable containerized environment and a fast local feedback loop. Compose Watch is not only a Docker convenience feature. For AI app development, it can be the difference between experimenting freely and waiting on rebuilds all day.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>node</category>
      <category>typescript</category>
      <category>devops</category>
    </item>
    <item>
      <title>Optimizing Docker Images for TypeScript AI Agents with Dive and Multi-Stage Builds</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Sat, 23 May 2026 17:47:20 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/optimizing-docker-images-for-typescript-ai-agents-with-dive-and-multi-stage-builds-3gho</link>
      <guid>https://dev.to/raju_dandigam/optimizing-docker-images-for-typescript-ai-agents-with-dive-and-multi-stage-builds-3gho</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;TypeScript AI agents can become surprisingly heavy Docker images.&lt;/p&gt;

&lt;p&gt;At first, the service may look small. It is just a Node.js app that calls an LLM, uses a few tools, stores some state, and exposes an API. Then the dependencies start growing. You add the OpenAI SDK, LangChain or another agent framework, Prisma, a database client, Playwright for browser automation, test utilities, TypeScript, build tools, and maybe a few internal packages.&lt;/p&gt;

&lt;p&gt;Before long, the Docker image is much larger than expected. It might still run, but the hidden cost shows up in CI, deployments, registry storage, cold starts, and security scans.&lt;/p&gt;

&lt;p&gt;This article walks through a practical optimization path for a TypeScript AI agent. The goal is not to chase the smallest possible image. The goal is to remove obvious waste, keep the runtime image focused, and use tools like Dive to understand what is actually inside the container.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Image Size Matters for AI Agent Services
&lt;/h2&gt;

&lt;p&gt;Large Docker images are not only a storage problem. They slow down CI pipelines because every build and deployment may need to push or pull hundreds of extra megabytes. They slow down new environments because each new instance needs the image before the app can start. They also increase the security surface because more packages usually mean more things to scan, patch, and maintain.&lt;/p&gt;

&lt;p&gt;For AI agent services, this matters even more because the app often includes tooling that is not needed at runtime. TypeScript compilers, test frameworks, browser binaries, local development utilities, and generated artifacts can accidentally end up in the final image. If the agent only needs to run compiled JavaScript and call APIs, the final image should not include everything used to build, test, and develop the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unoptimized Dockerfile
&lt;/h2&gt;

&lt;p&gt;A common first Dockerfile looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works, but it has several problems.&lt;/p&gt;

&lt;p&gt;It uses the full Node.js base image. It copies the entire project into the image, including files that may not be needed. It installs development dependencies. It keeps TypeScript source, tests, local configuration, and build tools in the same image that runs in production. It also makes Docker layer caching less effective because every source change can invalidate dependency installation.&lt;/p&gt;

&lt;p&gt;For a TypeScript AI agent, that can mean shipping a runtime image that contains testing libraries, Playwright setup files, development-only packages, local documentation, and other files that are not needed once the app is compiled.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Better Mental Model
&lt;/h2&gt;

&lt;p&gt;A production Docker image should answer one question: &lt;strong&gt;what does this service need to run?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a TypeScript AI agent, the runtime usually needs compiled JavaScript, production dependencies, package metadata, environment configuration, and maybe Prisma-generated client files or migration-related assets depending on how you deploy.&lt;/p&gt;

&lt;p&gt;It usually does not need TypeScript compiler dependencies, unit tests, Cypress or Playwright test specs, coverage reports, local &lt;code&gt;.env&lt;/code&gt; files, source maps in some production environments, &lt;code&gt;.git&lt;/code&gt; history, or development caches.&lt;/p&gt;

&lt;p&gt;Multi-stage builds help enforce that separation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimized Multi-Stage Dockerfile
&lt;/h2&gt;

&lt;p&gt;Here is a cleaner Dockerfile for a TypeScript AI agent API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# syntax=docker/dockerfile:1.7&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; node&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a much better starting point.&lt;/p&gt;

&lt;p&gt;The build stage installs all dependencies and compiles the TypeScript code. The runtime stage starts fresh, installs only production dependencies, and copies only the compiled output from the build stage. The final image does not include the TypeScript compiler, test files, or most development tooling.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;node:22-slim&lt;/code&gt; base image keeps broad compatibility while avoiding the size of the full Node.js image. Alpine can be smaller, but it can introduce compatibility issues with native dependencies. Many TypeScript AI apps use packages that depend on native modules, database clients, or browser-related libraries, so &lt;code&gt;slim&lt;/code&gt; is often the safer first optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add a Proper .dockerignore
&lt;/h2&gt;

&lt;p&gt;The Dockerfile is only part of the optimization. The build context also matters. If Docker receives your entire repository as context, you may accidentally copy unnecessary files into the image or slow down builds.&lt;/p&gt;

&lt;p&gt;A basic &lt;code&gt;.dockerignore&lt;/code&gt; for this kind of project can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;node_modules&lt;/span&gt;
&lt;span class="n"&gt;dist&lt;/span&gt;
&lt;span class="n"&gt;coverage&lt;/span&gt;
&lt;span class="n"&gt;playwright&lt;/span&gt;-&lt;span class="n"&gt;report&lt;/span&gt;
&lt;span class="n"&gt;cypress&lt;/span&gt;/&lt;span class="n"&gt;videos&lt;/span&gt;
&lt;span class="n"&gt;cypress&lt;/span&gt;/&lt;span class="n"&gt;screenshots&lt;/span&gt;
.&lt;span class="n"&gt;git&lt;/span&gt;
.&lt;span class="n"&gt;github&lt;/span&gt;
.&lt;span class="n"&gt;env&lt;/span&gt;
.&lt;span class="n"&gt;env&lt;/span&gt;.*
.&lt;span class="n"&gt;npmrc&lt;/span&gt;
*.&lt;span class="n"&gt;log&lt;/span&gt;
&lt;span class="n"&gt;README&lt;/span&gt;.&lt;span class="n"&gt;md&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt;
&lt;span class="n"&gt;tests&lt;/span&gt;
&lt;span class="err"&gt;__&lt;/span&gt;&lt;span class="n"&gt;tests__&lt;/span&gt;
*.&lt;span class="n"&gt;spec&lt;/span&gt;.&lt;span class="n"&gt;ts&lt;/span&gt;
*.&lt;span class="n"&gt;test&lt;/span&gt;.&lt;span class="n"&gt;ts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be careful with this file. Do not ignore files that your build actually needs. For example, if your app needs Prisma schema files during build, include them intentionally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; prisma ./prisma&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npx prisma generate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is to be deliberate. A Docker image should not receive the entire repository just because &lt;code&gt;COPY . .&lt;/code&gt; was easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Dive Helps
&lt;/h2&gt;

&lt;p&gt;Multi-stage builds are useful, but they do not tell you exactly what is inside the image. Dive helps with that.&lt;/p&gt;

&lt;p&gt;Dive is an open-source tool for exploring Docker images layer by layer. It shows which command created each layer, which files were added or changed, and where wasted space may exist. This makes it easier to see whether your image still contains unexpected files such as test reports, cached package data, source files, or large browser binaries.&lt;/p&gt;

&lt;p&gt;Install Dive locally, then analyze the image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;dive
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; ai-agent-api:optimized &lt;span class="nb"&gt;.&lt;/span&gt;
dive ai-agent-api:optimized
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you open the image in Dive, look for a few things. Check which layers are the largest. Look for files that should not exist in production. Verify that the final image does not include your test folders, local &lt;code&gt;.env&lt;/code&gt; files, coverage reports, or source repository metadata. Check whether &lt;code&gt;node_modules&lt;/code&gt; includes only production dependencies. Look at the efficiency score, but do not treat it as the only goal.&lt;/p&gt;

&lt;p&gt;The best use of Dive is not to obsess over every kilobyte. It is to make the image visible. Once you can see the layers, waste becomes much easier to remove.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: AI Agent with Playwright
&lt;/h2&gt;

&lt;p&gt;Browser automation is common in agent workflows. A support agent may open a web page, a QA agent may validate a flow, or a research agent may inspect a site. Playwright is powerful, but it can also make images much larger because browsers and system dependencies are heavy.&lt;/p&gt;

&lt;p&gt;The important question is whether Playwright is needed in the same runtime image as your API.&lt;/p&gt;

&lt;p&gt;If browser automation is only used in tests, do not include it in the production image. Keep Playwright in a separate test image or CI step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;

  &lt;span class="na"&gt;playwright-tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/playwright:v1.56.1-noble&lt;/span&gt;
    &lt;span class="na"&gt;working_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./:/app&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sh -c "npm ci &amp;amp;&amp;amp; npx playwright test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the agent truly needs browser automation at runtime, consider isolating that capability into a separate browser worker service instead of bloating the main API image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9enwoalzw07c9fkiqvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9enwoalzw07c9fkiqvj.png" alt=" " width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This separation keeps the main API smaller and makes the browser automation boundary more explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safer Dependency Installation
&lt;/h2&gt;

&lt;p&gt;For production images, prefer &lt;code&gt;npm ci&lt;/code&gt; over &lt;code&gt;npm install&lt;/code&gt;. It uses the lockfile and gives more reproducible installs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your project uses pnpm, the same idea applies. Install from the lockfile and avoid shipping development dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;corepack &lt;span class="nb"&gt;enable&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pnpm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--frozen-lockfile&lt;/span&gt; &lt;span class="nt"&gt;--prod&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also avoid installing packages at runtime. An AI agent should not dynamically install npm packages in production unless you have a very controlled sandbox and a strong reason. Runtime package installation makes the supply chain harder to scan and the image harder to reproduce.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and After Pattern
&lt;/h2&gt;

&lt;p&gt;The unoptimized version often looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The optimized version should look more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; node&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact size reduction will depend on your project. A small API may only save a few hundred megabytes. An AI agent with browser tooling, dev dependencies, generated reports, and cached files may see a much larger improvement. The "900MB to 150MB" story is realistic for some messy Node.js images, but it should be treated as an example, not a promise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add Image Checks to CI
&lt;/h2&gt;

&lt;p&gt;Dive can also run in CI mode with thresholds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true &lt;/span&gt;dive ai-agent-api:optimized &lt;span class="nt"&gt;--ci-config&lt;/span&gt; .dive-ci
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simple &lt;code&gt;.dive-ci&lt;/code&gt; file can enforce a minimum efficiency score:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efficiency&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efficiency&lt;/span&gt;
    &lt;span class="na"&gt;operation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;="&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.90&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should not be your only quality gate, but it can catch obvious regressions. For example, if someone accidentally copies Playwright reports, &lt;code&gt;.git&lt;/code&gt;, or local datasets into the image, the image size and efficiency score may change enough to fail the check.&lt;/p&gt;

&lt;p&gt;You can also add a simple image size check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker image inspect ai-agent-api:optimized &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{{.Size}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In mature teams, image optimization becomes part of the same quality loop as tests, linting, and vulnerability scanning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Trade-offs
&lt;/h2&gt;

&lt;p&gt;Optimization has trade-offs.&lt;/p&gt;

&lt;p&gt;The smallest image is not always the best image. Alpine images are smaller, but native Node.js packages may require extra work. Distroless images reduce attack surface, but they are harder to debug because they do not include a shell. Aggressive cleanup can make troubleshooting painful. Multi-stage builds improve runtime images, but they may add complexity to the Dockerfile.&lt;/p&gt;

&lt;p&gt;For most TypeScript AI agents, the best starting point is simple: use a slim base image, use multi-stage builds, exclude unnecessary files, install only production dependencies, run as a non-root user, and inspect the result with Dive.&lt;/p&gt;

&lt;p&gt;That will usually get you most of the benefit without turning the Dockerfile into a maintenance burden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Docker image optimization is not just a performance trick. For TypeScript AI agents, it is part of reliability and security.&lt;/p&gt;

&lt;p&gt;A smaller image pulls faster, deploys faster, scans faster, and usually contains fewer unnecessary files. A cleaner image also makes it easier to understand what your AI service is actually shipping. That matters when the application has access to model APIs, tools, browsers, databases, and credentials.&lt;/p&gt;

&lt;p&gt;Start with the obvious improvements. Replace the one-stage Dockerfile with a multi-stage build. Use &lt;code&gt;node:slim&lt;/code&gt; instead of the full image. Add a careful &lt;code&gt;.dockerignore&lt;/code&gt;. Install only production dependencies in the final stage. Keep Playwright and other heavy tooling out of the main runtime image unless the agent truly needs them. Then use Dive to inspect the image instead of guessing.&lt;/p&gt;

&lt;p&gt;The goal is not to win a smallest-image contest. The goal is to ship a focused, understandable, production-ready image for your AI agent.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>typescript</category>
      <category>optimization</category>
      <category>node</category>
    </item>
    <item>
      <title>Your AI Agent Has a Supply Chain: Securing Node.js Apps with Docker Hardened Images</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Wed, 20 May 2026 23:07:42 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/your-ai-agent-has-a-supply-chain-securing-nodejs-apps-with-docker-hardened-images-1ede</link>
      <guid>https://dev.to/raju_dandigam/your-ai-agent-has-a-supply-chain-securing-nodejs-apps-with-docker-hardened-images-1ede</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;AI applications often look small from the outside. A Node.js service calls a model, connects to a few tools, stores some state, and returns a response. The codebase may be much smaller than a traditional enterprise application.&lt;/p&gt;

&lt;p&gt;The security surface is not small.&lt;/p&gt;

&lt;p&gt;A modern Node.js AI app may use model provider APIs, MCP servers, browser automation, Redis or Postgres, private npm packages, GitHub tokens, internal APIs, and local files. An agent may read repository code, open a browser, inspect logs, summarize customer data, or call tools that perform real actions. That means the container running the app is not just serving HTTP traffic. It is sitting near credentials, tools, data, and execution paths.&lt;/p&gt;

&lt;p&gt;This is why the Docker image matters. The base image, dependency install process, runtime user, filesystem permissions, SBOM, vulnerability scanning, and secret handling are all part of the AI application architecture.&lt;/p&gt;

&lt;p&gt;Many AI tutorials skip this layer. They show how to call a model, build an agent loop, or connect a tool. In production, the question is different: what exactly are we shipping, where did it come from, what can it access, and how much damage can it do if compromised?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Apps Increase Supply Chain Risk
&lt;/h2&gt;

&lt;p&gt;Traditional Node.js applications already have supply chain risk. They depend on npm packages, operating system packages, base images, CI pipelines, and deployment configuration. AI applications add more moving pieces.&lt;/p&gt;

&lt;p&gt;An AI agent may use MCP servers as tool adapters. Each MCP server has its own dependencies, permissions, and credentials. A local LLM workflow may pull model artifacts from registries. A browser automation tool may bring large system dependencies. A code-review agent may need GitHub access. A support assistant may access customer-like data. A test-generation agent may read and write files.&lt;/p&gt;

&lt;p&gt;The application may still be "just a Node.js service," but the dependency graph is much wider than it looks.&lt;/p&gt;

&lt;p&gt;Docker's 2026 supply chain write-up on Trivy and KICS is a useful reminder of the risk. Docker described two incidents where stolen publisher credentials were used to push malicious images through legitimate publishing flows. Docker stated that its infrastructure was not breached, but anyone who pulled the compromised tags was temporarily exposed through the software supply chain.&lt;/p&gt;

&lt;p&gt;That story matters for AI apps because agents often rely on tools they did not build. A compromised image, package, MCP server, or build step can become a path to credentials, source code, cloud systems, or sensitive data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Risky Dockerfile
&lt;/h2&gt;

&lt;p&gt;A common Dockerfile for a Node.js AI app may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:latest&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; OPENAI_API_KEY=sk-example&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; GITHUB_TOKEN=ghp-example&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["npm", "start"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Dockerfile has several problems.&lt;/p&gt;

&lt;p&gt;It uses &lt;code&gt;node:latest&lt;/code&gt;, which can change over time and make builds less predictable. It copies the entire local directory into the image, which may accidentally include &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.npmrc&lt;/code&gt;, test artifacts, or local files. It uses &lt;code&gt;npm install&lt;/code&gt; instead of a lockfile-based install. It bakes secrets into the image through environment variables. It runs as the default user. It does not separate build dependencies from runtime dependencies.&lt;/p&gt;

&lt;p&gt;For a demo, this might work. For an AI app with access to tools and credentials, it is too loose.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Safer Dockerfile Pattern
&lt;/h2&gt;

&lt;p&gt;A better pattern uses a specific base image, a multi-stage build, production-only dependencies, a non-root user, and no embedded secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# syntax=docker/dockerfile:1.7&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;groupadd &lt;span class="nt"&gt;--system&lt;/span&gt; appgroup &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; useradd &lt;span class="nt"&gt;--system&lt;/span&gt; &lt;span class="nt"&gt;--gid&lt;/span&gt; appgroup &lt;span class="nt"&gt;--home&lt;/span&gt; /app appuser &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; appuser:appgroup /app

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; appuser&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already much safer. It pins the Node major version instead of using &lt;code&gt;latest&lt;/code&gt;. It uses &lt;code&gt;npm ci&lt;/code&gt; for reproducible dependency installation. It keeps build tooling out of the runtime stage. It installs only production dependencies in the final image. It runs as a non-root user.&lt;/p&gt;

&lt;p&gt;This is not perfect security, but it is a better foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Docker Hardened Images Fit
&lt;/h2&gt;

&lt;p&gt;Docker Hardened Images take the base-image part of this problem further. Docker describes Docker Hardened Images as secure, minimal, production-ready images, and in December 2025 Docker announced that they were made free and open source under the Apache 2.0 license. Docker also stated that it had hardened more than 1,000 images and Helm charts in the catalog.&lt;/p&gt;

&lt;p&gt;The key idea is that the base image should not be an afterthought. A hardened image reduces unnecessary packages, narrows the attack surface, and gives teams a stronger starting point than a general-purpose image.&lt;/p&gt;

&lt;p&gt;The Dockerfile pattern stays mostly the same. The base image changes to the hardened equivalent available in your registry and organization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-hardened-node-image&amp;gt;@sha256:&amp;lt;digest&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tsconfig.json ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-hardened-node-image&amp;gt;@sha256:&amp;lt;digest&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 10001&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact image name depends on how your team accesses Docker Hardened Images and how your registry is configured. The important habits are to use a trusted base image, avoid &lt;code&gt;latest&lt;/code&gt;, and pin the image by digest when you need stronger reproducibility.&lt;/p&gt;

&lt;p&gt;Docker's Hardened Images product page also emphasizes drop-in replacements, continuous updates, secure customization, and provenance-preserving workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do Not Bake Secrets Into the Image
&lt;/h2&gt;

&lt;p&gt;AI apps usually need secrets, but the image should not contain them.&lt;/p&gt;

&lt;p&gt;Model provider keys, GitHub tokens, MCP credentials, database passwords, and OAuth tokens should be provided at runtime through your deployment environment or secret manager. They should not appear in the Dockerfile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node-ai-agent:secure&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
      &lt;span class="na"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${GITHUB_TOKEN}&lt;/span&gt;
      &lt;span class="na"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${DATABASE_URL}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is acceptable for local development when values come from a local &lt;code&gt;.env&lt;/code&gt; file that is not committed. In production, prefer a managed secret system such as Kubernetes Secrets, AWS Secrets Manager, Google Secret Manager, Azure Key Vault, HashiCorp Vault, or your platform's built-in secret store.&lt;/p&gt;

&lt;p&gt;Build-time secrets are different. If the image build needs temporary access to a private npm package or private Git repository, use Docker Build Secrets or SSH mounts instead of &lt;code&gt;ARG&lt;/code&gt; or &lt;code&gt;ENV&lt;/code&gt;. Docker's build secrets documentation explains that secret mounts and SSH mounts are designed for sensitive data needed during the build, and that the process has two steps: pass the secret to &lt;code&gt;docker build&lt;/code&gt;, then consume it in the Dockerfile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;secret,id&lt;span class="o"&gt;=&lt;/span&gt;npm_token &lt;span class="se"&gt;\
&lt;/span&gt;  npm config &lt;span class="nb"&gt;set&lt;/span&gt; //registry.npmjs.org/:_authToken&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /run/secrets/npm_token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm ci &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm config delete //registry.npmjs.org/:_authToken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then build with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;npm_token,env&lt;span class="o"&gt;=&lt;/span&gt;NPM_TOKEN &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; node-ai-agent:secure &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker's GitHub Actions documentation also supports secret mounts and SSH mounts in CI, where secret mounts appear as files under &lt;code&gt;/run/secrets&lt;/code&gt; by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add Runtime Guardrails
&lt;/h2&gt;

&lt;p&gt;A secure image is only one layer. The container should also run with limited permissions.&lt;/p&gt;

&lt;p&gt;For a Node.js AI app, a practical Compose configuration may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node-ai-agent:secure&lt;/span&gt;
    &lt;span class="na"&gt;read_only&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;tmpfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/tmp&lt;/span&gt;
    &lt;span class="na"&gt;cap_drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ALL&lt;/span&gt;
    &lt;span class="na"&gt;security_opt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;no-new-privileges:true&lt;/span&gt;
    &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1g&lt;/span&gt;
    &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
      &lt;span class="na"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${DATABASE_URL}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A read-only filesystem prevents the app from writing to unexpected places. A temporary filesystem gives it a safe place for temporary files. Dropping Linux capabilities reduces what the container can do. &lt;code&gt;no-new-privileges&lt;/code&gt; prevents privilege escalation. CPU and memory limits reduce the blast radius of a bad loop, runaway browser process, or unexpected agent behavior.&lt;/p&gt;

&lt;p&gt;These settings may need adjustment. Browser automation, file-processing tools, and some native dependencies may require additional permissions or writable directories. The goal is not to blindly copy every restriction. The goal is to start restrictive and open only what the application actually needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolate Tooling and Data Paths
&lt;/h2&gt;

&lt;p&gt;AI agents often call tools. Those tools should not all run with the same permissions.&lt;/p&gt;

&lt;p&gt;A GitHub MCP server may need network access to GitHub but should not need write access to the local filesystem. A filesystem tool may need read access to a specific workspace but should not see the entire host machine. A browser automation tool may need temporary writable space but should not need database credentials.&lt;/p&gt;

&lt;p&gt;A simple architecture separates the app, tools, and data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz3bbhmgvg7d77f0zr3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz3bbhmgvg7d77f0zr3e.png" alt=" " width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This separation matters because compromise should not mean total access. If the browser tool is compromised, it should not automatically get database credentials. If the filesystem tool is compromised, it should not automatically get a model provider key. If the GitHub tool is compromised, it should have the smallest useful token scope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use SBOMs and Scanning
&lt;/h2&gt;

&lt;p&gt;You cannot secure what you cannot see. A Software Bill of Materials, or SBOM, lists the components inside your image. Docker Scout uses SBOMs to understand the components in an image and cross-reference them with vulnerability data. Docker's Scout documentation describes it as a supply chain security solution that analyzes images, creates an inventory of components, and matches that inventory against vulnerability databases.&lt;/p&gt;

&lt;p&gt;A basic scan can be part of your CI workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Security Scan&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker build -t node-ai-agent:ci .&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Scan image with Docker Scout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node-ai-agent:ci&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cves&lt;/span&gt;
          &lt;span class="na"&gt;only-severities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical,high&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker Scout's image analysis documentation explains that image analysis extracts the SBOM and image metadata, then evaluates it against vulnerability data from security advisories.&lt;/p&gt;

&lt;p&gt;Scanning does not prove an image is safe. It gives you visibility. That visibility is especially important for AI apps because the dependency surface includes npm packages, system packages, browser dependencies, tool servers, and sometimes model artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pin What Matters
&lt;/h2&gt;

&lt;p&gt;Tags are convenient, but they can move. For production images, pin important base images by digest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22-slim@sha256:&amp;lt;digest&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes builds more predictable because the digest identifies a specific image. You can still update regularly, but updates become intentional instead of accidental.&lt;/p&gt;

&lt;p&gt;The same thinking applies to MCP server images and other tool containers. Avoid pulling random &lt;code&gt;latest&lt;/code&gt; images in production workflows. Use a known version or digest. Review the source. Keep an update process.&lt;/p&gt;

&lt;p&gt;This is especially important for agent systems because tools can perform actions. A compromised or unexpectedly changed tool image is more dangerous than a broken static asset build.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Security Checklist
&lt;/h2&gt;

&lt;p&gt;Before shipping a Node.js AI app in Docker, check the basics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a trusted base image, preferably a hardened image when available&lt;/li&gt;
&lt;li&gt;Avoid &lt;code&gt;latest&lt;/code&gt; for production and pin critical images by digest&lt;/li&gt;
&lt;li&gt;Use a multi-stage Dockerfile so build tools do not ship in the runtime image&lt;/li&gt;
&lt;li&gt;Install dependencies with &lt;code&gt;npm ci&lt;/code&gt; and ship only production dependencies&lt;/li&gt;
&lt;li&gt;Run the container as a non-root user&lt;/li&gt;
&lt;li&gt;Do not copy &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.npmrc&lt;/code&gt;, local logs, test reports, or unnecessary files into the image&lt;/li&gt;
&lt;li&gt;Use Docker Build Secrets for private package installation&lt;/li&gt;
&lt;li&gt;Pass model provider keys, GitHub tokens, database credentials, and MCP credentials at runtime through a secret manager&lt;/li&gt;
&lt;li&gt;Run image scanning in CI and review high and critical findings&lt;/li&gt;
&lt;li&gt;Use read-only filesystems, dropped capabilities, resource limits, and narrow network access where possible&lt;/li&gt;
&lt;li&gt;Give every MCP server and tool the smallest useful permission set&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This checklist is not glamorous, but it is the work that makes AI systems safer to operate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Your AI agent has a supply chain. It starts with the base image, continues through npm packages and build steps, extends into MCP servers and browser tools, and reaches all the way to model artifacts, credentials, and runtime permissions.&lt;/p&gt;

&lt;p&gt;Docker gives Node.js teams practical controls for this problem. Docker Hardened Images provide a stronger starting point. Multi-stage builds reduce what ships. Build secrets keep private tokens out of image layers. Runtime restrictions limit damage. SBOMs and Docker Scout improve visibility. Digest pinning makes updates intentional.&lt;/p&gt;

&lt;p&gt;None of this makes an AI app perfectly secure. It does create defense in depth. That matters because AI agents are not passive services. They read, call, summarize, browse, and sometimes act. The more capable the agent becomes, the more important its container boundary becomes.&lt;/p&gt;

&lt;p&gt;A good AI architecture is not only about prompts, tools, and models. It is also about what the application is allowed to run, what it is allowed to access, where its dependencies came from, and what happens when something goes wrong.&lt;/p&gt;

&lt;p&gt;Secure the supply chain before the agent becomes part of someone else's.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>Docker as the Safety Net for AI-Generated Frontend Code</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Sat, 16 May 2026 14:36:09 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/docker-as-the-safety-net-for-ai-generated-frontend-code-dmg</link>
      <guid>https://dev.to/raju_dandigam/docker-as-the-safety-net-for-ai-generated-frontend-code-dmg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;AI coding assistants can generate React components, Next.js pages, test files, form handlers, and TypeScript utilities very quickly. That speed is useful, but it also creates a new problem for frontend teams. The code may compile, pass linting, and look reasonable in a pull request, but still fail when a user clicks through the actual flow.&lt;/p&gt;

&lt;p&gt;Frontend code is full of small runtime details that are easy to miss. A generated component may not handle empty states. A form may work with happy-path data but fail when the API returns an error. A modal may render correctly but break keyboard navigation. A layout may look fine on desktop and collapse on mobile. A test may pass on one developer's laptop and fail in CI because the browser or system dependencies are different.&lt;/p&gt;

&lt;p&gt;This is where Docker becomes valuable. Docker does not make AI-generated code correct. It gives teams a repeatable place to verify that code. When Cypress or Playwright tests run inside Docker, the browser dependencies, Node.js version, operating system libraries, and test environment become more consistent across local development and CI.&lt;/p&gt;

&lt;p&gt;The goal is not fully autonomous testing. The healthier pattern is &lt;strong&gt;supervised automation&lt;/strong&gt;. Let AI tools generate or modify code. Run that code in a controlled Docker environment. Use Cypress or Playwright to validate important flows. Then let a human review the code with better evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trust Gap in AI-Generated UI Code
&lt;/h2&gt;

&lt;p&gt;AI-generated frontend code often looks convincing because it follows familiar patterns. It can produce a clean React component, use TypeScript interfaces, add Tailwind classes, and wire up a simple event handler. But correctness in frontend applications is not only about syntax.&lt;/p&gt;

&lt;p&gt;A real user flow depends on rendering, browser behavior, routing, network calls, state updates, accessibility, responsive layout, and integration with the rest of the application. These are exactly the areas where generated code needs verification.&lt;/p&gt;

&lt;p&gt;For example, an AI assistant might generate a profile component like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;UserProfileProps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;avatarUrl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;UserProfile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;avatarUrl&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;UserProfileProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;section&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;testid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user-profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;avatarUrl&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt; &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;avatarUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;alt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; avatar`&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/h2&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/section&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The component is simple and probably fine. But several questions remain. What happens when &lt;code&gt;name&lt;/code&gt; is empty? Is the avatar accessible enough? Does the component render properly in the page where it is used? Does the route load the expected data? Does the mobile layout still work? Does an existing test flow break?&lt;/p&gt;

&lt;p&gt;Static checks cannot answer all of those questions. Browser tests can.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Docker Belongs in the Testing Workflow
&lt;/h2&gt;

&lt;p&gt;Cypress and Playwright already solve the browser automation problem. Docker solves the environment problem.&lt;/p&gt;

&lt;p&gt;Cypress maintains Docker images that include the operating system dependencies needed to run Cypress in containers, with different image options depending on whether you want Cypress and browsers preinstalled or want to install Cypress yourself. The Cypress CI documentation also covers Docker images, CI setup, caching, environment variables, and parallel execution.&lt;/p&gt;

&lt;p&gt;Playwright also provides official Docker guidance. Its Docker documentation explains that the Playwright image includes browser system dependencies and browser binaries, while the Playwright package itself should be installed in your project. Playwright's Docker image is intended for CI and other Docker-supported environments.&lt;/p&gt;

&lt;p&gt;That consistency matters when reviewing AI-generated changes. If a test fails, you want the failure to be about the application, not a missing browser dependency or a local machine difference.&lt;/p&gt;

&lt;p&gt;Here is the workflow in one view:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k3yfx4pfjrps6dc7apf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k3yfx4pfjrps6dc7apf.png" alt=" " width="800" height="1235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The important part is the loop. AI speeds up generation. Docker and browser tests slow the process down just enough to make it safer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Docker Compose Setup
&lt;/h2&gt;

&lt;p&gt;A practical setup can use one service for the application and one service for the browser tests. The test container talks to the app container through Docker's internal network.&lt;/p&gt;

&lt;p&gt;Here is a simple Compose file for a React or Next.js application with Playwright:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run start&lt;/span&gt;

  &lt;span class="na"&gt;playwright&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/playwright:v1.56.1-noble&lt;/span&gt;
    &lt;span class="na"&gt;working_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;PLAYWRIGHT_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://app:3000&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./:/app&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sh -c "npm ci &amp;amp;&amp;amp; npx playwright test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup keeps the example intentionally simple. The &lt;code&gt;app&lt;/code&gt; service starts your application. The &lt;code&gt;playwright&lt;/code&gt; service runs tests against &lt;code&gt;http://app:3000&lt;/code&gt;, which works because both services are on the same Docker Compose network.&lt;/p&gt;

&lt;p&gt;For real projects, you should also make sure the test runner waits until the application is actually ready. &lt;code&gt;depends_on&lt;/code&gt; controls startup order, but it does not automatically prove the application is ready to accept HTTP requests unless you use health checks. Docker's Compose documentation explains that Compose can wait for dependencies marked with &lt;code&gt;service_healthy&lt;/code&gt; when a health check is defined.&lt;/p&gt;

&lt;p&gt;A more reliable version adds a health check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run start&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

  &lt;span class="na"&gt;playwright&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/playwright:v1.56.1-noble&lt;/span&gt;
    &lt;span class="na"&gt;working_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;PLAYWRIGHT_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://app:3000&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./:/app&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sh -c "npm ci &amp;amp;&amp;amp; npx playwright test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This avoids a common source of flaky tests: the test runner starts before the app is ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing an AI-Generated Component with Playwright
&lt;/h2&gt;

&lt;p&gt;Assume the AI assistant generated the &lt;code&gt;UserProfile&lt;/code&gt; component and a page renders it at &lt;code&gt;/profile&lt;/code&gt;. A small Playwright test can verify the behavior that matters to users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@playwright/test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;profile page displays the user information&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByTestId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user-profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeVisible&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;heading&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Jane Doe&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nf"&gt;toBeVisible&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jane@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeVisible&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;profile page works on a mobile viewport&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setViewportSize&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;390&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;844&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByTestId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user-profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeVisible&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jane@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeVisible&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test does not try to prove everything. It validates the page from the user's point of view. The profile exists, the key information is visible, and the page still works on a mobile-sized viewport.&lt;/p&gt;

&lt;p&gt;You can run it through Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose run &lt;span class="nt"&gt;--rm&lt;/span&gt; playwright
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the generated component breaks the route, fails to render expected content, or behaves differently inside the containerized browser environment, the test gives you a clear signal before the code reaches production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Same Pattern with Cypress
&lt;/h2&gt;

&lt;p&gt;Some teams prefer Cypress because of its developer experience, debugging flow, dashboard features, or existing test suite. The Docker pattern is similar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run start&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

  &lt;span class="na"&gt;cypress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cypress/included:15.7.0&lt;/span&gt;
    &lt;span class="na"&gt;working_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/e2e&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CYPRESS_baseUrl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://app:3000&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./:/e2e&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;--browser chrome&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A Cypress test for the same page can stay simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Profile page&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;shows user information&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[data-testid='user-profile']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Jane Doe&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jane@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;works on mobile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;390&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;844&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[data-testid='user-profile']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jane@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose run &lt;span class="nt"&gt;--rm&lt;/span&gt; cypress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact image tag should match your project and CI strategy. The broader point is that Cypress and Playwright both have strong Docker support, so teams do not need to invent a custom browser environment from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Docker as a Sandbox for AI Changes
&lt;/h2&gt;

&lt;p&gt;Testing is one part of the value. Isolation is another.&lt;/p&gt;

&lt;p&gt;When an AI assistant changes code, especially in a larger repository, you may not fully understand the consequences immediately. Docker gives you a controlled environment to build and run the application without depending too much on the developer's machine.&lt;/p&gt;

&lt;p&gt;For a safer local test environment, you can add basic constraints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;read_only&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;tmpfs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/tmp&lt;/span&gt;
    &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;768m&lt;/span&gt;
    &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These settings are not a complete security sandbox, but they reduce accidental damage. A read-only filesystem limits where the process can write. CPU and memory limits reduce the impact of runaway behavior. A temporary &lt;code&gt;/tmp&lt;/code&gt; gives the app space for normal temporary files without opening the whole container filesystem.&lt;/p&gt;

&lt;p&gt;For frontend validation, the goal is usually not to run completely untrusted code. The goal is to avoid letting generated code run directly against a developer's full local environment before there is some basic confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI for Pull Requests
&lt;/h2&gt;

&lt;p&gt;The best place to apply this pattern is the pull request. AI-generated code should not get a lighter path to merge just because it was generated quickly. If anything, it needs visible validation.&lt;/p&gt;

&lt;p&gt;Here is a simple GitHub Actions workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Frontend E2E Tests&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pages/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cypress/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docker-compose.yml"&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;playwright&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build app image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker compose build app&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Playwright tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker compose run --rm playwright&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload Playwright report&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;playwright-report&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;playwright-report&lt;/span&gt;

  &lt;span class="na"&gt;cypress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build app image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker compose build app&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Cypress tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker compose run --rm cypress&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload Cypress artifacts&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cypress-artifacts&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;cypress/screenshots&lt;/span&gt;
            &lt;span class="s"&gt;cypress/videos&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may not need to run both Cypress and Playwright in every project. Many teams should choose one primary browser testing framework and use it well. I included both here because many organizations already have Cypress suites while newer projects may prefer Playwright for cross-browser coverage and traces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging Failures
&lt;/h2&gt;

&lt;p&gt;One reason browser tests are valuable for AI-generated changes is that they provide evidence. A failed test is not just a red checkmark. It can include screenshots, videos, traces, console logs, and network details.&lt;/p&gt;

&lt;p&gt;Cypress can record screenshots and videos for failed runs, depending on configuration. Playwright can produce traces that show actions, DOM snapshots, network requests, console logs, and screenshots. These artifacts make it easier to review AI-generated changes because the reviewer can see how the application behaved, not just read the diff.&lt;/p&gt;

&lt;p&gt;A useful review comment is not &lt;em&gt;"AI broke the page."&lt;/em&gt; A useful review comment is &lt;em&gt;"the generated profile component removed the empty-state branch, and the Playwright trace shows the mobile profile page rendering a blank card when the user has no avatar."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is the kind of feedback loop teams need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Guidelines
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do not try to test every generated line of code with an end-to-end test.&lt;/strong&gt; That will slow the team down and create brittle suites. Focus on user-facing flows and integration points.&lt;/p&gt;

&lt;p&gt;Use unit tests for pure functions, component tests for isolated UI behavior, and Cypress or Playwright for complete flows. Docker is most useful for the tests where environment consistency matters: browser tests, integration tests, and workflows that depend on app services.&lt;/p&gt;

&lt;p&gt;Keep the test environment close to production, but not identical at all costs. A test container should be realistic enough to catch meaningful issues and simple enough that developers can run it repeatedly.&lt;/p&gt;

&lt;p&gt;Avoid giving AI-generated code direct access to sensitive local files, broad credentials, or production services during validation. Use test credentials, local services, and constrained containers.&lt;/p&gt;

&lt;p&gt;Most importantly, &lt;strong&gt;keep a human in the loop.&lt;/strong&gt; Docker and browser tests can tell you whether important behavior still works. They cannot decide whether the generated code is maintainable, aligned with product intent, accessible enough, or architecturally appropriate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI coding tools make frontend development faster, but faster code generation needs stronger verification. A React component that compiles is not automatically safe to merge. A generated page that looks good in a diff still needs to work in a browser, with real routing, layout, user interactions, and error states.&lt;/p&gt;

&lt;p&gt;Docker gives teams a repeatable environment for that verification. Cypress and Playwright provide the browser automation. Together, they create a practical safety net for AI-generated frontend code.&lt;/p&gt;

&lt;p&gt;The pattern is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let the AI tool propose the change&lt;/li&gt;
&lt;li&gt;Start the app in Docker&lt;/li&gt;
&lt;li&gt;Run Cypress or Playwright in a container&lt;/li&gt;
&lt;li&gt;Capture screenshots, videos, or traces when something fails&lt;/li&gt;
&lt;li&gt;Let a human review the code with evidence instead of guesswork&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the right balance for 2026. Do not blindly trust generated code, and do not reject useful AI assistance out of fear. Put the code in a container, test the behavior, review the result, and merge only when the evidence supports it.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>testing</category>
      <category>frontend</category>
      <category>ai</category>
    </item>
    <item>
      <title>Testing AI-Generated Node.js Code with Real Dependencies using Docker and Test containers</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Wed, 13 May 2026 04:56:39 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/testing-ai-generated-nodejs-code-with-real-dependencies-using-docker-and-test-containers-4nee</link>
      <guid>https://dev.to/raju_dandigam/testing-ai-generated-nodejs-code-with-real-dependencies-using-docker-and-test-containers-4nee</guid>
      <description>&lt;p&gt;AI coding tools are becoming part of everyday software development. They can generate API routes, database queries, validation logic, repository classes, test cases, and even Dockerfiles in seconds. That speed is useful, but it also creates a new kind of risk. The generated code may look correct, pass a few mocked tests, and still fail when it meets a real database, a real cache, a real message queue, or a real browser workflow.&lt;/p&gt;

&lt;p&gt;This is where many teams start feeling the weakness of mock-heavy testing. Mocks are fast, but they often test our assumptions instead of the actual behavior of the system. A mocked PostgreSQL client will return exactly what we tell it to return. It will not surprise us with a unique constraint violation, a transaction rollback issue, a timestamp behavior difference, a case-sensitive query problem, or a connection pooling edge case. Real systems behave with more friction, and good integration tests should include some of that friction.&lt;/p&gt;

&lt;p&gt;Test containers helps solve this problem by starting real dependencies in Docker containers during test execution. Instead of mocking PostgreSQL, Redis, MongoDB, LocalStack, or another service, your test can start a short-lived container, connect your Node.js application to it, run the test, and clean everything up afterward. The Node.js implementation of test containers is designed for this kind of workflow, and the project describes it as a way to run lightweight, throwaway instances of common databases, Selenium browsers, or anything else that can run in Docker.&lt;/p&gt;

&lt;p&gt;The idea is simple, but the impact is significant. When AI generates or modifies backend code, test containers gives you a safer way to verify whether that code works against real infrastructure behavior. It does not replace unit tests, and it does not remove the need for code review. Instead, it adds a confidence layer between “the code looks fine” and “this is safe enough to merge.”&lt;/p&gt;

&lt;p&gt;Here is the testing flow in one view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9zgkqt0dab8sqmf3vb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9zgkqt0dab8sqmf3vb9.png" alt=" " width="800" height="1456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A typical mocked test might look clean, but it can hide important behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;createUser&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;creates a user and returns an id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;mockResolvedValue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;demo@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hashed-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test is useful for checking that your function handles a successful response, but it does not prove that your SQL works. It does not verify that the users table exists, that the email column has a unique constraint, that the query returns the shape you expect, or that PostgreSQL handles your data types the way your mock suggests. If an AI assistant generated the SQL, this test may give you false confidence.&lt;/p&gt;

&lt;p&gt;A better integration test uses a real PostgreSQL container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@testcontainers/postgresql&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vitest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`
      INSERT INTO users (email, password_hash)
      VALUES ($1, $2)
      RETURNING id, email
    `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;createUser integration test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postgres:16-alpine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;app_test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withUsername&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test_user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withPassword&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test_password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getHost&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUsername&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPassword&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        password_hash VARCHAR(255) NOT NULL,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;creates a user and returns the generated id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;demo@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hashed-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeGreaterThan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;demo@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fails when the email already exists&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;duplicate@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;first-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;duplicate@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;second-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;rejects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toThrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/duplicate key value violates unique constraint/i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test does something the mock cannot do. It proves that the table definition, SQL statement, unique constraint, PostgreSQL behavior, and TypeScript application code work together. That matters even more when some of the code was generated or refactored by an AI tool.&lt;/p&gt;

&lt;p&gt;The same idea applies to API-level testing. Instead of testing only the repository function, you can test an Express route connected to the real database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;supertest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Pool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@testcontainers/postgresql&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vitest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;password&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Email and password are required&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existingUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT id FROM users WHERE email = $1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existingUser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rowCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;409&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Email already registered&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`
          INSERT INTO users (email, password_hash)
          VALUES ($1, $2)
          RETURNING id, email
        `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`hashed-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;password&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Internal server error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST /users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;ReturnType&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;createApp&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postgres:16-alpine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nx"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getHost&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUsername&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPassword&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        password_hash VARCHAR(255) NOT NULL,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;registers a new user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;new-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;secure-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeDefined&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;new-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;returns conflict for duplicate email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;same-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;first-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;same-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;second-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;409&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Email already registered&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a more realistic safety check for AI-generated backend code. If an assistant changes the route, modifies the SQL query, renames a column, removes the duplicate check, or mishandles an error path, this test has a much better chance of catching the issue than a mock-based test.&lt;/p&gt;

&lt;p&gt;Testcontainers also works well with browser testing. Cypress and Playwright are often used to test the full user experience, but those tests are only as reliable as the environment behind them. Cypress maintains Docker images with the required dependencies for running Cypress in Docker, and its CI documentation covers Docker images, caching, parallel execution, and environment configuration. Playwright also provides Docker guidance, including images that contain browser system dependencies for running tests in containerized environments.&lt;/p&gt;

&lt;p&gt;A useful pattern is to let Testcontainers provide the backend dependency while Playwright or Cypress validates the user flow. For example, a registration flow can use a real PostgreSQL container, a real API server, and a real browser test. This gives you confidence that the user interface, HTTP layer, validation logic, database query, and persistence behavior all work together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@playwright/test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@testcontainers/postgresql&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StartedPostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PostgreSqlContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postgres:16-alpine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;startApplicationForTests&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getHost&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUsername&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPassword&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;stopApplicationForTests&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;a user can register and see the profile page&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/register`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[name='email']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;playwright-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[name='password']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;secure-password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;button[type='submit']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[data-testid='profile-email']&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toHaveText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;playwright-user@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;startApplicationForTests&lt;/code&gt; function depends on your project structure, but the principle is straightforward. Start the dependency first, pass its runtime connection details into the app, then run the browser test against the real stack.&lt;/p&gt;

&lt;p&gt;This pattern is especially valuable when AI coding tools are changing frontend and backend code together. A generated form update may look correct in the browser, but it might send a payload that no longer matches the API. A generated API route may compile, but it might break database constraints. A generated repository method may pass unit tests, but fail against PostgreSQL because of an incorrect column name. Real dependency testing helps catch these integration gaps.&lt;/p&gt;

&lt;p&gt;Test containers is not only for PostgreSQL. The Node.js ecosystem has modules for databases and services such as MongoDB, Redis, and LocalStack, and it also supports generic containers for custom services. The official getting started guide demonstrates using PostgreSQL for Node.js tests, while the broader project describes test containers as a way to test with the same kinds of services used in production instead of relying on mocks or in-memory replacements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;GenericContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Wait&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;testcontainers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GenericContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-company/search-service:test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withExposedPorts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withEnvironment&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withWaitStrategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forHttp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/health&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchServiceUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`http://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getHost&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMappedPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Readiness checks are important. A container being “started” does not always mean the service inside it is ready to accept requests. Waiting for an HTTP endpoint, a log message, or a health check can prevent flaky tests that fail only because the test ran too early.&lt;/p&gt;

&lt;p&gt;There are trade-offs. Test containers-based tests are slower than unit tests. A PostgreSQL container may take a few seconds to start, especially on the first run when Docker needs to pull the image. These tests also require Docker to be available locally and in CI. That is why test containers should not replace your unit test suite. The best approach is layered testing. Keep fast unit tests for pure functions and isolated business logic. Use test containers for integration points where real dependency behavior matters.&lt;/p&gt;

&lt;p&gt;In practice, you can keep performance reasonable by starting containers once per test file, cleaning data between tests, and avoiding unnecessary container restarts. You can truncate tables, use transactions, or create isolated schemas depending on your application. Recent test containers Node releases also continue to improve operational behavior. The 11.14.0 release added auto cleanup control for containers and compose environments, along with support for running in parallel for distinct UIDs.&lt;/p&gt;

&lt;p&gt;A simple GitHub Actions setup is usually enough because hosted Ubuntu runners already support Docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Integration Tests&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run unit and integration tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main requirement is that your CI environment can run Docker containers. Once that is available, your tests can create real dependencies on demand without maintaining long-lived shared test databases.&lt;/p&gt;

&lt;p&gt;The most important mindset shift is this: mocks are not bad, but they are not enough. They are great for speed, edge cases, and isolated logic. They are weak when the risk lives in the contract between your application and a real dependency. AI-generated code increases that risk because it can produce code that looks reasonable while subtly misunderstanding the database schema, query behavior, or runtime environment.&lt;/p&gt;

&lt;p&gt;Test containers gives TypeScript teams a practical way to validate those boundaries. It lets you test Node.js APIs against real databases, run browser flows against realistic backends, verify migrations, check queue or cache behavior, and build more trustworthy CI pipelines. For teams adopting AI-assisted development, that confidence layer becomes even more valuable.&lt;/p&gt;

&lt;p&gt;The goal is not to test everything with Docker. The goal is to stop pretending that a mock database proves your application works with a real one. Start with one important flow, such as registration, checkout, booking, authentication, or report generation. Replace the mock-heavy integration test with a test containers-backed test. Run it locally. Add it to CI. Then expand only where the added confidence is worth the extra runtime.&lt;/p&gt;

&lt;p&gt;As AI tools make code generation faster, our validation systems need to become more grounded. Test containers is one of the most practical ways to bring that grounding into a modern TypeScript and Node.js workflow.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>node</category>
      <category>cypress</category>
    </item>
    <item>
      <title>Your AI Agent Dockerfile Might Be Leaking Secrets</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Sun, 10 May 2026 17:12:55 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/your-ai-agent-dockerfile-might-be-leaking-secrets-2cei</link>
      <guid>https://dev.to/raju_dandigam/your-ai-agent-dockerfile-might-be-leaking-secrets-2cei</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Dockerfiles are often treated as boring infrastructure files. We copy a working example, adjust a few commands, install dependencies, and move on. That is understandable, but it is also where many security mistakes begin.&lt;/p&gt;

&lt;p&gt;This risk becomes more important when we build AI-enabled Node.js applications. A modern AI app may depend on private npm packages, internal SDKs, GitHub repositories, model provider credentials, MCP server configuration, or private build-time assets. If we are not careful, tokens used during the Docker build can accidentally become part of the image history, image layers, build logs, or final runtime environment.&lt;/p&gt;

&lt;p&gt;Docker Build Secrets solve one specific problem: passing sensitive values to the build process without baking them into the final image. Docker's documentation is clear that build arguments and environment variables are not appropriate for secrets because they can persist in the final image, while secret mounts and SSH mounts are designed for securely exposing sensitive data only during a build step.&lt;/p&gt;

&lt;p&gt;This article focuses on the practical Node.js and AI-agent case: installing private packages, accessing private repositories, and avoiding the common mistake of treating API keys as normal Dockerfile variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Common Mistake
&lt;/h2&gt;

&lt;p&gt;A common Dockerfile pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:22-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; NPM_TOKEN&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NPM_TOKEN=$NPM_TOKEN&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm config &lt;span class="nb"&gt;set&lt;/span&gt; //registry.npmjs.org/:_authToken&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$NPM_TOKEN&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm ci

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, this looks reasonable. The build needs an npm token to install private packages, so the token is passed as an argument and used during &lt;code&gt;npm ci&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The problem is that &lt;code&gt;ARG&lt;/code&gt; and &lt;code&gt;ENV&lt;/code&gt; were not designed for secrets. The value may appear in metadata, logs, or intermediate layers depending on how the image is built and inspected. Even if the final container runs fine, the image may now carry more information than intended.&lt;/p&gt;

&lt;p&gt;This gets worse when developers use the same pattern for AI credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; OPENAI_API_KEY&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; OPENAI_API_KEY=$OPENAI_API_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is usually the wrong place for a model provider key. An OpenAI key, Anthropic key, GitHub token, or MCP server credential should normally be a runtime secret, not a build-time value. The build process usually does not need it. The running application does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Apps Make This Easier to Get Wrong
&lt;/h2&gt;

&lt;p&gt;AI applications often blur the boundary between build time and runtime. A regular Node.js API may only need dependencies during build and database credentials during runtime. An AI-agent application may also need tool credentials, private package access, GitHub access, prompt assets, evaluation data, and model provider keys.&lt;/p&gt;

&lt;p&gt;That complexity leads to shortcuts. A developer may add a token to the Dockerfile just to make the build pass. An AI coding assistant may generate a Dockerfile that uses &lt;code&gt;ARG&lt;/code&gt; because it looks simple. A CI workflow may pass secrets directly into build arguments because it is easy to wire up.&lt;/p&gt;

&lt;p&gt;The safer habit is to ask one question before adding any secret to a Docker build: &lt;strong&gt;does this value need to exist while building the image, or only when running the container?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the secret is needed to install a private npm package, clone a private repository, or download a private build asset, it may be a build secret. If the secret is needed to call a model provider, connect to a database, access an MCP tool, or call an external API at runtime, it should be passed when the container runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Safer Pattern: Build Secrets
&lt;/h2&gt;

&lt;p&gt;Docker BuildKit supports secret mounts. A secret mount exposes a value as a temporary file during a specific &lt;code&gt;RUN&lt;/code&gt; instruction. By default, Docker mounts secrets under &lt;code&gt;/run/secrets&lt;/code&gt;, and the secret is not automatically copied into the final image unless your command explicitly writes it somewhere permanent. Docker describes this as a two-step process: pass the secret into &lt;code&gt;docker build&lt;/code&gt;, then consume it inside the Dockerfile using a secret mount.&lt;/p&gt;

&lt;p&gt;Here is a safer version for installing private npm packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# syntax=docker/dockerfile:1.7&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;secret,id&lt;span class="o"&gt;=&lt;/span&gt;npm_token &lt;span class="se"&gt;\
&lt;/span&gt;  npm config &lt;span class="nb"&gt;set&lt;/span&gt; //registry.npmjs.org/:_authToken&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /run/secrets/npm_token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm ci &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm config delete //registry.npmjs.org/:_authToken

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runtime&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/dist ./dist&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/package*.json ./&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/index.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then build the image like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;npm_token,env&lt;span class="o"&gt;=&lt;/span&gt;NPM_TOKEN &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; ai-agent-api:local &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the npm token is available only during the &lt;code&gt;RUN&lt;/code&gt; instruction that installs dependencies. It is not declared with &lt;code&gt;ARG&lt;/code&gt;, not promoted to &lt;code&gt;ENV&lt;/code&gt;, and not needed in the runtime image.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture in One View
&lt;/h2&gt;

&lt;p&gt;The important distinction is that &lt;strong&gt;build secrets&lt;/strong&gt; and &lt;strong&gt;runtime secrets&lt;/strong&gt; solve different problems. Build secrets help the image build safely. Runtime secrets help the container run safely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgszxpfeshntnhse2b7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgszxpfeshntnhse2b7z.png" alt="architecture flow" width="800" height="1060"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Actions Example
&lt;/h2&gt;

&lt;p&gt;Docker also documents secret mounts and SSH mounts for GitHub Actions builds. Secret mounts expose values as files during the build container step, while SSH mounts expose SSH agent sockets or keys for operations such as cloning private repositories.&lt;/p&gt;

&lt;p&gt;Here is a simple GitHub Actions workflow using Docker's Build Push Action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Docker Image&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;docker-build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v6&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
          &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-agent-api:ci&lt;/span&gt;
          &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;npm_token=${{ secrets.NPM_TOKEN }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The matching Dockerfile can read the secret as &lt;code&gt;/run/secrets/npm_token&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;secret,id&lt;span class="o"&gt;=&lt;/span&gt;npm_token &lt;span class="se"&gt;\
&lt;/span&gt;  npm config &lt;span class="nb"&gt;set&lt;/span&gt; //registry.npmjs.org/:_authToken&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /run/secrets/npm_token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm ci &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm config delete //registry.npmjs.org/:_authToken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is much safer than passing the npm token as a build argument.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About SSH Keys?
&lt;/h2&gt;

&lt;p&gt;Sometimes the build needs to pull code from a private Git repository. For that, SSH mounts are usually a better fit than copying a private key into the image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# syntax=docker/dockerfile:1.7&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; git openssh-client &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ssh &lt;span class="se"&gt;\
&lt;/span&gt;  git clone git@github.com:your-org/private-agent-tools.git tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build it with SSH forwarding enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;--ssh&lt;/span&gt; default &lt;span class="nt"&gt;-t&lt;/span&gt; ai-agent-api:local &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SSH key is not copied into the image. The build step gets temporary access through the SSH mount.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should Not Be a Build Secret
&lt;/h2&gt;

&lt;p&gt;Not every secret belongs in &lt;code&gt;docker build --secret&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model provider keys are usually runtime secrets.&lt;/strong&gt; If your Node.js application calls a model API when it runs, pass the key at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ai-agent-api:local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For local development, Docker Compose can read values from your environment or an ignored &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-agent-api:local&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
      &lt;span class="na"&gt;MCP_GITHUB_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${MCP_GITHUB_TOKEN}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, use your platform's secret manager. That may be AWS Secrets Manager, Kubernetes Secrets, Docker Swarm secrets, GitHub environment secrets, or another managed secret store. The key idea is the same: &lt;strong&gt;runtime credentials should be provided to the running container, not baked into the image.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Checklist for Node.js AI Apps
&lt;/h2&gt;

&lt;p&gt;Before committing a Dockerfile for an AI application, review it with these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does the Dockerfile use &lt;code&gt;ARG&lt;/code&gt; or &lt;code&gt;ENV&lt;/code&gt; for anything that looks like a token, key, password, or credential?&lt;/li&gt;
&lt;li&gt;Does the build need the secret, or does only the running app need it?&lt;/li&gt;
&lt;li&gt;Are private npm tokens passed through &lt;code&gt;--secret&lt;/code&gt; instead of &lt;code&gt;ARG&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;Are SSH keys forwarded through &lt;code&gt;--ssh&lt;/code&gt; instead of copied?&lt;/li&gt;
&lt;li&gt;Does the final runtime image avoid &lt;code&gt;.npmrc&lt;/code&gt;, private keys, local &lt;code&gt;.env&lt;/code&gt; files, and unnecessary build artifacts?&lt;/li&gt;
&lt;li&gt;Is &lt;code&gt;.dockerignore&lt;/code&gt; excluding files such as &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.npmrc&lt;/code&gt;, &lt;code&gt;.git&lt;/code&gt;, logs, coverage output, and local test data?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A basic &lt;code&gt;.dockerignore&lt;/code&gt; should usually include these files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;.&lt;span class="n"&gt;env&lt;/span&gt;
.&lt;span class="n"&gt;env&lt;/span&gt;.*
.&lt;span class="n"&gt;npmrc&lt;/span&gt;
.&lt;span class="n"&gt;git&lt;/span&gt;
&lt;span class="n"&gt;node_modules&lt;/span&gt;
&lt;span class="n"&gt;coverage&lt;/span&gt;
&lt;span class="n"&gt;dist&lt;/span&gt;
*.&lt;span class="n"&gt;log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be careful with &lt;code&gt;dist&lt;/code&gt; if your build process expects it from the host. In most production Docker builds, the image should build its own &lt;code&gt;dist&lt;/code&gt; output inside the container.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Verify You Did Not Leak Something Obvious
&lt;/h2&gt;

&lt;p&gt;You can inspect image history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;history &lt;/span&gt;ai-agent-api:local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also run a quick scan inside the image filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; ai-agent-api:local sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"find /app -type f | xargs grep -i 'sk-' || true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command is not a full security scanner, but it can catch obvious mistakes. For serious workflows, use dedicated secret scanning and image scanning tools in CI.&lt;/p&gt;

&lt;p&gt;This is not theoretical. A 2023 internet-wide study of container images found that exposed secrets in container images are a real issue, including private keys and API secrets discovered across public and private registries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Docker Build Secrets are not complicated, but they require a clear mental model.&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;build secrets&lt;/strong&gt; when the build process needs temporary access to sensitive data, such as private npm packages or private source repositories. Use &lt;strong&gt;runtime secrets&lt;/strong&gt; when the running application needs credentials, such as OpenAI keys, GitHub tokens, database passwords, or MCP server credentials.&lt;/p&gt;

&lt;p&gt;For AI-agent applications, this distinction matters even more. Agents often connect to powerful tools and sensitive systems. A leaked token can expose private repositories, model usage, customer data, internal APIs, or deployment workflows.&lt;/p&gt;

&lt;p&gt;The safer pattern is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not put secrets in &lt;code&gt;ARG&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Do not promote them to &lt;code&gt;ENV&lt;/code&gt; inside the Dockerfile&lt;/li&gt;
&lt;li&gt;Do not copy &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;.npmrc&lt;/code&gt; into the image&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;RUN --mount=type=secret&lt;/code&gt; for build-time secrets&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;--mount=type=ssh&lt;/code&gt; for private Git access&lt;/li&gt;
&lt;li&gt;Pass runtime credentials through your runtime environment or secret manager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your Dockerfile is part of your application's security boundary. Treat it that way, especially when the application is powered by AI and connected to real tools.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>node</category>
      <category>devops</category>
    </item>
    <item>
      <title>MCP Without the Setup Pain: Using Docker MCP Toolkit with TypeScript Agents</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Fri, 08 May 2026 15:00:06 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/mcp-without-the-setup-pain-using-docker-mcp-toolkit-with-typescript-agents-44he</link>
      <guid>https://dev.to/raju_dandigam/mcp-without-the-setup-pain-using-docker-mcp-toolkit-with-typescript-agents-44he</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol, usually called MCP, has quickly become one of the most important ideas in AI application development. It gives AI tools and agents a standard way to connect to external systems such as filesystems, GitHub, databases, browsers, documentation, and internal APIs.&lt;/p&gt;

&lt;p&gt;The protocol is useful because it gives agents a common tool interface. Instead of every AI application inventing its own way to call tools, MCP creates a shared pattern for exposing capabilities.&lt;/p&gt;

&lt;p&gt;However, the protocol is only one part of the story. The real pain starts when developers need to run multiple MCP servers locally. One server may need Node.js, another may need Python, another may need browser dependencies, and another may need OAuth or API keys. Suddenly, your agent is not just an AI workflow. It is a small distributed system running on your laptop.&lt;/p&gt;

&lt;p&gt;Docker MCP Toolkit tries to solve that operational problem. It does not replace MCP, and it does not make your agent intelligent by itself. Its value is simpler and more practical: it helps you discover, configure, run, and manage MCP servers as containerized tools through Docker Desktop and the Docker MCP Gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real MCP Problem Is Setup
&lt;/h2&gt;

&lt;p&gt;A TypeScript agent may look simple at first. It receives a user request, asks an LLM what to do next, and then calls tools. But those tools need to run somewhere.&lt;/p&gt;

&lt;p&gt;Imagine a code-review agent that needs three capabilities. It needs GitHub access to read pull request metadata. It needs filesystem access to inspect local files. It needs Playwright access to open a preview deployment and check whether the application still works.&lt;/p&gt;

&lt;p&gt;Without Docker, each tool may come with a different setup process. You may need to install Node.js packages for one server, Python packages for another server, browser dependencies for Playwright, and local credentials for each integration. That might be acceptable for one developer on one machine. It becomes painful when a second developer joins, when the setup moves to CI, or when the team needs consistent tool versions.&lt;/p&gt;

&lt;p&gt;This is the same problem Docker has always been good at solving. A tool should bring its runtime and dependencies with it. Developers should not need to manually reproduce a long setup document just to run the same agent workflow.&lt;/p&gt;

&lt;p&gt;Docker's MCP documentation describes the Toolkit as a Docker Desktop management interface for setting up, managing, and running containerized MCP servers in profiles and connecting them to AI agents. It also highlights profile-based organization, integrated tool discovery, and zero manual setup as key features.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Docker MCP Toolkit Actually Does
&lt;/h2&gt;

&lt;p&gt;Docker MCP Toolkit sits between your AI client and your MCP servers. The AI client might be Claude Desktop, Cursor, VS Code, Docker AI Agent, or your own local TypeScript agent. The MCP servers are the tools that perform actions.&lt;/p&gt;

&lt;p&gt;The Toolkit helps with the operational layer. It lets you browse MCP servers from Docker's MCP Catalog, add servers to profiles, connect clients, and run those servers through the Docker MCP Gateway. Docker's MCP Catalog documentation says the catalog contains more than 300 verified MCP servers packaged as container images with versioning, provenance, and security updates.&lt;/p&gt;

&lt;p&gt;That packaging matters. A containerized MCP server can include the runtime it needs, the dependencies it needs, and a more predictable execution environment. The Docker MCP Gateway then manages the server lifecycle. Docker's gateway documentation explains that when an AI application needs a tool, the gateway identifies the correct server, starts it as a Docker container if needed, injects required credentials, applies security restrictions, forwards the request, and returns the result.&lt;/p&gt;

&lt;p&gt;The important point is that your agent does not need to know how every MCP server is installed. It only needs to connect through the gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here is the architecture in one view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9eau1r1o424rtti78dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9eau1r1o424rtti78dj.png" alt=" " width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The profile defines which servers are available for a workflow. For example, a frontend development profile might include GitHub, filesystem, Playwright, and documentation search. A backend profile might include GitHub, PostgreSQL, Redis, and observability tools.&lt;/p&gt;

&lt;p&gt;Docker's profile documentation says profiles organize servers into named collections for different projects, and different AI applications can connect to different profiles. It also notes that profiles can be shared with teams through OCI-compliant registries, while credentials are not included in the shared profile for security reasons.&lt;/p&gt;

&lt;p&gt;That gives teams a cleaner model. The profile defines the approved toolset. Each developer configures their own credentials. The agent connects to the profile instead of a random collection of local scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Docker MCP Toolkit
&lt;/h2&gt;

&lt;p&gt;The easiest path is through Docker Desktop. In current Docker documentation, Docker recommends using the MCP Toolkit interface in Docker Desktop, especially for discovery and profile management. The get-started guide explains that you can create a profile from the Profiles tab, browse servers from the Catalog tab, add them to the profile, and connect supported clients from the Clients tab.&lt;/p&gt;

&lt;p&gt;A simple setup flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Docker Desktop&lt;/li&gt;
&lt;li&gt;Select MCP Toolkit&lt;/li&gt;
&lt;li&gt;Create a profile named &lt;code&gt;frontend-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add GitHub, filesystem, and Playwright servers from the Catalog tab&lt;/li&gt;
&lt;li&gt;Configure required credentials or OAuth permissions&lt;/li&gt;
&lt;li&gt;Connect your AI client to the profile&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For clients that are not directly listed in Docker Desktop, Docker documents a manual stdio configuration using the gateway command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"MCP_DOCKER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gateway"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--profile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"frontend-agent"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a useful pattern because many MCP clients support launching a local MCP server process over stdio. In this case, the process is the Docker MCP Gateway, and the gateway manages the actual MCP server containers behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple TypeScript Agent Example
&lt;/h2&gt;

&lt;p&gt;The MCP client SDK APIs may vary based on transport and package version, so the example below is intentionally simple. The goal is to show the application shape, not hide the article behind too much SDK boilerplate.&lt;/p&gt;

&lt;p&gt;A TypeScript agent using MCP tools usually follows this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ToolCall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ToolResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;callMcpTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToolCall&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ToolResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// In a real application, this call goes through your MCP client transport.&lt;/span&gt;
  &lt;span class="c1"&gt;// Docker MCP Gateway handles routing to the correct containerized server.&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Calling MCP tool: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Result from &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;reviewPullRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prDetails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callMcpTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github.get_pull_request&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prUrl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;changedFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callMcpTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github.list_changed_files&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prUrl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;packageJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callMcpTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;filesystem.read_file&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/workspace/package.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;prDetails&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;changedFiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;changedFiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;packageJson&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;packageJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;reviewPullRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://github.com/example/app/pull/42&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a real implementation, &lt;code&gt;callMcpTool&lt;/code&gt; would use an MCP client transport connected to the Docker MCP Gateway. The gateway would route &lt;code&gt;github.*&lt;/code&gt; calls to the GitHub MCP server container and &lt;code&gt;filesystem.*&lt;/code&gt; calls to the filesystem MCP server container.&lt;/p&gt;

&lt;p&gt;The agent itself stays clean. It is not installing GitHub dependencies. It is not launching Playwright. It is not managing Python or Node runtimes for individual tool servers. It is asking for tools by name, and Docker handles the operational boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for TypeScript Agents
&lt;/h2&gt;

&lt;p&gt;TypeScript is a strong fit for agent applications because it helps define tool contracts, workflow state, structured outputs, and runtime validation. But TypeScript alone does not solve the environment problem. A typed tool call still fails if the MCP server is not installed correctly, if the browser dependency is missing, or if a credential is configured differently across machines.&lt;/p&gt;

&lt;p&gt;Docker MCP Toolkit makes the tool layer more repeatable. A team can agree that a specific profile is the standard development toolset. One developer can use it from Cursor. Another can connect it to Claude Desktop. A third can connect a custom TypeScript agent. The server collection stays consistent.&lt;/p&gt;

&lt;p&gt;This becomes more important as agents move beyond simple demos. A real code assistant may need repository access, issue tracker access, local file access, test execution, browser automation, and documentation search. Without a management layer, MCP server sprawl becomes a real problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Docker Helps Most
&lt;/h2&gt;

&lt;p&gt;Docker helps most when your agent needs more than one or two tools. If you are only testing a single local MCP server, manual setup may be fine. But if your workflow depends on several MCP servers, different runtimes, and credentials, the Docker approach becomes much more useful.&lt;/p&gt;

&lt;p&gt;It also helps when teams need consistency. A new developer should not need to install five runtimes and follow a long checklist before trying an agent workflow. The closer the setup gets to "pull the profile, configure credentials, connect the client," the easier it becomes to share.&lt;/p&gt;

&lt;p&gt;Docker also helps with security boundaries. MCP servers are powerful because they can touch real systems. That also makes them risky. A filesystem tool should not automatically access your entire machine. A browser tool should not have unlimited permissions. A GitHub tool should use scoped credentials. Running tools through a gateway and containerized servers does not remove all risk, but it gives teams a better place to apply isolation and control.&lt;/p&gt;

&lt;p&gt;The Docker MCP Gateway repository describes this gateway pattern as &lt;strong&gt;AI Client → MCP Gateway → MCP Servers&lt;/strong&gt;, with servers running as Docker containers and the gateway providing a unified interface, secrets handling, OAuth integration, and dynamic discovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Does Not Solve
&lt;/h2&gt;

&lt;p&gt;Docker MCP Toolkit is not magic. It does not make a weak agent design reliable. It does not decide which tool should be called. It does not validate every tool result for you. It does not remove the need for approval gates when an agent can modify files, open pull requests, deploy code, or touch production-like systems.&lt;/p&gt;

&lt;p&gt;It also does not mean every MCP server is automatically safe. You still need to choose trusted servers, limit permissions, review tool access, and avoid giving broad credentials to experimental workflows. Docker's catalog and container packaging improve the operational story, but security still depends on how the tools are configured and what the agent is allowed to do.&lt;/p&gt;

&lt;p&gt;There is also a learning curve. Developers still need to understand MCP concepts such as clients, servers, tools, transports, and permissions. Docker simplifies the runtime and setup problem. It does not eliminate the need to design the agent workflow carefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Use Case
&lt;/h2&gt;

&lt;p&gt;A good first use case is a local code review assistant. Keep it simple. Give it access to GitHub for pull request metadata, filesystem access to the local repository, and Playwright access to a preview URL.&lt;/p&gt;

&lt;p&gt;The agent flow can be straightforward:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzmd7wjhq1ovy3o8jtqd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzmd7wjhq1ovy3o8jtqd.png" alt=" " width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is useful because it is realistic but still safe enough for a first experiment. The agent is not deploying anything. It is not merging code. It is gathering context and producing a review summary.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Docker MCP Toolkit
&lt;/h2&gt;

&lt;p&gt;Use Docker MCP Toolkit when you are building agents that need multiple external tools, when you want repeatable local setup across a team, or when you want MCP servers to run in isolated containers instead of directly on every developer machine.&lt;/p&gt;

&lt;p&gt;It is especially useful for TypeScript agent projects that combine GitHub, filesystem, browser automation, documentation search, databases, or cloud service tools. It is also useful when you want the same profile available across multiple AI clients.&lt;/p&gt;

&lt;p&gt;Skip it for very small experiments. If you are testing one MCP server for an hour, manual setup may be faster. Bring in Docker MCP Toolkit when the setup starts becoming part of the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;MCP standardizes how agents talk to tools. Docker MCP Toolkit standardizes how those tools are discovered, configured, run, and shared.&lt;/p&gt;

&lt;p&gt;That distinction matters. The future of agent development is not only about better prompts or smarter models. It is also about safer and more repeatable tool execution. Agents become more useful when they can access real systems, but they become harder to manage when every tool brings its own runtime, secrets, permissions, and setup instructions.&lt;/p&gt;

&lt;p&gt;Docker MCP Toolkit gives TypeScript developers a practical way to manage that complexity. It lets teams create profiles, run MCP servers as containers, connect clients through a gateway, and reduce the dependency chaos that comes with multi-tool agents.&lt;/p&gt;

&lt;p&gt;For a small prototype, you may not need it. For a real agent workflow that depends on GitHub, files, browsers, databases, or internal tools, Docker MCP Toolkit can make MCP feel less like a pile of scripts and more like a manageable development platform.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>typescript</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Stop Burning API Credits While Building AI Apps: Run Local LLMs with Docker Model Runner</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Thu, 07 May 2026 16:24:35 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/stop-burning-api-credits-while-building-ai-apps-run-local-llms-with-docker-model-runner-510j</link>
      <guid>https://dev.to/raju_dandigam/stop-burning-api-credits-while-building-ai-apps-run-local-llms-with-docker-model-runner-510j</guid>
      <description>&lt;p&gt;Building AI features usually starts with a cloud API. That is the fastest path when you are experimenting with chat interfaces, summarization, classification, content generation, or agent workflows. You add an SDK, pass an API key, send a prompt, and get a response back.&lt;/p&gt;

&lt;p&gt;That simplicity is great, but during active development it can also become noisy. Every prompt experiment, failed test, retry, debugging session, and local demo sends another request to a paid service. For one developer, the cost may be small. For a team building AI features every day, those calls can add up quickly. There is also another concern: not every development prompt should leave your machine, especially when you are testing with internal documents, customer-like data, logs, or proprietary examples.&lt;/p&gt;

&lt;p&gt;Docker Model Runner gives JavaScript developers another option. It lets you run AI models locally using Docker’s workflow and expose them through APIs that feel familiar to developers already using OpenAI-style clients. Docker describes Model Runner as a way to run and manage AI models locally, serve models through OpenAI and Ollama-compatible APIs, and package model files as OCI artifacts. That means AI models can start behaving more like other Docker-managed development dependencies.&lt;/p&gt;

&lt;p&gt;This does not mean local models replace cloud models for every use case. They usually do not. Cloud models are still better for production workloads that need high-quality reasoning, scale, reliability, and the latest model capabilities. The more useful point is simpler: local models are very useful during development, especially when you want fast iteration, predictable cost, and better control over data.&lt;/p&gt;

&lt;p&gt;Here is the workflow in one view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fayzxb3ttazekpqpcuavf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fayzxb3ttazekpqpcuavf.png" alt=" " width="800" height="1041"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The application code can stay almost the same. The main difference is configuration. In development, your OpenAI-compatible client points to Docker Model Runner. In production, it points to your cloud provider.&lt;/p&gt;

&lt;p&gt;Docker Model Runner is integrated with Docker Desktop and Docker Engine. Docker’s API reference shows that host processes can access the Model Runner API at &lt;code&gt;http://localhost:12434&lt;/code&gt;, while containers can access it through Docker networking patterns such as &lt;code&gt;model-runner.docker.internal:12434&lt;/code&gt; when configured through Compose.&lt;/p&gt;

&lt;p&gt;Before writing code, enable Docker Model Runner in Docker Desktop if it is not already enabled. Then confirm the CLI is available.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can pull a model using the Docker model command. The exact model you choose depends on what is available in your Docker environment and what your machine can run comfortably.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model pull ai/llama3.2:3B-Q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After pulling a model, you can run a quick prompt from the command line.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model run ai/llama3.2:3B-Q4_K_M &lt;span class="s2"&gt;"Explain Docker containers in one sentence."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already useful for quick experiments, but the real value for JavaScript developers comes from calling the local model from a Node.js app.&lt;/p&gt;

&lt;p&gt;Install the OpenAI SDK.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now create a small TypeScript helper that talks to the local Docker Model Runner endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;local-development-key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:12434/engines/llama.cpp/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/llama3.2:3B-Q4_K_M&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You summarize technical text clearly and briefly.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Summarize this text in three sentences:\n\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then call it from a simple script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;generateSummary&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./generate-summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    Docker Model Runner lets developers run AI models locally and call them
    through familiar API formats. This can reduce development cost and keep
    sensitive experimentation data on the developer machine.
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Failed to generate summary:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important part is not the example itself. The important part is the boundary. Your application is not tightly coupled to one provider. It is coupled to an OpenAI-compatible interface. That gives you flexibility.&lt;/p&gt;

&lt;p&gt;In local development, you can use this environment configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;http://localhost:12434/engines/llama.cpp/v1&lt;/span&gt;
&lt;span class="py"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;local-development-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rest of your application does not need to change. This pattern is valuable because most AI application code should not care whether the model is running locally or remotely. It should care about the contract: send messages, receive a response, handle errors, and validate the output.&lt;/p&gt;

&lt;p&gt;A practical use case for local models is development-time text processing. For example, imagine you are building an internal support tool that summarizes customer tickets before a human reads them. During development, you may run the same prompt hundreds of times while tuning the wording, testing edge cases, and adjusting the UI. A local model is a good fit for that stage because you are optimizing the workflow, not making final production-quality decisions.&lt;/p&gt;

&lt;p&gt;Here is a slightly more realistic example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;TicketSummary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bug&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;account&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;other&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;summarizeTicket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ticketText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TicketSummary&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/llama3.2:3B-Q4_K_M&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Classify the support ticket and summarize it. Return only valid JSON.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ticketText&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;TicketSummary&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;other&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The model returned an invalid response.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example is intentionally simple. In a real application, you would validate the response with a schema library such as Zod, add retries for invalid JSON, and log model behavior for debugging. The point is that Docker Model Runner lets you build and test this workflow locally without sending every prompt to a cloud API.&lt;/p&gt;

&lt;p&gt;Docker is also moving toward making models fit naturally into Compose-based development. The Docker Compose model reference describes a models section where an AI model can be defined as an OCI artifact, pulled and served by Model Runner, and then exposed to an application through injected connection information.&lt;/p&gt;

&lt;p&gt;Conceptually, that means a future local AI development stack can look like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:3000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://model-runner.docker.internal:12434/engines/llama.cpp/v1&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-development-key&lt;/span&gt;
    &lt;span class="na"&gt;extra_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-runner.docker.internal:host-gateway"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the Node.js application containerized while still allowing it to reach the local Model Runner endpoint. Docker’s API docs specifically note that containers may need an &lt;code&gt;extra_hosts&lt;/code&gt; entry to access &lt;code&gt;model-runner.docker.internal&lt;/code&gt; through the host gateway.&lt;/p&gt;

&lt;p&gt;There are several places where this local-first setup is useful.&lt;/p&gt;

&lt;p&gt;It is useful for prompt iteration because you can test many versions without worrying about API usage. It is useful for privacy-sensitive development because test data can stay on your machine. It is useful for offline work after the model is already pulled. It is also useful for CI experiments where you want to run basic LLM-dependent tests without calling a cloud provider, although you should keep those tests small because local inference can be slower and hardware-dependent.&lt;/p&gt;

&lt;p&gt;There are also clear limits.&lt;/p&gt;

&lt;p&gt;Local models usually do not match the quality of the strongest hosted models. Smaller models can summarize, classify, rewrite, and answer simple questions reasonably well, but they may struggle with complex reasoning or long context tasks. Performance depends heavily on your hardware, especially RAM and GPU availability. A small model may run comfortably on a developer laptop, while a larger model may feel too slow for daily use.&lt;/p&gt;

&lt;p&gt;Docker Model Runner is also best understood as a development tool first. Docker’s product page emphasizes local-first inference, no recurring API costs for local usage, privacy, and control. Those are development strengths. They do not automatically make it the right choice for high-scale production serving.&lt;/p&gt;

&lt;p&gt;A healthy architecture is to keep both paths available.&lt;/p&gt;

&lt;p&gt;Use local inference when you are designing prompts, building UI flows, testing basic behavior, working with sensitive examples, or experimenting with agent workflows. Use cloud inference when you need production reliability, stronger model quality, scale, monitoring, and service-level guarantees.&lt;/p&gt;

&lt;p&gt;The bigger lesson is that AI development is starting to look more like normal software development. We want local dependencies. We want repeatable environments. We want clear configuration. We want the ability to run important parts of the system without depending on external services for every test.&lt;/p&gt;

&lt;p&gt;Docker Model Runner fits into that shift. It brings AI models closer to the Docker workflow many developers already understand. You pull a model, run it locally, expose an API, and connect your application to it. For JavaScript and TypeScript developers, the OpenAI-compatible API makes the adoption path even easier because the application code can remain familiar.&lt;/p&gt;

&lt;p&gt;This is not a replacement for cloud AI platforms. It is a practical addition to the developer toolbox. If you are building AI features in Node.js and you want cheaper prompt iteration, better local privacy, and a Docker-native workflow, Docker Model Runner is worth exploring.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Stop Messy AI Projects: A Clean Folder Structure for Real Agent Systems</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Tue, 05 May 2026 20:42:29 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/stop-messy-ai-projects-a-clean-folder-structure-for-real-agent-systems-502f</link>
      <guid>https://dev.to/raju_dandigam/stop-messy-ai-projects-a-clean-folder-structure-for-real-agent-systems-502f</guid>
      <description>&lt;p&gt;Every AI agent project starts the same way. You create an &lt;code&gt;index.ts&lt;/code&gt;, add a prompt, maybe define a couple of tools, and everything works. For a while, it even feels clean and manageable. Then the system starts to grow. You introduce memory, add logging, experiment with multiple agents, and eventually build workflows. At that point, the simplicity disappears and the codebase turns into a collection of loosely connected files with no clear structure.&lt;/p&gt;

&lt;p&gt;This is the part most tutorials skip. They show how to call a model, but they rarely show how to organize a system around it.&lt;/p&gt;

&lt;p&gt;In a previous article, I discussed why AI agents should be designed as controlled systems where the model proposes actions and the application owns validation, execution, and safety. This article is the practical extension of that idea. If you were starting a TypeScript AI agent project today, this is the folder structure I would use to keep the system understandable and scalable.&lt;/p&gt;

&lt;p&gt;At a high level, the structure looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-ai-agent/
├── src/
│   ├── agents/
│   ├── tools/
│   ├── memory/
│   ├── workflows/
│   ├── mcp/
│   ├── prompts/
│   ├── middleware/
│   ├── types/
│   └── index.ts
├── config/
├── tests/
├── package.json
└── tsconfig.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, it may feel like over-organization. In reality, you do not start with everything. You grow into it. The goal is not to create folders upfront, but to have a clear place for things as complexity increases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesb079h84xm4fbosao53.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesb079h84xm4fbosao53.png" alt=" " width="800" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the simplest way to think about the system. Each folder has a single responsibility, and that clarity is what keeps the system predictable as it grows.&lt;/p&gt;

&lt;p&gt;The reason structure matters more in AI systems than in traditional applications is that the execution path is not fixed. In a typical backend, a request follows a known route. In an agent system, the path depends on the model’s decisions. The agent might call different tools, retrieve different memory, or stop midway for approval. That flexibility is powerful, but it also makes systems harder to debug and reason about. Without structure, debugging becomes guesswork. With structure, behavior becomes traceable.&lt;/p&gt;

&lt;p&gt;The best way to approach this is to start smaller than you think. A minimal setup is often enough in the beginning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── agents/
│   └── researcher.ts
├── tools/
│   └── search.ts
└── index.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is sufficient for a working agent. As the system grows, you introduce additional layers like memory, workflows, and middleware. The structure expands naturally instead of forcing a painful refactor later.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;agents&lt;/code&gt; folder is where you define what your system does. Each agent represents a role, typically combining a system prompt, a model configuration, and a set of tools. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;researcherAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;researcher&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a research assistant...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;web_search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This folder answers a simple but important question: what roles exist in your system?&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tools&lt;/code&gt; folder defines what the agent is allowed to do. Tools are where agents become useful, but they are also where risk enters the system. Each tool should be explicit and controlled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;web_search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/search?q=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key idea is not the implementation of the tool itself, but the boundary it creates. The agent should never have access to everything. It should only see and use tools that you explicitly register.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;memory&lt;/code&gt; folder is where many systems become unnecessarily complex. Instead of pushing everything into prompts, memory should be isolated and managed intentionally. A simple starting point is often enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContextMemory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;getAll&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can introduce more advanced memory systems such as vector search only when the need becomes real.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;workflows&lt;/code&gt; folder is where individual agent actions become coordinated processes. Most real systems are not single-step interactions. They are sequences of decisions and actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;researchPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;researcherAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;analystAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;research&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the point where you move from an agent to a system.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;mcp&lt;/code&gt; folder introduces a clean boundary for integrating external systems using the Model Context Protocol. As MCP adoption grows, isolating these integrations becomes increasingly valuable. Even with MCP, your application still needs to control access, validation, and permissions.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;prompts&lt;/code&gt; folder is about separating content from logic. As prompts evolve, keeping them inline makes iteration harder. Moving them into dedicated files allows faster updates without touching code.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;middleware&lt;/code&gt; folder is where production concerns live. This includes token budgets, logging, tracing, and rate limiting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BudgetMiddleware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This layer is often what separates a simple demo from a production-ready system.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;types&lt;/code&gt; folder is where TypeScript provides its real value. Centralizing interfaces ensures that when something changes, the impact is visible across the system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes evolving the system much safer.&lt;/p&gt;

&lt;p&gt;What most people miss is that folder structure is not just about organization. It reflects architecture. If your code mixes tools, prompts, memory, and execution logic randomly, your system will behave the same way. If your folders enforce separation of concerns, your system becomes predictable. This aligns directly with the architectural principle that the runtime controls execution, the model proposes actions, and the system validates behavior.&lt;/p&gt;

&lt;p&gt;Testing should follow the same philosophy. You do not need a complex setup at the beginning. A simple structure is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tests/
├── unit/
└── integration/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start by testing tools and memory. Add workflow tests as the system evolves. End-to-end testing can come later once the system stabilizes.&lt;/p&gt;

&lt;p&gt;As your project grows, the structure can evolve. You might introduce a &lt;code&gt;providers&lt;/code&gt; folder if you support multiple LLMs, or a &lt;code&gt;skills&lt;/code&gt; layer if capabilities become reusable across agents. At the same time, if the project remains small, it is perfectly valid to flatten the structure. The goal is not to follow a template rigidly, but to avoid chaos as complexity increases.&lt;/p&gt;

&lt;p&gt;Most AI agent tutorials focus heavily on prompts and models. Very few focus on how to structure the system around them. In real-world projects, that is where most of the challenges appear. A good folder structure will not make your agent smarter, but it will make your system understandable, maintainable, and scalable. And in practice, that matters far more.&lt;/p&gt;

&lt;p&gt;In the previous article &lt;a href="https://dev.to/raju_dandigam/the-typescript-ai-agent-architecture-i-would-use-in-2026-18k6"&gt;https://dev.to/raju_dandigam/the-typescript-ai-agent-architecture-i-would-use-in-2026-18k6&lt;/a&gt; I covered the architecture behind controlled AI agents and why the model should not own the system. In a future post, I will show how to combine that architecture with this structure to build a minimal but production-ready agent in TypeScript. That is where everything connects.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The TypeScript AI Agent Architecture I Would Use in 2026</title>
      <dc:creator>Raju Dandigam</dc:creator>
      <pubDate>Tue, 05 May 2026 05:46:54 +0000</pubDate>
      <link>https://dev.to/raju_dandigam/the-typescript-ai-agent-architecture-i-would-use-in-2026-18k6</link>
      <guid>https://dev.to/raju_dandigam/the-typescript-ai-agent-architecture-i-would-use-in-2026-18k6</guid>
      <description>&lt;p&gt;Most AI apps do not fail because the model is bad. They fail because the system surrounding the model lacks structure.&lt;/p&gt;

&lt;p&gt;The first version usually starts the same way. A user sends input, the app calls an LLM, and the response is returned. That is enough for a demo, but the moment the system needs to do anything real, the design starts to break.&lt;/p&gt;

&lt;p&gt;A real AI system does more than generate text. It may need to call APIs, use tools, remember context, validate outputs, retry on failures, ask for human approval, and explain what happened. At that point, you are not building a chatbot anymore. You are building a system.&lt;/p&gt;

&lt;p&gt;In 2026, I would not start with prompts. I would start with architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  The model is not the architecture
&lt;/h3&gt;

&lt;p&gt;One of the biggest mistakes I see is treating the LLM as the center of the system. The model can suggest what to do next, but it should not control everything. It should not decide which tools are safe, whether a user has permission, or whether a risky action should proceed.&lt;/p&gt;

&lt;p&gt;The model should propose. The application should decide. This simple shift changes how you design everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Think in terms of a loop, not a prompt
&lt;/h3&gt;

&lt;p&gt;An agent is not a better prompt. It is a loop. The system gives the model a goal and context. The model suggests the next step. The system validates that step, executes it if allowed, records the result, and continues until the task is completed or blocked. Without this structure, agents become unpredictable. They repeat steps, call the wrong tools, or silently fail. With structure, they become workflows you can reason about.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with a simple state model
&lt;/h3&gt;

&lt;p&gt;Before anything else, define state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;AgentState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AgentStep&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nl"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;running&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;blocked&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;AgentStep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;output&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This small structure changes everything. The system is no longer a single request-response call. It becomes a stateful workflow. You can inspect it, debug it, resume it, and control it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rd3ld7j3rji6hl6gp8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rd3ld7j3rji6hl6gp8q.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the simplest way to think about it. The model suggests. The runtime controls. The system decides what actually happens. I would keep the architecture simple and consistent. &lt;/p&gt;

&lt;p&gt;The five layers that actually matter&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The API layer handles requests, users, and permissions.&lt;/li&gt;
&lt;li&gt;The runtime layer controls the loop, state, and execution.&lt;/li&gt;
&lt;li&gt;The model layer interacts with LLMs through a gateway.&lt;/li&gt;
&lt;li&gt;The tool layer defines what the agent is allowed to do.&lt;/li&gt;
&lt;li&gt;The control layer handles validation, memory, observability, and approvals.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is enough for most real systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools should be contracts, not suggestions
&lt;/h3&gt;

&lt;p&gt;Tools are what make agents useful, but they are also where risk enters the system. If a model can call tools, those tools need structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;risk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key idea is simple.The model can request a tool. The system decides if that request is allowed. This is where most demos fall short. They give the model too much control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory should be intentional
&lt;/h3&gt;

&lt;p&gt;More context does not always mean better results. Instead of sending everything to the model, retrieve only what matters. Think of memory as useful signals, not a full transcript. Short-term memory belongs to the current task. Semantic memory stores reusable facts. Episodic memory stores past actions. The important part is not storing memory. It is retrieving the right memory at the right time.&lt;/p&gt;

&lt;p&gt;This keeps the system focused, cheaper, and easier to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured outputs make the system usable
&lt;/h3&gt;

&lt;p&gt;Free text works for user responses. It does not work for system decisions. If the model is deciding what to do next, it should return structured data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;call_tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;finish&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ask_user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows the system to validate behavior instead of guessing from text. The model suggests. The system verifies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability is not optional
&lt;/h3&gt;

&lt;p&gt;Agent systems are harder to debug because they are not deterministic. The same input may take a different path. If something goes wrong, you need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the model saw&lt;/li&gt;
&lt;li&gt;What it decided&lt;/li&gt;
&lt;li&gt;Which tool it called&lt;/li&gt;
&lt;li&gt;What came back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this, debugging becomes guesswork. Even a simple step trace makes a big difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where frameworks fit
&lt;/h3&gt;

&lt;p&gt;Frameworks can help, but they do not replace architecture.&lt;/p&gt;

&lt;p&gt;Tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vercel AI SDK&lt;/li&gt;
&lt;li&gt;LangGraph&lt;/li&gt;
&lt;li&gt;OpenAI Agents SDK&lt;/li&gt;
&lt;li&gt;Model Context Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are useful for building agent systems. But they do not define your boundaries.&lt;/p&gt;

&lt;p&gt;You still need to decide how state works, how tools are exposed, how outputs are validated, and how failures are handled.&lt;/p&gt;

&lt;h3&gt;
  
  
  The architecture I would trust
&lt;/h3&gt;

&lt;p&gt;The architecture I would use in 2026 is not the most complex one. It is the one that gives control back to the system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stateful workflow.&lt;/li&gt;
&lt;li&gt;A controlled loop.&lt;/li&gt;
&lt;li&gt;Typed tools.&lt;/li&gt;
&lt;li&gt;Structured outputs.&lt;/li&gt;
&lt;li&gt;Observable steps.&lt;/li&gt;
&lt;li&gt;Clear boundaries between model decisions and system execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what turns an AI demo into something you can actually trust. Because in real systems, reliability matters more than clever prompts.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
