Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Building Zero-CVE Container Images at Scale: Patterns and Pitfalls (MAM215)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Building Zero-CVE Container Images at Scale: Patterns and Pitfalls (MAM215)

In this video, Dale Rodriguez from Chainguard explains how they build zero-CVE container images at scale. He describes the Chainguard factory that rebuilds 1,818 images daily from source using tools like Melange and APKO, processing with ~1,000 CPUs. The system automates CVE monitoring with 7-day critical and 14-day high/medium/low resolution SLAs. AI assists in diagnosing build errors, though humans verify all changes. Images undergo rigorous testing on real Kubernetes clusters before release. Chainguard offers 54 free images, enterprise custom assembly options, and has expanded to VMs and Libraries (Python, Java, JavaScript) with malware scanning and CVE backporting up to 3 years for packages.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Inside the Chainguard Factory: Building Zero-CVE Container Images Through Automation and Scale

Today we're going to talk about building zero-CVE container images at scale. My name is Dale Rodriguez. I'm a Sr. Happiness Engineer and also a Sr. Solutions Engineer at Chainguard. Let me tell you a little bit more about me. I'm a certified Vuln Hustler, which means that we hustle vulnerabilities at Chainguard. I have 10 years focused on product management, security, and solutions engineering, and another 7 years focusing on cloud native security. You can find me on LinkedIn at the URL at the bottom if you're interested, and let's go to the next slide.

So what is Chainguard? Why are we here? First and foremost, Chainguard's goal is to be the secure source for open source. We believe in the open source community. We support the open source community, but we wanted to enhance the open source community by securing containers, libraries, and virtual machines within the OSS community so that you can run this cool stuff in production environments and your compliance and security people won't have issues with it. We have over 1,818 images, 134,752 versions, and a whole lot of images. We don't sell the factory. We sell images and a couple of other things.

These are all of the tools that we use to build software at Chainguard. Melange is our package manager that helps us build APKO packages. Cosign is a GitHub security tool for security tokens. Sigstore is for signing. We have a whole bunch of Terraform providers. And last but not least, Malcontent will check your container images for malware. So there is no magic to zero CVEs. There's a lot of automation, debugging, a whole bunch of cloud compute costs, testing, and a lot of questioning your own sanity because this is very hard. So first of all, before we talk about the factory, I'm going to talk a little bit about the philosophy of building from source.

Have you ever heard of open source malware? When you build from source, first of all, it allows you to trust the source. You want to provide depth in your defense. What I mean by that, if you look at the small diagram over there to the right, you'll see that if you have very low visibility, limited visibility into your awareness blinds you from risk. What we do at Chainguard is we want to expand that visibility and build all the way down to the file system itself. Depth will mitigate uncertainty, and most importantly, when you have this depth, you'll be able to trust the stuff that you build.

Also, we're building software with safer compiler options. When you look at a lot of upstream projects, a lot of the languages aren't compiled using safe compilers. So we do our best to recompile all packages and binaries with safer options so we can reduce vulnerabilities. So what is the Chainguard factory? At a high level, we're going to download the OSS project. We have some cool automation in the background that's going to generate a package build, then generate another CVE remediation if necessary. That stuff is going to go into a package repository. Those two blue boxes at the bottom represent building that image and testing it, and then we're going to push that out to an image repository.

The top boxes that you see there, the package and image boxes, those are just some examples of how we do it. So let's dive in a little further. The assembly line. Essentially, the Chainguard factory assembly line runs right through the middle of the factory. Everything touches the assembly line and we have to keep it moving. We're building a lot of stuff daily, so all 1,818 of those projects are rebuilt on a daily basis. That's to cover for zero day CVEs and also to maintain updates in line with projects from an automated fashion.

Most importantly, we have a lot of GitHub PRs and I'm going to dive into those in a second. Also, our package building system is running on Kubernetes and then in that background we're executing our package build tool called Melange. So as I mentioned, we really have tons of PRs. This is just a screenshot of some revision request updates. As you can see, we have 6 in this particular screenshot. We got about 65,000 closed, 407 open.

When there's a new upstream release, we have automation that matches the version differences. We have GitHub checks and commit expectations, and essentially what we're doing is making sure it's the right image. Then we're going to run builds and test results on each of those. We're going to do that twice because we want to make sure that the image is proper, functional, and most importantly performing.

This gives you a sense of scale. In this particular slide, we rebuild every package from source. In this graph, these are the numbers of CPUs that we spin up as we're building these images. This is our all-world builder. One of the challenges about this is when you're spinning up this many Kubernetes nodes, you can really have a lot of degradation on performance with our packaging system, so we had to bring this in-house to make it a little bit better.

Essentially, we spun up about 1,000 CPUs to process this and then we spin them back down once the build process is complete. We want to automate all the things. We want to monitor upstream projects and new releases, which is why we're releasing daily. Most importantly, we're going to monitor multiple version streams of upstream projects. We don't just provide the latest; we provide every single tag that's supported by the project. The reason we do that is everybody doesn't use latest. You should not pin the latest, and most importantly, pinning multiple stream versions allows us to maintain an end-of-life upstream for up to six months. In the event that a project goes into end-of-life, we could support that project for you after six months.

Most importantly, we're going to monitor the CVEs and we're going to scan for those CVEs. At Chainguard, we resolve CVEs with criticals resolved in seven days, and high, medium, and low resolved within fourteen days. The robots do most of the work. I don't do it. So you can't have a talk at AWS re:Invent without AI. We use AI to diagnose and fix our build errors. It improves our tests and has made my life very easy. We're submitting PRs and it processes a lot of good stuff for us, but the biggest thing is that our AI is generating a lot of telemetry from our build scans, tests, and test logs. This information allows us to improve our factory, making sure that we're making the right decisions when we're building our software.

This is an example of AI doing its thing. The main thing I want to point out though is that when this AI PR is generated, it is also assigned to an engineering group. A human does verify the work before it's submitted. We don't trust AI yet. We do not want to become Skynet. Essentially, we're going to make sure that the errors in the build logs are highlighted. We're going to do some suggested fixes as well, and then we're going to triage and remediate that. At the moment, that is done by a human.

Testing is also important. Our images do not get tagged until they pass tests. Just because we download and rebuild a package and package it up into an image doesn't mean that's complete. We want to make sure that it's fully functional. A lot of times what that means is spinning up an actual Kubernetes cluster and running a real-world test of that particular project to make sure that it's functional. We do a little bit more than just hello world when we're testing these projects.

This is an example of where the image comes into the PR system or the factory. It's going to spin up that test harness. It's going to deploy those probes. The probes will provide the functional testing. We'll leverage Helm, Docker, or whatever tool we use to deploy for that particular test. Once that runs and that's certified, then that image is released. The factory is about scale and how we build a functional, scalable factory.

Chainguard's Product Ecosystem: From Free Images to Enterprise Libraries

Our images are always signed with Sigstore and pushed to our customers' private repos. When you become a customer at Chainguard, you get your own private repository with your domain and your images. We release those typically within an hour. You can clone your images from our repository to yours, and most importantly, we have LDAP and other features that allow you to customize those images so you can have your own code and image program leveraging Chainguard images. That way, we build those for you so you don't have to manage that in-house. As you can see in the green, that's essentially an example of your name here with a particular image type and then on the tag history you can travel back and look up previous builds if you like.

We also have free images—about 54 of those. These images are latest only. We have some base images: Python, JRE, Ruby, Node, and PHP. We also have some application images that you can use for free: Postgres, MariaDB, MongoDB, Redis, and Vault. So we have about 54 of those and counting. All of these free images always have zero CVEs. They have SBOMs. It's essentially a free version of our enterprise product, and you're welcome to use it. We'd love to see what you build.

Enterprises, commercials, and everyone else are not one size fits all, so we built custom assembly. Custom assembly allows you to customize the golden images that we built and make them your own. You can add your own packages and create your own entry points. This allows you to make our workflows fit within yours. The idea of building these images is not for you to change how you do things, but rather to add a complement to the way that you're currently doing business. You could add packages, do all updates, and they all come signed with Sigstore. They have SBOMs, provenance, and more.

This last slide is about how we want you to chain guard your open source software stack. At the bottom, the foundation of all of this is the Chainguard Factory. On top of that is Chainguard OS. We make and distribute our own operating system. This allows us to have that seven-day critical SLA and fourteen-day SLA for high, medium, and low for system packages because we're not at the mercy of a third-party distribution. Most importantly, this operating system is minimal and has a small footprint. It was actually inspired by Alpine. We used to have mutual support and got tired of being confused, so we dropped it a few years ago.

After Chainguard OS, the next product is Chainguard Virtual Machines. Chainguard Virtual Machines are the same as the hardened container images, except it's a VM effort for it. Instead of having an Nginx container, you could have an Nginx VM with zero CVE. All of our images and VMs also come in compliance flavors. We have FIPS versions of a lot of our images—over 700 FIPS images—and those FIPS images can also be made into Chainguard VMs.

We talked about Chainguard Containers and tool chains. We protect the tool chain runtime, and most importantly, we also have Helm charts called Chainguard Guarded Charts. When Bitnami put their stuff behind a paywall, we decided to step up and make charts to serve the community. We have roughly 40 to 50 charts that you can use with Chainguard Helm charts. And last but not least, Chainguard Libraries. Chainguard Libraries is one of our newer, more exciting features. Instead of using PyPI, for example, for your Python packages, you would use Chainguard. Those packages will come with malware protection, provenance, and CVE remediation. Currently we have Python and Java available. JavaScript is in early access.

The goal of libraries is to try to secure every open source library that we can. On the roadmap will more than likely be Go and Rust, followed by a few others over the next couple of quarters. The idea is to secure the entire stack. One main thing I want to note on Chainguard Libraries is that one of the nice things about Chainguard Libraries is that you still can use third-party library providers. We don't try to lock you in to our libraries product, our container product, or VMs. They're all customizable. Another thing on libraries, especially with Python—you all know Python has about 700,000 plus packages. We're trying to work towards some balance. We haven't gotten that far yet, but we have about 20,000 to 30,000 and we're still growing. Java has over 50,000 packages. JavaScript—I can't tell you all the numbers because you all didn't sign an NDA.

So that's the factory. A couple of key points: The packages are built from source. They really are hardened by default by expert humans. I promise we don't use AI for everything. They're updated and patched quickly. That's mostly automated, which is what helps us build at scale. They're minimal container images built from packages. They're reproducible, secure, and tested on real clusters. They're rebuilt consistently to pick up package updates, and most importantly, they deliver hot and fresh to your door like a pizza.

The VM images are exactly the same. The libraries are a bit different. They're built from a verified upstream source. They're malware scanned, and they also have CVE backporting. One distinct difference with libraries compared to our images and container product is that we essentially backport about three years back. With images and VMs, we're not trying to support three-year-old versions, but from a package perspective, we know that some development houses are still using their favorite Java package from like 1995, so we want to have the ability for those customers to migrate to a secure form of a package and not leave them out in the dust. Thank you all for coming to my talk. I appreciate you all.

; This article is entirely auto-generated using Amazon Bedrock.