Due to the current job market, I’ve had a bit more free time on my hands than usual so I have been spending more time on personal projects. Putting more effort into those projects sitting forever on the "I'll fix that eventually" list. The small kind of things that come up enough to notice but not important enough to focus on.
One of those things was the APT cache I’ve been running. It generally works, but every so often it just randomly starts returning HTTP 500 errors without any useful logs and usually fixed by restarting it.
How hard could it be?
While spending a couple hours looking into it, not making much progress, I saw that I am not the only person having reliability issues with apt-cacher-ng. Scrolling through the Debian bug tracker, the application having errors caused by race conditions, incomplete downloads, and corrupted files seems to have become a frequent issue¹²³⁴⁵ in the last couple years with similar sounding issues going back over a decade. There are a few other alternatives, but most seemed either have become unmaintained or have similar complaints.
I ended up wondering why I was using it in the first place. APT repositories are just HTTP file servers, and APT already supports proxies natively. It didn’t seem like something that should require a custom piece of software. Being a web developer, my first instinct was to reach for Nginx. It already handles caching extremely well, and APT repositories are largely just static unsecured* files. It seemed like a reasonable fit.

The goal changed from fixing my issues to finding a simpler solution using off-the-shelf Nginx container. My use case was quite simple, I just needed a pull-through cache so I wasn't downloading the same packages multiple times. I was mostly happy with apt-cacher-ng, didn't need any new features, but wanted something more reliable. I'm not running a business with it, the setup can be a bit jank as long as it stays running smoothly.
So I set myself some goals before starting:
- Can be used as a drop-in as a replacement for a existing apt-cacher-ng container because I don't want change how my machines are already configured.
- No custom software, use existing projects. I want a cache, not a whole new piece of software to maintain.
- Support both Debian and Ubuntu; assuming if they work, other distributions that use APT are likely to work as well.
- Aggregate the various Debian mirrors into a single cache bucket to avoid duplication.
- Be able to cache other non-standard package repositories such as updates for Proxmox.
- Cannot require mirroring the whole repositories. debmirror is technically always a solution but I don't consider it a reasonable one.
- Packages should naturally fall out of a full cache or expire over time.
- Behave transparently to the system using it. Machines shouldn't need to know or care if their request was cached or not.
- Support
auto-apt-proxyso laptops and CI pipelines which can't use a fixed configuration can auto-magically benefit from the cache.
Why would you want or need an APT cache?
Okay, let's step back for a second, why am I bothering with this? There's two ways to look at for reasons. There is the feel good philosophical reasons and there's the practical self-serving reasons.
Part of the positive motivation here is just being considerate of the infrastructure these distributions rely on. Debian, like many open source project, are largely supported by volunteers and donations. A local cache helps reduce the burden on these services when updating multiple systems or while repetitively installing the same packages as part of a CI/CD pipeline. Even though individual packages are small, it can add up quickly. Basically be kind to services being provided for free.
For the more selfish reasons, the servers used when connecting to deb.debian.org have never been particularly fast for me. Even on a gigabit connection, I tend to get speeds in the sub-megabit range. Since I run Debian across a few machines as well as in virtual environments for testing and development, having a local cache means that once a package is downloaded the first time, it can be served at full speed for every further request. Updates are considerably faster. It also has a side effect of making the net-installer surprisingly fast compared to the full size DVD installer when it comes time to do major version upgrades.
Nginx is great, this should be quick and easy
Nginx handled most of what I needed out of the box. There’s an official container image, it can listen on the same port as apt-cacher-ng, and its caching system is well documented and understood. I've used it for much more ridiculous things throughout my career.
Putting together the initial configuration didn’t take long. A couple of server blocks for the known mirrors and a catch-all to handle the non-standard repositories with proxy_pass sending traffic where it needs to go. Add some caching rules to aggregate the Debian mirrors and I've got enough to have something working. I start testing with copies of Debian and Ubuntu, other than a few easily solvable issues like some files needing to be excluded from caching and HTTP 206 being cached incorrectly causing checksum failures, it was smooth sailing. Ready to call it done in an afternoon.
Feeling confident in what I had done, switched my desktop over to using the new cache for a more realistic test. It immediately fails trying to do apt update.
Hmm...maybe it's not all HTTP
You may have noticed that there was an asterisk next to "unsecured files". There's a reason for that. APT repositories are mostly served over HTTP, Debian and Ubuntu are by default. The lack of HTTPS is not usually an issue. APT, being a creation of the 1990s, makes no assumptions of the transport medium and instead verifies hashes and that the package is signed by a trusted GPG key.
However in a world that is increasingly SSL by default, many third-party repositories are HTTPS. My previous tests were using clean installs of the systems, but trying to test using my desktop, the more messy nature of the real world was introduced. Notably Microsoft, Mozilla, and Docker repositories that use HTTPS.
As a general rule, HTTPS/SSL traffic is a blackbox that you cannot interfere with. I could add extra configuration to handle HTTP-to-HTTPS conversion specifically for these repositories. However I wanted to the proxy to be transparent without needing the client system being aware caching. Having to change repository settings on systems to replace HTTPS URLs with HTTP ones is not transparent. It would also make unknown repositories unusable until the container is updated.
Luckily, APT handles proxied HTTPS connections by first sending a CONNECT request over HTTP which makes it possible to identify where the traffic needs to go without needing to decrypt it. This should allow me to route the traffic properly. Unfortunately, Nginx doesn’t support handling those requests without additional modules that are only available in the paid version.
Hello HAProxy
To work around the HTTPS limitations, I added HAProxy. The idea was to let it handle routing the requests to either Nginx for caching or forward them along to the requested destination as needed. This idea based on a misunderstanding of the documentation causing me to believe that there was support for being a forward proxy by setting some request variables.
This mistake cost me a couple hours of frustration trying to debug a service that was never designed to work. It wasn't all for nothing though. Besides being able to use ACLs to split the traffic more cleanly, the HAProxy dashboard gave me more visibility into the traffic compared to the relatively bare Nginx status page.
I could now route the requests more effectively, but I was still no closer to HTTPS working.
TinyProxy to the rescue
At this point, I remembered SteamCache (now called LANCache) from the 2010s, which was a project caching Steam game downloads for LAN events. It was using Nginx, similar to what I am attempting to do so I decided take a look into how they were handling SSL. They use "SNI proxy" as they MITM requests directly using DNS replacement. Unfortunately, it was useless to me try to handle proxy requests.
This left me at a bit of a dead end. It was time to take step back and do something I should have done long ago instead of fighting with HAProxy, just Google it.
The first result led me to TinyProxywhich was the exactly what I needed. It’s a small, proxy server that handles forwarding HTTPS requests, requiring almost zero configuration, and has on-going maintenance. Adding it to the container and updating HAProxy to pass the appropriate traffic to it filled in the missing piece. It would handle HTTPS traffic while Nginx continued to handle caching.
After that, everything started behaving as expected. Testing with Debian and Ubuntu worked without issue, my desktop updated with both HTTP and HTTPS repositories flawlessly.
Quick pit-stop for auto-apt-proxy
I did not forget about wanting support for auto-apt-proxy. It turns out the detection method is quite simple. Being an open source tool, I was able to look into how it finds proxies. Proxies are detected by looping through a list of known endpoints, making a request for the root, and then searching for the string "Apt-cache".
By already having a index page with the config line needed to connect, I was already supporting detection because I had been jokingly referring to this project as "Apt-cacher-dt" or Apt Cacher Duct Tape. I had unknowingly added support.
So what have I done?
This was a silly little project to fix a personal gripe that ended up taking three different proxy servers, slapping them together into a container to create a proxy. It's not going to win any prizes for efficiency but it does what I set out to do. It has been running flawless for a couple days now.
It was a bit more involved than I originally expected, however it gave me a better understanding of how proxies interact and how APT distributes software. It was a fun detour that gave me something to write about, something else I have been looking give a try.
I've posted the source to Github along with the container image, if anyone would like to take a look or give it a try.
Top comments (0)