According to this study only ~20% of GitHub actions use caching to speed up builds! π€―
Seems incredibly low and wasteful to me, so I was wondering. How many of you donβt use caching in their projects? And what are the reasons why not?
According to this study only ~20% of GitHub actions use caching to speed up builds! π€―
Seems incredibly low and wasteful to me, so I was wondering. How many of you donβt use caching in their projects? And what are the reasons why not?
For further actions, you may consider blocking this person and/or reporting abuse
Atulpriya Sharma -
Muhammad Hanzala Ali -
Michael Mekuleyi -
Vivesh -
Top comments (12)
I do in some cases, but not in others.
If itβs something like caching dependencies and the dependency management tooling includes strict versioning (for example, using NPM and making use of
package-lock.json
instead of justpackage.json
) then yes, I generally do use caching.OTOH, if there is no way to determine automatically that the cache should be dropped, I tend to avoid caching. Without automatic cache invalidation, you end up in a situation where you canβt be sure that the CI is actually testing the correct thing, or that any failures are actually true failures. This type of thing is a known and well established issue with caching in build environments (for example, standard advice when you run into a build issue while using
ccache
has always been to manually drop the cache and rebuild things).To be honest I was only thinking of npm and Docker build caches, things where I never really ran into any problems with invalid caches. But I see your point.
What is the main reason you invest in caching? Speed or Cost?
Actually, Docker build caching is one potential source of issues, because you can only safely use caches if all parts of your build process (and all underlying images) are nondeterministic and invariant of all external resources. For example, if you are building something off of an Ubuntu base image and running
apt-get update && apt-get upgrade -y
as part of the build, you actually canβt safely use Dockerβs built-in cache because of how it handles invalidation (put differently, if you build that same Docker image with a clean cache at two different points in time, you can end up with two different images, which means you canβt safely use caching).Speed primarily, because the difference can be huge, and none of the build infrastructure I use charges for time.
True, you should always use explicit versioning in your Dockerfile, to have reproducible builds.
But even without it, caching should work and speed up things and the build should be able to work for as long as you keep the cache. At least I donβt see a reason why caching unversioned commands should break the build.
I do see the point that, if the cache gets dropped for whatever reason, you may end up with a different build as different version of the dependencies may get installed (which you probably donβt want if you aim for reproducible builds).
The flip side though is that sometimes you want (or even need) to always be using the latest versions of dependencies. For example, where I work we use Docker as part of our process of building native DEB/RPM packages for various distros (because it lets us make the process trivially portable), and in that case, we always want to be building against whatever the latest versions of our dependencies are so that the resulting package installs correctly.
In such a situation, caching the Docker build results can cause that requirement for tracking the latest dependency versions to be violated.
I personally set up caching only for pipelines that take βtoo longβ (more than 15 minutes).
I think itβs mostly because the next feature is always more important then tweaking the CI/CD.
But thinking about it, this feels like βskipping TDDβ. In the end the benefits in terms of cost savings and developer productivity seems the outweighs the couple of minutes it takes to set up caching and other improvements (Iβm looking at you Dockerfile π)
Some caches can be slow to upload and download from, resulting in slower overall build time. Also, each caching step doubles the number of conditional branches that you need to verify after making a change to your CI script.
I would try caching in each case and measure difference in build time.
I have been leaning more towards having developers run tests locally, produce a text file with a hash based on source files of the last test run, commit the text file into repo, and have the PR builds just verify that last test run matches source code. Then have PR merge build do a full build as a last minute check of reproducibility.
Yeah Iβve noticed that getting and updating cache can take 1-2 minutes, depending on the size of the cache. But given a fairly large list of dependencies itβs almost always worth it.
As for changes in CI scripts: Canβt you set up cache to re validate when it detects changes in key files, such as package.json.
The idea of running Tests locally and uploading proof is interesting. But Iβd be worried about having uncontrollable environments and side effects. Centralized CI allows for a βsource of truthβ. Or how do you manage your developer environments?
I suspect most people use CI to just ensure all devs are running the tests...for that case, I would just use the check I mentioned, along with scripts that encourage runnings tests locally with minimum friction. As mentioned, I still would use CI as a final check after PR review is complete, but not on each change in a PR.
The next level is people trying to ensure the tests run from a clean state. You can also encourage this with scripts to clean local environment state. Generally, most languages have some sort of sandbox or dependency management, so builds are relatively isolated and reproducible.
The next level would be people ensuring the build works on multiple operating systems or on a blessed operating system. At that point, CI can help if devs use macos and blessed operating system is linux.
A final level is when people want to speed up their integration tests by split and running in parallel. For this, I would use CI.
For me it's the other way around, I want automatic tests to pass before I review the PR, because developer time is more valuable then CI time.
Sure you can clean the git state and and build directory, however this could still mean that developers run different compiler versions, or different HW architecture (Intel vs Apple Silicon), which essentially makes very hard, if not impossible to have reproducible builds and tests. Which is why I prefer having a fast CI setup as gatekeeper and never release anything that has not been built by the CI server.
I just vote for fast local tests (I would probably wrap git push with an alias that ran unit tests && git push instead and ask everyone to use the alias), along with a final check before merging. If you keep thinking through all the scenarios that make CI tests slower than local dev, you will ultimately notice that there are multiple scenarios where CI will always become the bottleneck in your code workflow. One scenario is when you modify the version of one of your external dependencies. The tests will run quickly locally, but CI will have a busted cache and need to fetch all the dependencies again and possibly upload the new set to a cache. This will happen on a regular basis. Developers will complain that CI can become slow sometimes.
I just meant that you have to verify this logic works as you expect, which usually means running the ci build twice to try out the two branches.