Pipeline caching can help reduce build time by allowing the outputs or downloaded dependencies from one run to be reused in later runs, reducing or avoiding the cost to recreate or redownload the same files again.
Let's see how to enable and use caching in Azure Pipelines.
Hi everybody. Today we talk about how we can reduce the time our Pipelines take to run, using caching.
Caching is especially useful in scenarios where the same dependencies are downloaded over and over at the start of each run. This is often a time consuming process involving hundreds or thousands of network calls.
If you are a visual learner or simply prefer to watch and listen instead of reading, here you have the video with the whole explanation, which to be fair is much more complete than this post.
If you rather prefer reading, well... let's just continue :)
Why is caching important? If you are on your on-prem machine or dev machine, the build and release processes already have some sort of caching implementation that saves the files or the metadata locally.
However, when it comes to Azure Pipelines Agents, every time a new Pipelines Run starts, you are on an agent machine that has been created afresh just for that run, so nothing is stored there apart from the binaries and services the CICD process needs. This is why you may want to use the cache service.
Remember that caching can be effective at improving build time provided the time to restore and save the cache is less than the time to produce the output again from scratch. Because of this, caching may not be effective in all scenarios and may actually have a negative impact on build time.
Caching is added to a pipeline using the Cache pipeline task. This task works like any other task and is added to the steps section of a job.
variables: solution: '**/WebAppTest.sln' NUGET_PACKAGES: $(Pipeline.Workspace)/.nuget/packages
- task: Cache@2 inputs: key: 'nuget | "$(Agent.OS)" | **/packages.lock.json,!**/bin/**,!**/obj/**' restoreKeys: | nuget | "$(Agent.OS)" path: $(NUGET_PACKAGES) displayName: Cache NuGet packages
The Cache task has two required inputs: key and path:
- path should be set to the directory to populate the cache from (on save) and to store files in (on restore). It can be absolute or relative.
- key instead should be set to the identifier for the cache you want to restore or save
There is also another parameter, "restore keys", which can be used if you want to query against multiple keys or key prefixes. This is useful to fallback to another key in the case that a key couldn't be found.
In the example above, I use a composite key with the 'nuget' string literal, the OS version the agent is running (which comes from a system variable), and finally the content of the 'packages.lock.json' file. When you specify a file, the engine uses the hash of the content of the file as part of the key so if the content changes, so does the key. I've also specified additional filters so to ignore that file if tit is present in the bin and obj folders.
When a cache step is encountered during a run, the task will restore the cache based on the provided inputs. If no cache is found, the step completes and the next step in the job is run. After all steps in the job have run and assuming a successful job status, a special "save cache" step is automatically injected and run for each "restore cache" step that was not skipped. This step is responsible for saving the cache.
Skip steps based on cache hit
There are some scenarios in which the successful restoration of the cache should cause a different set of steps to be run. For example, a step that installs dependencies can be skipped if the cache was restored.
variables: solution: '**/WebAppTest.sln' NUGET_PACKAGES: $(Pipeline.Workspace)/.nuget/packages CACHE_RESTORED: 'false'
- task: Cache@2 inputs: key: 'nuget | "$(Agent.OS)" | **/packages.lock.json,!**/bin/**,!**/obj/**' restoreKeys: | nuget | "$(Agent.OS)" path: $(NUGET_PACKAGES) cacheHitVar: CACHE_RESTORED displayName: Cache NuGet packages - task: NuGetCommand@2 inputs: restoreSolution: '$(solution)' condition: ne(variables.CACHE_RESTORED, 'true')
To achieve that, we can use the cacheHitVar param, and pass a variable to it. If there is a cache hit, then that variable value will be changed to true.
Then we can use that variable in a condition for the task or step we want to skip.
I've created this comparison table so we can have an overview about the caching performances.
The first column is the execution with no cache.
The second one, instead, is with cache but at first run, which means that there is a cache miss. as you can see the overall time is more because not only the dependencies have to be restored as before, but we need to check if the cache exists and, since it doesn't, upload the content and create the cache item.
The third and last column is a cache hit. The cache is present, so the dependencies don't have to be downloaded (hence the NuGet Restore step is faster and since the cache doesn't chance there is no time for saving it. And this is faster, even tho for just few seconds because I had just few dependencies in the demo app.
It is worth noting that caching is currently supported in CI and deployment jobs, but not in classic release jobs.
What do you think of the Azure Pipelines Caching system? Do you use it? Can you see its benefits?
Let me know in the comment section below.
References and Links
- Video with full explanation and examples about Pipelines Caching
- Azure Pipelines Triggers series
- Official documentation about Azure Pipelines Caching
Like, share and follow me 🚀 for more content:
☕ Buy me a coffee
🌐 CoderDave.io Website
👦🏻 Facebook page
Top comments (1)
It appears there are issues with cache actually being beneficial at the moment.
github.com/microsoft/azure-pipelin... Cache is slow · Issue #11864