Ever wondered why it takes so long for your Azure pipeline to finish whatever it is supposed to do? Can we improve this somehow?
In this post, I will try to point out the obvious reason for slowness in our pipelines that is running scripts based on the npm ecosystem, describe what solution Azure Pipelines offers to solve it and how it works, as well as how the implementation looks like.
Pipelines in Azure
Azure pipelines are composed from stages, each of them running jobs that are divided into steps that run your scripts or tasks. These jobs are being run on top of an agent which is basically a computing infrastructure (you can think of it as an image with preinstalled software needed to run your scripts).
These agents spin up into pristine state. Yes, this is the basics of a predictable CI environment. The problem - your script is doing everything from scratch every single time which is probably a big waste of time. Maybe we can shave off a few seconds or more.
In our npm ecosystem, we rely on the fundamentals of node modules as our dependencies. In our CI process, we will install dependencies in a predictable manner, which gives us a lot of confidence that the CI environment is always going to be the same for given inputs (basically our lock file), so we can test it carefully and finally deploy or publish our artifact safely.
This installation process along with our build process among others, can take a significant amount of time as our app grows and become more mature. Our pipeline therefore can take several minutes to complete. Why? - The problem lies within the nature of a CI ecosystem; The process repeats itself over and over on every single run of our pipeline. Luckily, CI systems and specifically Azure Pipelines enable us to potentially improve on this aspect, using a caching mechanism.
Cache your dependencies with Cache task
The following example demonstrates how to use the Azure Pipeline Cache task to cache our yarn dependencies.
The process for caching your npm dependencies is the same while using your package-lock.json and .npm cache folder.
Here is a brief summary of how the caching process is working:
- Using Yarn task to set Yarn's cache folder to a location of our choice
- task: Yarn@3 inputs: arguments: 'config set cache-folder $(yarnCacheFolder)' displayName: Set Yarn cache folder
- Use cache task to create a cache with a key based on a static description of 'yarn', the agent's operating system identifier, and the content of your yarn lock file. Azure Pipelines is actually creating a hash based on the contents of this file and when the pipeline is run it will check for cache hit with this key. At runtime, assuming your agent is running Windows, this will look similar to this - yarn|"Windows_NT"|FupyH86xxxxxxxxxtBrhv+2fiXTRb/6Dew= .
- task: Cache@2 inputs: key: 'yarn | "$(Agent.OS)" | yarn.lock' restoreKeys: | yarn | "$(Agent.OS)" yarn path: $(yarnCacheFolder) displayName: Cache dependencies
- On the first run there is a cache miss with our key and the cache is created for the first time.
Getting a pipeline cache artifact with one of the following fingerprints: Fingerprint: `yarn|"Windows_NT"|FupyH86xxxxxxxxxtBrhv+2fiXTRb/6Dew=` Fingerprint: `yarn|"Windows_NT"|**` Fingerprint: `yarn|**` There is a cache miss.
At this stage, you might notice an additional Post-Job step was dynamically appended to the pipeline. This step is used to actually create the cache item. The process is printed to the pipeline log:
Creating a pipeline cache artifact with the following fingerprint: `yarn|"Windows_NT"|FupyH86xxxxxxxxxtBrhv+2fiXTRb/6Dew=` Cache item created.
- On the second run, assuming no changes to our lock file and our agent's configuration, there will be a cache hit and dependencies will be fetched from the previously created cache instead of resolving it from the registry. This doesn't mean it happens instantaneously but it should take less time then actually do all of the process from scratch. In the pipeline log, you can actually see that the cache is being downloaded from Azure storage before it is being used.
There is a cache hit: `yarn|"Windows_NT"|FupyH86xxxxxxxxxtBrhv+2fiXTRb/6Dew=` Used scope: 30;7375ab34-xxxx-xxxx-8ef7-382c644dca33;refs/heads/xxxxx; Entry found at fingerprint: `yarn|"Windows_NT"|FupyH86xxxxxxxxxtBrhv+2fiXTRb/6Dew=` Expected size to be downloaded: 100.0 MB Downloaded 0.0 MB out of 100.0 MB (0%). 7-Zip 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26 Extracting archive: Downloaded 12 MB out of 100.0 MB (12%). ...
If there is a cache miss, it will try to fallback to one of the restore keys.
Here is the complete example of our .yml file:
variables: npmRegistry: 'xxxx-xxxx-xxxx-xxxx' #Your custom registry ID yarnCacheFolder: $(Pipeline.Workspace)/.yarn steps: - task: Yarn@3 inputs: arguments: 'config set cache-folder $(yarnCacheFolder)' displayName: Set Yarn cache folder - task: Cache@2 inputs: key: 'yarn | "$(Agent.OS)" | yarn.lock' restoreKeys: | yarn | "$(Agent.OS)" yarn path: $(yarnCacheFolder) displayName: Cache dependencies - task: Yarn@3 displayName: 'Install dependencies' inputs: arguments: '--frozen-lockfile' customRegistry: useFeed customFeed: '$(npmRegistry)'
Using the Azure Pipeline Caching feature can save us a great deal of time. Here is an example of a pipeline performance before applying a caching task. You can see the average run time is not great to say the least, and revolves around 10 minutes and sometimes more - depending on agent availability, other networking, CI and npm ecosystem performance aspects. Clearly there must have been room for improvement.
And here is a summary of 2 runs after applying caching.
The first run (2nd in the list) is longer due to the Post-Job step creating a new cache item and npm dependencies are installed from npm as usual.
The second run (1st in the list) really shows the new performance improvement, as there is a cache hit, the npm dependencies are fetched from the cache - Our pipeline run time is now reasonable!
Top comments (0)