GitHub Actions Day 24: Caching Dependencies

December 24, 2019

This is day 24 of my GitHub Actions Advent Calendar. If you want to see the whole list of tips as they're published, see the index.

Most software projects depend on a set of dependencies that need to be installed as part of the build and test workflows. If you're building a Node application, the first step is usually an npm install to download and install the dependencies. If you're building a .NET application, you'll install NuGet packages. And if you're building a Go application, you'll go get your dependencies.

But this initial step of downloading dependencies is expensive. By caching them, we can reduce this time spent setting up our workflow.

Basically, when the actions/cache action runs for the first time, at the beginning of our workflow, it will look for our dependency cache. Since this is the first run, it won't find it. Our npm install step will run as normal. But after the workflow is completed, the path that we specify will be stored in the cache.

Subsequent workflow runs will download that cache at the beginning of the run, meaning that our npm install step has everything that it needs, and doesn't need to spend time downloading.

The simplest setup is just to specify a cache key and the path to cache.

- uses: actions/cache@v1
  with:
    path: ~/.npm
    key: npm-packages

However, this setup is a little too simplistic, because caches are shared across all the workflows for your repository. That means that if you had a cache for the npm packages in your master branch, and a cache for the npm packages in a maintenance branch, then you'd always have to download the packages that changed between those two branches.

That is to say: when the master branch build runs, it will store the packages that it uses in the cache. When the maintenance branch build runs, it will restore the cache of packages from the master branch build. Then npm install will need to download all the packages that aren't in the master branch but are in the maintenance branch.

Instead, you can tailor the cache to exactly what it's storing. The best way to do this is to use a key that identifies exactly what's being cached. You can take a hash of the file that identifies the dependencies you're installing -- in this case, we're using npm, so we'll hash the package-lock.json. This will give us a cache key that is tailored to our packages. We'll actually have multiple caches, one for each branch that changes the package.json, so each branch will restore efficiently.

- uses: actions/cache@v1
  with:
    path: ~/.npm
    key: npm-packages-${{ hashFiles('**/package-lock.json') }}

Okay, this is an improvement. But we still have a problem: since the key depends on the contents of package-lock.json, any time we change the dependencies at all, we invalidate the cache completely.

We can add one more key -- in this case, the restore-keys -- that can be used as a fuzzier match. It will match the prefixes of the cache keys. In this case, we could set the restore-keys to npm-packages-. If there's an exact match for the key, then that will be the cache that's restored. But on a cache miss, then it will look for the first cache with a key that starts with npm-packages-.

This means that npm install will have to download some dependencies, but probably not all of them. So it's a big improvement over the case when there's a total cache miss.

- uses: actions/cache@v1
  with:
    path: ~/.npm
    key: npm-packages-${{ hashFiles('**/package-lock.json') }}
    restore-keys: npm-packages-

Using the actions/cache action is a good way to reduce the time spent setting up your dependencies, and it works on a wide variety of platforms. So whether you're building a project with Node, .NET, Java, or another technology, it can speed up your build.