GitHub Actions Day 24: Caching Dependencies
This is day 24 of my GitHub Actions Advent Calendar. If you want to see the whole list of tips as they're published, see the index.
Most software projects depend on a set of dependencies that need to
be installed as part of the build and test workflows. If you're
building a Node application, the first step is usually an npm install
to download and install the dependencies. If you're
building a .NET application, you'll install NuGet packages.
And if you're building a Go application, you'll go get
your
dependencies.
But this initial step of downloading dependencies is expensive. By caching them, we can reduce this time spent setting up our workflow.
Basically, when the actions/cache
action runs for the first time, at the beginning of our workflow, it
will look for our dependency cache. Since this is the first run, it
won't find it. Our npm install
step will run as normal. But after
the workflow is completed, the path that we specify will be stored
in the cache.
Subsequent workflow runs will download that cache at the beginning of
the run, meaning that our npm install
step has everything that it
needs, and doesn't need to spend time downloading.
The simplest setup is just to specify a cache key and the path to cache.
- uses: actions/cache@v1
with:
path: ~/.npm
key: npm-packages
However, this setup is a little too simplistic, because caches are shared across all the workflows for your repository. That means that if you had a cache for the npm packages in your master branch, and a cache for the npm packages in a maintenance branch, then you'd always have to download the packages that changed between those two branches.
That is to say: when the master branch build runs, it will store the
packages that it uses in the cache. When the maintenance branch build
runs, it will restore the cache of packages from the master branch
build. Then npm install
will need to download all the packages that
aren't in the master branch but are in the maintenance branch.
Instead, you can tailor the cache to exactly what it's storing.
The best way to do this is to use a key
that identifies exactly
what's being cached. You can take a hash of the file that identifies
the dependencies you're installing -- in this case, we're using npm,
so we'll hash the package-lock.json
. This will give us a cache key
that is tailored to our packages. We'll actually have multiple caches,
one for each branch that changes the package.json
, so each branch will
restore efficiently.
- uses: actions/cache@v1
with:
path: ~/.npm
key: npm-packages-${{ hashFiles('**/package-lock.json') }}
Okay, this is an improvement. But we still have a problem: since the
key depends on the contents of package-lock.json
, any time we change
the dependencies at all, we invalidate the cache completely.
We can add one more key -- in this case, the restore-keys
-- that
can be used as a fuzzier match. It will match the prefixes of the
cache keys. In this case, we could set the restore-keys
to
npm-packages-
. If there's an exact match for the key, then that
will be the cache that's restored. But on a cache miss, then it
will look for the first cache with a key that starts with
npm-packages-
.
This means that npm install
will have to download some dependencies,
but probably not all of them. So it's a big improvement over the
case when there's a total cache miss.
- uses: actions/cache@v1
with:
path: ~/.npm
key: npm-packages-${{ hashFiles('**/package-lock.json') }}
restore-keys: npm-packages-
Using the actions/cache
action is a good way to reduce the time
spent setting up your dependencies, and it works on a wide variety
of platforms.
So whether you're building a project with
Node,
.NET,
Java,
or another technology, it can speed up your build.