Cover image for shrun: A modern CLI testing framework

shrun: A modern CLI testing framework

taillogs profile image Ryland G Originally published at cdevn.com ・4 min read

TL;DR: Test your CLI commands in isolated docker containers using the Jest test environment you already love.

A few years ago, I was working as lead architect for a startup building a high-performance competitor to AWS Lambda. One of my responsibilities was maintaining a standalone CLI and SDK for the functions backend. The CLI/SDK was built with Node and commander (eventually yargs), and had very similar structure/usage to the popular Serverless framework.

A while after I built out this initial SDK/CLI we started having some internal frustrations regarding the process user-facing features would go through before eventually reaching the CLI. We realized that we would often design a backend feature only to later realize that the CLI interface/API would need to be quite nasty to satisfy it. This frustration had a measurably negative effect on both the quality of new features and the velocity in which they could be released. Many readers might assume that we simply had bad communication and planning, and while there was definitely room for improvement in that area, it didn't help that our team was separated by a 10-11 hour time difference. Regardless of the cause, at some point one of my coworkers started a conversation with me to explore ways we could make our process more declarative and reliable. After a especially frustrating day, he came to me with an amazing idea. He suggested that we create a "spec" format that would allow us to both test the CLI and propose new user-facing features in a concrete way. I perfectly understood the direction he was going, so I immediately started building a prototype. A day later I had a MVP version of the tool which consumed yaml based spec tests and ran them automatically against our open source CLI. Below is an example to show you the format of the spec (testing the npm init --help command):

- test: Test init help output
    - "curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -"
    - "sudo apt install nodejs"
    -   in: npm init --help
        out: |-
          npm init [--force|-f|--yes|-y|--scope]
          npm init <@scope> (same as `npx <@scope>/create`)
          npm init [<@scope>/]<name> (same as `npx [<@scope>/]create-<name>`)
          aliases: create, innit

Spec format

test: string - each spec test must have a test stanza with a unique name. For those who are familiar with Jest/Ava/Mocha, this maps directly to the test("someName", () => {}) format used by those frameworks.

setup?: string[] - the setup section allows you to run a series of shell commands before the test itself runs. This is convenient for tests that rely on a specific set of environment variables, need iptables configured etc. For those who are familiar with Jest/Ava/Mocha, this partially maps to the beforeEach (more like beforeThis since you specify it per test) construct.

steps: Step[] - steps are where the bulk of your test logic is defined and there is no limit to the number you can have per test. All steps must have an in entry, this is what will actually be run against the containers internal shell. If a step is expected to succeed, it is a PassStep and must have an out entry. in and out map to actual and expected in traditional testing frameworks. If a test is not expected to succeed (not 0 exit code), it must either have an err or exit entry. err is similar to out but is checked against stderr as opposed to stdout. exit makes it possible to specify the expected exit code that resulted from running the tests in statement.

There are also two other stanzas not show by the above spec:

cleanup?: string[] - the exact same as setup but runs after the test has finished. Useful for resource cleanup. Maps to the afterEach/afterThis construct in traditional testing frameworks.

foreach: Map<string, string>[] - allows a single test to be run multiple times with different input values.

Why shrun?

Some of you may think a dockerized solution like this is overkill. I understand that sentiment but there are convincing reasons why shrun brings value:

  • Each test runs in it's own isolated environment. CLI testing is unique in the sense that it's often the ultimate point of contact between your product and the user. Ensuring that a set of steps runs from start to finish on X environment is paramount.
  • Tests have minimal ability to interfere with each other. There are still issues such as noisy neighbors and throttling by external services, but generally speaking parallel test runs will not degrade the reliability of the tests.
  • The containers of troublesome failing tests can be sent to other developers and debugged quickly.
  • You can run shrun on any platform that supports Docker (basically all of them)


This is the initial release of shrun so don't expect things to be perfect. In the future I hope to improve the framework and add all relevant but missing Jest flags. Contributors and feedback are welcome and desired, so I'd love to hear how shrun could be improved to better fit your needs. If you like what you saw, please star the project on GitHub so it can be useful to a wider audience.

Posted on by:

taillogs profile

Ryland G


Head of Product Experience at Temporal. previously lead architect and low-level systems programmer for scale out SaaS offering. Game engine developer, ML engineering expert. DMs open on Twitter.


markdown guide

Wow... This might be what i was looking for for the past 2 years :))

I need to dive deeper, because my CLI's are often manipulating files on the HDD, so:

input = file
output = processed file

But the whole syntax looks more promising than fiddling with regexps ;)

BTW. Have you tried using alpine or any other docker image that is not measured in GBs?


I'm so happy to hear that shrun might improve your development experience. Manipulating files is definitely not a problem but it may be more tricky if you want to modify files on the host machine from the container. Your point about regexps is spot on, that's what shrun has to do internally so the user doesn't!

I have used many other docker images over the last few years with the tool but haven't used alpine since the rewrite. My implementation doesn't actually care about the specifics of the container as I'm just communicating with the docker daemon directly via HTTP API. Even the default shrun image is much lighter than GB coming in at ~300MB, not tiny but not bad. I'd love to hear more about why image size is important to you.

Thanks for the great feedback and data points.


Ouh, right, because input files are outside and output files are outside of container. Hmm. Maybe thats an idea for config option (what to mount where - -v in docker afair).

Hmmm. For some reason when i ran shrun build it started downloading ubuntu 19, which is usually an overkill. :)

Wow amazing feedback. Completely agree about allowing users to take advantage of mounting volumes. I immediately opened an issue and think it's one of the higher priority items. github.com/rylandg/shrun/issues/12

As for ubuntu 19. Users are free to bring their own docker image, shrun build is mainly provided as a convenience for those unfamiliar with docker. But I actually use ubuntu 19 for real shrun tests because I want a runtime as close to a real user as possible. This might not make sense for some shrun users but it does for me. Maybe I'll add a few default image options to shrun for users who don't want to make their own but also want some level of control.

Thanks again for the awesome feedback.

I actually went and wrote a quick patch. shrun 0.0.56 now supports a volume flag. May not work perfectly, but should be more than enough to get by. One caveat is that I couldn't use -v flag as it's already used by Jest so I opted for --vo instead. Obviously you can also use --volume. I added an example in the shrun-basic-demo repo of usage here:


I'll probably add the option to use an alpine image later today.

Edit: I also just added an alpine image to the repo and reworked build command so it allows you to choose between ubuntu and alpine.

npx shrun build <command-name> --image alpine | ubuntu

Example is in demo repo json github.com/rylandg/shrun-basic-dem... under script name "build-alpine". Not an alpine expert but tried to port all config I use in ubuntu and didn't have trouble with local runs.

Wow, that was quick :D
Thank you :)

No problem. Tbh I was pretty annoyed with myself for not thinking of volumes for the initial release. Thanks so much for mentioning it, should also make nested docker possible.


Could you explain a little more about the cleanup?: string[] spec stanza? Does it stop/remove the container or just stop processes within the container?

Also, are there examples of use in the pipeline for the repo?


As a user you never actually interact with the container, so you can assume all steps, setup and cleanup run inside of the already initialized container. Resource cleanup is very important so that's a great question regardless. shrun automatically creates a new container before each test runs and cleans up that container after it finishes.

Also, are there examples of use in the pipeline for the repo?

Could you explain what you mean by this?


Could you explain what you mean by this?

Sure! I was thinking more along the lines of a few examples of basic implementation. I see the basic demo project you've included with the post, maybe something more along those lines, showing how to implement most (if not all) features?

The demo project should address exactly that. I wrote a fake CLI to showcase a full testing setup with shrun. In the README of shrun I also reference the spec file of that other repo. Do you think more is needed? I've wondered it myself and would love some more data points.


Thanks for sharing your tool. I have in the coming days some gitlab pipeline scripts to refactoring and would like to give it a spin to do some TTD for cli executions.

Have you any suggestion how to set it up to test functions within a docker container that is used to run a job?


Hey, first off thanks for taking the time to read my article and look into shrun. shrun is a great fit for TTD CLI work on paper, so I'm guessing it can be useful to you.

I don't fully understand your question about "test functions". It doesn't matter what docker image you're using, that's completely up to you. shrun has the optional flag --dockerImage which allows you to use whatever images are available locally. I actually created a repo with a fully contained example of using shrun which is available here github.com/rylandg/shrun-basic-demo. I hope this answer helps, but please let me know if it doesn't and I can help you work through whatever issues you run into.


That's a pretty cool idea, way to go 👏


So glad you like the idea, thanks for the kind words.