loading...
Cover image for Property Testing with JSVerify · Part I

Property Testing with JSVerify · Part I

glebec profile image Gabriel Lebec ・8 min read

Abstract: JSVerify is a JavaScript library for property-based testing, inspired by the classic Haskell library QuickCheck. JSVerify's API documentation appears to assume a certain level of familiarity with "type tetris" and functional idioms like monads. This article demonstrates the basics of the JSVerify library API, including how to create and use arbitrary values, data generators, and data shrinks, in ways that a typical JavaScript developer can understand.

Part I: What is Property Testing?

John Hughes, one of the original authors of QuickCheck, gives a compelling demonstration of the power of property tests in the following video. Watch if you wish, or read on – we will cover the essentials further below.

In short, property tests:

  • Specify expected laws (aka properties) for all intended inputs
  • Generate large numbers of random inputs to check, usually via a convenient DSL or API
  • Verify that the expected properties hold for all generated cases
  • Simplify failing cases progressively, to find a boundary that introduces the failure

Property testing is a form of evidence by example that code might be correct, or is definitely incorrect. It lies between the two extremes of unit tests (manually-specified individual cases) and formal machine-checked logic proofs (impractical in some languages or systems).

Unit Testing

Let's consider a small example. If we wanted to verify the correctness of a custom sort function, we might write a unit test in the form of a Chai assertion executed via the Mocha test framework:

describe('`sort`', () => {
    // unit test:
    it('transforms `[2, 1, 3]` to `[1, 2, 3]`', () => {
        expect(sort([2, 1, 3])).to.deep.equal([1, 2, 3])
    })
})

How confident are you that sort is correct? Probably not very. Sort could easily be broken without failing this test, if it:

  • merely swaps the first two elements
  • always returns [1, 2, 3]
  • fails on two-digit numbers (or higher)
  • fails on a list containing duplicate elements
  • fails on lists with only the first (or last) item out of order
  • fails on an empty list
  • fails on lists of even length

As humans, we tend to fall short of imagining every possible edge and corner case. This is partly because we make assumptions about how code will be used, focusing on the happy path. Writing multiple unit tests for each of the above would help, but is a lot of work for only a moderate increase in confidence that sort is generally correct.

Ad-Hoc Property Testing via Dynamic Assertions

When a programmer specifies that sorting [2, 1, 3] returns [1, 2, 3], they don't necessarily care about that one specific array. Rather, the point is to verify certain properties (characteristics) of sorting, such as:

  • sort results in an ordered array
  • sorting an already-sorted array does nothing (idempotence)
  • sort merely moves elements without addition, change, or removal

Such properties describe what actually matters conceptually, far more than any one specific case. Testing veterans will sometimes have developed ad-hoc techniques for confirming such concepts, for example by dynamically generating inputs and assertions.

const TESTS = 100
const LIMIT_LEN = 50
const LIMIT_NUM = 1000

const randomInt = () =>
    Math.floor(Math.random() * LIMIT_NUM)

const randomArr = () =>
    Array(Math.floor(Math.random() * LIMIT_LEN))
        .fill(null)
        .map(randomInt)

const isOrdered = (arr) => {
    for (let i = 0; i < arr.length - 1; i++) {
        if (arr[i] > arr[i + 1]) return false
    }
    return true
}

describe('`sort`', () => {
    for (let t = 0; t < TESTS; t++) {
        it('outputs ordered arrays`', () => {
            expect(isOrdered(sort(randomArr())).to.equal(true)
        })
    }
})

This is a step in the direction of full-blown property testing, but if the spec fails, all we see is the following opaque message:

ERROR: expected `false` to be `true`

There are some nontrivial downsides to the above attempt:

  • a lot of ad-hoc code for generating random inputs
  • the randomArray function probably produced the empty list more than necessary (once suffices)
  • did you notice that the randomInt function never produces negative numbers? Oops!
  • failure is non-instructive, what input caused it?
  • what did sort actually return for that input?
  • if the random input which failed is large, which part of it actually caused the failure?
  • if the suite is run multiple times, randomness means we may never encounter that failure case again

Some of this can be mitigated, e.g. by writing custom Chai matchers (extra work), logging (polluting test output), or other means. However, some of these issues have already been addressed by property-testing systems.

First-Class Property Testing with JSVerify

Here is the same check for ordering, written using a dedicated generative testing library.

const jsc = require('jsverify')

const isOrdered = (arr) => {
    for (let i = 0; i < arr.length - 1; i++) {
        if (arr[i] > arr[i + 1]) return false
    }
    return true
}

describe('`sort`', () => {
    it('outputs ordered arrays', () => {
        // property test:
        jsc.assert(jsc.forall('[integer]', (arr) => {
            return isOrdered(sort(arr))
        }))
    })
})

Note that we did not need to write our own randomInt and randomArr functions – JSVerify handled this for us. Breaking it down:

  • jsc.forall – the heart and soul of this library, it ties together data generators with a property-checking function; forall returns a "Property" value (meant to be used by other JSVerify functions).
    • '[integer]' – a string declaration of what data we want JSVerify to generate, specifically an array of integers. JSVerify parses a Domain-Specific Language (DSL) for describing input data (more on this later).
    • (arr) => { ... } – a function which will receive the randomly-generated data, responsible for verifying the expected property by returning true if that property holds.
  • jsc.assert – takes the "Property" returned by forall, and executes the logic it represents as an assertion (throwing an error on failure). This lets JSVerify integrate with test frameworks like Mocha.

Running this code gives us a more telling result:

Error: Failed after 7 tests and 3 shrinks. rngState: 86ac1d5e6093784bf2;
Counterexample: [2, 10]

Interesting! What does it all mean?

  • Error – Mocha is reporting that our assertion failed.
  • Failed after 7 tests – JSVerify generated six input cases before it discovered a seventh case that violated our property.
  • and 3 shrinks. – The failing counterexample could be simplified in three steps to reach a minimal failing counterexample (JSVerify actually took more steps than this, but who's counting?).
  • rngState: 86ac1d5e6093784bf2; – JSVerify uses a seed-based pseudo-random generator to produce values. Unlike Math.random(), we can deliberately use this seed to deterministically reproduce the failing test, if desired.
  • Counterexample: [2, 10]. JSVerify indicates that [2, 10] is the smallest input data it could find that failed our property check.

If we (temporarily) instrument our code as follows, we can investigate a bit deeper:

describe('`sort`', () => {
    it('outputs ordered arrays', () => {
        jsc.assert(jsc.forall('[integer]', (arr) => {
            const output = sort(arr)
            const ok = isOrdered(output)
+           console.log(
+               ok ? '✔️' : '❌',
+              'input:', arr,
+              'output:', output
+           )
            return ok
-       }))
+       }), {
+           rngState: '86ac1d5e6093784bf2'
+       })
    })
})

Note the second parameter to jsc.assert is an options object permitting us to manually set the pseudorandom seed, replicating the results we saw previously. (Alternatively, when using Mocha you can force the use of a particular rngState using --jsverifyRngState.) When this modified test is run, we see the following logged:

OK Input Output Notes
✔️ [] [] JSVerify begins with "small" inputs.
✔️ [2] [2] The first six cases are all ok.
✔️ [9] [9]
✔️ [23] [23]
✔️ [-9, 33, 33, 18, 31] [-9, 18, 31, 33, 33]
✔️ [3] [3]
[20, 9, 10] [10, 20, 9] Oops! Our property failed on the seventh case.
[9, 10] [10, 9] JSVerify "shrinks" the array, but it still fails.
✔️ [10] [10] Shrunk again, but now the prop holds. Back up.
✔️ [0, 10] [0, 10] JSVerify shrinks the values in the array. 9 -> 0 is still ok.
[4, 10] [10, 4] Halving 9 -> 4 fails, so shrinking restarts from here.
✔️ [10] [10] JSverify isn't perfect; we see some redundant checks.
✔️ [0, 10] [0, 10]
[2, 10] [10, 2] Halving 4 -> 2 fails again; shrink from here.
✔️ [10] [10] More redundant checks.
✔️ [0, 10] [0, 10]
✔️ [1, 10] [1, 10] This time, 2 -> 1 results in a passing output.
✔️ [-1, 10] [-1, 10] What about negative ints? Still ok.
✔️ [2] [2] From here on, JSVerify does a logarithmic search of integers between -10 and 10.
✔️ [2, 0] [0, 2]
✔️ [2, 5] [2, 5]
✔️ [2, -5] [-5, 2]
✔️ [2, 7] [2, 7]
✔️ [2, -7] [-7, 2]
✔️ [2, 8] [2, 8]
✔️ [2, -8] [-8, 2]
✔️ [2, 9] [2, 9]
✔️ [2, -9] [-9, 2] No failures, so JSVerify concludes [2, 10] is a minimal counterexample.

This experiment demonstrates several clever aspects of JSVerify's shrinking procedure for arrays of integers:

  • It checks for both smaller arrays and smaller values in the array
  • It checks commonly-neglected edge cases like [], 0, and negative integers
  • It uses binary search for good performance

Diagnosing the Problem

Revisiting our original test, the combination of global it + jsc.assert(jsc.forall(...)) is common enough that JSVerify offers a property method for it. (Using this convenience method means you can no longer pass in the options object to assert, so keep that in mind.)

describe('`sort`', () => {
-   it('outputs ordered arrays', () => {
-       jsc.assert(jsc.forall('[integer]', (arr) => {
-           return isOrdered(sort(arr))
-       }))
-   })
+   jsc.property('outputs ordered arrays', '[string]', (arr) => {
+       return isOrdered(sort(arr))
+   })
})

Manually running our test suite (without fixing the PRNG seed) multiple times eventually generates at least two distinct minimal failing input cases for us to consider:

Error: Failed after 4 tests and 13 shrinks. rngState: 04214a57f39806becd;
Counterexample: [2, 10]
...
Error: Failed after 1 tests and 8 shrinks. rngState: 0092ae01573f6d84bc;
Counterexample: [-1, -2]
...

Sorting aficionados may already have figured out what's going on – like Array#sort, our sort operates lexicographically rather than numerically. If the developer didn't intuit this issue, they could manually trace the logic of their sorting function on these inputs, either on paper or using debugging tools. Since JSVerify "shrunk" the counterexamples, debugging them will be simpler than if the original failure was reported.

But Wait, There's More!™ Check Out This One Weird Trick™

There is (apparently undocumented, as of yet) a feature that can make failure reporting more informative. If you return anything other than a boolean from your property check, it is appended as a string to the error message. This lets us include the output (or anything else we care to log) upon failure.

jsc.property('outputs ordered arrays', '[string]', (arr) => {
    const output = sort(arr)
    return isOrdered(output)
+       ? true
+       : 'sort returned ' + JSON.stringify(output)
})
Error: Failed after 2 tests and 14 shrinks. rngState: 86fd6b284e01a5c793;
Counterexample: [2, 10]; Error: sort returned [10,2]

Summary & Try It Yourself

In this first part, we saw the fundamentals of generative testing and getting started with JSVerify. In Part II, we will explore the JSVerify API in greater depth. Before doing so, however, try using JSVerify yourself below:

const jsc = require('jsverify') // https://en.wikipedia.org/wiki/De_Morgan%27s_laws const deMorgansLawsHold = (p, q) => { theorem1 = !(p && q) === (!p || !q) theorem2 = !(p || q) === (!p && !q) return theorem1 && theorem2 } // A correct property const result1 = jsc.check(jsc.forall( jsc.bool, jsc.bool, deMorgansLawsHold ), { quiet: true }) // silences console logs // An incorrect property const result2 = jsc.check(jsc.forall( jsc.nearray(jsc.bool), (nonEmptyArrBools) => { const idxOfFalse = nonEmptyArrBools.indexOf(false) return (idxOfFalse >= 0) ? true : 'indexOf found false at: ' + idxOfFalse } ), { quiet: true }) // Predict! Is this property correct or incorrect? Try it and see… const result3 = jsc.check(jsc.forall( '[json]', 'json', // jsc.array(jsc.json), jsc.json, // equivalent spec (arr, jsVal) => { const newArr = arr.concat(jsVal) return newArr.length === arr.length + 1 } ), { quiet: true }) console.log("de Morgan's laws hold?", result1) console.log("index of false in a nonempty array is at least 0?", result2) console.log("concatenating on a random JS value increases length by 1?", result3)

Discussion

pic
Editor guide