O.F.K.

Posted on Sep 13 • Edited on Sep 17

One... Two... Testing

#elixir #fsharp #testing

Nobody `Expecto` the Spanish Inquisition

On our previous hike, we modelled the domain of FunPark - our miniature theme-park management application.

It's wise to also test our code.

F# has a bunch of testing frameworks: starting from NUnit, the more generic .Net port xUnit.net, going through FsUnit, and several others of varying levels of adoption, or ease-of-use (yes, while I love it for trivial cases, I'm looking at you Unqoute.)

I decided to go with Expecto, One of F# community's most loved testing framework.

Expecto is truly a framework, not a library: it has a built-in test organizer, a test runner, and lastly, it has its own extensive assertion library.

To top things off, its documentation is top-notch (though, sadly, not perfect - there are some parts no longer compatible with current APIs, so caveat emptor, and lean on the great F# community for help!)

Let me give you an example

Expecto is an example-based testing framework.

What that means is that we run assertions on concrete instances: we run our functions-under-test with pre-defined inputs that we expect would yield a deterministic result. We then assert that the function's output matches the expected result.

Yes, that probably sounds like the testing you're used to, that you simply call "testing", and what's with the "example-based" nonsense?

That will be explained later.

FYI

There isn't much to say about Expecto that its documentation, and the code examples in test/server/FunParkTest folder don't cover.

Take a look.

Push comes to shove, example-based testing is mostly a “solved problem” in the domain of software engineering, what changes are the implementation details.

I personally love Expecto, as does most of the F# community, though I listed some alternatives you may want to tinker with to see what tickles your fancy.

Property of <entity under test>

All the frameworks and libraries in the previous section were, as noted, example-based.

Example-based testing is actually very well suited to Functional Programming languages due to those languages' attributes such as immutability-by-default, and pure functions having no side effects.

But what if we could test our function on a deeper level? What if we could identify some core invariant(s) of our function, that are true no matter what input argument(s) we pass it?

This is what property-based testing is about.

Standing on the shoulders of giants

F# guru, and all-round stand-up guy, Scott Wlaschin, has a series on PBT, a true masterpiece, showing Scott's uncanny humor, as well as giving a thorough tour of PBT basics, and then some.

Check it out!

The hedgehog can never be buggered at all

Current to this post publication date, F# has two well-established libraries for property-based testing:

On the left corner, weighing in at v3.x, hailing from the OG PBT library, Haskell's QuickCheck, of which it's a direct port, we have FsCheck - F# premier PBT library, well established, battle-tested, community-driven, just solid.
On the right corner, weighing in at v0.13.0, but very actively developed by its maintainers, Hedgehog - a multi-language PBT library (available also for Haskell and... sorry, I have to say a bad word, Scala). Not as well established as the incumbent, but still very welcome by the community, and an all-round excellent PBT library.

For my little reimplementation project, I decided to go with Hedgehog!

The reasoning? Twofold:

I just wanted to play with the "other" tool, the runner-up, full of promise.
Unlike FsCheck that lacks it, Hedgehog boasts built-in shrinking!

Honey, I shrunk the test

I'm sure the second bullet point made perfect sense, especially those who only learned of property-based testing just now, so we can move on.

Just kidding.

To understand what shrinking is, let's take a hand-waving detour of how property-based testing works.

How the property sausage is made

At its core a PBT library generates random value according to the types of the inputs to the function-under-test, which it then "feeds" into said function and assert a certain invariant, the property, is held for all those values.

(How those values are generated is a topic unto itself, you can read about here.)

We, however, want to discuss what happens when the library generates a value that disproves the property.

True, the library could just return the offending value and call it a day, but, due to the input values being generated in a pseudo-random fashion, that value might be non-informative.

As a silly, but illustrative example, think of a function that, due to a faulty implementation, expects all its inputs, which are strings, to contain the character 'a'.

Most PBTs generate strings by appending a random draw of ASCII, or even UTF-8, characters to each other for a given length. It's not uncommon for a generated string to look like "tIg87%^K🔥FHkjがf".

Now suppose our library found this string is invalidating our property and returned it.

What useful information did we get? Why is that string invalid? Is it because of the emoji? The Kanji? Too long? Too short? Because it has two consecutive numbers (did you catch that?)

There are too many possibilities, and even when inspecting the (faulty) function's implementation, we might miss the issue because we'd be trying to spot an issue that matches our assumptions, that I wrote before, not looking for "must contain an 'a' character" flaw!

Could the library do something to show us a failing input that just by looking at it we'd get that it's wrong and have an idea of how to fix it? For example, for our silly example function, the empty string, ""?

It would certainly help!

The act of producing ever simpler examples until finding the most trivial example to invalidate the invariant is called "shrinking".

I like small values and I can not lie

For simple types, e.g., primitive types, most PBT libraries know how to shrink towards a trivial example on their own: strings get iteratively shrunk into the empty string, numbers are shrunk from large ones to zero, lists are shrunk to the empty list, and so forth.

But... what happens when the library is faced with a complex type? How does shrinking happen then?

Without knowledge on how to shrink, some PBT libraries would be forced to return the first offending value they find, not the most trivial one.

The following example is taken from Hedgehog's documentation

(I feel it's not explained well, and the code snippets, for both FsCheck and Hedgehog are incompatible with their respective current APIs.)

Assume we have a type: type MyVersion = MyVersion of maj: int * min: int * patch: int denoting a SemVer-comptiable version number. (Not using Version as the type's name since there is a System.Version already.)

We now have this, again silly, notion all SemVer-compatible numbers are reversible (not necessarily palindrome, e.g., "1.2.1", but also "240.35.240") so we define the function, let revVer Ver(x, y, z) = Ver(z, y, x) (Yes, I called it revVer on purpose, see if you get why?)

Now, let's look at how FsCheck will return on this obviously wrong property:

// Assuming FsCheck library is in context, of course
open FsCheck
open FsCheck.FSharp // New in v3.x of FsCheck

type MyVersion = MyVersion of maj: int * min: int * patch: int

let verRev (MyVersion (ma, mi, pa)) = MyVersion(pa, mi, ma)

let version =
    Gen.choose (0, 255) // Arb.Generator<'T> is not compatible with FsCheck v3.x, so we get creative. Choosing `byte` was purely to restrict the possible values to some reasonable size!
    |> Gen.three
    |> Gen.map (fun (ma, mi, pa) -> MyVersion(ma, mi, pa))
    |> Arb.fromGen

Prop.forAll version (fun ver -> ver = verRev ver)

Here's the output I got:

Falsifiable, after 1 test (0 shrinks) (7912673337469139209,2146181641386198071)
Last step was invoked with size of 2 and seed of (16035987728726137193,1406933410677738447):
Original:
MyVersion (63, 87, 87)
with exception:
System.Exception: Expected true, got false.

There are two takeaways from this very informative error message:

The counter-example: MyVersion (63, 87, 87).
No shrinking was done, because FsCheck doesn't know how to shrink the custom type MyVersion: "Falsifiable, after 1 test (0 shrinks)".

Now, before we discuss this any further, let's see how Hedgehog deals with the same property:

// Assuming Hedgehog is in context
open Hedgehog

type MyVersion = MyVersion of maj: int * min: int * patch: int

let revVer (MyVersion (ma, mi, pa)) = MyVersion (pa, mi, ma)

let version =
    Range.constantBounded ()
    |> Gen.byte
    |> Gen.map int
    |> Gen.tuple3
    |> Gen.map (fun (ma, mi, pa) -> MyVersion (ma, mi, pa))

property {
    let! vers = Gen.list (Range.linear 0 100) version
    return (List.forall (fun ver -> ver = verRev ver) vers)
}
|> Property.checkBool
|> printfn "%A"

And the error:

Error: System.Exception: *** Failed! Falsifiable (after 1 test and 10 shrinks):
[MyVersion (0, 0, 1)]
This failure can be reproduced by running:
> Property.recheck "0_3673599252239236771_13184715031196647977_1010110110110110110110110110" <property>
at Hedgehog.ReportModule.tryRaise(Report report)
at Hedgehog.Property.checkBool(Property`1 g)
at <StartupCode$FSI_0010>.$FSI_0010.main@()
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

We can already see the difference:

The counter-example is as minimal as can be: `MyVersion (0, 0,1).
Shrinking occurred! Hedgehog knows how to shrink even custom types, provided the types comprising the custom types are either primitive, or otherwise well-defined in our code: "Failed! Falsifiable (after 1 test and 10 shrinks)".

"You're not integral, to the project"

Indeed, for a trivial example as our Version type the difference between Myersion (143, 2, 5) and MyVersion (0, 0, 1) is, well, trivial. But it's not hard to think of properties where getting to the minimal counter-example would be tenfold easier to parse than a non-trivial one (e.g., a record type with 10 fields, each field itself a custom type.)

The thing is, when dealing with custom types, FsCheck, unless provided a shrinking strategy, for which it has a specific API, treats the custom type opaquely and atomically, i.e., as a single unit.

When a custom type fails a property, FsCheck returns the offending non-trivial example, because it's the best it can do.

Hedgehog, on the other hand, and I'd be damned if I know how, that's why I'm only writing about it, not maintaining it, does know how to pierce into a custom type and shrink its constituents - assuming they're primitive types, or otherwise well-defined in the code.

This property, pun not intended but so very fitting, that some other PBT libraries in other programming languages also have, is called integral shrinking, because, unlike FsCheck that requires external shrinking strategies, frameworks with integral shrinking, know about shrinking on their own.

With great shrinking comes great responsibility

Both FsCheck and Hedgehog boast great generators API, the APIs for creating (pseudo-) random inputs.

Actually, they're so great that for the most part you can use those APIs to generate inputs even for custom types without fail or problem.

But sometimes we need more, we need a generator so customized, wrapping over a type so complex, that we need to create our own generator.

Luckily, both libraries allow us that too. With ease!

Check out tests/server/FunParkHedgehog/HedgehogGenerators.fs.

To grok the file, you need to know that Bogus is an F# implementation of the great Ruby Faker gem - a library for creating plausible-sounding values in a variety of domains.

I wanted to use it to create human-sounding names for my Patrons and Rides.

Could I have used Bogus directly in the constructors for these types? Yes, of course.

But that would lose the (integral) shrinking the PBT libraries offer - Bogus dishes out values of an (extensive) internal database. It has no notion of what shrinking is, and is not designed for that purpose.

Great idea in general, not good enough for my tests. I need more!

So, let's create a custom generator wrapping Bogus - a generator that calls a Bogus module, extracting a value from it, while retaining Hedgehog's ability to shrink the values if need be.

First, we define a way to consistently get the same wrapper.

As you may have noted before, in the example code earlier, both libraries, upon failing, note the seed they used to create the random inputs.

Knowing the seed is essential for making sure we can re-run the tests with the same randomized inputs to verify the failing tests now pass.

The implementation is rather straightforward, using Bogus's API for creating a seeded Faker object.

The real "magic" happens in genBogus (and its size-dependent cousin genBogusSized, though I've never used a size-dependent generator in any of my tests, and I'm not sure why they even exist).

In truth, genBogus is not magical at all: it uses Gen module's API to map from the given seed generator, a constraint imposed by Gen.map input expectations, to a generator wrapper around a specific Bogus API.

That really is all there is to it!

Tinker, tailored custom generators, solider, spy

To see how it's used, look at tests/server/FunParkHedgehog/RideTest.fs, specifically at the name generator.

We define fakerName using our genBogus which returns a generator wrapping over Bogus.Faker.Name module, so, basically: Gen<Bogus.Faker.Name>.

If you hover over fakerName though, you'll see its type is Bogus.Faker.Name, unwrapped.

That is because we defined it using let! (i.e., "let-bang"), which bind the unwrapped value.

(This non-magical "magic" is due to how the gen builder, a computation expression, another unique F# feature, is implemented.

I advise on reading Scott Wlaschin's, who else?, series on computation expression to understand these complex, but very rewarding features.)

One more touch I added to my custom generator was custom shrinking!

I didn't have to, as we now know, Hedgehog has integral shrinking, but I wanted the shrinking to also use the Faker.Name module, not resort to string shrinking, after all, name is a string in the end, so, actually, I did have to define a custom shrinker, but due to a business logic constraint, not a shortcoming of the library!

The last nicety I added, strictly for flair and showing-off, I admit, was tagSelector.

Using F#'s reflection capabilities, nothing to do with Hedgehog, it simply makes sure no tag is repeated in the tag list the Ride generator creates.

The worth of a testing library

One question to contemplate is "is this integral shrinking worth it?"

That's not a trivial question! While lacking in that department, at least currently, who knows what FsCheck's maintainers have in store, it does have a slew of other helper functions that make it an extraordinary testing framework: integral logging, custom distribution of inputs generated, and much more.

It may well be that Hedgehog has all those features too, but its documentation is very lacking, and if it does have those features... I don't know of them, or how to invoke them.

(Note that FsCheck's documentation is also lacking, and actually the official docs, on the website, are still those of v2.x, where the library is now v3.x!)

The answer to that question is left to you, dear reader, and your needs, and abilities, per project.

I wanted to try Hedgehog's internal shrinking, and this is, after all, a toy project I do for fun... no one's going to fault me for writing wrong tests, not having a 100% coverage, and so forth.

In real-life, production, projects you need to weigh all constraints you'd be facing and make an informed decision.

Hopefully this post helps with that.

And now, the time has come

This has been an even longer, bumpier, ride, ha-ha, get it, Ride, tough crowd, than the previous one, because the subject of testing, and property-based testing in particular is vast, to say the least.

I hope you did get at least some idea of how we test in F#, and of what PBT are, why they're useful, and a glimpse of how to actually use them.

As with the previous post, I put useful links throughout the text, to folks much more proficient than myself in F# (and the skill of writing succinctly, yet precisely. I know my own faults, yes.)
You'd be wise to follow up on these links, especially the ones to any of Scott Wlaschin's posts!

Adieu

The next two chapters in the original material are about custom equality and comparison.

Elixir doesn't have any built-in facilities to deal with either.

F# on the other hand does. For both, custom equality, and custom comparison.

But it's going to take me a while to read the chapters in Elixir, code them in F#, and write voluminous posts about it. 😁

So, for the time being, this is goodbye.

Hope you enjoyed these two posts, that you learned something, that something clicked for you.

And hey, maybe let's call it "till next time".

It's going to take a while, but I promise you, like Arnie... I'll be back!

DEV Community

One... Two... Testing

Nobody `Expecto` the Spanish Inquisition

Let me give you an example

FYI

Property of <entity under test>

Standing on the shoulders of giants

The hedgehog can never be buggered at all

Honey, I shrunk the test

How the property sausage is made

I like small values and I can not lie

"You're not integral, to the project"

With great shrinking comes great responsibility

Tinker, tailored custom generators, solider, spy

The worth of a testing library

And now, the time has come

Adieu

Top comments (0)

Nobody Expecto the Spanish Inquisition

Let me give you an example

FYI

Property of <entity under test>

Standing on the shoulders of giants

The hedgehog can never be buggered at all

Honey, I shrunk the test

How the property sausage is made

I like small values and I can not lie

"You're not integral, to the project"

With great shrinking comes great responsibility

Tinker, tailored custom generators, solider, spy

The worth of a testing library

And now, the time has come

Adieu

Nobody `Expecto` the Spanish Inquisition