loading...

The Orchid, the Wasp, and the Test Fixture

dmfay profile image Dian Fay Originally published at di.nmfay.com on ・7 min read

I write a lot of integration tests that operate on data. The usual format for this is a setup function which gets the database into a particular state, a test or tests which validate the appropriate application functionality, and then a teardown function which cleans everything up so the next test suite can do its thing. There are different names and some little complexities (Mocha and AVA offer a before and a beforeEach, for example) but generally speaking this is How It's Done in every language/framework I've written tests in. This seems less a product of conscious architecture than it does a natural evolution of testing processes; nobody's* really nailed down a formal model for test data management yet.

The end result is that these setup functions, or fixtures, tend to be developed ad-hoc and inconsistently. It's not difficult to wind up with two test suites taking completely different approaches to generate what's practically speaking the same data. It gets worse when something changes and a bunch of your tests become out of date with you none the wiser until a bug report lands in your lap. I've written a lot of fixtures like that, and I want to stop.

The only solution to inconsistency is centralization: there needs to be a single source of data. If there's one place to go for fixture data, that goes a long way toward ensuring tests stay current. However, just bringing all the fixtures under one roof isn't enough. If some tests exercise carryout orders and others exercise delivery orders, the database state could be 75% identical -- but one has a phone number and a pickup time attached, the other an address and a driver. One fixture alone won't do the job, and breaking it up is backsliding towards the original problem. Centralization is only part of the solution; fixtures have to be flexible as well.

Meanwhile, in Southwestern Australia

The hammer orchid has a very specific mechanism of reproduction. Each of the species in the Drakaea genus mimics the scent (not to mention color and shape) of the female of a symbiotic species of wasp. The scent attracts male wasps, which attempt to mate with the flower only to become covered in the orchid's pollen. Eventually they give up and fly off. Enough of them proceed to fall for the same trick again, rubbing the pollen off onto a new flower, to ensure the survival of the orchids; and, presumably, enough of them find actual mates to ensure the survival of their own species.

Of course, to say the orchid tricks the wasp is a blatant anthropomorphization. The orchid may be a marvel of evolutionary architecture, but it can't think and it can't plan. It is simply following a program which requires that it become, in a certain sense -- quite literally, smell -- a wasp. An orchid which fails to be a wasp does not reproduce. The wasp, too, is an orchid when it deposits pollen on the waiting stigma of another flower.

The poststructuralists Gilles Deleuze and Felix Guattari used the orchid and the wasp to exemplify what they called a rhizome. The rhizome is an organizational model, a way of thinking about structure and process and the structure of process, which counterpoints the more familiar hierarchical or arborescent model. A corporation is a hierarchy of power which flows top to bottom; meanwhile, a labor union may have officials and bureaucracy, but these local hierarchies don't define the entire organization. Power in a union flows in many directions. There's a lot to like about the rhizomatic model, but one of its principal attributes is just what we're looking for: flexibility.

Deleuze and Guattari identify six characteristics of a rhizome in 1000 Plateaus. The first two and last two are each closely related and considered together.

Connection and Heterogeneity

A rhizome is a crowd or cluster of different (heterogeneous) things which can be and are connected non-hierarchically. This describes a lot of technological stuff, especially distributed systems! If you're thinking of serverless applications, Cassandra, or Kubernetes clusters: that's where we're going with this.

Our data consists, at an atomic level, of records in different tables. If we consider an "initializer" function which generates one of these records as an element of a rhizome, we can compose multiple initializers to generate any data state we need to test.

An initializer looks something like this:

async (db, data) => {
  return db.drivers.insert({name: 'Taylor', license: 'abc123'});
};

Other initializers may cover the franchises table, the destinations table, and the orders table. Each is as simple as possible, generating records of one and only one type. An initializer which creates records of multiple types is a throwback to the complex fixtures we're trying to avoid.

There are always some tests that need to do something specific with the data. What happens when a driver doesn't have a license? If Taylor always has one, we can't exercise that code. We have a few options here:

  • Update Taylor's record to remove her license at the beginning of the "drivers without licenses get ticketed" test
  • Create a second driver-without-license initializer which generates a record for Taylor's hapless compatriot Tyler, sans license
  • Generate records for both Taylor, with a license, and Tyler, without, in the single driver initializer

There's no cut and dried answer here; the best solution depends on the situation. Here, if there's only one test that depends on having a driver without a license, I'd go with option A. If there are several, it might be time to consider the others.

Multiplicities

Rhizomes must be thought of in terms of the discrete elements which make it up, and how those elements interact with the elements of other systems. The reproduction of the hammer orchid consists of flowers and wasps, and both flower and wasp interact with things outside. Deleuze and Guattari offer a more direct example: a puppet's strings, considered as a multiplicity, are connected not to the will of the puppeteer but to another multiplicity of nerves. The puppeteer's nervous system becomes a puppet in the same way that the hammer orchid becomes a wasp.

Thinking in multiplicities inverts the question of how fixture data is set up. It's no longer about the state for this or that test, but about the ability to describe and therefore build any data state. Each test suite selects the initializer functions it requires and builds a rhizome from them. The order of invocation does matter for local hierarchies; for example, we can't create a delivery order without a driver.

I have a ContextFactory to which I can pass the names of initializer functions. This factory returns a new function which, when executed, runs the initializers in sequence and collects the records each generates, passing the current state or context into each succeeding initializer so elements in local hierarchies can create their relationships correctly. Each test suite's before function creates a new ContextFactory in the global scope:

contextFactory = await ContextFactory(
  'franchise',
  'driver',
  'destination',
  'delivery-order'
);

This example contains two local hierarchies: franchise-driver-order and destination-order. The only constraint on ordering is that nothing can appear before its dependencies; for example, we could create the destination before anything else, but delivery-order has to be created last.

Asignifying Rupture

Have I mentioned that poststructuralism takes a lot of heat for impenetrable jargon? In fairness, it's difficult to establish a vocabulary to talk about things as abstract as it does, but its reputation is still deserved to a certain extent. Think of this as representing a "self-healing" capability if one of the components of the rhizome breaks down. If a single wasp doesn't make it to a second flower, it makes little difference; there are other wasps and other flowers. Political rhizomes especially have a way of recurring even under harsh repression, as does quackgrass.

This is a useful property for distributed architectures and concurrent processing: if a Spark job has incomplete results because something took an executor offline, the cluster manager can schedule other executors to cover the missing data. But for our purposes, a breakdown means inconsistency, so this is a point of departure for us -- we're better off raising an exception and aborting.

Cartography and Decalcomania

A rhizome is "a map and not a tracing". Where the latter creates an immutable still-life representation, a map is open to interpretation, interrogation, and most importantly, modification. Maps change all the time, because what they represent is permanently in flux. Territories declare independence, are recognized or not, are annexed; borders shift, connections are made and broken, cultures and languages ebb and flow. Maps do more than merely show this information: they transfer it ("decalcomania" is a process of reproducing images, the origin of the more common and subtly different word "decal"). A border defines the understood limits of a territory; a route on an atlas becomes a route in the mind of a driver.

When the ContextFactory is invoked, it returns an object mapping initializers to the data each have created.

ctx = contextFactory();

assert.equal(ctx.driver.name, 'Taylor');

A monolithic fixture is a tracing: it freezes a snapshot of the data model as it appeared at one point in time. The initializers, by contrast, map out our application's data model bit by bit, each piece adding more definition. If the information which makes up a driver changes -- adding a last name or whether they're on shift -- that gets added to the initializer. Every test is automatically up to date. If one breaks, that's a good thing! It means the code being exercised can't handle the new information correctly, and needs to be fixed before we can ship.

End

The rhizomatic model makes test fixtures endlessly flexible. Where monolithic fixtures multiply complexity and fall out of date with little warning, a unified, composable set of discrete fixtures keeps data generation centralized and ensures that tests that exercise related functionality use a consistent and current data set.

* The Doctrine O/RM for PHP provides a framework for loading and executing discrete centralized test fixtures, making it the only example I've seen in the wild of what I'm about to cover, if you're the kind of person who skips down to read footnotes before continuing. Anyway, score one for PHP!

Posted on Mar 7 '18 by:

dmfay profile

Dian Fay

@dmfay

It's pronounced Diane. At any given point I'm pick-at-least-two from data architect, developer, and ops...ish. In my spare time I maintain Massive.js, a data mapper for Node.js and PostgreSQL.

Discussion

markdown guide