DEV Community

Cover image for How Can Hierarchical Test Structure Absolutely Make a Mess?
Alex Fedorov
Alex Fedorov

Posted on • Originally published at hackernoon.com

How Can Hierarchical Test Structure Absolutely Make a Mess?

Have you ever written your unit tests using a simple xUnit style testing framework?

Then you probably know, as tests get more complex, the more boilerplate and duplication they collect, either spread among the test methods, or setup functions.

Now, hierarchical context frameworks are pretty robust to mitigate this boilerplate problem and remove this duplication. They allow you to have nested contexts each one having their own little bit of setup, and “inheriting” the setup of parent contexts.

This way, you can express lots of different scenarios without actually repeating yourself even once in the test setup code.

Great, isn’t it?

Now, what if I told you that hierarchical test structure can cause more subtle duplication (that it was supposed to prevent in the first place) that is hard to spot and refactor?

Let me give you an oversimplified example:

Two Fetchers, elusively alike

The other day, I was working on two “fetcher” classes that talk to the HTTP client for 3rd party API and contain business logic of what needs to be done, and also can handle both immediate “successful,” and asynchronous “accepted” responses.

Their code was pretty similar, and whole logic of handling the response, and re-scheduling the async task was duplicated, so we have refactored it in a separate collaborator object in the production code.

Then we thought: “Well, we refactored duplication in the production code. There bound to be a duplication in the test code. Let’s get it DRYed as well!”

Not so fast!

When we started reading two test suites side-by-side, it turned out that they don’t look too much alike. It was hard to spot and isolate the duplication.

They were written in a hierarchical style, and they were leveraging the full power of nested contexts.

Here is a reduced example for the first test (in Kotlin):

describe("FetcherOne") {
    lateinit var asyncQueue: AsyncQueue
    lateinit var fetcher: FetcherOne

    val client = Client()
        .withMockResponseStatus("accepted")

    // … more irrelevant variables here …

    beforeEachTest {
        asyncQueue = AsyncQueue()
        fetcher = FetcherOne(asyncQueue, client)
    }

    describe("perform") {
        beforeEachTest {
            fetcher.perform()
        }

        it("tells client to do stuff") {
            assertThat(client.actions).contains("doStuff")
        }

        it("triggers an async polling job") {
            assertThat(asyncQueue).contains(fetcher::pollStatus)
        }

        // … more irrelevant tests here …

        context("when there is no need to check for status") {
            beforeGroup {
                client.mockResponseStatus("successful")
            }

            it("does not trigger an async polling job") {
                assertThat(asyncQueue).isEmpty()
            }

            // … more irrelevant tests here …
        }

        // … more irrelevant contexts here …
    }
}
Enter fullscreen mode Exit fullscreen mode

So the test suite starts with the root context. Here is where you can set the global defaults for your test suite, and override some of them in the nested contexts.

This is good if you have a typical “happy path” scenario, and then you have more “special cases,” for which you’ll use nested contexts.

Notice that we’re mocking the HTTP client to respond with an “accepted” status:

val client = Client()
    .withMockResponseStatus("accepted")
Enter fullscreen mode Exit fullscreen mode

And then in one of the nested contexts, we override the response status to “successful”:

beforeGroup {
    client.mockResponseStatus("successful")
}
Enter fullscreen mode Exit fullscreen mode

With that all in mind, take a look at the second test suite:

describe("FetcherTwo") {
    lateinit var asyncQueue: AsyncQueue
    lateinit var fetcher: FetcherTwo

    val client = Client()

    // … more irrelevant variables here …

    beforeGroup {
        client.mockResponseStatus("successful")
    }

    beforeEachTest {
        asyncQueue = AsyncQueue()
        fetcher = FetcherTwo(asyncQueue, client)
    }

    describe("perform") {
        beforeEachTest {
            fetcher.perform()
        }

        it("tells client to do some other stuff") {
            assertThat(client.actions).contains("doSomeOtherStuff")
        }

        it("does not trigger an async polling job") {
            assertThat(asyncQueue).isEmpty()
        }

        // … more irrelevant tests here …

        context("when there is a need to check operation status async") {
            beforeGroup {
                client.mockResponseStatus("accepted")
            }

            it("triggers an async job to check the last operation") {
                assertThat(asyncQueue).contains(fetcher::checkLastOperation)
            }

            // … more irrelevant tests here …
        }

        // … more irrelevant contexts here …
    }
}
Enter fullscreen mode Exit fullscreen mode

As you can see, the author of this test suite have chosen a different response status as the default “happy-path” scenario—“successful”:

val client = Client()

beforeGroup {
    client.mockResponseStatus("successful")
}
Enter fullscreen mode Exit fullscreen mode

(and used a different mocking style on top of that)

And the nested context overrides the response status to “accepted” status:

beforeGroup {
    client.mockResponseStatus("accepted")
}
Enter fullscreen mode Exit fullscreen mode

Now, with this example, you could figure out how to refactor it quickly enough, after inverting the default and custom scenarios for one of the test suites, right?

This example is elementary. What we had was more complicated. Imagine that your async polling and retrying logic (what we were trying to refactor) was dependent on 4 factors:

  • You’ll have 4 levels of nested contexts describing each element as a part of the overall scenario;
  • Each context may choose a different default case as opposed to which ones should be overridden in the inner context.

As you can see, this can quickly become a mess, and very hard to refactor.

If we had a classic flat test structure, we’d have a little bit of duplication in the test suite, but it would be so much easier to compare test suites side-by-side and refactor them.

Let me show you an example:

describe("FetcherOne") {
    lateinit var asyncQueue: AsyncQueue
    lateinit var fetcher: FetcherOne

    val client = Client()

    beforeEachTest {
        asyncQueue = AsyncQueue()
        fetcher = FetcherOne(asyncQueue, client)
    }

    it("perform - calls client") {
        client.mockResponseStatus("accepted")

        fetcher.perform()

        assertThat(client.actions).contains("doStuff")
    }

    it("perform - triggers async polling job when accepted") {
        client.mockResponseStatus("accepted")

        fetcher.perform()

        assertThat(asyncQueue).contains(fetcher::pollStatus)
    }

    it("perform - does not trigger async job when successful") {
        client.mockResponseStatus("successful")

        fetcher.perform()

        assertThat(asyncQueue).isEmpty()
    }
}
Enter fullscreen mode Exit fullscreen mode

The second test suite will now look very similar (not worth showing here, but you can see the gist if you like).

As you can see, refactoring the duplication between two test suites now is almost a no-brainer.

Do you like to learn more about testing and TDD in Kotlin?

I have written a 4-part (350-pages total) “Ultimate Tutorial: Getting Started With Kotlin” (+ more to come), and you can get it as a free bonus by becoming a member of my monthly newsletter.

On top of just Kotlin, it is full of goodies like TDD, Clean Code, Software Architecture, Business Impacts, 5 WHYs, Acceptance Criteria, Personas, and more.

—Sign up here and start learning how to build full-fledged Kotlin applications with TDD!

Conclusion

Now, I’m not trying you to convince of one style or another. I’m just trying to tell you that both styles have their strengths and weaknesses.

And as we’ve noticed even strength of the style might let you shoot yourself in the foot. So be careful: with power comes responsibility.

Don’t make your hierarchical tests too complex and too nested!

Your Turn

If you like my ideas, consider giving this article lots of dev.to reactions 😊 and share your own experiences with both styles in the comments below:

What tricky situations with testing can YOU remember?


originally published on HackerNoon
picture credit: pexels

Top comments (0)